智能与分布计算实验室
  频繁子图挖掘算法及其在洗钱模式发现中的应用研究
姓名 钟刚
论文答辩日期 2008.06.05
论文提交日期 2008.06.10
论文级别 硕士
中文题名 频繁子图挖掘算法及其在洗钱模式发现中的应用研究
英文题名 An Application Research on Frequent Subgraph Mining in Money-laundering Model Detecting
导师1 李玉华
导师2
中文关键词 金融交易网络;频繁模式挖掘;建模
英文关键词 financial transaction network;frequent subgraph mining;modeling
中文文摘 由于金融帐户之间的交易所天然具有的转入、转出方式,具有相互交易的一个交易团体的交易数据通过转入帐户和转出帐户之间的关联,形成了一个交易的网络,可以非常直观的用图的方式进行描述。根据金融反洗钱研究表明,洗钱交易存在各种各样的模式,对应于交易网络图,即图模式。采用频繁子图挖掘的方法可以发现洗钱交易的频繁使用的模式。分析表明,金融交易网络可以用有向加权图模型来表示。用节点表示帐户,边表示交易,方向表示帐户之间的资金转移方向,权值依据交易信息来设定。考虑金融交易背景十分复杂,选取其中关键的几个属性如金额、企业类型相关度、交易地区进行规约,转化为交易边的权值。现有的频繁子图挖掘算法多数是针对无向带标号图的挖掘,对有向图、图模式的挖掘较少。图的模式是指具体节点标号不同但是结构相同的图抽象为统一的表示方式,而现有子图挖掘算法将同一个模式的标号不同的具体子图视为不同的模式。基于模式增长的mSpan频繁模式挖掘算法通过最小扩展得到一个最小边权值编码和一个抽象节点序列编码对有向图唯一标识,解决了图模式的同构问题以及冗余扩展问题。算法中把边的方向依据扩展的方向设置为一个方向编码并入到边的权值编码中,可以唯一的确定一个图模式。实验表明,mSpan算法可以挖掘出输入数据中包含的所有频繁模式。
英文文摘 As the transaction between finance account has an in-out relationship naturally,a group that have trade relationship must form a network by the contact of accounts,and can be expressed by graph type directly.According to the finance anti-laundering research,there are lots of trade models in money-laundering behaviors,which corresponding to the subgraph models in trade network.Using frequent subgraph mining methods can help finding the frequent ones in these models. Analysis shows that the financial transaction network can use a map to the directed,weighted model. With nodes that account, while that transaction, the direction that the transfer of funds between accounts direction, the weight based on transaction information to set. Consider the background of very complex financial transactions, select one of several key attributes such as the amount, type-related enterprises, trading areas statute, into the weight transactions. Nowadays,the developed frequent subgraph mining algorithms aim at non-direction,non-weight marked graph mining mostly,but few aim at directed graph pattern mining.The graph pattern means a uniform expresstion of graphs that have different marked nodes but same structure.Most frequent subgraph mining algorithms treat them different patterns.The mSpan frequent pattern mining algorithm base on FP-growth uses a min expand way gets a min edges's code and a abstract nodes's codes sequence.It can mark a graph pattern one and only,and solve the graph pattern's isomorphic problem and the redundant expand problem.The algorithm combins the code of the direction of an edge into the weight of the edge as the instance of the expand direction.The experiment shows mSpan can mine all frequent directed,weighted graph models.