智能与分布计算实验室
  基于数据挖掘的外汇交易行为分析研究
姓名 郭洁
论文答辩日期 2006.05.08
论文提交日期 2006.05.15
论文级别 硕士
中文题名 基于数据挖掘的外汇交易行为分析研究
英文题名 Research on Behavioral Analysis in Foreign Exchange Transaction based on Data Mining
导师1 卢正鼎
导师2
中文关键词 行为分析,数据挖掘,属性离散化,聚类,超图模型
英文关键词 Behavioral Analysis,Data Mining,Attribute Discretization,Clustering,Hypergraph model
中文文摘 外汇交易中,一些企业由于企业本身特点或从事业务的相近有着相似的交易行为,根据这些天然特征描述交易主体的行为特点,以行为标准分组交易主体,有利于发现海量交易数据中有着相似交易行为的交易主体,以此帮助监管部门深入认识交易主体,熟悉相似交易主体在交易行为表现上的共性,摸清他们的交易行为规律,也为外汇管理的宏观决策提供有用的信息。 利用数据挖掘进行外汇交易行为分析,结合应用背景,主要分两个阶段来实现:数据预处理和行为分析。数据预处理主要为行为分析做准备,是一个整合数据的过程,其中主要包括数据抽取、特征属性选择、属性离散化、数据模型建立这样几个步骤,最后形成交易行为描述表;行为分析分两步完成,一步是数据转换,这是为了将交易行为描述表转化为描述交易主体的交易主体表,第二步是采用基于超图的聚类算法完成交易主体的分组,从而达到行为分析的目的。 在数据预处理中,首先要解决一些属性的离散化问题。采用了一种使用滑动窗口进行统计分析从而达到离散化目的的无监督离散化算法,将滑动窗口在整个取值域内滑动,每次只考虑滑动窗口内的统计分布状况,这样降低了算法复杂度,从而解决了取值域跨度大且区域内分布极不均衡的属性离散化问题。该算法简单,易于实现,通常只需对统计直方图扫描一次就能完成属性的分段。尽管属性取值域很大,但仍能基本按照数据分布状态进行合理分段。 考虑到传统聚类算法多用数据之间的距离尺度来衡量数据间的相似度,对于高维分类数据的聚类效果不佳,基于超图模型的行为分析方法则利用关联规则建立超图模型,用超边包含的关联规则置信度和超边间联系的紧密来描述交易主体行为的相似性,然后对超图分割来实现聚类,按行为特征分组交易主体,以达到发现有相似交易行为的交易主体的目的。同时在聚类的过程中,采用超边粗化数据对象来减小数据集的规模,使得该算法适用于数据量大的数据聚类。
英文文摘 In foreign exchange transaction, some enterprises may have similar behavior because of their own feature or similar business. It is helpful to discover these business objects which have similar behavior that these nature feature are used to describe the behavior characteristic of business objects, and these business objects are grouped according to behavior standard. It can make the supervisory department more deeply recognize business objects, understand the common behavior features of similar objects, and know the real situation of business behavior. In conclusion, it is a good supplementary for the management and supervision of foreign exchange to discover and cluster these business objects which have similar behavior through those natural features. There are two phases to carry out behavioral analysis of foreign exchange transaction using data mining and combining practical application which are data preprocess and behavioral analysis. Data preprocess is the basis of behavioral analysis, and it is a process of preparing data including choosing data, selecting attributes, building data model and so on, then business behavior table is produced. Behavioral analysis consists of two steps. One is data transforming which finishes the transform between business behavior table and business object table , the other is to complete clustering business objects by Hypergraph- based Clustering Algorithm. It is the chief task to resolve attribute discretization before describing business behavior. In this paper we describe a unsupervised discretization algorithm by statistical analysis with slip window which bring a good result for continuous attribute of large value range and asymmetric distributing. Added to this, it is simple and easy to be achieved, and is usually completed through scanning statistical histogram once. Though the value range of the attribute is large, we can gain the reasonable subsections according to the distributing situation. In view of the conventional clustering algorithm which scale the similarity between objects through the distance metric and not get a good cluster purpose for multidimensional categorical data, a new hypergraph-based behavioral analysis method builds hypergraph model using association rules, describes the similarity metric through the confidence of association rules in every hyperedge and tightness among the hyperedges, and carry through hypergraph partition to cluster objects according to the feature of foreign exchange dealing. In this way, it is achieved that those business objects of similar behavior can be discovered. At the same time, it uses hyperedge to reduce the scale of data set in the process of clustering. In this way, it is applicable for clustering large data set.