基于本体的洗钱案例特征提取研究-智能与分布计算实验室

基于本体的洗钱案例特征提取研究

姓名	王光武
论文答辩日期	2007.05.31
论文提交日期	2007.06.10
论文级别	硕士
中文题名	基于本体的洗钱案例特征提取研究
英文题名	A Study On Feature Extraction in money-laundering cases Based on Ontology
导师1	李玉华
导师2
中文关键词	领域本体;洗钱案例;模式匹配算法;特征提取
英文关键词	Ontology;money-laundering case;algorithms of pattern matching;feature extraction
中文文摘	洗钱案例特征值是金融领域中判定洗钱活动的一项重要参考依据。在基于案例推理的监测甄别中，首要任务是将案例报告的特征值录入到案例库。由于洗钱案例报告自身所具有的信息隐藏性和非结构化性，使得这项工作尚处于人工操作阶段，在效率和准确率上难以达到要求。基于此，提出一种基于本体的特征提取方法，设计并实现了文本知识的自动获取。本体作为一种概念化的显示说明，是对客观存在的概念和关系的描述。通常情况下，本体的构建是在领域专家的指导下进行的。实际应用中，在分析了大量洗钱案例报告之后，将其抽象出一个概念模型，抽取其中能代表洗钱特性的关键字作为本体中的类。利用同样原理再定义相应类的子类以及子类与父类之间的属性关系，最后是定义实例和加入约束。在特征提取中，采用模式匹配和定义文法相结合的方法实现。模式匹配的功能是确定索引关键字出现在在文本向量中的位置；文法定义规定了被抽取数据的出现形式，数据定义提供了数据规格化的参考标准。此外，对模式匹配算法做了深入的研究，分析了各算法的优缺点和复杂度，并对现有算法做了改进。最后，设计了一个原型系统。系统是开发语言是Java，运行在B/S模式下。系统中使用了开源工具protégé 3.1进行本体的编辑和Jena 2.4进行本体解析，实验的输入数据来自官方提供的洗钱案例报告样本，输出形式为可以存储在关系数据库中的结构化数据。
英文文摘	The feature of money-laundering case is a important reference which could judge the activations of money-laundering in Financial fields. In the test of money-laundering with CBR(Case Based Reasoning), the first task is to input the features into the cases database. As money-laundering case reports themselves with the properties of information hidden, it makes this work is still at the stage of manual, the both of efficiency and accuracy are not Satisfactory. According to this ,the paper presents a method which is based on ontology. Since the introduction of Domain Ontology . it implemented the full text of knowledge acquisition and make people get out from the heavy manual labor. As a conceptualization of the explicit note ,Ontology is the description of conceptions and relations which is in the objective world. In the actual study , the construction of ontology is usually with the help of experts in the field . We analysis a large number of money-laundering cases reports and abstract the concepts into a model. Extract the keyword as the classes in the ontology , then definite subclasses and attribution relations of all kinds of classes. At last, it assignments all the instances for the classes which the instances are certain and add the restrictions between all the classes. In the knowledge acquisition , this paper uses a methods of pattern matching and text definition .The task of pattern matching is the certainty of position which index keyword in the text vector. The production rules and data rules of text definition constraint the formation of extracted information . In addition ,we make a deep study for the algorithm of pattern matching and improve the present algorithm. At last , the issues designed a simulated system, the system is developed with Java and is running the mode of B/S. In the process of developing , we use the open-source toolkit protégé to edit the Ontology and use Jena to apply the Ontology .The data for the input of experiment is from the samples of money-laundering case reports which supplied by the official agency, and the data of output is the structured data in the Database.