智能与分布计算实验室
  对等网环境下语义检索系统研究与实现
姓名 万宇涛
论文答辩日期 2007.05.31
论文提交日期 2007.06.22
论文级别 硕士
中文题名 对等网环境下语义检索系统研究与实现
英文题名 The Research and Implementation of P2P Semantic Search System
导师1 卢正鼎
导师2
中文关键词 对等网;文件共享;信息检索;语义搜索
英文关键词 P2P;file-sharing;Information Retrieval;semantic search
中文文摘 目前,P2P技术被广泛的用于网络节点之间的文件共享与搜索。采用P2P的搜索技术可以有效的跟踪数据的更新速度、提高访问的有效性以及检索的效率,同时有效地提高了共享资源的深度和广度。但现有的对等网络文件共享系统往往存在仅支持弱语义(甚至缺乏语义)的共享的局限性,不能有效地满足用户的需求。 在研究和分析当前主流信息检索算法的基础上,重点研究基于查询条件概率的统计语言模型,并引入机器翻译领域中的统计翻译模型,改进统计语言模型的经典算法?? 一元语言模型,提出基于概率翻译方法的一元语言模型检索技术。从而改善一元语言模型认为词汇间没有任何联系的假设前提,将词汇间的同义词因素以概率翻译的方式考虑进来,将经典语言模型方法中的查询条件的生成过程看作是由文档中出现的词汇通过翻译模型向查询条件中相关词汇的一个映射过程,一定程度上改善了一元语言模型文档词汇间无相关性的先天不足,从而获得较好的语义检索性能。 在此基础上,将改进的语义信息检索算法引入基于super-peer的P2P信息共享模型,建立支持语义的P2P信息共享模型,利用super-peer进行节点管理、信息转发和语义信息查询,既充分发掘 P2P 技术的潜在优势,消除传统集中式文件共享系统存在的资源发现效率和可扩展性等方面的局限性,又有效地支持了基于语义的检索技术。 最后,将设计的对等网语义检索模型应用于系统中,开发出原型系统,并解决系统实现中涉及的若干问题,并对实验结果和实际运行结果进行了分析,进一步以实验验证了利用此模型来实现P2P网络语义文档共享的有效性。
英文文摘 Presently, P2P technology is widely applied in file-sharing systems. P2P network breaks the traditional model of Client/Server, shortens the data-updating cycle, and enhances the searching efficiency. Meanwhile, it broadens the file-sharing scope. However, current P2P file-sharing systems generally don't support semantic search, thus can't fulfill the query needs effectively. Based on researches and analyses in commonly used IR model, especially the Statistical Language Model which is based on probability theory, a Probability Translation Method based Unigram Language Model is designed and established. This method develops from Unigram Language Model. In order to enhance the performance of semantic search, and it breaks the assumed precondition in Unigram Language Model considering thesaurus relation between words. The method considers the process of generating query condition as a process of translating the words in the document into the related words in query condition. The method solves the problem considering thesaurus relation between words, and enhances the performance of semantic retrieval. Based on the theory above, we propose a p2p semantic sharing model, which involves semantic IR model in super-peer based p2p file-sharing system. In this model, super-peer plays the role of peer management, message relay and semantic search. This model not only takes advantage of P2P’s recourse discovering efficiency and it’s expansibility, but also support semantic searching technology. Finally, a project of P2P semantic file-sharing system based on p2p semantic sharing model is designed and implemented. In order to prove the availability of the p2p semantic sharing model, experiment and system running test are involved.