智能与分布计算实验室
  基于Web挖掘的个性化推荐服务研究
姓名 丁一
论文答辩日期 2004.05.10
论文提交日期 2004.05.12
论文级别 硕士
中文题名 基于Web挖掘的个性化推荐服务研究
英文题名 The Research of Personalized Recommendation Based on Web Mining
导师1 卢正鼎
导师2
中文关键词 数据挖掘;个性化;信息检索;推荐服务;聚类分析
英文关键词 ata mining;Personalization;information search;recommendation server;clustering
中文文摘 随着网络技术的不断发展,如何利用数据挖掘技术从大量的网络信息中挖掘出对人们有用的资源,已经成为研究的热点问题。信息推荐技术需要解决的三个问题是:首先,要理解用户的需求;其次,是能高效、准确地执行查询任务;最后,能把结果很好地组织起来交给用户。目前比较成熟的信息推荐技术是采用面向网络信息来解决以上问题的。但是,这种方式很难执行好用户个性化的检索需求,因此提出了面向用户的个性化推荐模型。 通过对通用搜索引擎和元搜索引擎的研究,提出了个性化推荐模型,该模型分为离线部分和在线部分。离线部分由数据预处理和特定的访问挖掘任务组成,数据预处理将网络服务器的访问日志文件以及站点的相关文件生成用户文件和事务文件;特定的访问挖掘是利用聚类算法来生成网页聚类。模型的在线部分主要是利用离线部分生成的网页聚类,再根据用户的当前访问操作行为,动态地为用户推荐下一步访问操作。在线部分主要是由:用户接口、兴趣学习器、个性化分析器、推理器、网络数据连接管理器、个性化过滤器和网络服务器等组成。在线部分涉及到的关键算法有:兴趣学习算法、个性化分析算法、个性化过滤算法和推理算法,此外还对模型的一些简单的语法规则进行了定义,通过实验环境实现了一个界面简单的推荐模型。
英文文摘 With the development of network technology, how to use the data mining technologies to search on Web has come to be a hot research area in the information search field. Three problems need to be solved through information discovery technology on Web. Firstly, it must understand the user’s need correctly. Secondly, it can execute query tasks efficiently and accurately. Lastly, it is able to organize the results before showing them to user. Now the popular and mature IR technologies settle these problems in a network information objected method. However, the network information oriented IR technologies cannot understand and execute users’ personal need. In this dissertation, we bring forward a new model for information recommendation in Web, user-oriented information recommendation model. After combining advantages and disadvantages of the normal search model and the Meta search model, this paper raises another new model of personal information recommendation based on Web mining. The model can depart offline-model and online-model. Introduces the data preprocessing of the offline-model, and then discusses the special task of accessing mining. Data preprocessing is the step that brings users’ documents and events documents through accessing log documents of Web server and some documents of sites. Special task of access mining brings out Web URL clustering using the clustering algorithm. Online-model mainly uses the Web URL clustering based on current accessing operation of users, recommendation the following accessing operation dynamically. Online-model mainly consists of user interface, interest learner, personality analysis, case-based reasoning, Internet database connecter, personality re-sorting and Web server. And the key algorithm of online-model such as, interest learning algorithm, personal analysis algorithm, personal re-sorting algorithm and case-based reasoning algorithm were discussed. Besides these, some simple syntax rules of the model were defined. The whole structure of the model was displayed through experiments, which verify the capability of it.