智能与分布计算实验室

A New Vector Space Model Exploiting Semantic Correlations of Social Annotations for Web Page Cluster

出版社:
  • 会议名称:The 12th International Conference on Web-Age Information Management (WAIM 2011)
  • 举办地点:Wuhan,China
  • 举办日期:September 14-16, 2011
  • 页数:106-117
摘要内容:

Text clustering can effectively improve search results and user experience of information retrieval system. Traditional text clustering approaches are based on vector space model, in which a document is represented as a vector using term frequency based weighting scheme. The main disadvantage of this model is that it cannot fully exploit semantic correlations between social annotations and document contents because term frequency based weighting scheme only captures the number of occurrences of terms in the document. However, social annotation of web pages implicates fundamental and valuable semantic information thus can be fully utilized to improve information retrieval system. In this paper, we investigate and evaluate several extended vector space models which can combine social annotation and web page text. In particular, we propose a novel vector space model by computing the semantic correlations between social annotations and web page words. Comparing with other vector space models, our experiments show that using semantic correlations between social tags and web page words improves the clustering accuracy with RI score increase of 4% ~ 7%.

关键词:
  • social annotation;clustering;information retrieval