基于聚类的图像拷贝检测索引系统的研究-智能与分布计算实验室

基于聚类的图像拷贝检测索引系统的研究

姓名	王越
论文答辩日期	2008.06.05
论文提交日期	2008.06.10
论文级别	硕士
中文题名	基于聚类的图像拷贝检测索引系统的研究
英文题名	Research on Index Structure Based on Clustering in Content-based Image copy detection
导师1	卢正鼎
导师2
中文关键词	基于内容的图像拷贝检测;聚类分析;树形索引
英文关键词	Content-based Image Copy Detection;Cluster Analysis;index
中文文摘	伴随着信息技术特别是Internet技术的不断发展，多媒体信息不断涌现，图像数据飞速增长，数字图像的复制和转播变得十分便捷。如何对多媒体的传播和使用进行有效地跟踪和监控已经成为多媒体的版权保护方案中重要部分。目前，拷贝检测作为解决这类问题的一种具有相当潜力的技术，因此拷贝检测技术逐渐成为多媒体安全领域的研究热点。传统的基于内容的图像拷贝检测索引技术使用的是顺序检索，对于大容量、高维数的图像数据来说，这种检索方法在效率上显然已经不能满足需要。对图像数据库进行必要的预处理并且建立索引以提高检索效率显得愈发重要。首先引入了k-d树，用于图像多维特征点的索引结构，并对其进行改进，使之适应与维度比较高的情况，但是图像拷贝的查全率和查准率有所下降。然后又引出了对高维特征向量先做聚类处理的思想，使用了k-means及其改进之后的聚类方法，对高维特征向量做聚类预处理，然后对聚类中心做k-d树的索引结构，这样在不提高算法复杂度的情况下，可以比较有效的提高图像拷贝的查全率。在以上提出的理论基础之上，设计了一个基于内容的图像拷贝检测索引实验平台，通过实验数据，从索引效率和索引结果进行比较，实验表明基于聚类的树形索引结构在基于内容的图像拷贝索引中的高效、实用。
英文文摘	Along with the development of information technology,especially the development of Internet technology,multi-media information,including image data,is rapid growing.It becomes more and more convenient to copy and disseminate digital images.How to track and monitor the copy and disseminate of multimedia effectively is become a important part in multimedia copyright protection scheme. At present，Content-based Image copy detection has been a Potential technology for resolve this problem. Therefore, the Content-based Image copy detection (CBICD) has been drawing more and more research attention on multimedia security area in the recent years. The traditional CBICD is ordinal retrieval.However, for large-volume and high- dimension image data,this retrieval method obviously has been unable to meet efficiency.It is more important that the image database should be preprocessed and establish index to improve retrieval efficiency. As so, this paper introduces a k-d tree index structure first, meanwhile,the author improves the algorithm to adapt Multidimensional situation, but the Recall rate and the precision ratio is not ideal. So, this paper introduces a ideology that clustering the Multidimensional Eigenvector first. The clustering method is k-means clustering, and then improves the method. The main idea is that preprocess the Multidimensional Eigenvector with improved k-means clustering method first, and then make a k-d tree index structure with clustering centre. Using this method, the Recall rate increases apparently. The theoretical basis of the above, the author designs a CBICD experimental platform.Compared by retrieval efficiency and retrieval result, it is convincingly proved that hierarchical index structure based on clustering is efficient and applicable in CBICD.