Modern Information Retrieval

Objective | Textbook | Lecture Notes | Handouts | Projects | Readings | Links | Homepage

Notice

The time and address for class: Fall 2016, Week 2-9, Monday 3:55~5:30pm, D9-D212, Wednesday 2:00-3:35pm, C12-N303.

Objective

As the amount of online textual information (e.g., web pages, email, news articles, office documents, and scientific literature) grows explosively, it is increasingly important to develop tools to help us manage and exploit the huge amount of information. Web search engines, such as Google, Yahoo!, and MSN, are good examples of such tools, and they are now an essential part of everyone's life.

As the underlying science of search engines, information retrieval (IR) has been studied since several decades ago, but the huge impact of the research results of the information retrieval community had not appeared until the birth of the Web. Now information retrieval has become a very active research area, attracting more and more attention recently. The purpose of this course is two-fold: (1) Introduce the foundational concepts, principles, and techniques of IR and review a representative set of frontier topics. (2) Discuss the general methodology and specific strategies for doing research. Students will learn the basic principles and algorithms for managing text information and be exposed to some advanced frontier topics. They will also obtain hands-on experience on writing a research proposal and conducting IR research. A few lab works will complement the course for practical understanding of framework of the IR as well as its design.

Textbook

The course uses Introduction to Information Retrieval  by Ricardo B-Yates, Berthier R-Neto as the textbook.  The course has been given to Ph.D Candidates and part of master students in Chinese.

    Christopher D. Manning, Prabhakar Raghavan and Hinrich Schütze, An Introduction to Information Retrieval, Cambridge University Press. 2008

Details about the textbook:

Information retrieval (IR) has changed considerably in recent years with the expansion of the World Wide Web and the advent of modern and inexpensive graphical user interfaces and mass storage devices. As a result., traditional IR textbooks have become quite out of date and this has led to the introduction of new IR books. Nevertheless, we believe that there is still great need for a book that approaches the field in a rigorous and complete way from a computer-science perspective (as opposed to a user-centered perspective). This book is an effort to partially fulfill this gap and should be useful for a first course on information retrieval as well as for a graduate course on the topic.

The book comprises two portions which complement and balance each other. The core portion includes nine chapters authored or co-authored by the designers of the book. The second portion, which is fully integrated with the first, is formed by six state-of-the-art chapters written by leading researchers in their fields. The same notation and glossary are used in all the chapters. Thus, despite the fact that several people have contributed to the text, this book is really much more a textbook than an edited collection of chapters written by separate authors. Furthermore, unlike a collection of chapters, we have carefully designed the contents and organization of the book to present a cohesive view of all the important aspects of modern information retrieval.

From IR models to indexing text, from IR visual tools and interfaces to the Web, from IR. multimedia to digital libraries, the book provides both breadth of coverage and richness of detail. It is our hope that, given the now clear relevance and significance of information retrieval to modern society. the book will contribute to further disseminate the study of the discipline at information science, computer science, and library science departments throughout the world.

Lecture Notes

Instructors: Ruixuan Li, Xiwu Gu, Kunmei Wen

TAs: Xinhua Dong, Zhengyuan Xue

    Lecture 0.   Overview of the Course (PDF)
    Lecture 1.   Introduction to Information Retrieval, Boolean retrieval (PDF)
    Lecture 2.   Term vocabulary & postings lists (PDF) (PDF)
    Lecture 3.   Index construction & compression (PDF) (PDF)
    Lecture 4.   Scoring, term weighting & the vector space model
    Lecture 5.   Computing scores in a complete search system
    Lecture 6.   Language models for information retrieval
    Lecture 7.   Text classification & Naive Bayes Vector space classification (PDF)
    Lecture 8.   Support vector machines & machine learning on documents
    Lecture 9.   Flat clustering & Hierarchical clustering
    Lecture 10. Matrix decompositions & latent semantic indexing
    Lecture 11. Evaluation in information retrieval (PDF)
    Lecture 12. Relevance feedback & query expansion
    Lecture 13. Snippet generation based on GPU (PDF)
    Lecture 14. Web search (PDF)
    Lecture 15. Information retrieval on SSD (PDF) (PDF)
    Lecture 16. Micro-blog Information retrieval

Handouts and Solutions

No specific handouts. But the requirements are as follows:

    1.  Interactive discussions (10%)
    2.  High quality presentation (30%)
    3.  Good term paper (60%)  (the cover of the test paper)

Students' work: The presentation list preview.

Some useful notes on how to prepare and deliver a good presentation could be found here.

A list of questions you may want to keep in mind when reading papers could be found here.

Projects

Here is a small project about Lucene, Nutch, Hadoop, MapReduce.

Recommended Readings

1. Christopher D. Manning, Prabhakar Raghavan and Hinrich Schütze, An Introduction to Information Retrieval, Cambridge University Press. 2008
2. Baeza-Yates, R. & B. Ribeiro-Neto. eds. Modern Information Retrieval. Addison Wesley Longman Publishing Co. Inc., 2005
3. Witten, Ian et al. Managing Gigabytes. Orlando, FL: Morgan Kaufmann Publishers Incorporated, 1999
4. William Frakes & Ricardo Baeza-Yates, Information Retrieval Data Structures and Algorithms. PrenticeHall, 1992
5. Karen Sparck Jones & Peter Willet eds. Readings in Information Retrieval, Morgan Kaufmann, 1997
6. 李晓明,闫宏飞,王继民著. 搜索引擎--原理、技术与系统. 北京:科学出版社,2005
7. 李国辉等著. 信息的组织与检索. 科学出版社,2003

Useful Links

If you are ready to be devoted to database systems, you should cherish these web sites which directs you to numerous precious web resource. 


Ruixuan Li
School of Computer Science and Technology,
Huazhong University of Science and Technology 
Wuhan 430074, Hubei, P. R. China
Phone: +86-27-87544285
E-mail: rxli([at])sina([.])com