Modern Information Retrieval
Objective | Textbook | Lecture Notes | Handouts | Projects | Readings | Links | Homepage
Notice
The time and address for class: Fall 2016, Week 2-9, Monday 3:55~5:30pm, D9-D212, Wednesday 2:00-3:35pm, C12-N303.
As the amount of online textual information (e.g.,
web pages, email, news articles, office documents, and scientific literature)
grows explosively, it is increasingly important to develop tools to help us
manage and exploit the huge amount of information. Web search engines, such as
Google, Yahoo!, and MSN, are good examples of such tools, and they are now an
essential part of everyone's life.
As the underlying science of search engines, information retrieval (IR) has been
studied since several decades ago, but the huge impact of the research results
of the information retrieval community had not appeared until the birth of the
Web. Now information retrieval has become a very active research area,
attracting more and more attention recently. The purpose of this course is
two-fold: (1) Introduce the foundational concepts, principles, and techniques of
IR and review a representative set of frontier topics. (2) Discuss the general
methodology and specific strategies for doing research. Students will learn the
basic principles and algorithms for managing text information and be exposed to
some advanced frontier topics. They will also obtain hands-on experience on
writing a research proposal and conducting IR research. A few lab works will complement the course for
practical understanding of framework of the IR as well as its design.
The course uses Introduction to Information Retrieval by Ricardo B-Yates, Berthier R-Neto as the textbook. The course has been given to Ph.D Candidates and part of master students in Chinese.
Christopher D. Manning, Prabhakar Raghavan and Hinrich Schütze, An Introduction to Information Retrieval, Cambridge University Press. 2008
Details about the textbook:
Information retrieval (IR) has changed
considerably in recent years with the expansion of the World Wide Web and the
advent of modern and inexpensive graphical user interfaces and mass storage
devices. As a result., traditional IR textbooks have become quite out of date
and this has led to the introduction of new IR books. Nevertheless, we believe
that there is still great need for a book that approaches the field in a
rigorous and complete way from a computer-science perspective (as opposed to a
user-centered perspective). This book is an effort to partially fulfill this gap
and should be useful for a first course on information retrieval as well as for
a graduate course on the topic.
The book comprises two portions which complement and balance each other. The
core portion includes nine chapters authored or co-authored by the designers of
the book. The second portion, which is fully integrated with the first, is
formed by six state-of-the-art chapters written by leading researchers in their
fields. The same notation and glossary are used in all the chapters. Thus,
despite the fact that several people have contributed to the text, this book is
really much more a textbook than an edited collection of chapters written by
separate authors. Furthermore, unlike a collection of chapters, we have
carefully designed the contents and organization of the book to present a
cohesive view of all the important aspects of modern information retrieval.
From IR models to indexing text, from IR visual tools and interfaces to the Web,
from IR. multimedia to digital libraries, the book provides both breadth of
coverage and richness of detail. It is our hope that, given the now clear
relevance and significance of information retrieval to modern society. the book
will contribute to further disseminate the study of the discipline at
information science, computer science, and library science departments
throughout the world.
Instructors: Ruixuan Li, Xiwu Gu, Kunmei Wen
TAs: Xinhua Dong, Zhengyuan Xue
Lecture 0. Overview of the Course (PDF)
Lecture 1. Introduction to Information Retrieval,
Boolean retrieval
(PDF)
Lecture 2. Term vocabulary &
postings lists (PDF) (PDF)
Lecture 3. Index construction & compression (PDF)
(PDF)
Lecture 4. Scoring, term weighting & the vector
space model
Lecture 5. Computing scores in a complete search
system
Lecture 6. Language models for information
retrieval
Lecture 7. Text classification & Naive Bayes
Vector space classification (PDF)
Lecture 8. Support vector machines & machine
learning on documents
Lecture 9. Flat clustering & Hierarchical
clustering
Lecture 10. Matrix decompositions & latent semantic indexing
Lecture 11. Evaluation in information retrieval (PDF)
Lecture 12. Relevance feedback & query expansion
Lecture 13. Snippet generation based on GPU (PDF)
Lecture 14. Web search (PDF)
Lecture 15. Information retrieval on SSD (PDF)
(PDF)
Lecture 16. Micro-blog Information retrieval
Handouts and Solutions
No specific handouts. But the requirements are as follows:
1.
Interactive discussions (10%)
2. High quality presentation (30%)
3. Good term paper (60%) (the
cover of the test paper)
Students' work: The presentation list preview.
Some useful notes on how to prepare and deliver a good presentation could be found here.
A list of questions you may want to keep in mind when reading papers could be found here.
Here is a small project about Lucene, Nutch, Hadoop, MapReduce.
1.
Christopher D. Manning,
Prabhakar Raghavan and
Hinrich Schütze,
An Introduction to Information Retrieval, Cambridge
University Press. 2008
2. Baeza-Yates, R. & B.
Ribeiro-Neto. eds. Modern Information Retrieval. Addison Wesley Longman
Publishing Co. Inc., 2005
3. Witten, Ian et al. Managing Gigabytes. Orlando, FL: Morgan Kaufmann
Publishers Incorporated, 1999
4. William Frakes & Ricardo Baeza-Yates, Information Retrieval Data
Structures and Algorithms. PrenticeHall, 1992
5. Karen Sparck Jones & Peter Willet eds. Readings in Information Retrieval,
Morgan Kaufmann, 1997
6. 李晓明,闫宏飞,王继民著.
搜索引擎--原理、技术与系统. 北京:科学出版社,2005
7. 李国辉等著. 信息的组织与检索. 科学出版社,2003
If you are ready to be devoted to database systems, you should cherish these web sites which directs you to numerous precious web resource.
Information Retrieval and Web Search at Stanford, Spring 2012, Instructor is Chris Manning and Pandu Nayak.
Advanced Topics in Information Retrieval at UIUC , Fall 2012, Instructor is Chengxiang Zhai.
Information Retrieval Course at CMU, Spring 2012, Instructor is Jamie Callan and Yiming Yang.
Information Retrieval and Web Search at Utexas, Fall 2012, Instructor is Raymond J. Mooney.
Information Retrieval Course at UMASS, Fall 2010, Instructor is James Allan.
Intelligent Information Retrieval at Depaul U., Winter 2006, Instructor is Bamshad Mobasher.
Information Retrieval and Extraction at Taiwan U., 2005, Instructor is Xinxi Chen.
Ruixuan
Li
School of Computer Science and Technology,
Huazhong University of Science and Technology
Wuhan 430074, Hubei, P. R. China
Phone: +86-27-87544285
E-mail:
rxli([at])sina([.])com