Home


EECS767 Information Retrieval, Spring 2010

Important: we use KU Blackboard in this course.

Overview

This class talks about algorithms and applications for retrieving information from large document repositories, including the Web. We will cover classic IR methods for text documents and databases, as well as more recent developments in web search. Topics include text algorithms, probabilistic modeling, performance evaluation, indexing, clustering, web structures, multimedia information retrieval, social network analysis, and etc.

Time and Location:

Class: Thursdays 6:10PM - 9:10PM, (Edwards Campus)
Office hours: Th 3:30pm - 5:30pm, or by appointment
Instructor: Bo Luo (bluo <at> ku <dot> <edu>), AIM: bluoku, Google Talk: through homepage.
Grader: TBA

Textbook:

Introduction to Information Retrieval, by Christopher D. Manning, Prabhakar Raghavan and Hinrich Schütze, Cambridge University Press. 2008.

References:

Modern Information Retrieval, by Ricardo Baeza-Yates and Berthier Ribeiro-Neto, ACM Press, 1999.

Information Retrieval: Data Structures and Algorithms, by William B. Frakes and Ricardo Baeza-Yates, Prentice Hall, 1992.

The Geometry of Information Retrieval, by C. J. van Rijsbergen,Cambridge University Press, 2004.

Tasks and grading:

Homeworks (25%): 6 homeworks (lowest score will be dropped, 5% each).
Projects (35%): two projects: work in groups of 2-3. Project details can be found here.
Exams (35%): two exams: closed book, closed notes, one page cheat sheet (letter size, double sided) allowed.
Class participation (5%): students are expected to sit in class and participate in class discussions.

A: 90+ or top 30%
B: 75+ or top 60%
C: 60+ or top 90%
D/F: 0-59

Policies:

Academic Integrity

"Academic integrity is a central value in higher education. It rests on two principles: first, that academic work is represented truthfully as to its source and its accuracy, and second, that academic results are obtained by fair and authorized means. 'Academic misconduct' occurs when these values are not respected." -- Office of the Vice Provost for Student Success.

Office of the Vice Provost for Student Success Academic Integrity Webpage

University Policy (Academic Misconduct)

Schedule

Week
Date
Schedule
HW
Due
Week 1 01/14 0. course intro; 1. intro to IR    
Week 2 01/21 2. evaluation - in-class mini-project    
Week 3 01/28 3.1. boolean model 3.2. vector model    
Week 4 02/04 4. text algorithms    
Week 5 02/11 5. IR systems, evaluation revisited; meta search;    
Week 6 02/18 6. relevance feedback and query expansion; group discussion    
Week 7 02/25 7. metadata, midterm review    
Week 8 03/04 Exam I    
Week 9 03/11 Project presentation    
Week 10 03/18 Spring break    
Week 11 03/25 9. web search I: crawling    
Week 12 04/01 10.web search II: link analysis, PageRank    
Week 13 04/08 11. classification and clustering    
Week 14 04/15 12. social network analysis    
Week 15 04/22 Exam II    
Week 16 04/29 13. Vertical search, multimedia information retrieval    
Week 17 05/06 Project presentation    
         

Online stemmer available here.

Projects

The projects are designed for students to apply their knowledge into real world applications. In project 1, students will design and implement a working meta-search engine, which sends user queries to multiple search engines, aggregates the results to improve search quality. In project 2, students are expected to deploy an open-source search engine for vertical search.

  • Project 1 (15 points): build a basic meta-search engine, which queries at least 3 independent search engines.
    • Note: you can use any programming language and SDK. However, you should NOT use any open-source or third party code, except for HTML generated by WYSIWYG editors.
  • Project 2 (20 points): download, install, and configure an open-source search engine, test it with some manually collected documents (10 points). Implement a niche crawler to collect documents and feed them to the search engine (10 points).

Project Report

The group report is to show what you did in the project, especially, how you designed and implemented the meta-search engine, how you implemented the crawler, etc. Each group is exptected to submit two reports -- one for each project. Please clearly state your overall approach, design, implementation and results, and carefully proofread and edit the report before submission.

You are expected to hand in a reports in the following formats:

  • Major sections organized as Introduction, Algorithms (if any), System design, Implementation notes, Test results, Necessary Appendixes, etc.
  • A cover page (including project title) with group name and group members
  • A table of contents with page numbers
  • Using double-spaced typing for convenient grading
  • E-copies only. Email to our grader, cc to the instructor and all team members.

Project Log

With each report, please include an project log, which describes your activities in this project.

  • Clearly state the responsibility of each group member. If possible, give a table to tell who did which task, who implemented which component, who wrote which part of the report, who coordinated the group work activities, etc.
  • Give a log of your group activity, such as what you did on which day, and how many people attend.

References

http://en.wikipedia.org/wiki/Metasearch_engine http://www.lib.berkeley.edu/TeachingLib/Guides/Internet/MetaSearch.html
http://portal.acm.org/citation.cfm?id=256164
https://www.aaai.org/ojs/index.php/aimagazine/article/viewArticle/1290
http://portal.acm.org/citation.cfm?id=383952.384007
http://portal.acm.org/citation.cfm?id=505284