Home


EECS767 Information Retrieval

Overview

This class talks about algorithms and applications for retrieving information from large document repositories, including the Web. We will cover classic IR methods for text documents and databases, as well as more recent developments in web search. Topics include text algorithms, probabilistic modeling, performance evaluation, indexing, clustering, web structures, multimedia information retrieval, social network analysis, and etc.

Time and Location:

Class: M 6:10PM - 9:10PM, Regnier H 163 (Edwards Campus)
Calendar: you can view/subscribe this Google Calendar
Office hours: M 4:00pm - 6:00pm, or by appointment
Instructor: Bo Luo (bluo <at> ku <dot> <edu>), AIM: bluoku, Google Talk: through homepage.
Grader: Prashanth Ramani (pramani <at> ittc <dot> ku <dot> edu)

Textbook:

Introduction to Information Retrieval, by Christopher D. Manning, Prabhakar Raghavan and Hinrich Schütze, Cambridge University Press. 2008.

References:

Modern Information Retrieval, by Ricardo Baeza-Yates and Berthier Ribeiro-Neto, ACM Press, 1999.

Information Retrieval: Data Structures and Algorithms, by William B. Frakes and Ricardo Baeza-Yates, Prentice Hall, 1992.

The Geometry of Information Retrieval, by C. J. van Rijsbergen,Cambridge University Press, 2004.

Tasks and grading:

Homeworks (30%): 6 homeworks (lowest score will be dropped, 6% each). email your answers to our grader and cc to instructor before 11:59pm on the due day.
Projects (40%): work in groups of 3 (groups will be assigned on 02/02). Project details can be found here.
Midterm exam (20%): closed book, closed notes, one page cheat sheet (letter size, double sided) allowed.
Class participation (10%): students are expected to sit in class and participate in class discussions.

Policies:

Academic Integrity

"Academic integrity is a central value in higher education. It rests on two principles: first, that academic work is represented truthfully as to its source and its accuracy, and second, that academic results are obtained by fair and authorized means. 'Academic misconduct' occurs when these values are not respected." -- Office of the Vice Provost for Student Success.

Office of the Vice Provost for Student Success Academic Integrity Webpage

University Policy (Academic Misconduct)

Schedule

Week
Date
Schedule
HW
Due
Week 1
-
-    
Week 2 01/19 MLK holiday    
Week 3 01/26 0. course intro (color); 1. intro to IR (color) HW1  
Week 4 02/02 2. evaluation(color) - IIR.08, 3.1. boolean model (color) - IIR.01 HW2 HW1
Week 5 02/09 3.2. vector model (color) - IIR.06 HW3 answer
Week 6 02/16 4. text algorithms   HW3
Week 7 02/23 5. IR systems, evaluation revisited; 7.2 meta search; HW41  
Week 8 03/02 6. relevance feedback and query expansion; group discussion   HW4
Week 9 03/09 7. metadata, midterm review   report 1 *
Week 10 03/16 Spring break    
Week 11 03/23 Midterm    
Week 12 03/30 9. web search 1: crawling    
Week 13 04/06 10.web search 2: link analysis, PageRank HW5 report 2
Week 14 04/13 11. classification and clustering   HW5
Week 15 04/20 12. social network analysis HW6  
Week 16 04/27 13. Vertical search, multimedia information retrieval    
Week 17 05/04 Project presentation (& pizza)   HW6
  05/17 Extended deadline for porject report. Firm!   Final report

* Report 1 due: Sunday, 03/15.
1 Online stemmer available here.

Project

This project is designed for students to apply their knowledge into real world applications. Students are expected to design and implement a working meta-search engine, which sends user queries to multiple search engines, aggregates the results to improve search quality.

  • Task 1: build a basic general-purpose meta-search engine, which queries at least 3 independent search engines.
  • Task 2: open-ended problem: add a cool feature to your basic meta-search engine, e.g. result clustering, relevance feedback, image meta-search, etc.
  • Note: you can use any programming language and SDK. However, you should NOT use any open-source or third party code, except for HTML generated by WYSIWYG editors.

Project Report

The group report is to show what you did in the project, especially, how you designed and implemented the meta-search engine. The first interim report shall include overall system design and aggregation algorithm for task 1. The second interim report shall include the implementation and test results of task 1, as well as preliminary design of task 2. The final report shall include everything. You are allowed (and suggested) to reuse the contents from previous reports. Please clearly state your results, and carefully proofread and edit the paper before submission.

You are expected to hand in a final report in the following formats:

  • Major sections organized as Introduction, Meta-search algorithms, System design, Implementation notes, Test results, Necessary Appendixes, etc.
  • A cover page (including project title) with group name and group members
  • A table of contents with page numbers
  • Using double-spaced typing for convenient grading
  • E-copies only. Email to our grader, cc to the instructor and all team members.

Project Log

With each report, please include an project log, which describes your activities in this project.

  • Clearly state the responsibility of each group member. If possible, give a table to tell who did which task, who implemented which component, who wrote which part of the report, who coordinated the group work activities, etc.
  • Give a log of your group activity, such as what you did on which day, and how many people attend.

Grading (40 points total)

  • Report 1: 5 points
  • Report 2: 5 points
  • Final report: 10 points
  • Presentation: 5 points
  • Performance and code quality: 10 points
  • Project log: 5 points
  • Individual adjustment based on project log: -20 points to +20 points

References

http://en.wikipedia.org/wiki/Metasearch_engine http://www.lib.berkeley.edu/TeachingLib/Guides/Internet/MetaSearch.html
http://portal.acm.org/citation.cfm?id=256164
https://www.aaai.org/ojs/index.php/aimagazine/article/viewArticle/1290
http://portal.acm.org/citation.cfm?id=383952.384007
http://portal.acm.org/citation.cfm?id=505284