Assignments
LING 467

New assignments will be added here each week.

Assignment 1 -- Due 23 Jan 08

1. Reading:
 a. Chapter 1 of Modern Information Retrieval by Baeza-Yates and Ribeiro-Neto
 b. Article by Grefenstette and Tapanainen
    What is a Word? What is a Sentence? (1994)

2. Send me email that provides me with the following information.

Your name
Your email address
Your status (undergrad, grad, which program you are in)
What prior programming experience do you have?
What would you like to get from this course?
Do you have a personal computer? If so, what kind?

3. Download perl and install it on your machine.


Assignment 2 -- Due 30 Jan 08

1. Reading
Chapter 2 of Information Retrieval by van Rijsbergen.

2. Program
Write a perl program to find words in a text file, count them and
display a frequency list for the  words in the text.  Your program
should use a subroutine that accepts a string as input and returns
a list of the words found.  Try to make your function be a good one,
not just the simplest thing that works.
  • Here is some Test Data that could be used to test your program. 3. Find at least one example of some string that would be problematic for an English word finding program, e.g. "4x4" or "$1million". (Send me an email with your example(s)). Bonus Points! Send at least one problematic example from a language other than English. Please include a translation and a full description of why your example might give an IR system difficulties

  • Assignment 3 -- Due 6 Feb 08

    1. Reading - Chapter 3 of Baeza-Yates/Ribeiro-Neto.
       Also read the web pages on Zipfian distributions
       and stopwords.
    
    2. Program
    Refine your word finding program. This time, you should
    make a subroutine called TokenizeWords that takes a
    text string as an arguement and returns an array of the
    words in the string. Your program should work with
    the driver program provided HERE. 
    
    

    Assignment 4 -- Due 12 Mar 08

    1. Study for midterm
    The midterm will cover everything in the course so far. 
    This includes anything covered in lecture, material in the 
    readings and the assignments.
    
    2. Program
    Begin work on you indexer. For this assignment, your indexer 
    need only handle the documents enough to create the 
    Document Information File. Later assignments will enhance
    this version to create the other two files.
    
    

    Assignment 5 -- Due 19 Mar 08

    1. Enhance your indexer so that it generates all three files of the index.
    

    Assignment 6 -- Due 26 Mar 08

    1. Read the WikiPedia page on the Google PageRank algorithm.
    
    2. Continue working on your indexer. Add in the additional information
    that you need to support TF-IDF and any other features that you may want 
    to use for your search engine.
    
    3. Write up a preliminary idea about what you will do with your search 
    engine that makes it uniquely yours. This is preliminary. If you 
    find that you want to change it later, that will be alright.