Lab 2: CPS100e: Vectors, Classes, Sorting, Searching

Introduction

This lab will provide practice using vectors, classes, and structs. It will also provide a good lead-in to part 2 of the second assignment. Be sure that you have created a "cps100e" subdirectory. If you haven't and don't know how, be sure to ask a TA for help. It's a good idea to make the directory readable by the professors teaching the course and by your lab TA. To do this you can type what's shown below (and substitute any login-id for "ola").

     fs setacl cps100e ola read

Change into your "cps100e" directory and create a "lab2" subdirectory (use "mkdir lab2"). Then change into your lab2 subdirectory and copy files by typing what's below (don't forget the trailing dot).

      cp ~ola/cps100e/lab2/*  .

From the "lab2" subdirectory, make a link to several text data files. To do this type

    ln -s ~ola/data data

Then you'll be able to use the directory "data" to access several big text files. You can type "ls data" to see what's in it.

If you type "ls" you should see the following files: Makefile, words.cc, logsearch.cc, logins.h, logins.cc (and data which is a link).

Use the menu-bar to invoke emacs or type "emacs &" to get an emacs window. You want to use the ampersand & which leaves you with control of your xterm (so you can run programs from it). Be sure to leave the emacs window up, don't quit it until you log off.

Counting Words

Use the emacs commands "C-x C-f" to find/load the file words.cc". Then compile this command using "C-x c" and type "make words" in the minibuffer. If this doesn't invoke the compile command you can always type "M-x compile" (that's ESCAPE-x compile).

When the program runs, you'll be prompted for the name of a file. Enter "data/poe.txt" and the program will keep track of all the unique words in Edgar Allan Poe's ``The Cask of Amontillado''. The program stores each word in a vector, and checks to see if a word is already present in the array before adding it again. The program prints how long reading takes, and how long sorting the words read in takes.

How long does it take to sort the words in Poe's ``cask'' __________________? How long to read them in ___________________? You should show these answers to a TA.

In its current version, the program prints all the words to the screen. You are to change this so that the words are printed to a file whose name you enter.

Replace the call "list.Print(cout)" with the code sequence below.

    ofstream output;
    cout << "output file: ";
    cin >> filename;
    output.open(filename);

    list.Print(output);

Now run the program again. Find the alphabetically fifth word in Poe's ``cask'', what is it ______________________ ?

Word Counting

You are to modify the word.cc program to print the number of times each unique word appears in the text.

The struct "WordList" contains a vector of strings. You're to change it so that it uses a vector of type "Info" defined below. (Do NOT forget the semi-colon after the } !!).

Define this struct before the class "WordList" and change the vector definition to "Vector mylist".

    struct Info
    {
        string word;
        int count;
    };

Try to compile the program. You'll get errors due to the change in the definition. For example, in "Search" you'll need to change the line:

    if (myList[k] == key)

to

    if (myList[k].word == key)

so that the "word" field is compared to key. In the "Update" function, if the word is already in "myList" then the count of how many times it occurs should be incremented. Note that "Search" is called in "Update". This is an example of one member function calling another member function.

If "key" is NOT in "myList", the new word should be added to the end of "myList" as is done in the original version of "Update".

You'll need to make some changes in "Sort" and Print" as well, but these changes are syntactic and based on the change to using "Info" vectors rather than "string" vectors.

How do you know if your program works correctly? You should create a small data file with about ten words, and test your program on it.

After your program works, change the "Sort" function so that it compares "count" fields of each item in the list rather than using an alphabetic comparison. Doing this requires changing ONLY the line that compares "list[k].word" and "list[min].word".

Then rerun the program with Poe's cask and determine what the top 5 most frequently occurring words are. Put these words and their counts, suitably labelled, in your README.

You should run the program on "data/melville.txt" as well, this should take longer. Put the top 5 most frequently occurring words and their counts from this file (Bartelby the Scrivener) into your README.

Login Search

Now load the file "logsearch.cc", compile it, and run it. When prompted for the name of an input file, enter "data/acpub.logs", a file that contains all names and logins of users on the acpub system.

Search (using the 'l' command) for yourself by entering your login. Then search using the 'n' command for anyone with your name. Search again using the 'n' command for anyone with 'owen' as part of their name. How many such users are there ________________________ ?

You should study the programs "logsearch.cc" and "logins.cc" carefully since they can help with the CD data base assignment. You should probably print these files.

Re-reading

In the current program, to read another file of logins (e.g., one from computer science machines) you quit the program and run it again. Add an option for the user specified by the character 'r' that will read logins from a file specified by the user. To do this, move the code that prompts for a file name and calls "ReadLogs" into a new function called "DoRead". You should do this using the mouse to cut-and-paste (see the Edit menu). Drag over some text with the mouse while pressing the left button to highlight it. Then choose cut (or copy) followed by paste from the Edit menu. If you don't have an Edit menu you can cut a chunk of text by putting the cursor at the beginning of the text and marking the start by typing "C-' '", that's control-spacebar. Then move the cursor to the end of the block of text you want to copy. You can cut this block of text by typing "C-w" (control-W) and you can copy it by typing "M-w" (escape-w). To paste it move the cursor and type "C-y". You can undo without a menu by typing "C-x u" (control-x followed by u).

Before opening a new file, you'll need to close the current stream. To close the stream called open, you would use:

    input.close();

Printing the program

Practice printing. To print the file logins.cc by typing

      print -Pteerlp1 logins.cc

If this doesn't work ask a TA. (You can substitute another printer for teerlp1).

Submission

To turn in programs: When your modified programs works, you can submit them, along with a README describing how long it took you and what your impressions of the lab are. If you edit the Makefile, you can type "make submit" to submit the appropriate files, or you can type the command below (Replace N in lab2secN with your section number: 1, 2, or 3).

submit100e lab2secN README words.cc logsearch.cc

The README file should include your name, how long you worked on the program, and anyone you received significant help from. You should also include any comments you have about the lab.

You should receive a message telling you that the programs were submitted correctly. If you get a message indicating that "submit100e" isn't found, you can always type the full path name:

~ola/bin/submit100e lab2secN README words.cc logsearch.cc