OOOH-KWIC


In addition to the original specifications, your team must add the following functionality to your program. These features emphasize specific design issues and are meant to help you think about what it means to "hard-code" things within your program and how to allow the user to gain access to those things when using your program.

Specifications

A typical architecture for many computer programs is one that divides a program's execution into three stages: input, data is provided to the program; process, that data is transformed; and output, the program displays the results of transforming that data. This input/process/output (IPO) model of programming is used in simple programs like this one as well as in million-line programs that forecast the weather or predict stock market fluctuations. Your final program should be clearly separated into three independent modules such that each contains one or more classes that make it flexible enough to accommodate a variety of options without requiring either of the other modules to change. To do this, you must think carefully about what the result of each step is so that it can be safely received by the next step.

The requirements for each module are described below:

Input

Your program should be able to read text files from a variety of sources. For example, 

Process

Your program should be flexible in how it orders and chooses its keywords. For example, 

Output

Your program should be flexible in the formatting of the output. For example,

Your team may implement all of these options or additional options to distinguish itself from the masses (i.e., for extra credit). However, note that the amount of extra credit will be in proportion to the amount of intellectual effort needed to implement the option. For example, accepting regular expressions in addition of exact words would be worth a lot of credit because it would require learning about regular expressions and mastering the available implementation. On the other hand, adding yet another way to set apart the keyword on a line would not be worth very much. Of course, a well-tested, perfectly working program that has fewer features (but plenty of clear paths to easy expansion) is always worth more than the leaky kitchen sink.

In short, to maximize your grade, you should implement enough variety in your program to clearly demonstrate that your design supports such extensions.

Options

By default, your program should work as described in the original specifications. However, if there is a file called kwic.properties in the directory where the program is being run, then it should be able to customize the output based on the following options. You can add other options for extra credit.
Option Format
Default
Description
before=<int> 3 maximum number of words of context to print before the keyword
after=<int> 3 maximum number of words of context to print after the keyword
order=<string(s)> alphabetical

output order: length means by word length; number means by most occurrences; chronological means by first appearance; reverse means opposite order

Any number of these options may appear as part of this option separated by spaces. The order in which each appears determines its order of importance. For example, given the option "length number alphabetical", the output should be sorted first by the length of the word, then by number of appearances, and finally, if both of those are equal, then by alphabetical order.

offset=<string> none output keyword should surrounded immediately before and after by string given in option
color=<#hexcolor> none output keyword in the given browser color (only valid in HTML output)
aligned=<boolean> true output keywords such that they are aligned in a column
min=<int> 3 minimum number of letters in a word to be considered a keyword in the concordance
exclude=<filename> none exclude words from the given file from being a keyword in the concordance
include=<reg_exp> all exclude all words fom being keywords except those that match the given expression
max=<int> all maximum number of occurrences of keyword to print at a time
output=<string> text output format: either html or text

Deliverables

  1. Monday, February 7. Submit a README containing the address of your website that contains:
    1. a name for your team's "company"
    2. a description of team's your shared vision of the project (think of this as the advertising blurb on the box in which your software will be sold)
    3. a Programmer's Manual by justifying whose KWIC implementation you intend to use and noting the specific changes you think will be necessary in the current code to implement this project's new features.
    We will check this page frequently to check on your progress, so you will need to update this web site as you develop your project.
    Before submitting this deliverable, you must discuss your design and plan with your mentor TA.
  2. Friday, February 11. Submit a program that, at a minimum, should be able to
    1. process an options file such that its values override the default values
    2. read from or write to multiple formats (not necessarily both and not necessarily perfectly)
    3. sort the output in a variety of ways
    Your web site should include a current version of your user and programmer manuals that correctly describes the current implementation.
    Before submitting, you must demo this program with your mentor TA.
  3. Tuesday, February 15. Submit the final version of program, including all user and programmer documentation.
    Before submitting, you must demo this program with your mentor TA.
  4. Thursday, February 17. Your individual project analysis is due.