Compsci 100, Spring 2010, RSG

You should snarf the RSG assignmentfiles from http://www.cs.duke.edu/courses/spring10/cps100/snarf or browse the code directory for code files provided. See the howto file for help and more instructions.

Genesis of Assignment

double spiral
(image source and it's grammar)
The Random Sentence Generator was one of the original (1999) SIGCSE/Nifty Assignments. This version uses regular expressions for parsing, Java's Collection classes, and is consequently more straightforward than the original.

There are many examples of randomly-generated text, graphics, art, and so on. The ones referenced here all use context free grammars like those you'll be using in this assignment. The AD Generator combines slogans with Flickr images to create random ads built on real slogans. The famous SCIgen project generates random computer science papers including those that were actually accepted for publication, albeit in shady conference venues. The site has videos and a complete description of the history of SCIgen.

Context Free Art includes information on generating different images using grammars and computerized drawing.

For example, here are some Duke Compsci excuses generated by this grammar.

I can't believe I haven't started working on this week's APT assignment. The problems were unbelievably hard and I couldn't find my computer .

I finished working on this week's APT assignment. The problems were like trivial and Eclipse crashed .

I gave up working on this week's APT assignment. The problems were really , really , so impossible and I got h1n1 .

I gave up working on this week's APT assignment. The problems were so , like easy and I had a midterm .

I finished working on this week's APT assignment. The problems were like trivial .


High Level View of Assignment

You'll do three things in this assignment.


Grammar Background

In Computer Science a grammar is basically a set of rules. Grammars can be used to recognize and validate text, e.g., a grammar from a programming language is used to parse, compile, or interpret programs written in the language. Grammars can also be used in a generative mode, to generate texts or images or sounds that conform to the rules of the grammar.

The format of the grammar used in this assignment is described briefly here, there are more details in the howto, but basically you're supposed to figure it out by looking at the examples.

A grammar processed by your program consists of a collection of definitions and rules for each definition. For example in the grammar below each definition is enclosed by curly-braces. The definition consists of the non-terminal being defined followed by the rules for that the definition. Random text is always generated beginning with the non-terminal <start> as can be seen in the examples shown above generated by this grammar.

{ <start> I <status> working on this week's APT assignment. The problems <description> .; } { <status> gave up ; finished ; am still ; can't believe I haven't started ; } { <description> were <adjective> <difficult> ; were <adjective> <difficult> and <excuse>; } { <adjective> really ; unbelievably ; so ; like ; <adjective> , <adjective> ; } { <difficult> hard ; impossible ; easy ; trivial ; } { <excuse> I had a midterm ; I got h1n1 ; I couldn't find my computer ; Eclipse crashed ; <excuse> and <excuse> ; }

By examining the randomly generated examples you can see how sometimes a string of adjectives is generated, e.g., like, really, really, so, unbelievably. In theory the length of this sequence of adjectives, generated by repeatedly choosing the last of the rules for the non-terminal <adjective>, could be arbitrarily long, but in practice choosing this rule happens with probability 0.2 (1/5) so choosing it repeatedly isn't too likely.

Some non-terminals, like <difficult> and <status> don't result in more rules/definitions being chosen. But the others do generate more choices and texts since the rules associated with the definitions also have non-terminals in them.


Grading

For this assignment you'll get engineering points for your choice and implementation in how you store the grammar and generate random text from your stored data structure. You'll get correctness/algorithmic points for how well your program adheres to the standard expansion rules which are described in the howto. There are no analysis points in this assignment. You'll submit the non-GUI version first (rsgI) and then the GUI-based version (rsgII). You're urged to work on the non-GUI one first, but this isn't required. You will, however, receive two grades: one for each submission.

Submit using the assignment target rsgI and rsgII from Eclipse.