Steganography, Compsci 100, Fall 2009

You should snarf assignment stego from http://www.cs.duke.edu/courses/fall09/cps100/snarf or browse the code directory for code files provided. See the Stenagraphy howto file for help and more instructions.

The code provided in this assignment uses the Picture class provided as part of An Introduction to Programming with Java (an Interdisciplinary Approach) by Kevin Wayne and Robert Sedgewick.

Genesis of Assignment

A steganography assignment appears in the Nifty Assignments Archive from 2009, that idea gave birth to this assignment, though this one is substantially different. The nifty assignment was originally developed by Tom Murtaugh and Brent Heeringa from Williams College.

Steganography Background Steganography

This image used by permission of The Hermitage Museum from their Image Usage Policy.
Steganography (see the Wikipedia entry for details) deals with hiding information, typically so that even the existence of the hidden information is hidden. An encrypted message might attract attention --- even if it's not possible to decrypt it, some people will try. But a message whose existence is hidden is hard to decipher because ... well hopefully you get the picture.
The images to the left and right are of the 1908 Picasso painting Three Women which is held at the Hermitage Museum in St Petersburg, Russia. One of the images (perhaps both) has a hidden watermark so that the use of the image can be tracked by the Hermitage. Watermarks are a form of steganography where the hidden information is related to the object in which the information is hidden. The other image has been altered by a program equivalent to the one you'll write as part of this assignment to store another, hidden image. Clicking on this link will reveal the hidden image extracted from the Three Women image. The extracted hidden image is a lower-quality version of the original image that was hidden, because it's not possible to hide the complete, full quality image without essentially erasing the Three Women image. (In this case the hidden image isn't of a very high quality to begin with.)
You'll also write a program to hide text in an image. This version of Three Women has the complete text of Melville's Bartleby, the Scrivener (a story of Wall-street hidden in it. Your programs will be able to hide and extract text in an image.

This image used by permission of The Hermitage Museum from their Image Usage Policy.

Steganography	Background	Steganography
This image used by permission of The Hermitage Museum from their Image Usage Policy.	Steganography (see the Wikipedia entry for details) deals with hiding information, typically so that even the existence of the hidden information is hidden. An encrypted message might attract attention --- even if it's not possible to decrypt it, some people will try. But a message whose existence is hidden is hard to decipher because ... well hopefully you get the picture. The images to the left and right are of the 1908 Picasso painting Three Women which is held at the Hermitage Museum in St Petersburg, Russia. One of the images (perhaps both) has a hidden watermark so that the use of the image can be tracked by the Hermitage. Watermarks are a form of steganography where the hidden information is related to the object in which the information is hidden. The other image has been altered by a program equivalent to the one you'll write as part of this assignment to store another, hidden image. Clicking on this link will reveal the hidden image extracted from the Three Women image. The extracted hidden image is a lower-quality version of the original image that was hidden, because it's not possible to hide the complete, full quality image without essentially erasing the Three Women image. (In this case the hidden image isn't of a very high quality to begin with.) You'll also write a program to hide text in an image. This version of Three Women has the complete text of Melville's Bartleby, the Scrivener (a story of Wall-street hidden in it. Your programs will be able to hide and extract text in an image.	This image used by permission of The Hermitage Museum from their Image Usage Policy.

There are several websites that hide text in an image --- exactly what one of the programs you write will do. Of course these sites will extract the hidden text as well, just as your suite of programs will. Site for hiding text in an image include imagecipher, mozaiq/encrypt, and utilitymill.com are the sites.

What You Will Do

You will write five classes for this assignment. Most of them consist of simple changes to the class HideImage which you'll write first and for which you're given code with which you'll start. Each of the classes you write will also have a main method to run it. There is an opportunity for extra credit.

HideImage prompts the user for two images and a number of bits and hides one image (the source) in the other (the target) using the specified number of bits. A starter version of this class HideImage.java is provided that prompts the user for two image files and the number of bits. More details are in the howto.
ExtractImage extracts an image from one provided by the user when the user specifies the number of bits to use in the extraction. A starter version of this class ExtractImage.java is provided that prompts the user for the image file and the number of bits. More details are in the howto.
HideText is similar to HideImage but hides text in an image using either one or two bits. The user specifies whether one or two bits will be used as well as both the image file and the file of text to be hidden.
ExtractText extracts text hidden in an image specified by the user. The user also specifies whether one or two bits will be used in extracting text --- see the howto for details.
StegoBenchmark processes every image file in a directory chosen by the user and determines which of the image files contains hidden text as created by the HideText program. You'll need to use method(s) from ExtractText that you wrote and try both one- and two-bit encodings. You'll need to write a method to determine if a string represents text, ideas can be found in the howto. A starter file StegoBenchmark.java is provided that processes all files in a directory chosen by the user.

Overview and Development

The howto has details about each of the classes you'll implement for this assignment. There are also details there about how pictures are comprised of pixels represented in the programs you'll write by the java.awt.Color class.

A brief big-picture look at the ideas used in these programs is provided here to help guide your reading of the more detailed howto.

The key to hiding source image or text in a target image is to use some bits of each pixel in the target image to store the hidden source. When hiding source text, you'll hide as many characters as possible --- except for really long text or really small target images you'll be able to hide the entire text without substantially degrading the image. When hiding a source image, you'll reduce the quality of the source image and then hide this reduced quality image.

For example, the class/program ClearBitsFromImage clears the low-order bits from each r,g,b value in each pixel. The table below shows the gradual degradation of an image as the number of cleared bits increases. It's pretty amazing that in clearing seven of the eight bits in each RGB value of each pixel (image on far right below) that the original image is still discernible.

Original Clear 2 bits Clear 4 bits Clear 6 bits Clear 7 bits

new image

Development

You should be sure that each of the programs/classes you write works before proceeding to the next. Before you make HideImage work when the number of bits is specified by the user, first make it work with a fixed number of bits, e.g., with two bits used for each RGB value in each pixel. To determine if HideImage works you may need to implement ExtractImage, although you can gain some confidence that hiding an image works by printing a few RGB values, e.g., by selective use of System.out.printf statements.

You should think about what it means to use two bits in each RGB value to hide an image. You'll clear the low-order two bits of each RGB value in each pixel of the target, e.g., the hidden image will start looking like the image second from the left in the table-above. However, instead of leaving these two bits as 0 (or clear) you'll store two bits from the source image. This means you'll scale or reduce each RGB value of the source image to fit into two bits, e.g., you'd store the image second from the right in the table above if that was the source image since clearing six bits essentially means taking an RGB value of 255 and reducing it to 3 --- dividing by 64 = 2⁶. Similarly 128 would be reduced to 2, but 127 is reduced to 1. More details are found in the howto.

Text/ASCII

When hiding text, you'll break each character into its constituent pieces. Whereas pictures are comprised of pixels which are comprised of three RGB values, each character in a string/text is made of bytes and a byte is eight bits. We'll break strings into eight-bit chunks called bytes since this works even when Unicode is used for a character. Unicode represents a character with 16 bits, but as explained in the howto, breaking a String into bytes will make things work even when Unicode characters are used. You'll hide each byte from the hidden source text in the cleared bits of the target image similarly to how your code hid pixels when hiding a source image.

Grading

Each program you write should be robust in the sense that if the inputs don't work, your program should exit gracefully. For example, when hiding a source image in a target, the images must be the same size. That makes things simpler for you as the programmer, so if the images are not the same size, your program should exit gracefully with a message to the user, not crash. When hiding text, you may have more text than can fit in an image. If this is the case, your program should certainly not crash. Ideally you'd inform the user that not all the text was hidden when there's not enough space.

If your program works only with a fixed number of bits rather than a user-specified number of bits, you can still get full credit for the programs, but you won't get full credit for generality. For generality you should be able to hide text using either 1 or 2 bits per RGB value of each pixel of the target image. For hiding images you should be able to hide using from 1 to 8 bits per RGB value.

This assignment is worth 56 points with 10 extra points available; the breakdown is as follows:

functionality	points
HideImage	5
ExtractImage	5
HideText	8
ExtractText	8
robustness	6
generality	6
StegoBenchmark	8
README	10

Your README file should list your testing results, all the people with whom you collaborated, and the TAs/UTAs you consulted with. You should include an estimate of how long you spent on the program and what your thoughts are about the assignment.

Submit your README and all of your source code using Eclipse with assignment name stego.

Extra Credit

For extra credit you should write HideText and ExtracText to allow the user the choice of hiding the text in row-major order as described in the howto or in column-major order. Essentially your code will process the pixels in the target image either row-by-row starting with the top row of pixels (this is row-major) or column-by-column starting with the left most column (this is column major). You should give the user the choice of which order to use by appropriate use of the JOptionPane.showOptionDialog method, e.g., as shown below. int opt = JOptionPane.showOptionDialog(null, "Choose Row or Column", "Hiding Text Options", JOptionPane.YES_NO_CANCEL_OPTION, JOptionPane.QUESTION_MESSAGE, null, new String[]{"row", "column"}, "row");

You should ask the user for which order to use when extracting and when hiding. You should also try both row- and column-major orders when writing your StegoBenchmark program. Be sure to document in your README that you tried the extra credit and how much of it you did.