Steganography Howto, Compsci 100, Fall 2009

In developing and writing your code, please use the classes provided as part of this assignment as a starting point. You'll need a basic understanding of pictures/pixels and text/bytes to hide and extract pictures and images, respectively. Details on these and other aspects of the code you'll develop are provided here as are some hints in developing the programs.

There are four sections in this howto: the caveat section below; an explanation of pixels and the details you'll need for HideImage and ExtractImage; an explanation of hiding text and bit-chunks you'll need for HideText and ExtractText; and details about the design of the code for each class you must write.

  1. Caveats
  2. Here's a list of things to keep in mind when developing these programs. These caveats can lead to subtle and hard-to-find bugs if you're not careful in avoiding them.

    1. Save images by supplying a ".png" suffix. The Picture class supports saving with either a .jpg or a .png suffix --- you must supply the suffix in the file name you provide. However, .png is a lossless compression scheme whereas .jpg is lossy -- information is lost when compressing with the jpg format. You need to keep all the information when you hide text or an image so you must use a .png suffix when saving images using the Picture class.

    2. Except when using the String.getBytes method which returns an array of bytes, use int values. Bytes in Java are signed, i.e., have values between -128 and 127.

    3. Make sure that the order in which you store bit chunks when hiding text, which should be from most- to least-significant, is the same order in which you reconstruct the text. Your program ExtractText must be able to extract text from the images in which your program HideText hid the information, but ideally your program should be able to extract text from every image in which the text is stored most-significant-bits first, e.g., from images in which your classmates' code hid text.

    4. Your code should create new Picture objects in HideImage, ExtractImage and HideText. Be sure that each of these classes has a main method that allows the user to hide/extract information. The Picture object created should be made visible to the user by calling its show method, this allows the user to save the image. Each of these classes and ExtractText should have a main method that allows them to run, prompting the user for files and bits as appropriate.

    5. When extracting text from an image, don't process any eight-bit chunks that are all zero, i.e., if a constructed byte/int has the value 0, don't create a character from it that you concatenate onto the String you're building with the code. Just skip such values. On a related note --- when hiding text you may run out of pixels in which to hide text, and you may run out of text to hide. In the latter case, you should hide bit-chunks (one- or two-bit as chosen by the user) with the value zero. These won't result in any characters being extracted because your code will skip zeros. But you won't get gibberish by extracting when nothing was hidden. In summary: process every pixel when hiding text; when there is no text to hide, hide the value zero.

  3. Pixels
  4. Images are made of individual picture elements or pixels. Although there are different color models, including gray-scale, RGB, and CMYK, we'll be using the standard RGB or red, green, blue color model. You can find details in the Wikipedia RGB entry. For the purposes of this assignment you need to understand that each pixel is made of a red, green, and blue value, and each of these values is an integer between 0 and 255 (inclusive). For example, you can extract the values of a pixel represented by the java.awt.Color class we are using as shown in the code below. This code illustrates how to construct a Color object representing a pixel from RGB values and how to extract the individual RGB values from a pixel/Color. java.awt.Color c1 = new Color(255,0,0); // this is red java.awt.Color c2 = new Color(0,255,0); // this is green java.awt.Color c3 = new Color(0,0,255); // this is blue java.awt.Color c4 = new Color(0,0,0); // this is black java.awt.Color c5 = new Color(255,255,255); // this is white java.awt.Color cc = Color.magenta; int vr = cc.getRed(); // 255 int vg = cc.getGreen() // 0 int vb = cc.getBlue(); // 255

    The program ClearBitsFromImage shows the standard way you'll loop over every pixel in a image, processing each pixel to create a new image. You'll use similar code in all the classes you write to hide and extract text and images. The code shown in method clear (and reproduced below) is very similar to the code in the methods you'll write, but the method reduce that constructs a new pixel/Color will change. For complete details, see the Java code. The basic idea is to process every pixel, creating a new pixel for the image being produced (e.g., when hiding text or an image or extracting an image).

    public Picture clear(Picture target){ int width = target.width(); int height = target.height(); target.show(); Picture pic = new Picture(width,height); for(int i=0; i < width; i++){ for(int j=0; j < height; j++){ Color sc = target.get(i, j); Color cc = reduce(sc); pic.set(i,j,cc); } } return pic; }

    If you change the body of method reduce as shown in the code fragment below, the effects are shown by the images below the code. Magenta is a combination of maximal red and blue values, which is why the rightmost image has a magenta hue.

    public Color reduce(Color c) { return new Color(c.getRed(), VALUE, c.getBlue()); }

    Original VALUE=255 (all green) VALUE=0 (no green)

    Hiding an Image

    To hide an image using two bits per RGB value/pixel you'll need to clear two bits in each value of the target image RGB values and store two bits from the source image value in this cleared space. For example, if you cleared two digits from the number 1578 you'd get 1500. You can then store 23 in the cleared space yielding 1523. The target value of 1578 has been replaced by 1523 in the newly created image, and the value 23 can be extracted from 1523 by arithmetic operations.

    In the diagrams below the process of hiding a source RGB value in the target value is explained. You don't need to use binary values in any calculations for this program, the binary values are shown to provide a deeper understanding of the process of hiding and extracting image values.

    first stage The target value in which we'll hide information is 179. This could be either the R, G, or B value from a Color/pixel. We're using two bits to store hidden information, so we clear two bits by dividing and multiplying by 4 = 22.
    The source value we're hiding is 142. Since we have two bits in which to hide the value we must hide a value that's either 0, 1, 2, or 3. We hide the value 142/64 -- and we we use 64 because it's 2(8-2) = 26 = 64. If we were using one bit to hide, we'd divide by 128 since we can represent two values with one bit and 256/2 = 128. If we we had 3 bits to hide information in we'd divide by 32 since we can represent eight values with three bits and 256/8 = 32. second stage
    third stage The target value of 179 has been replaced by 178 --- this is the value used in the constructed image that represents both the target image and the hidden, source image. The bits in red are the bits we're hiding.
    When we're extracting the hidden image we need to know that two bits have been used to hide a value. We extract the hidden value by calculating 178 % 4 = 2 since 178 = 44*4+2. The value used to create part of a pixel in the reconstructed hidden image is 2*64 = 128. We rescale by 64 since that's the number we divided by when hiding the original source value of 142. We've now got 128 as the value that was originally 142 in the source image before the 142 was scaled down and hidden. fourth stage

    Information is stored using binary values, but you don't need to use binary arithmetic in doing the simple arithmetic operations that are part of inserting and extracting values/bits. Although you don't need to use the binary representations in the code you write, understanding the binary may help when you debug. However, you can do this entire assignment using only decimal/base-10 arithmetic operations.

    Suppose a source pixel/color is represented by the RGB triple (57, 108, 213). In base two this triple is (00111001, 01101100, 11010101). What happens if you want to store/hide this source pixel of (57, 108, 213) using two bits? If you are using two bits to store this value you must reduce each value by a factor of 64 so that it's one of four different values since you can represent four values with two bits: 0, 1, 2, 3. Reducing (57, 108, 213) by dividing by 64 yields (1, 1, 3). These are the values that would be hidden. When they're extracted, the values would be multiplied by 64 yielding (64, 64, 192). As a result, the original source pixel of (57, 108, 213) is hidden and then extracted as (64, 64, 192).

    Alternatively, suppose the RGB triple (57, 108, 213) is a target pixel/value in which information will be hidden using two bits. To clear two bits you divide and multiply by 4 as shown in the diagram above, e.g., you replace 57 by 57/4*4 = 56. This results in replacing (57, 108, 213) by (56, 108, 212). In binary these values are (00111000, 01101100, 11010100). Note that the rightmost two bits of each value are zero -- they've been "cleared". In the image you create you don't store these cleared values in a pixel, you use them to add the hidden information from the source, and then you store them in a pixel.

    You'll use regular decimal/arithmetic operations to reduce values unless you have a good understanding of bit operators. You can convert between base-10 and base-2 using the Google query 57 in base 2 or 0b00011011 in base 10. You can debug by using the static Integer.toBinaryString(i) method that returns a string representing the base two/binary version of an int value.

    HideImage Development

    First use two bits to store a source image in a target. You'll clear each of the three RGB values in a pixel by two bits using arithmetic operations and the number 4, e.g., to clear 1578 to 1500 in base 10 you could simply divide by 100 and then multiply by 100: that clears two decimal digits. You do the same thing with 4 instead of 100 to clear two bits (binary digits).

    You can store any of the values 0, 1, 2, or 3 in the two cleared bits since you can represent four values with two bits. For each RGB value in each pixel of the source (to be hidden) image you divide the RGB value by 64 since the original values were in the range 0-255 and you need to map them to 0-3.

    You'll likely need to write ExtractImage to see that you've hidden an image successfully. When extracting RGB values from each pixel you'll get 0, 1, 2, or 3. When creating a new image, scale these by multiplying by 64 to get a value in the range 0-255. When you've got this working with two bits, try to parameterize your methods to work with any number of bits between 0 and 8, though using 0 or 8 bits doesn't make much sense.


  5. Text
  6. In computing text is represented using characters and each character is represented by an ASCII or Unicode value. Java uses Unicode to represent each character, and Unicode uses 16 bits/character. However, we'll ignore the ASCII/Unicode distinction and treat text/strings as a sequence of bytes, where bytes are eight bit values. The text stored in files on computers is typically stored in bytes, and two bytes can be combined to create a Unicode character when that's what's actually stored, just as four bytes can be combined to create an int when that's what is stored. For our purposes we'll treat Strings as simply a sequence of bytes. This simplifies the process of hiding text.

    Fortunately, the String class provides the getBytes method as shown below. Using printf makes it simple to print a value as either a character or an integer to illustrate what's going on.

    Code Output
    String s = "abcd efg"; byte[] array = s.getBytes(); for(byte b : array){ System.out.printf("%c %d %s\n", b,b,Integer.toBinaryString(b)); }
     a 97 1100001
     b 98 1100010
     c 99 1100011
     d 100 1100100
       32 100000
     e 101 1100101
     f 102 1100110
     g 103 1100111
    
    

    To hide text in an image, you'll need to convert an entire file to a sequence of bytes. We can do that simply in Java with the following code that leverages the power of the java.util.Scanner class.

    String filename = ... //somehow get a filename, e.g., with JFileChooser Scanner scan = new Scanner(new File(filename)); // create scanner String all = scan.useDelimiter("\\Z").next(); // read entire file byte[] text = all.getBytes(); // convert to bytes

    Once you're converted a file to a sequence/array of bytes, you'll need to hide the bytes using either one or two bits of each RGB value in the target image in which the text will be hidden. You clear either one or two bits using techniques and arithmetic operations described above in the section on hiding images. When hiding text, you don't reduce the information content of the text by scaling as was done with the hidden image. If the target image doesn't have enough pixels in which to hide all the text, hide as much as can fit.

    The process of hiding text in an image is similar to the process of hiding one image in another, but in the image hiding code we made the assumption that both images were the same size in width and height. Thus one pair of nested loops was sufficient to process both source and target pixels.

    In hiding text, you'll need to either loop over pixels in the target image or bytes/bits in the source text. Do not try to loop over both at the same time. For example, if you loop over pixels in the target image, you'll need to hide different bits from the source text in each RGB value of each pixel. If these bits are in an array, you can access them simply by indexing into the array and incrementing the index after each access.

    Getting bits from a byte

    When hiding text, you'll store either four or eight "bit-chunks" per byte of the hidden text depending on whether you're using two or one bit from each RGB value in the target image, respectively. You'll need to extract these text bit-chunks from each byte. Perhaps the simplest way to extract the chunks is to use code similar to the following which prints each decimal digit of a number from least significant digit to most significant using mod/div operators -- this code shows how to use the arithmetic operators with base-10 digits.

    code output
    int value = 12345; System.out.printf("value of %d backwards: ",value); for(int k=0; k < 5; k++) { int lsd = value % 10; value /= 10; System.out.printf("%d",lsd); } System.out.println();
    
    value of 12345 backwards: 54321
    

    Using values of 2 or 4 rather than 10 when extracting digits in the loop above would result in getting eight one-bit chunks or four two bit chunks/byte, respectively. You'll need to change the number of times the loop iterates, e.g., eight or four times depending on whether you're extracting 1 or 2 bits, respectively. However, with this code you'll be extracting the bits in order from least to most significant just as the 5 is printed first in the code fragment above. You'll need to somehow reverse this order, or extract differently, so that your code will store the most significant bit-chunk from each byte first, then the next most significant, and finally the least significant bit(s).

    For example, if you're using two bits per RGB value to hide the value whose binary representation is given by 01110001 you'll need to store the two-bit chunks in the order 01, 11, 00 and 01. If the first three chunks are stored in one pixel, the last chunk, 01 will be hidden/stored in the next pixel.

    One way to make the process of getting bits from a byte simpler is to create a new byte-array from the one obtained from String.getBytes. The new array would be either four or eight times bigger depending on whether you're using two or one bit-sized chunks, respectively. This method is fine to use, though more memory intensive than avoiding the creation of the additional array. However, creating the second array makes it much simpler to iterate over the array when creating new pixels/Colors from the target image in which to store the chunks representing the hidden message. You can also extract bits using arithmetic operations or bit-shifting operations.

  7. Code and Class Details
  8. The class ClearBitsFromImage illustrates one way to organize code for the classes you write in this assignment. One idea that can help as you develop your programs is to write a method that returns a new Picture based on the parameters to the method. In ClearBitsFromImage the method clear does this -- and you can see its use in the main method. Writing a method to alter each pixel, as illustrated by the method reduce is also a good idea. Developing the code by isolating functionality in a method helps when debugging and checking that your code is working properly.

    Starter classes for HideImage and ExtractImage are provided to take care of some of the boilerplate code in opening files.

    When developing ExtractText it will help to write a method that returns the String representing the extracted text. Writing such a method will help as you try to find all the images in a directory that contain text: part of writing StegoBenchmark. It's not a good idea for the method that extracts text to simply print the text. If you return the text, you can print the string returned if you want to, e.g., for testing/debugging. But you can also pass the string to other methods, e.g., for determining if the string represents text or is simply gibberish.

    This means three of the four hide/extract classes you write will have a method that returns an Picture object. You should make sure that you call the show method on this object since that pops up a frame in which the image appears and the frame has a Save-option that allows the user to save the image. When writing ExtractText you won't generate a new Picture, you'll generate the String extracted from an image. You should likely print this String in the main method you write, but you won't print it when you call the method from StegoBenchmark.

    StegoBenchmark

    The idea in writing the benchmark program is to determine which images in a directory of images store text. You'll need to try both one- and two-bit parameters to see which of them (if any) results in extracting text from an image. To determine if the String extracted from an image is text you'll need to write a method in the benchmark code that returns a boolean value indicating whether a String is text. You should do this by using properties of text: the average word length is one of the properties you'll use. You can use the Google query "average word length" to determine what the average word length is in English and other languages. You can find words in String using the String.split method, e.g., the code below breaks a String on white-space into an array of "words" -- this is typically how we break text represented by a String into its component words. String s = "the quick brown fox"; String[] all = s.split("\\s+"); // all contains "the" "quick" "brown" "fox" In addition to using average word length to determine if a String represents text you might look at the characters to make sure that they're actually letters and not gibberish. The static method Character.isLetter returns true if its parameter is a letter (think 'a'-'z', upper or lowercase). Of course you can also use a dictionary of English words, but for identifying text this is probably not necessary.