March 14, 2011 (Happy Pi Day!)
For group work, please turn in one sheet of paper with all the names of those who participated in your group.
| a | 011000001 |
| b | 011000010 |
| c | 011000011 |
| d | 011000100 |
| ? | 00111111 |
| @ | 010000000 |
| A | 010000001 |
We will look at a very small file to focus on encoding, but you can imagine files that are much bigger.
| letter | encoding of letter | letter | encoding of letter | letter | encoding of letter | letter | encoding of letter |
|---|---|---|---|---|---|---|---|
| . | . | . | . | . | . | . | . |
| . | . | . | . | . | . | . | . |
| . | . | . | . | . | . | . | . |
| . | . | . | . | . | . | . | . |
Now let's try to get a "smaller" encoding.
| letter | number occurences | letter | number occurences | letter | number occurences | letter | number occurences |
|---|---|---|---|---|---|---|---|
| . | . | . | . | . | . | . | . |
| . | . | . | . | . | . | . | . |
| . | . | . | . | . | . | . | . |
| . | . | . | . | . | . | . | . |
| letter | encoding of letter | letter | encoding of letter | letter | encoding of letter | letter | encoding of letter |
|---|---|---|---|---|---|---|---|
| . | . | . | . | . | . | . | . |
| . | . | . | . | . | . | . | . |
| . | . | . | . | . | . | . | . |
| . | . | . | . | . | . | . | . |
Huffman Coding builds a structure in which characters that are used a lot are near the top and characters not used much are near the bottom. The tree is built by starting with all the characters and the number of times they appear in the file. The smallest two numbers are joined together.
For example if the characters a and b are both used once, they can be joined together and the structure is labeled with their sum.
If the character e occurs twice, it can be joined with the previous
structure
and labeled with the sum of the two.
In general, two characters or structures are joined together if they
have the smallest two values. This continues until there is just one
structure. Then codes are assigned to letters by walking down the
structure. Going left means "0" and going right means "1". This way
each character has a unique code.
In the example above with the stucture for a, b and e, the code for
a is 00, b is 01 and e is 1.
letter number occurences letter number occurences letter
number occurences letter number occurences
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
letter encoding of letter letter encoding of letter letter
encoding of letter letter encoding of letter
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .