| LZW compression used to encode/decode a GIF file by Bob Montgomery [73357,3140] |
| |
| ENCODER |
| Consider the following input data stream in a 4 color (A, B, C, D) system. |
| We will build a table of codes which represent strings of colors. Each time |
| we find a new string, we will give it the next code, and break it into a |
| prefix string and a suffix color. The symbols \ or --- represent the prefix |
| string, and / represents the suffix color. The first 4 entries in the table |
| are the 4 colors with codes 0 thru 3, each of which represents a single |
| color. The next 2 codes (4 and 5) are the clear code and the end of image |
| code. The first available code to represent a string of colors is 6. Each |
| time we find a new code, we will send the prefix for that code to the |
| output data stream. |
| |
| A B A B A B A B B B A B A B A A C D A C D A D C A B A A A B A B ..... |
| \/\/---/-----/\/---/-------/\/ \/ \ /\/---/---/\ /\/-----/---/---/ |
| 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 Code |
| 6 8 10 8 14 16 8 13 7 Prefix |
| |
| The encoder table is built from input data stream. Always start with the |
| suffix of last code, and keep getting colors until you have a new |
| combination. |
| |
| Color Code Prefix Suffix String Output |
| A 0 - |
| B 1 - |
| C 2 - |
| D 3 - |
| Clear 4 - |
| End 5 - |
| A \ A A First color is a special case. |
| B / \ 6 A B AB A |
| A | / 7 B A BA B |
| B | |
| A / | 8 6 A ABA 6 |
| B | |
| A | |
| B \ / 9 8 B ABAB 8 |
| B / | 10 B B BB B |
| B | |
| A | / 11 10 A BBA 10 |
| B | |
| A | |
| B | |
| A / \ 12 9 A ABABA 9 |
| A \ / 13 A A AA A |
| C / \ 14 A C AC A |
| D \ / 15 C D CD C |
| A / | 16 D A DA D |
| C | |
| D | / 17 14 D ACD 14 |
| A | |
| D / \ 18 16 D DAD 16 |
| C \ / 19 D C DC D |
| A / | 20 C A CA C |
| B | |
| A | |
| A | / 21 8 A ABAA 8 |
| A | |
| B / | 22 13 B AAB 13 |
| A | |
| B / 23 7 B BAB 7 |
| |
| The resultant output stream is: A B 6 8 B 10 9 A A C D 14 16 D C 8 .... |
| The GIF encoder starts with a code length of 2+1=3 bits for 4 colors, so |
| when the code reaches 8 we will have to increase the code size to 4 bits. |
| Similarly, when the code gets to 16 we will have to increse the code size |
| to 5 bits, etc. If the code gets to 13 bits, we send a clear code and start |
| over. See GIFENCOD.GIF for a flow diagram of the encoding process. This |
| uses a tree method to search if a new string is already in the table, which |
| is much simpler, faster, and easier to understand than hashing. |
| |
| =========================================================================== |
| |
| DECODER |
| |
| We will now see if we can regenerate the original data stream and duplicate |
| the table looking only at the output data stream generated by the encoder |
| on the previous page. The output data stream is: |
| |
| A B 6 8 B 10 9 A A C D 14 16 D C 8 .... |
| |
| The docoding process is harder to see, but easier to implement, than the |
| encoding process. The data is taken in pairs, and a new code is assigned to |
| each pair. The prefix is the left side of the pair, and the suffix is the |
| color that the right side of the pair decomposes to from the table. The |
| decomposition is done by outputing the suffix of the code, and using the |
| prefix as the new code. The process repeats until the prefix is a single |
| color, and it is output too. The output of the decomposition is pushed onto |
| a stack, and then poped off the stack to the display, which restores the |
| original order that the colors were seen by the encoder. We will go thru |
| the first few entries in detail, which will hopefully make the process |
| clearer. |
| The first pair is (A B), so the prefix of code 6 is A and the suffix |
| is B, and 6 represents the string AB. The color A is sent to the display. |
| The 2nd pair is (B 6), so the prefix of code 7 is B and the suffix is |
| the the last color in the decomposition of code 6. Code 6 decomposes into |
| BA, so code 7 = BA, and has a suffix A. The color B is sent to the display. |
| The 3rd pair is (6 8) and the next code is 8. How can we decompose 8. |
| We know that the prefix of code 8 is 6, but we don't know the suffix. The |
| answer is that we use the suffix of the prefix code; A in this case since |
| the suffix of 6 is A. Thus, code 8 = ABA and has a suffix A. We decompose 6 |
| to get BA, which becomes AB when we pop it off the stack to the display. |
| The 4th pair is (8 B), so code 9 has a prefix of 8 and a suffix of B, |
| and code 9 = ABAB. We output ABA to the stack, and pop it off to the |
| display as ABA. |
| The 5th pair is (B 10) and the next code is 10. The prefix of code 10 |
| is B and the suffix is B (since the prefix is B). Code 10 = BB, and we |
| output the prefix B to the display. |
| The 6th pair is (10 9) and the next code is 11. Thus the prefix of |
| code 11 is 10 and the suffix is the last color in the decomposition of 9, |
| which is A. Thus code 11 = BBA, And we output BB to the display. |
| So far, we have output the correct colors stream A B AB ABA B BB to |
| the display, and have duplicated the codes 6 thru 11 in the encoder table. |
| This process is repeated for the whole data stream to reconstruct the |
| original color stream and build a table identical to the one built by the |
| encoder. We start the table with codes 0-5 representing the 4 colors, the |
| clear code, and the end code. When we get to code 8, we must increse the |
| code size to 5 bits, etc. See GIFDECOD.GIF for a flow diagram of the |
| decoding process. |
| |
| I Hope this helps take some of the mystery out of LZW compression, which is |
| really quite easy once you 'see' it. Bob Montgomery |