LZW Compression and Decompression
Here are a sketches of a compression/decompression pair of
algorithms. These sketches aren't thorough (e.g. what happens
when the dictionary fills up?), but they'll serve for
your homework.
Based on
A Technique for High-Performance Data Compression,
Terry A. Welch, IEEE Computer, June 1984, pp. 8-19, and
question 70 from part 2 of the comp.compression
data compression FAQ.
Compression
In the pseudo-code below:
- bool GetNextChar( char& K ) puts the
next input character into K and returns true if there's
a character to get, and returns false otherwise.
- CODE(w) returns the dictionary code (index)
corresponding to the string w.
- The plus sign (+) stands for string/character
concatenation.
char K;
string w;
w = ""
while( GetNextChar( K ) )
{
if( w + K is in the dictionary )
w = w + K
else
{
send CODE(w) to output
Add w + K to the dictionary
w = K
}
}
Decompression
In this pseudo-code:
- D(code) is the dictionary string corresponding to the
given index/code.
- bool GetNextCode( int& code ) puts the next code from
the input into code, returning true if there is a code
to get, and returning false at the end of the input.
- + still means string/char concatenation.
int code
string w, tmp
GetNextCode( code )
w = D(code)
send w to output
while( GetNextCode(code) )
{
if( there's a string whose index is code in the dictionary )
{
tmp = w
w = D(code)
Add tmp + w[0] to the dictionary
}
else
{
Add w + w[0] to the dictionary
w = D(code)
}
send w to output
}
Let me know if you have questions or find errors.
Jeff Ondich,
Department of Mathematics and Computer Science,
Carleton College, Northfield, MN
55057, (507) 646-4364,
jondich@carleton.edu