LZW Decompression

Still under construction

From A Technique for High-Performance Data Compression, Terry A. Welch, IEEE Computer, June 1984, pp. 8-19.

"The abnormal kase occurs whenever an input character string contains the sequence KwKwK, where Kw already appears in the compressor string table. The compressor will parse out Kw, send CODE(Kw), and add KwK to its string table. Next it will parse out KwK and send the just-generated CODE(KwK). The decompressor, on receiving CODE(KwK), will not yet have added that code to its string table because it does not yet know an extension character for the prior string."

This is exactly the problem we had in class on 1/23. We had already added "AB" to our dictionary when we encountered the string "ABABA". In Welch's language, K = 'A' and w = "B". You can get the same bad effect most quickly by using an input string like "AAA" (K = 'A', w = empty string).

What follows is Welch's decompression algorithm, reformatted a bit. It's not clear why the compression FAQ authors didn't include this, but they didn't. In this algorithm, DICT(code) refers to the string stored at index "code", and CODE(w) refers to the index at which the string w is stored. Similarly, CODE(K) refers to the index at which the 1-character string K is stored in the dictionary.

The line marked ???? leaves me a bit mystified, and I'm tired enough that there's no way I'm going to figure it out this afternoon. Fortunately, I don't think the issue comes up in your assignment. I'll think about it more later.

Initialize:

	the stack = empty
	oldCode = code = The first input code
	send DICT(code) to output
	finChar = DICT(code)

Next Code:
	inCode = code = The next input code
	if( no new code )
		Exit
	
	if( code is not defined )
	{
		Send finChar to output
		code = oldCode
		inCode = CODE( oldCode, finChar ) ????
	}

Next Symbol:
	if( code == CODE(wK) for some non-empty string w and character K )
	{
		Push K on the stack
		code = CODE(w)  // Why does CODE(w) exist?
		Go to Next Symbol
	}
		
	if( code == CODE(K) for some character K )
	{
		Send K to output
		finChar = K

		while( the stack is not empty )
		{
			Pop top of stack and send to output
		}
		
		Add the concatenation of oldCode and K to the dictionary
		oldCode = inCode
		
		Go to Next Code
	}

Gotta love that structured programming. Yikes. My best guess for the meaning of "finChar" is "final character from previous



Jeff Ondich, Department of Mathematics and Computer Science, Carleton College, Northfield, MN 55057
(507) 646-4364, jondich@carleton.edu