From A Technique for High-Performance Data Compression, Terry A. Welch, IEEE Computer, June 1984, pp. 8-19.
"The abnormal kase occurs whenever an input character string contains the sequence KwKwK, where Kw already appears in the compressor string table. The compressor will parse out Kw, send CODE(Kw), and add KwK to its string table. Next it will parse out KwK and send the just-generated CODE(KwK). The decompressor, on receiving CODE(KwK), will not yet have added that code to its string table because it does not yet know an extension character for the prior string."
This is exactly the problem we had in class on 1/23. We had already added "AB" to our dictionary when we encountered the string "ABABA". In Welch's language, K = 'A' and w = "B". You can get the same bad effect most quickly by using an input string like "AAA" (K = 'A', w = empty string).
What follows is Welch's decompression algorithm, reformatted a bit. It's not clear why the compression FAQ authors didn't include this, but they didn't. In this algorithm, DICT(code) refers to the string stored at index "code", and CODE(w) refers to the index at which the string w is stored. Similarly, CODE(K) refers to the index at which the 1-character string K is stored in the dictionary.
The line marked ???? leaves me a bit mystified, and I'm tired enough that there's no way I'm going to figure it out this afternoon. Fortunately, I don't think the issue comes up in your assignment. I'll think about it more later.
Initialize: the stack = empty oldCode = code = The first input code send DICT(code) to output finChar = DICT(code) Next Code: inCode = code = The next input code if( no new code ) Exit if( code is not defined ) { Send finChar to output code = oldCode inCode = CODE( oldCode, finChar ) ???? } Next Symbol: if( code == CODE(wK) for some non-empty string w and character K ) { Push K on the stack code = CODE(w) // Why does CODE(w) exist? Go to Next Symbol } if( code == CODE(K) for some character K ) { Send K to output finChar = K while( the stack is not empty ) { Pop top of stack and send to output } Add the concatenation of oldCode and K to the dictionary oldCode = inCode Go to Next Code }
Gotta love that structured programming. Yikes. My best guess
for the meaning of "finChar" is "final character from previous