LZWL is a syllable-based variant of the character-based LZW compression algorithm[1][2] that can work with syllables obtained by all algorithms of decomposition into syllables. The algorithm can be used for words too.
Algorithm
Algorithm LZWL can work with syllables obtained by all algorithms of decomposition into syllables. This algorithm can be used for words too.
In the initialization step, the dictionary is filled up with all characters from the alphabet. In each next step, it is searched for the maximal string S, which is from the dictionary and matches the prefix of the still non-coded part of the input. The number of phrase S is sent to the output. A new phrase is added to the dictionary. This phrase is created by concatenation of string S and the character that follows S in the file. The actual input position is moved forward by the length of S. Decoding has only one situation for solving. We can receive the number of phrase, which is not from the dictionary. In this case, that phrase can be created by the concatenation of the last added phrase with its first character.
The syllable-based version uses a list of syllables as an alphabet. In the initialization step, the empty syllable and small syllables from a database of frequent syllables are added to the dictionary. Finding string S and coding its number is similar to the character-based version, except that string S is a string of syllables. The number of phrase S is encoded to the output. The string S can be the empty syllable.
If S is the empty syllable, then we must get from the file one syllable K and encode K by methods for coding new syllables. Syllable K is added to the dictionary. The position in the file is moved forward by the length of S. In the case when S is the empty syllable, the input position is moved forward by the length of K.
In adding a phrase to the dictionary there is a difference in the character-based version. The phrase from the next step will be called S1. If S and S1 are both non-empty syllables, then a new phrase is added to the dictionary. The new phrase is created by the concatenation of S1 with the first syllable of S. This solution has two advantages: The first is that strings are not created from syllables that appear only once. The second advantage is that we cannot receive the decoder number of the phrase that is not from the dictionary.
References
- ↑ http://www.cs.vsb.cz/dateso/2005/slides/slides6.pps
- ↑ Salomon, David; Motta, Giovanni (2010-01-18). Handbook of Data Compression - David Salomon, D. Bryant, Giovanni Motta - Google Books. Springer. ISBN 9781848829039. Retrieved 2014-07-11.