I’ve already written about two PKZIP bugs related to the “Implode” compression method. Now I’ve come across another one, so I guess I’ll investigate it as well. Here are the first two:
There’s an old collection of files called the Pier 1 Shareware CDROM (#1). On it is a file named BADIMPLO.ZIP. Here are two places you might be able to find it:
You have to unzip it first — this step will work with any unzip program. That gives you a file named BADIMPLO.ARJ.
Then you have to un-ARJ the BADIMPLO.ARJ file — I suggest using 7-Zip to do that. That gives you two files: README.DOC and BAD4ZIP.DOC.
In README.DOC, the anonymous author claims that the companion BAD4ZIP.DOC file cannot be correctly compressed by PKZIP 1.10, but that he or she does not know why.
The description of the BADIMPLO.ZIP file used by the CDROM’s editor says that this is an example of “the so-called ‘Postscript’ bug”. I don’t know why it would have been called that, but it suggests that this bug was at least somewhat well known at the time.
Let’s verify the claim. Using the DOSBox DOS emulator, I’ll use PKZIP v1.10 to compress BAD4ZIP.DOC into a ZIP file:
C:\ZIPTEST> 110\PKZIP.EXE B110.ZIP BAD4ZIP.DOC Creating ZIP: B110.ZIP Adding: BAD4ZIP.DOC imploding (94%), done.
Now I’ll test the resulting ZIP file, using the same version of PKZIP. To save me some effort, I’ll be primarily relying on the verification feature (-t), instead of actually extracting the file.
C:\ZIPTEST> 110\PKUNZIP.EXE -t B110.ZIP Searching ZIP: B110.ZIP Testing: BAD4ZIP.DOC PKUNZIP: Warning! file fails CRC check B110.ZIP has errors!
Yep, there’s definitely a problem here. PKZIP 1.10 can’t correctly decompress a file that it compressed.
I get the same result (failed CRC check) when testing the compressed file with PK(UN)ZIP 1.01, 1.02, and 2.04g. I get a similar result with Info-ZIP Unzip, and other modern software.
The scope of the bug
As discussed in my previous articles, there four different “modes” of Implode compression. PKZIP 1.10 chose the “i8:3” mode for this file.
$ zipinfo B110.ZIP ... 7872 t- i8:3 ... BAD4ZIP.DOC
There are two other PKZIP versions from the v1.x era: 1.01 and 1.02. I was a little surprised to find that they do not have this problem.
C:\ZIPTEST> 102\PKZIP.EXE B102.ZIP BAD4ZIP.DOC Creating ZIP: B102.ZIP Adding: BAD4ZIP.DOC imploding (94%), done. C:\ZIPTEST> 102\PKUNZIP.EXE -t B102.ZIP Searching ZIP: B102.ZIP Testing: BAD4ZIP.DOC OK
I note that PKZIP 1.02 uses the same Implode mode, i8:3, as does v1.10. So it’s not that it sidesteps the bug by using a different algorithm. It simply doesn’t have the bug.
Generally speaking, when a compressed file fails to decompress correctly, it could be:
- A compression bug
- A decompression bug
- A design error in the specification (or, the lack of a specification) that at least makes it unclear whether the compressor, or the decompressor, is at fault
All signs point to this being strictly a compression bug.
Versions of PKZIP after 1.10 do not support compressing with the Implode algorithm (they do Implode decompression only), so they cannot have any Implode compression bugs. That’s also the case for most ZIP programs made after the early 1990s.
Since I had it handy, I also tested the old PAK v2.51 program by NoGate Consulting. It’s something that I know can do Implode compression. I compressed BAD4ZIP.DOC using both of PAK’s ZIP modes (/z and /bugs). Both resulting ZIP files used the i8:3 mode, and both were apparently valid, and could be decompressed by PK(UN)ZIP 1.10 and modern unzip software.
I have not really figured out what causes the bug, but I did investigate it a little more.
The B102.ZIP (good) and B110.ZIP (bad) files are very similar, differing only in the version-made-by field, and in one small section of the compressed data:
< 0000192 68 47 ba ef d1 8e e8 00 01 f5 f0 3d da 21 70 d4 > 0000192 68 47 ba ef d1 8e e8 00 01 f5 10 3e fa 21 70 94
Four bytes out of a sequence of six, starting at offset 202 in the ZIP file, are different. The compressed data starts at offset 41, so that’s offset 161 in the compressed data.
Imploded data starts with a “Huffman tree definition” section, followed by data that more directly encodes the actual file data. In these files, the tree definition section is about 121 bytes in size, and that’s less than 161, so the bug is not in the tree definitions; it occurs later.
Comparing the original BAD4ZIP.DOC file to the bad one we get after compression+decompression, essentially just a couple of bytes are wrong:
< 0000864 31 30 31 31 31 31 31 31 31 31 31 31 31 31 31 31 > 0000864 31 30 30 39 31 31 31 31 31 31 31 31 31 31 31 31 < 0001824 32 30 31 31 31 31 31 31 31 31 31 31 31 31 31 31 > 0001824 32 30 30 39 31 31 31 31 31 31 31 31 31 31 31 31 ...
The wrong bytes get replicated a few times, which is not surprising given how the compression algorithm works. But the decompressor doesn’t seem to go off the rails, and the number of decompressed bytes is correct.
PKZIP 1.10, and apparently only that version, has a rare but serious Implode compression bug. Some files are silently compressed incorrectly.
Here’s a table of the three Implode bugs I’ve investigated, showing which (DOS) versions of PKZIP they occur in.
|#1 (MML bug)||buggy||buggy||OK||OK|
|#2 (v1.01 literal tree bug)||buggy||OK||OK||OK|
|#3 (v1.10 compression bug)||OK||OK||buggy||N/A|