When I was researching old versions of PKZIP, I found that modern unzip programs aren’t able to unzip the PKZIP v1.01 distribution file. Three of the member files inside the self-extracting ZIP file fail to decompress correctly.
/cygdrive/c/dosprogs/ZIPTEST/101 $ unzip ../DIST/PKZ101.EXE exploding: README.DOC warning: 475 bytes required to uncompress to 873 bytes; supposed to require 537 bytes. bad CRC 1180acd5 (should be de477677) exploding: MANUAL.DOC exploding: DEDICATE.DOC exploding: LICENSE.DOC exploding: ORDER.DOC warning: 1196 bytes required to uncompress to 4607 bytes; supposed to require 1405 bytes. bad CRC 54314dea (should be 7e529a01) exploding: APPNOTE.TXT exploding: OMBUDSMN.ASP warning: 368 bytes required to uncompress to 595 bytes; supposed to require 428 bytes. bad CRC c8188b41 (should be 167904ac) exploding: PKZIP.EXE exploding: PKUNZIP.EXE exploding: MAKESFX.COM exploding: ZIP2EXE.EXE exploding: PKZIPFIX.EXE extracting: REZIP.ZIPf
That’s okay… PKZ101.EXE is an EXE file that seems to also be a ZIP file, but it doesn’t have to be a valid ZIP file. It just has to be able to install PKZIP when you run it, which it does do. But this cries out for an explanation. It could be a bug in the modern unzip software. That would be relevant to me, because I’ve recently started maintaining a mini-library for decompressing files of this type.
It turns out that the installed PKZIP v1.01 can unzip its own PKZ101.EXE distribution file. V1.02 can also unzip PKZ101.EXE. But the next version, v1.10, fails like modern software does. I’m pretty sure all later versions of PKZIP also fail.
There’s no such problem with the v1.02 distribution file (PKZ102.EXE), though. All the software mentioned can unzip it just fine.
Here’s a summary of the files in PKZ101.EXE, produced by the “zipinfo” utility:
$ zipinfo PKZ101.EXE Archive: PKZ101.EXE Zip file size: 131517 bytes, number of entries: 13 -rw-a-- 1.0 fat 873 t- i4:3 89-Jul-21 01:01 README.DOC -rw-a-- 1.0 fat 140355 t- i8:3 89-Jul-21 01:01 MANUAL.DOC -rw-a-- 1.0 fat 720 t- i4:2 89-Jul-21 01:01 DEDICATE.DOC -rw-a-- 1.0 fat 8959 t- i8:3 89-Jul-21 01:01 LICENSE.DOC -rw-a-- 1.0 fat 4607 t- i4:3 89-Jul-21 01:01 ORDER.DOC -rw-a-- 1.0 fat 25662 t- i8:3 89-Jul-21 01:01 APPNOTE.TXT -rw-a-- 1.0 fat 595 t- i4:3 89-Jul-21 01:01 OMBUDSMN.ASP -rwxa-- 1.0 fat 31342 b- i4:2 89-Jul-21 01:01 PKZIP.EXE -rwxa-- 1.0 fat 21440 b- i4:2 89-Jul-21 01:01 PKUNZIP.EXE -rwxa-- 1.0 fat 896 t- i4:2 89-Jul-21 01:01 MAKESFX.COM -rwxa-- 1.0 fat 6898 b- i4:2 89-Jul-21 01:01 ZIP2EXE.EXE -rwxa-- 1.0 fat 8926 b- i4:2 89-Jul-21 01:01 PKZIPFIX.EXE -rw-a-- 1.0 fat 14592 b- stor 89-Jul-21 01:01 REZIP.ZIP 13 files, 265865 bytes uncompressed, 114641 bytes compressed: 56.9%
For comparison, here PKZ102.EXE :
$ zipinfo PKZ102.EXE Archive: PKZ102.EXE Zip file size: 136192 bytes, number of entries: 15 -rw-a-- 1.0 fat 5837 t- i8:3 89-Oct-01 01:02 WHATSNEW.102 -rwxa-- 1.0 fat 295 t- shrk 89-Oct-01 01:02 BIOSFIX.COM -rw-a-- 1.0 fat 873 t- i4:2 89-Oct-01 01:02 README.DOC -rw-a-- 1.0 fat 140355 t- i8:3 89-Jul-21 01:01 MANUAL.DOC -rw-a-- 1.0 fat 720 t- i4:2 89-Jul-21 01:01 DEDICATE.DOC -rw-a-- 1.0 fat 8959 t- i8:3 89-Jul-21 01:01 LICENSE.DOC -rw-a-- 1.0 fat 4607 t- i4:2 89-Jul-21 01:01 ORDER.DOC -rw-a-- 1.0 fat 25662 t- i8:3 89-Jul-21 01:01 APPNOTE.TXT -rw-a-- 1.0 fat 595 t- i4:2 89-Jul-21 01:01 OMBUDSMN.ASP -rwxa-- 1.0 fat 31408 b- i4:2 89-Oct-01 01:02 PKZIP.EXE -rwxa-- 1.0 fat 22022 b- i4:2 89-Oct-01 01:02 PKUNZIP.EXE -rwxa-- 1.0 fat 896 t- i4:2 89-Oct-01 01:02 MAKESFX.COM -rwxa-- 1.0 fat 6906 b- i4:2 89-Oct-01 01:02 ZIP2EXE.EXE -rwxa-- 1.0 fat 8926 b- i4:2 89-Jul-21 01:01 PKZIPFIX.EXE -rw-a-- 1.0 fat 14592 b- stor 89-Oct-01 01:02 REZIP.ZIP 15 files, 272653 bytes uncompressed, 118660 bytes compressed: 56.5%
All the files that fail, and none of the files that succeed, have the code “i4:3” in one of the columns. That’s not a coincidence.
The “iX:X” codes mean the file uses a compression method named Implode. Implode has four variants (sets of parameters), which zipinfo names i4:2, i4:3, i8:2, and i8:3. I’ll adopt zipinfo’s names for the Implode variants.
Implode compression was used by the 1.x versions of PKZIP. It can still be decompressed by later versions, but they don’t use it to compress new files.
i4:2 and i8:3 were frequently used, but i4:3 and i8:2 are rarer. In fact, I don’t know any way to get PKZIP to use them. But there are a fair number of ZIP files in the wild that use them, so some zip program must have created them.
I tested some files that use i4:3 and i8:2 compression, and none of them could be decompressed by PKZIP 1.01 or 1.02. They work fine in 1.10 and later. Apparently, 1.01 and 1.02 use different and incompatible algorithms for i4:3 and i8:2. But what exactly are the differences?
I didn’t find anything in the 1.10 documentation that admits such a change. I tried a few web searches, but came up empty. (Still, I’d assume that this issue was known, back in the day.)
Figuring out the differences myself is something that could easily be beyond my ability. Then again, it could turn out to be pretty easy. It’s worth taking a look.
My first idea was that v1.01/1.02 doesn’t actually support the i4:3/i8:2 algorithms, and instead just uses one of the other algorithms. But no, a little testing proved that was not the case.
I noted that, when fed an i4:3 compressed file from PKZ101.EXE, a modern unzip program doesn’t just go off the rails. It thinks everything is fine, until the final integrity checks. The resulting incorrect uncompressed file isn’t complete garbage, either; it’s just that some of the bytes are duplicated or missing.
It didn’t take much more debugging to guess that the “length” codes in the compressed data were being misinterpreted. These codes tell the decompressor how many of some previously-seen bytes to make an additional copy of.
Let’s check the ZIP format documentation. (The relevant documentation is the same in all versions.) One part of the algorithm uses something called the “Minimum Match Length” when decoding the length.
Length <- Length + Minimum Match Length
Elsewhere in the documentation:
When [the Literal Shannon-Fano tree] is present, the Minimum Match Length for the sliding dictionary is 3. If this tree is not present, the Minimum Match Length is 2.
The “Literal Shannon-Fano tree” is present for just the “:3” implode variants (i4:3 and i8:3).
Here’s the answer, as far as I can determine: The difference is that v1.01/v1.02 thinks the Minimum Match Length is 2 for i4:3, though the documentation says it’s 3. And it thinks the Minimum Match Length is 3 for i8:2, though the documentation says it’s 2. Later versions of PKZIP, and other zip software, work according to the documentation.
I’ve confirmed this by modifying a decompressor to use the wrong match length in these cases, and verifying that it decompresses files in the same way (the same wrong way, in most tests) that PKZIP v1.01 does.
To summarize the Minimum Match Length issue:
|Variant||Specs say||PKZIP v1.0x||PKZIP v1.10+|
In conclusion, the PKZIP v1.01/1.02 software does not match the documentation. In v1.10, the software was changed to match the documentation. That doesn’t necessarily mean it was the software that was wrong. It could have originally been a documentation error, for which changing the software was deemed the least-bad solution. I don’t know.
Other open questions:
- Are there any bad i4:3 and i8:2 files in existence, other than the three in PKZ101.EXE?
- Is there an easy way to detect such bad compressed files, so that the right Implode parameters can be selected?
It might not be easy to find ZIP files that use a given Implode variant, so here are two sample files, found in the wild, that use all four variants: MYSTIC.ZIP, MODED301.ZIP. They are compatible with PKZIP 1.10, and incompatible with 1.01/1.02.
The relevant PKZIP versions can be found here.