PKZIP “Implode” compression oddity

When I was researching old versions of PKZIP, I found that modern unzip programs aren’t able to unzip the PKZIP v1.01 distribution file. Three of the member files inside the self-extracting ZIP file fail to decompress correctly.

/cygdrive/c/dosprogs/ZIPTEST/101 $ unzip ../DIST/PKZ101.EXE
  exploding: README.DOC
  warning:  475 bytes required to uncompress to 873 bytes;
            supposed to require 537 bytes.
 bad CRC 1180acd5  (should be de477677)
  exploding: MANUAL.DOC
  exploding: DEDICATE.DOC
  exploding: LICENSE.DOC
  exploding: ORDER.DOC
  warning:  1196 bytes required to uncompress to 4607 bytes;
            supposed to require 1405 bytes.
 bad CRC 54314dea  (should be 7e529a01)
  exploding: APPNOTE.TXT
  exploding: OMBUDSMN.ASP
  warning:  368 bytes required to uncompress to 595 bytes;
            supposed to require 428 bytes.
 bad CRC c8188b41  (should be 167904ac)
  exploding: PKZIP.EXE
  exploding: PKUNZIP.EXE
  exploding: MAKESFX.COM
  exploding: ZIP2EXE.EXE
  exploding: PKZIPFIX.EXE
 extracting: REZIP.ZIPf

That’s okay… PKZ101.EXE is an EXE file that seems to also be a ZIP file, but it doesn’t have to be a valid ZIP file. It just has to be able to install PKZIP when you run it, which it does do. But this cries out for an explanation. It could be a bug in the modern unzip software. That would be relevant to me, because I’ve recently started maintaining a mini-library for decompressing files of this type.

It turns out that the installed PKZIP v1.01 can unzip its own PKZ101.EXE distribution file. V1.02 can also unzip PKZ101.EXE. But the next version, v1.10, fails like modern software does. I’m pretty sure all later versions of PKZIP also fail.

There’s no such problem with the v1.02 distribution file (PKZ102.EXE), though. All the software mentioned can unzip it just fine.

Here’s a summary of the files in PKZ101.EXE, produced by the “zipinfo” utility:

$ zipinfo PKZ101.EXE
Archive:  PKZ101.EXE
Zip file size: 131517 bytes, number of entries: 13
-rw-a--     1.0 fat      873 t- i4:3 89-Jul-21 01:01 README.DOC
-rw-a--     1.0 fat   140355 t- i8:3 89-Jul-21 01:01 MANUAL.DOC
-rw-a--     1.0 fat      720 t- i4:2 89-Jul-21 01:01 DEDICATE.DOC
-rw-a--     1.0 fat     8959 t- i8:3 89-Jul-21 01:01 LICENSE.DOC
-rw-a--     1.0 fat     4607 t- i4:3 89-Jul-21 01:01 ORDER.DOC
-rw-a--     1.0 fat    25662 t- i8:3 89-Jul-21 01:01 APPNOTE.TXT
-rw-a--     1.0 fat      595 t- i4:3 89-Jul-21 01:01 OMBUDSMN.ASP
-rwxa--     1.0 fat    31342 b- i4:2 89-Jul-21 01:01 PKZIP.EXE
-rwxa--     1.0 fat    21440 b- i4:2 89-Jul-21 01:01 PKUNZIP.EXE
-rwxa--     1.0 fat      896 t- i4:2 89-Jul-21 01:01 MAKESFX.COM
-rwxa--     1.0 fat     6898 b- i4:2 89-Jul-21 01:01 ZIP2EXE.EXE
-rwxa--     1.0 fat     8926 b- i4:2 89-Jul-21 01:01 PKZIPFIX.EXE
-rw-a--     1.0 fat    14592 b- stor 89-Jul-21 01:01 REZIP.ZIP
13 files, 265865 bytes uncompressed, 114641 bytes compressed:  56.9%

For comparison, here PKZ102.EXE :

$ zipinfo PKZ102.EXE
Archive:  PKZ102.EXE
Zip file size: 136192 bytes, number of entries: 15
-rw-a--     1.0 fat     5837 t- i8:3 89-Oct-01 01:02 WHATSNEW.102
-rwxa--     1.0 fat      295 t- shrk 89-Oct-01 01:02 BIOSFIX.COM
-rw-a--     1.0 fat      873 t- i4:2 89-Oct-01 01:02 README.DOC
-rw-a--     1.0 fat   140355 t- i8:3 89-Jul-21 01:01 MANUAL.DOC
-rw-a--     1.0 fat      720 t- i4:2 89-Jul-21 01:01 DEDICATE.DOC
-rw-a--     1.0 fat     8959 t- i8:3 89-Jul-21 01:01 LICENSE.DOC
-rw-a--     1.0 fat     4607 t- i4:2 89-Jul-21 01:01 ORDER.DOC
-rw-a--     1.0 fat    25662 t- i8:3 89-Jul-21 01:01 APPNOTE.TXT
-rw-a--     1.0 fat      595 t- i4:2 89-Jul-21 01:01 OMBUDSMN.ASP
-rwxa--     1.0 fat    31408 b- i4:2 89-Oct-01 01:02 PKZIP.EXE
-rwxa--     1.0 fat    22022 b- i4:2 89-Oct-01 01:02 PKUNZIP.EXE
-rwxa--     1.0 fat      896 t- i4:2 89-Oct-01 01:02 MAKESFX.COM
-rwxa--     1.0 fat     6906 b- i4:2 89-Oct-01 01:02 ZIP2EXE.EXE
-rwxa--     1.0 fat     8926 b- i4:2 89-Jul-21 01:01 PKZIPFIX.EXE
-rw-a--     1.0 fat    14592 b- stor 89-Oct-01 01:02 REZIP.ZIP
15 files, 272653 bytes uncompressed, 118660 bytes compressed:  56.5%

All the files that fail, and none of the files that succeed, have the code “i4:3” in one of the columns. That’s not a coincidence.

The “iX:X” codes mean the file uses a compression method named Implode. Implode has four variants (sets of parameters), which zipinfo names i4:2, i4:3, i8:2, and i8:3. I’ll adopt zipinfo’s names for the Implode variants.

Implode compression was used by the 1.x versions of PKZIP. It can still be decompressed by later versions, but they don’t use it to compress new files.

i4:2 and i8:3 were frequently used, but i4:3 and i8:2 are rarer. In fact, I don’t know any way to get PKZIP to use them. But there are a fair number of ZIP files in the wild that use them, so some zip program must have created them.

I tested some files that use i4:3 and i8:2 compression, and none of them could be decompressed by PKZIP 1.01 or 1.02. They work fine in 1.10 and later. Apparently, 1.01 and 1.02 use different and incompatible algorithms for i4:3 and i8:2. But what exactly are the differences?

I didn’t find anything in the 1.10 documentation that admits such a change. I tried a few web searches, but came up empty. (Still, I’d assume that this issue was known, back in the day.)

Figuring out the differences myself is something that could easily be beyond my ability. Then again, it could turn out to be pretty easy. It’s worth taking a look.

My first idea was that v1.01/1.02 doesn’t actually support the i4:3/i8:2 algorithms, and instead just uses one of the other algorithms. But no, a little testing proved that was not the case.

I noted that, when fed an i4:3 compressed file from PKZ101.EXE, a modern unzip program doesn’t just go off the rails. It thinks everything is fine, until the final integrity checks. The resulting incorrect uncompressed file isn’t complete garbage, either; it’s just that some of the bytes are duplicated or missing.

It didn’t take much more debugging to guess that the “length” codes in the compressed data were being misinterpreted. These codes tell the decompressor how many of some previously-seen bytes to make an additional copy of.

Let’s check the ZIP format documentation. (The relevant documentation is the same in all versions.) One part of the algorithm uses something called the “Minimum Match Length” when decoding the length.

Length <- Length + Minimum Match Length

Elsewhere in the documentation:

When [the Literal Shannon-Fano tree] is present, the Minimum
Match Length for the sliding dictionary is 3.  If this tree is
not present, the Minimum Match Length is 2.

The “Literal Shannon-Fano tree” is present for just the “:3” implode variants (i4:3 and i8:3).

Here’s the answer, as far as I can determine: The difference is that v1.01/v1.02 thinks the Minimum Match Length is 2 for i4:3, though the documentation says it’s 3. And it thinks the Minimum Match Length is 3 for i8:2, though the documentation says it’s 2. Later versions of PKZIP, and other zip software, work according to the documentation.

I’ve confirmed this by modifying a decompressor to use the wrong match length in these cases, and verifying that it decompresses files in the same way (the same wrong way, in most tests) that PKZIP v1.01 does.

To summarize the Minimum Match Length issue:

VariantSpecs sayPKZIP v1.0xPKZIP v1.10+
i4:2222
i4:3323
i8:2232
i8:3333

In conclusion, the PKZIP v1.01/1.02 software does not match the documentation. In v1.10, the software was changed to match the documentation. That doesn’t necessarily mean it was the software that was wrong. It could have originally been a documentation error, for which changing the software was deemed the least-bad solution. I don’t know.

Other open questions:

  • Are there any bad i4:3 and i8:2 files in existence, other than the three in PKZ101.EXE?
  • Is there an easy way to detect such bad compressed files, so that the right Implode parameters can be selected?

Resources

It might not be easy to find ZIP files that use a given Implode variant, so here are two sample files, found in the wild, that use all four variants: MYSTIC.ZIP, MODED301.ZIP. They are compatible with PKZIP 1.10, and incompatible with 1.01/1.02.

The relevant PKZIP versions can be found here.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s