An un-unzippable ZIP file

Apparently, I’m writing a series of articles about ZIP format.

Here’s a curiosity. I’m going to use the Info-ZIP zip program to make a ZIP file that the Info-ZIP unzip program cannot unzip. I’ll only do things that should be benign, i.e. things that a normal user might conceivably do.

This article assumes you have at least some awareness of Unix/Linux shell commands.

Preliminary stuff

First, I’ll do some testing to figure out how much extra data (platform-specific file attributes) the zip program is likely to put in my file.

I also need to choose a consistent filename length for the member file that goes inside the ZIP file. I’ll say 6 characters.

I’ll make a little test file with some distinctive text in it, giving it a name with 6 characters.

$ echo QWERTY > aa.aaa

Now I’ll create a ZIP file containing that file. I’ll specify no compression (-0).

$ zip -0 aa.zip aa.aaa

I’ll look inside the new ZIP file, for my distinctive text.

$ od -t x1z -A d aa.zip
...
0000064 51 57 45 52 54 59 0a 50 4b 01 02 1e 03 0a 00 00  >QWERTY.PK.......<
...

So the text begins at offset 64 (decimal). Subtract that offset from the magic number 101010256. I get 101010192. That’s the number I need.

Making the file

I’ll create a file with exactly that many bytes, again using a 6-character filename.

$ head -c 101010192 /dev/zero > zz.zzz

There’s nothing nefarious about this file. It’s just a lot of zeroes.

As before, I’ll add it, uncompressed, to a new ZIP file. But this time, I’ll use the -z option to add a zipfile comment. ZIP file comments are not used very often these days, but it’s not an exotic feature.

It doesn’t really matter what I type for the comment, but it must have at least 16 characters. If you’re following along, and want to get exactly the same results as me, use “a” for the first 16 characters.

$ zip -0 -z zz.zip zz.zzz
  adding: zz.zzz (stored 0%)
enter new zip file comment (end with .):
aaaaaaaaaaaaaaaa
.
$

It thinks it worked…

Testing the file

… But there’s a problem. We can’t even list the contents of the ZIP file, let alone extract our zz.zzz file:

$ unzip -l zz.zip
Archive:  zz.zip
caution:  zipfile comment truncated
warning [zz.zip]:  zipfile claims to be last disk of a multi-part archive;
  attempting to process anyway, assuming all parts have been concatenated
  together in order.  Expect "errors" and warnings…true multi-part support
  doesn't exist yet (coming soon).
error [zz.zip]:  missing 3166533398 bytes in zipfile
  (attempting to process anyway)
error [zz.zip]:  attempt to seek before beginning of zipfile
  (please check that you have transferred or created the zipfile in the
  appropriate BINARY mode and that you have compiled UnZip properly)

Don’t bother investigating the error messages. They’re nonsense.

Explanation

The ZIP specification has a wealth of information, but as far as I can tell, it never gets around to actually explaining how to read a ZIP file. The implied method requires you to first find the “end of central directory record”, which should be near the end of the file.

The implied method for finding this record is to find the last occurrence of a particular 4-byte signature (0x50 0x4B 0x05 0x06), except that the last 18 bytes of the file should not be searched.

But, in some rare situations, it is possible for this byte sequence to accidentally (or “accidentally”) occur after the real signature, and before the last 18 bytes of the file. If that happens, there’s a good chance an unzip program will use the false signature, and go off the rails.

One of the fields after the signature is a pointer to the “central directory”. I maneuvered the zip program into putting the central directory at exactly offset 101010256, which is hex 0x06054B50, which is encoded in the file as the byte sequence 0x50 0x4b 0x05 0x06. The unzip program thinks that’s the signature, and blows up.

Without the ZIP file comment, the false signature would be in the last 18 bytes of the file, where most unzip programs won’t look for it. The comment is just a way to add some bytes to the end of the file, to work around this problem.

As an aside, one could certainly construct a ZIP file that has a fake signature, or an entire fake ZIP file for that matter, in the comment. The ZIP specification doesn’t explicitly forbid it, but I think it’s implied that that’s not allowed. And because 0x05 and 0x06 aren’t codes for normal text characters, they won’t occur in an innocent comment, so a fake signature is not going to happen accidentally.

A zip program could work around this particular exploit by never putting the central directory at exactly offset 101010256. That’s easy; it could put it at 101010257 instead. But other similar exploits are possible, which might not be so easy to cleanly work around.

Some unzip programs are able to handle my “bad” file. That could be because they’re sophisticated enough to figure out that something seems wrong, and correct for it (at least with high probability). (But this is more difficult than it seems at first, because some invalid files should corrected for in a different way.) Or it could be because they’re very unsophisticated, and don’t parse zip files in the normal way.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s