I’ve decided to postpone writing more about the messy details of PKLITE EXE files, and instead discuss PKLITE COM format.
This post is part of a series. For a list of other posts, see Part 1.
You may know that the DOS operating system supports two binary executable file formats: EXE and COM. The PKLITE executable compression software supports both of them. The compressed file will have the same format, EXE or COM, as the original file. This does not go without saying — it’s pretty easy to transform a COM file into an EXE file, and some executable compression utilities do just that.
COM is a very simple format, really just a blob of machine code.
I’ve studied PKLITE’s compressed COM format as best I can, and it seems to be relatively easy to understand. This is partly due to the simplicity of COM format, and partly because there are very few differences between different PKLITE-compressed COM files. It really is a breath of fresh air, compared to the huge hassle that is PKLITE-compressed EXE.
The main idea here is that we want to understand the format well enough to decompress any compressed file (“statically”, without executing it), thereby recovering the original uncompressed file.
To be clear, here’s nothing new about being able to decompress these files. PKLITE itself can do it, if run with the “
-x” option. At least, it can decompress files that were made by the same version or an older version of PKLITE, and that have not been modified after their creation.
Some observations about PKLITE COM format:
- The compression scheme is the same as one of the LZ77 with Huffman coding schemes used in EXE format — specifically, the one known as “small” compression. I wish I could be completely sure that this is always the case; all I can say is that I tried to find a file that would trigger “large” compression mode, and it never happened.
- It never uses so-called “extra compression”. The
-eoption is ignored.
- As far as I can tell, there are no options that have any effect on the resulting compressed file.
- It is always possible to recover the original file exactly, byte for byte.
The compressed data format is covered in Part 3. It’s the “small” mode of “code image” compression.
What this means is that, assuming you already have a decompressor for the LZ77+Huffman coding format, the only thing you need to decompress a PKLITE COM file is the offset at which the compressed data starts. While there’s no easy and direct way to get this information, there are only a few possible offsets.
For future reference, here’s an annotated partial hex dump of a typical compressed COM file:
For decompression purposes, you can pretend that there are just four different versions of PKLITE COM format, and that’s including the unofficial beta version of PKLITE. The first 10 bytes of the file are sufficient to distinguish them. You can use the following table:
|Bytes at start of file||Version||ver.|
I am not saying that fingerprinting the first 10 or so bytes of a mystery file is the best way to detect PKLITE COM format. It’s possible to be much more discriminating, by looking at bytes later in the file. Something you should not do is trust the version info bytes, the copyright message, or the “Not enough memory” message. It was common for PKLITE users to erase these things.
It is also possible (without using the version info or copyright message) to distinguish version ranges 1.00–1.03 and 1.05–1.14, by looking at the bytes around offset 260. But that’s not needed for decompression.
??” bytes are different in different files, and should be treated as wildcards.
The two “
??” bytes following the “
ba” byte seem to be the size in bytes of the compressed data, rounded up to the next even number. This is not needed for static decompression, because the compressed data ends with a special “stop” code. But it could be useful as an integrity check.
I suspect the bytes following the “
b8” byte reflect how much memory the run-time decompressor needs. They’re probably of no use to a static PKLITE decompressor.
Recall that for a pristine PKLITE-compressed EXE file, two bytes I call version info appear at file offset 28. These bytes appear in COM files as well, but unfortunately, they are not in the same place in all versions. As shown in the table, they can be at offset 36, 44, or 46. The PKLITE copyright message always starts immediately afterward.
Here are the actual copyright messages:
Version V.info Copyright message begins ------- ------ ------------------------ 1.00beta 00 01 PK Copyr. 1990 PKWARE 1.00 00 01 PKlite Copr. 1990 PKWARE 1.03 03 01 PKLITE Copr. 1990 PKWARE 1.05 05 01 PKLITE Copr. 1991 PKWARE 1.12 0c 01 PKLITE Copr. 1991 PKWARE fake v1.20 0c 01 PKLITE Copr. 1991 PKWARE 1.13 0d 01 PKLITE Copr. 1991 PKWARE 1.14 0e 01 PKLITE Copr. 1992 PKWARE 1.15 0e 01 PKLITE Copr. 1992 PKWARE 1.50 32 01 PKLITE Copr. 1990-1995 PKWARE 2.01 01 02 PKLITE Copr. 1990-1995 PKWARE
The copyright message always concludes “
PKWARE Inc. All Rights Reserved“. It’s possible that there are registered versions of PKLITE that do something not listed in this table, but I haven’t encountered any such files.
Unlike with EXE format, it’s not possible to distinguish files produced by the fake “v1.20” version from the real v1.12 software.
In the hex dump shown above, the version info bytes are
0E 01, which is interpreted as 0x010e, which decodes to version 1.14. But according to my table, it’s a version 1.15 file. That’s because PKLITE v1.15 has a bug: It writes a version info of 0x010e (1.14), when it should be 0x010f (1.15). The PKLITE COM format changed between v1.14 and v1.15, so I think it’s fair to call it a bug, and not a situation where the number is giving the format version, as opposed to the software version.
That’s about all I know about PKLITE-compressed COM files.