Notes on PKLITE format, Part 2

For an introduction to PKLITE, and a list of the other posts in this series, see Part 1.

This post uses some of the EXE jargon defined in my post on DOS EXE format.

Structure of PKLITE-compressed files

Here’s a bird’s eye view of how PKLITE transforms a DOS EXE file when it compresses it:

Notes:

  • Files with “extra compression” omit the segments labeled “other 26 bytes of orig header” and “orig. custom-data-1”.
  • If the file contains the original DOS header, you can use it to deduce and recreate the exact layout of the original file.
  • Files produced by v1.00-beta are a little different: the code image segment starts with the compressed code image segment, and the decompressor comes after.
  • As mentioned previously, the original custom-data-2 segment is discarded.
  • The only way to find the compressed relocation table, or the footer, is to find the end of the previous segment. The only way to find the end of the previous segment is to decompress it.

The “footer” is usually 8 bytes in size. Sometimes it’s larger — I don’t know why.

The first 8 bytes contain four of the fields from the original file’s DOS header:

OffsetField
+0SS
+2SP
+4CS
+6IP

This information is redundant for pristine files that do not use “extra compression”, as it also appears in the copy of the original DOS header.

The compressed code image offset

Figuring out the offset of the compressed code image, relative to the start of the (new) code image segment, is a frustrating challenge. I don’t know why the format designers made it so difficult.

[Update: In Part 5, I give a completely different, hopefully more robust, way of finding the compressed code offset.]

For early versions of PKLITE, the offset seems to always be the same for a given version and flags. The version codes used in this table are the bytes at offset 29 and 28, in that order.

Version/flagsOffset (decimal)
(v1.00-beta)0
0100010f464
1100110d480
110e110f512
2100210f656
31033105672
310c310d656
310e310f704

These offsets are relative to the start of the “code image” segment, not the start of the file.

For PKLITE versions after v1.15, the offset is not always the same for a given version+flags. I’ve worked out some algorithms that seem to work (see the Deark source code if you want), but they haven’t been well tested. This is a crucial piece of the PKLITE puzzle, and I wish I had better information about it.

Other PKLITE decompressors

Switching gears a bit, let’s quickly look at some of the existing PKLITE decompressors.

There are two main conceptual ways to decompress a self-decompressing file like a PKLITEd file:

  1. The same way you would decompress any other compressed file: Learn the compression format, and write a computer program to read the compressed data and decompress it. I call this “static” decompression.
  2. Run the program in a controlled way, let it decompress itself, then make a copy of the decompressed data. I call this “dynamic” decompression.

A dynamic decompression utility usually only runs on the same computer platform that the compressed file runs on, though in theory it could use an emulator. It’s also possible for the utility to implement a minimal interpreter/emulator that can do just enough to get the job done.

A fair number of dynamic PKLITE decompressors were written, back in the day. I’ve only looked at a few of them. Since I’m writing a static decompressor, they’re only of moderate interest to me.

UNP

UNP is an example of a dynamic decompressor for DOS. It supports a lot of formats, including PKLITE. It seems to be one of the more highly regarded such programs, and in my experience, it does a pretty good job with PKLITE format. [Edit: I should have said that UNP’s source code is available, though it’s all in assembly language, so maybe not easy to read.]

One place to get UNP is the SAC archive.

DISLITE

DISLITE (link) is a PKLITE-only dynamic decompressor for DOS. What’s interesting is that the source code was released, which is sadly quite rare for DOS utilities. The source code incorporates some useful knowledge about PKLITE format.

I find it surprising how big the DISLITE source code is. I would have thought that a dynamic decompressor, while high-tech, would not require a lot of code. But not in this case. Apparently, it’s not easy to do this well, at least for PKLITE format.

depklite

depklite (search for it, and/or refer to ModdingWiki) is a small open source C utility that does static decompression. There are several variants of it. It’s quite incomplete, but may still be useful, at least for research purposes.

mz-explode

mz-explode is an open source C++ program for static decompression of PKLITE files. It was evidently derived in part from disassembling PKLITE.

A warning about mz-explode: I’ve seen it argued that portions of it appear to have been created in such a direct manner that it might constitute some form of copyright infringement. Therefore, if you’re a programmer, I have to advise against looking at its source code, unless you understand the legal ramifications.

PKLITE

As previously mentioned, PKLITE itself can decompress some PKLITEd files, using the -x option. I assume its decompression would be classified as “static”. But it has significant limitations, and is not open source.

Parameters required for decompression

To decompress the code image segment, you need several pieces of information:

  • The file position of the start of the compressed data.
  • Does the file use “large”, or “small” compression? (usually given by the 0x20 bit of the byte at offset 29)
  • Does the file use “extra” (-e) compression? (usually given by the 0x10 bit of the byte at offset 29)

That’s usually all you need; however:

  • This does not seem to suffice to decompress special “version 1.20” files. I haven’t figured out how to decompress them.
  • Version 2.01 has an “uncompressed region” feature (-g), which almost certainly requires special handling. I don’t know how to support it. mz-explode purports to be able to detect whether the feature was used, but it doesn’t actually support it.
  • There could be other subtleties, or obscure features, that I’m not aware of.

The good news is that the compression format was quite stable across different versions of PKLITE. You don’t need a different decompression algorithm for each version of PKLITE.

The next post in this series covers the details of the compressed data format, and decompression algorithm.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s