Notes on PKLITE format, Part 1

I’ve been writing (as a feature of Deark) a decompressor for DOS EXE files compressed with PKLITE. Before I forget it all, I decided to write down some of the things I’ve learned about PKLITE.

This is the first post in a series. Other posts:

There is a fair amount of information about PKLITE on the internet, but I haven’t found anything that I’d consider to be of the highest quality in terms of completeness and accuracy. If you want to see what others have written about it, I suggest starting with ModdingWiki’s page on PKLITE.

What is PKLITE?

PKLITE is an “executable compression” utility. It is computer software that converts an executable file into a smaller one which — one hopes — still functions the same.

It was developed by PKWARE, the makers of PKZIP. It was first released in 1990.

Screenshot of PKLITE’s “usage” message

PKLITE was popular. I think it’s safe to safe to say that, in its heyday, it was the most commonly used executable compression utility for DOS — other than possibly EXEPACK (a primitive compressor that was a standard part of some development systems).

I scanned the EXE files (excluding self-extracting ZIP files) in a late version of the Simtel DOS archive, and found that over 10% of them are compressed with PKLITE. The point is, there are a lot of PKLITE-compressed files in existence.

Supported platforms

PKLITE compresses DOS EXE and DOS COM formats. The last(?) version of PKLITE, v2.01, also supports Windows 3.x EXE files. The was also a separate product named PKLITE32, which compresses 32-bit Windows executables.

I don’t have much to say about the Windows formats. I don’t think they were very popular, but I don’t know.

I don’t have much to say about the COM format, either, though I plan to research it someday.

Except where noted, this post is about the DOS EXE format. I will be referring to some of the fields and segment names from my previous post on EXE files, so please use that for reference if you’re interested.

Post-processed PKLITE files

A word of caution: In this post, I’m only describing pristine PKLITE files. That is, files that were directly produced by PKLITE.

Unfortunately, some PKLITE users, for one reason or another, wanted to disguise the fact that they used PKLITE, and/or to make it harder to decompress the file. And it’s not very difficult to do that.

Something to note is that nothing in the custom-data-1 or custom-data-2 segments (refer to my DOS EXE post) is used when executing a PKLITE file. Users figured this out, and sometimes they modified or deleted those segments.

Also, inside the decompression routine, PKLITE files contain the error message “Not enough memory”. This is another thing that should not be trusted to be present, if you’re trying to identify PKLITE files. Some users would overwrite it with spaces, or otherwise modify it.

In practice, robust identification and decompression of PKLITE files requires that you look at the parts of the file that are hard to mess with.

Versions of PKLITE

There were versions of PKLITE that were freely-distributable: “free for noncommercial use”. I’ll call them the “free” versions. There were also “registered” (or “professional”) versions that you could purchase.

PKLITE-compressed files contain a version number label, in the bytes at offset 28 and 29. The low 4 bits of byte 29 are the major version (1 or 2). Byte 28 is the minor version. For example, bytes 0x32 0x21 would translate to version 1.50 (The “3” digit will be explained later).

As best as I can determine, there are nine legitimate free versions:

VersionDateTypical distribution filename
1.001990-12-01PKLTE10.EXE
1.031990-12-20PKLTE103.EXE
1.051991-03-20PKLTE105.EXE
1.121991-06-15PKLTE112.EXE
1.131991-08-01PKLTE113.EXE
1.141992-06-01PKLTE114.EXE
1.151992-07-30PKLTE115.EXE
1.501995-04-10PKLTE150.EXE
2.011996-03-15PKLTS201.EXE

Every free version consistently writes its version number to the files it creates. You may think that’s obvious, but it’s not the sort of thing one can just safely assume.

The number and identity of the registered is unfortunately not known to me. There might be more registered versions than free versions. Some of them might be classified as “beta”. I don’t know whether the version number of the software always matches the version number label in the generated files.

The only registered versions that I’ve seen on the internet are 1.12, 1.13, and 1.15.

There are also two notable unauthorized versions that are readily found on the internet:

  • v1.00-beta – 1990-05-29 – PKLITE.ZIP – This seems to be a private beta version that was leaked.
  • Fake “v1.20” – “1992-08-20” – PKLT120R.* – This is actually a lightly-hacked copy of v1.12-registered.

There may also have been internal versions of PKLITE, used only by PKWARE for its own software.

Not all PKLITE-compressed files were produced directly by PKLITE. Notably, the ZIP2EXE utility from PKZIP versions 2.x produces self-extracting archives whose unzip routine appears to have been compressed with PKLITE v1.20-registered (whatever that is).

The “flags” bits

As previously mentioned, the low 4 bits of the byte at offset 29 of a PKLITE-compressed EXE file contain the major version number. The high 4 bits are for flags:

  • The 0x10 bit, if set, means “extra” compression — an option (-e) only supported by the registered versions.
  • The 0x20 bit tells which of two variants of the compression algorithm was used. Technically, it tells which set of predefined Huffman codebooks was used. If set, the file is sometimes said to use “large” compression. Which variant to use seems to be automatically decided by PKLITE.
  • The 0x40 bit is, as far as I know, used only by the 1.00-beta version. If set, it means the -l (“load-high”) option was used when creating the file. This feature does not exist in the official releases.
  • The 0x80 bit is unused.

At offset 30 of a pristine PKLITE-compressed EXE file is an identification string and copyright message. For what it’s worth, here are the messages used by the different versions:

Version      Message
-------      -------
1.00         PKLITE Copr. 1990 PKWARE Inc. All Rights Reserved
1.03         PKLITE Copr. 1990 PKWARE Inc. All Rights Reserved
1.05         PKLITE Copr. 1991 PKWARE Inc. All Rights Reserved
1.12         PKLITE Copr. 1990-91 PKWARE Inc. All Rights Reserved
1.13         PKLITE Copr. 1990-91 PKWARE Inc. All Rights Reserved
1.14         PKLITE Copr. 1990-92 PKWARE Inc. All Rights Reserved
1.15         PKLITE Copr. 1990-92 PKWARE Inc. All Rights Reserved
1.50         PKLITE Copr. 1990-1995 PKWARE Inc. All Rights Reserved
2.01         PKlite(R) Copr. 1990-1996 PKWARE Inc. All Rights Reserved

1.00-beta    PKLITE Copr. 1990 PKWARE Inc. All Rights Reserved
1.20-fake    PKLITE Copr. 1990-92 PKWARE Inc. All Rights Reserved

Note that you can apparently distinguish files made by the fake “v1.20”, from those made by the real v1.12, by the date of a v1.12 file being “92” instead of “91”. (At least, I’m pretty sure there was no 1992 release of v1.12.) That one byte is the only difference between such files.

Other observations about the PKLITE software

  • PKLITE is well documented, and the free version’s documentation includes information about features only available in the registered version.
  • PKLITE has a feature (the -x option) to decompress compressed files. So, one might ask, why go to the trouble of writing your own decompressor? My reply is that it has some limitations, the biggest being that it can’t decompress most of the files compressed by the registered versions of PKLITE. And it runs on DOS, which isn’t always convenient nowadays.
  • The PKLITE distribution includes a utility named CHK4LITE.EXE, which will examine a file, and tell you whether it is compressed by PKLITE, and the version of PKLITE that was used. This is potentially useful, especially if we can figure out the algorithm it uses.

What it means to decompress a PKLITE file

Let’s say you have a PKLITE-compressed file, and you want to decompress it. But what do you really want?

Maybe, all you want is a decompressed copy of the compressed part of the file, so you can scan it for interesting text, and things like that.

Beyond that, the obvious goal of decompression is to exactly reproduce the original EXE file, before it was compressed by PKLITE. It turns out that this is sometimes possible, and sometimes not. PKLITE sometimes throws away information deemed to be unimportant, so it can’t be reproduced. But even in those cases, it’s usually possible to create a decompressed file that works the same as the original. Reproducing the original file, as best we can, is the type of decompression I’m aiming for.

Worth noting, though, is that some software developers didn’t want people decompressing and modifying their software, so they tried to make their software only work if it was compressed by PKLITE. So, even if you decompress a file perfectly, it might not actually work. The later versions of PKLITE even have a documented way that programs can check to see if they are PKLITE-compressed. For more information, search the v2.01 documentation for “PSP”. It seems that it is possible, with some wizardry, to create a decompressed file that will trick programs that perform this check into thinking they are compressed. But that’s beyond my knowledge, and I’m not going to try to do it.

What is saved in a PKLITE-compressed file

Here’s a list of the components of a to-be-compressed EXE file, and what PKLITE does with them.

  • The DOS code image segment is saved, compressed using LZ77 with Huffman coding. The specific algorithm is, as far as I know, unique to PKLITE.
  • The 28-byte DOS header is saved. This is optional in the registered version. The most important data in the DOS header is also saved in another way.
  • The custom-data-1 segment is saved. This is optional in the registered version.
  • The custom-data-2 segment is not saved. Any data in this segment will be lost. This is a bit ironic, since PKLITE itself puts data in the custom-data-2 segment
  • The relocation table is saved, compressed using a specialized algorithm. It can be saved as-is. Or it can be converted to a functionally equivalent form that is more compressible, though this means that the original table might not be recoverable.
  • The custom-data-3 (overlay) segment is retained as-is. There is an option to discard it instead. It is never compressed.

To be continued

I know, I haven’t even gotten to the structure of PKLITE-compressed files, or the compression algorithms, or a robust way to identify PKLITE-compressed files as such. I plan to investigate those things in future posts. But I’ll admit that I don’t know everything. There are a few types of PKLITE files that I still don’t know how to decompress, and some features I don’t fully understand.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s