A quick overview of DOS EXE file format

This is a brief introduction to, and a diagram of, the basic structure of the EXE executable file format used by MS-DOS and some other operating systems. DOS EXE is a subset of the PE "Portable Executable" format still used by Microsoft Windows today, so this also constitutes preliminary information that could be helpful to … Continue reading A quick overview of DOS EXE file format

Testing some LZ77 compression limits

This post is about data compression algorithms that involve LZ77, or a similar kind of compression. It's mainly about old-school compression algorithms and software. There is some information about LZ77 in my post about LZ77 prehistory. I won't explain it in detail here, but here are some things to know about it. Both the compressor … Continue reading Testing some LZ77 compression limits

ARC/PAK’s “Distilled” compression scheme

PAK is an old file compression and archiving program for DOS, developed by NoGate Consulting. (Search the web for "pak251.exe".) It has a number of features, which include some extensions to ARC format. One such ARC extension is compression method #11, named "Distilled". It was introduced in PAK v2.0, released in July 1989. Unlike my … Continue reading ARC/PAK’s “Distilled” compression scheme

ARC’s “Trimmed” compression scheme

ARC is a file compression and archiving utility that was in use from the mid-1980s to the early 1990s, mainly on DOS computers. It was developed by a small company named System Enhancement Associates. The last major version of ARC was 7.x, first released in late 1989 or early 1990. In v7, ARC became part … Continue reading ARC’s “Trimmed” compression scheme

Making an uncompressed JPEG 2000 file

Challenge: Construct a JPEG 2000 image file that isn't compressed. Also, try to do it without spending any money. Overview The flagship feature of the JPEG 2000 suite of graphics formats is the wavelet-based "JPEG 2000 codestream" compression format. This challenge is not to figure out how to make a degenerate form of that format … Continue reading Making an uncompressed JPEG 2000 file

Encoding Huffman codebooks

This post will assume you have a basic knowledge of the data compression technique known as Huffman coding. Though maybe, since I'm only concerned about decompression, I should call it something like "bit-oriented prefix codes". Huffman coding is really just one of the algorithms that can produce such a code, but it's the term everybody … Continue reading Encoding Huffman codebooks

The blocksize field in LHA compression format

This post is about the data compression format I'll call "lh5". It is actually a family of formats that includes the compression methods often named lh{4, 5, 6, 7, 8}. It was most notably used by version 2.x of the old LHA/LZH/LHArc compressed archive format. It was used, often in modified form, in a number … Continue reading The blocksize field in LHA compression format

LZ77 compression prehistory

LZ77 is a widely-used class of data compression algorithms. I'll start with a quick overview of it. Assuming you're compressing a stream of bytes (a "file"), your LZ77 compressed data, at a high level, would contain two possible kinds of instructions for the decompressor: Emit literal: {byte value=A}Copy from history: {match-offset=B, match-length=C} The match-offset may … Continue reading LZ77 compression prehistory