ARC/PAK’s “Distilled” compression scheme

PAK is an old file compression and archiving program for DOS, developed by NoGate Consulting. (Search the web for "pak251.exe".) It has a number of features, which include some extensions to ARC format. One such ARC extension is compression method #11, named "Distilled". It was introduced in PAK v2.0, released in July 1989. Unlike my … Continue reading ARC/PAK’s “Distilled” compression scheme

Making an uncompressed JPEG 2000 file

Challenge: Construct a JPEG 2000 image file that isn't compressed. Also, try to do it without spending any money. Overview The flagship feature of the JPEG 2000 suite of graphics formats is the wavelet-based "JPEG 2000 codestream" compression format. This challenge is not to figure out how to make a degenerate form of that format … Continue reading Making an uncompressed JPEG 2000 file

Encoding Huffman codebooks

This post will assume you have a basic knowledge of the data compression technique known as Huffman coding. Though maybe, since I'm only concerned about decompression, I should call it something like "bit-oriented prefix codes". Huffman coding is really just one of the algorithms that can produce such a code, but it's the term everybody … Continue reading Encoding Huffman codebooks

The blocksize field in LHA compression format

This post is about the data compression format I'll call "lh5". It is actually a family of formats that includes the compression methods often named lh{4, 5, 6, 7, 8}. It was most notably used by version 2.x of the old LHA/LZH/LHArc compressed archive format. It was used, often in modified form, in a number … Continue reading The blocksize field in LHA compression format

LZ77 compression prehistory

LZ77 is a widely-used class of data compression algorithms. I'll start with a quick overview of it. Assuming you're compressing a stream of bytes (a "file"), your LZ77 compressed data, at a high level, would contain two possible kinds of instructions for the decompressor: Emit literal: {byte value=A}Copy from history: {match-offset=B, match-length=C} The match-offset may … Continue reading LZ77 compression prehistory

Thoughts on timestamps of computer files

Computer files can have a number of different kinds of timestamps. Some of them are stored in the file's external metadata, alongside the file's name -- I'll call these external timestamps. Others are stored inside the file itself -- I'll call these internal timestamps. I use the term "timestamp" loosely. When something called a "timestamp" … Continue reading Thoughts on timestamps of computer files