Making an uncompressed JPEG 2000 file

Challenge: Construct a JPEG 2000 image file that isn't compressed. Also, try to do it without spending any money. Overview The flagship feature of the JPEG 2000 suite of graphics formats is the wavelet-based "JPEG 2000 codestream" compression format. This challenge is not to figure out how to make a degenerate form of that format … Continue reading Making an uncompressed JPEG 2000 file

Encoding Huffman codebooks

This post will assume you have a basic knowledge of the data compression technique known as Huffman coding. Though maybe, since I'm only concerned about decompression, I should call it something like "bit-oriented prefix codes". Huffman coding is really just one of the algorithms that can produce such a code, but it's the term everybody … Continue reading Encoding Huffman codebooks

The blocksize field in LHA compression format

This post is about the data compression format I'll call "lh5". It is actually a family of formats that includes the compression methods often named lh{4, 5, 6, 7, 8}. It was most notably used by version 2.x of the old LHA/LZH/LHArc compressed archive format. It was used, often in modified form, in a number … Continue reading The blocksize field in LHA compression format

LZ77 compression prehistory

LZ77 is a widely-used class of data compression algorithms. I'll start with a quick overview of it. Assuming you're compressing a stream of bytes (a "file"), your LZ77 compressed data, at a high level, would contain two possible kinds of instructions for the decompressor: Emit literal: {byte value=A}Copy from history: {match-offset=B, match-length=C} The match-offset may … Continue reading LZ77 compression prehistory

Thoughts on timestamps of computer files

Computer files can have a number of different kinds of timestamps. Some of them are stored in the file's external metadata, alongside the file's name -- I'll call these external timestamps. Others are stored inside the file itself -- I'll call these internal timestamps. I use the term "timestamp" loosely. When something called a "timestamp" … Continue reading Thoughts on timestamps of computer files

Notes on WinHelp format, part 3

This post is part of a series about WinHelp file format. Please read the other parts first: Part 1Part 2 With what we learned previously, we can decompress the TOPIC blocks, locate the TOPICLINKs, and stitch each TOPICLINK's fragments together to make each TOPICLINK a contiguous blob of bytes: A defragmented TOPICLINK is composed of … Continue reading Notes on WinHelp format, part 3