LHARK (with a K) is an old compression/archiver utility for DOS. It is related to the popular utility named LHA (formerly LHarc), but should not be confused with it. You should be able to find a copy of LHARK by searching the web for "LHARK04D". LHARK was developed by Kerwin F. Medina around 1995/1996. It … Continue reading Notes on LHARK compression format
Category: File formats
The blocksize field in LHA compression format
This post is about the data compression format I'll call "lh5". It is actually a family of formats that includes the compression methods often named lh{4, 5, 6, 7, 8}. It was most notably used by version 2.x of the old LHA/LZH/LHArc compressed archive format. It was used, often in modified form, in a number … Continue reading The blocksize field in LHA compression format
LZ77 compression prehistory
LZ77 is a widely-used class of data compression algorithms. I'll start with a quick overview of it. Assuming you're compressing a stream of bytes (a "file"), your LZ77 compressed data, at a high level, would contain two possible kinds of instructions for the decompressor: Emit literal: {byte value=A}Copy from history: {match-offset=B, match-length=C} The match-offset may … Continue reading LZ77 compression prehistory
An obscure polyglot file: LHA/CAR
LHA is a compressed archive file format and compression utility that was, for a long time, a competitor of ZIP. It's also known as LZH format, or LHarc format, but I'll call it LHA. In the course of researching it, I came across an obscure lookalike format created by a program named CAR. CAR is … Continue reading An obscure polyglot file: LHA/CAR
Thoughts on timestamps of computer files
Computer files can have a number of different kinds of timestamps. Some of them are stored in the file's external metadata, alongside the file's name -- I'll call these external timestamps. Others are stored inside the file itself -- I'll call these internal timestamps. I use the term "timestamp" loosely. When something called a "timestamp" … Continue reading Thoughts on timestamps of computer files
PKZIP Implode bug #3
I've already written about two PKZIP bugs related to the "Implode" compression method. Now I've come across another one, so I guess I'll investigate it as well. Here are the first two: Bug #1 (MML)Bug #2 (v1.01 literal tree issue) There's an old collection of files called the Pier 1 Shareware CDROM (#1). On it … Continue reading PKZIP Implode bug #3
Notes on WinHelp format, part 3
This post is part of a series about WinHelp file format. Please read the other parts first: Part 1Part 2 With what we learned previously, we can decompress the TOPIC blocks, locate the TOPICLINKs, and stitch each TOPICLINK's fragments together to make each TOPICLINK a contiguous blob of bytes: A defragmented TOPICLINK is composed of … Continue reading Notes on WinHelp format, part 3
Notes on WinHelp format, part 2
This post is part of a series about WinHelp file format. Part 1 - Read this first.Part 3 The internal TOPIC file (named "|TOPIC") is the business part of the HLP file. It contains the text, and other information. To read the TOPIC file, you need to know the TOPIC "block size", which will be … Continue reading Notes on WinHelp format, part 2
Another Implode bug in old PKZIP software
In a previous post, I discussed an old PKZIP bug related to the compression method named "Implode". I'll call that bug the "MML bug", for "Minimum Match Length". [See also a later post: Bug #3.] In this post, I'll discuss another old PKZIP bug related to Implode compression, mainly just to distinguish it from that … Continue reading Another Implode bug in old PKZIP software
Notes on WinHelp format, part 1
I wanted to write a program to extract the text from WinHelp .HLP files. HLP format was the standard Microsoft Windows help/documentation file format from around 1990 (the start of the Windows 3.x era), through the early 2000s. There are countless old Windows applications that come with an HLP file, but starting with Vista, Windows … Continue reading Notes on WinHelp format, part 1