Survey of EXPAND/DECOMP utilities

If you look at old DOS and Windows software distribution disks, you may see a lot of files whose names have the last character replaced by a "_" character, or sometimes a "$" character. For example: mplayer.ex_ mplayer.hl_ mplayer.re_ msacm.dl_ msacm.dr_ msadpcm.ac_ mscomstf.dl_ ... Many such files belong to a family of compressed file formats … Continue reading Survey of EXPAND/DECOMP utilities

What is LZSS compression?

I'm not asking how to implement LZSS. I'm asking how to distinguish things-that-are-LZSS from things-that-are-not-LZSS. It's generally understood that LZSS is a kind of data compression. It's supposedly a derivative of LZ77. But you may struggle to find out anything definitive or verifiable about LZSS. The name LZSS is more than likely derived from Lempel–Ziv–Storer–Szymanski, … Continue reading What is LZSS compression?

Survey of LHarc and LHA versions and names

The LHarc family of software (including LHA, etc.) is an old compression and archiving utility, originally for DOS computers. I've found the LHarc version history to be confusing in a number of ways. In this post, I'll try to explain what's what, to the best of my knowledge. LHarc is comparable to its contemporaries PKZIP … Continue reading Survey of LHarc and LHA versions and names

Win32 I/O character encoding supplement 2 – setlocale enhancement

This is part of a series of post on using Unicode in Windows command-line applications. Here's the first post. Sometime in 2018, some functions in the Windows 10 C runtime system, and related development SDKs, were enhanced to support UTF-8. This feature is enabled by calling the setlocale function. For reference, Microsoft's current documentation of … Continue reading Win32 I/O character encoding supplement 2 – setlocale enhancement

Encoding Huffman codebooks

This post will assume you have a basic knowledge of the data compression technique known as Huffman coding. Though maybe, since I'm only concerned about decompression, I should call it something like "bit-oriented prefix codes". Huffman coding is really just one of the algorithms that can produce such a code, but it's the term everybody … Continue reading Encoding Huffman codebooks

The Cleveland baseball team

I see that the Cleveland Indians baseball team is finally going to change their nickname. I think that's probably a good thing. For one thing, the word "Indians" is ambiguous, and you wouldn't want to accidentally demean people from South Asia, when you're trying to demean people from North America. They say they haven't chosen … Continue reading The Cleveland baseball team

LZ77 compression prehistory

LZ77 is a widely-used class of data compression algorithms. I'll start with a quick overview of it. Assuming you're compressing a stream of bytes (a "file"), your LZ77 compressed data, at a high level, would contain two possible kinds of instructions for the decompressor: Emit literal: {byte value=A}Copy from history: {match-offset=B, match-length=C} The match-offset may … Continue reading LZ77 compression prehistory

Thoughts on timestamps of computer files

Computer files can have a number of different kinds of timestamps. Some of them are stored in the file's external metadata, alongside the file's name -- I'll call these external timestamps. Others are stored inside the file itself -- I'll call these internal timestamps. I use the term "timestamp" loosely. When something called a "timestamp" … Continue reading Thoughts on timestamps of computer files

Win32 I/O character encoding supplement 1 – A Cygwin issue

A while back, I wrote a series of posts about using Unicode in Windows console mode programs: Part 1Part 2Part 3 In Part 2, I said that programmers should probably not be changing the console code page to UTF-8 (65001). And that if they must, they should change it back when they're done. But now … Continue reading Win32 I/O character encoding supplement 1 – A Cygwin issue

What is the name of libjpeg?

Shortly after the development of the JPEG image format around 1991, an organization named the Independent JPEG Group (IJG) released an open source software package to help people use the format. While the software included a few utilities, such as cjpeg and djpeg, the important part of it was its C library. The library became … Continue reading What is the name of libjpeg?