Win32 I/O character encoding supplement 3: UTF-8 manifest

For a list of other posts in this series, refer to the first post. A relatively recent Windows software development feature, affecting character encoding, is the ability to request a specific "ANSI" character encoding (or "code page"), presumably UTF-8, using a manifest. I decided to investigate what this really does. This "manifest method" is independent … Continue reading Win32 I/O character encoding supplement 3: UTF-8 manifest

Updated survey of LHarc and LHA

Since my first post on DOS versions of LHarc/LHA, I've found a few more versions of the software. Six of them appear to be original/official, and all of those are Japanese-language: 1.13d, 2.05b, 2.13, 2.52, 2.54, and 2.55. And I found quite a few new modified or hacked versions, two of which I'll discuss: "v1.14a" … Continue reading Updated survey of LHarc and LHA

Survey of EXPAND/DECOMP utilities

If you look at old DOS and Windows software distribution disks, you may see a lot of files whose names have the last character replaced by a "_" character, or sometimes a "$" character. For example: mplayer.ex_ mplayer.hl_ mplayer.re_ msacm.dl_ msacm.dr_ msadpcm.ac_ mscomstf.dl_ ... Many such files belong to a family of compressed file formats … Continue reading Survey of EXPAND/DECOMP utilities

What is LZSS compression?

I'm not asking how to implement LZSS. I'm asking how to distinguish things-that-are-LZSS from things-that-are-not-LZSS. It's generally understood that LZSS is a kind of data compression. It's supposedly a derivative of LZ77. But you may struggle to find out anything definitive or verifiable about LZSS. The name LZSS is more than likely derived from Lempel–Ziv–Storer–Szymanski, … Continue reading What is LZSS compression?

Survey of LHarc and LHA versions and names

[See this post for some updates to the information here.] The LHarc family of software (including LHA, etc.) is an old compression and archiving utility, originally for DOS computers. I've found the LHarc version history to be confusing in a number of ways. In this post, I'll try to explain what's what, to the best … Continue reading Survey of LHarc and LHA versions and names

Win32 I/O character encoding supplement 2 – setlocale enhancement

This is part of a series of post on using Unicode in Windows command-line applications. Here's the first post. Sometime in 2018, some functions in the Windows 10 C runtime system, and related development SDKs, were enhanced to support UTF-8. This feature is enabled by calling the setlocale function. For reference, Microsoft's current documentation of … Continue reading Win32 I/O character encoding supplement 2 – setlocale enhancement

Encoding Huffman codebooks

This post will assume you have a basic knowledge of the data compression technique known as Huffman coding. Though maybe, since I'm only concerned about decompression, I should call it something like "bit-oriented prefix codes". Huffman coding is really just one of the algorithms that can produce such a code, but it's the term everybody … Continue reading Encoding Huffman codebooks

The Cleveland baseball team

I see that the Cleveland Indians baseball team is finally going to change their nickname. I think that's probably a good thing. For one thing, the word "Indians" is ambiguous, and you wouldn't want to accidentally demean people from South Asia, when you're trying to demean people from North America. They say they haven't chosen … Continue reading The Cleveland baseball team

LZ77 compression prehistory

LZ77 is a widely-used class of data compression algorithms. I'll start with a quick overview of it. Assuming you're compressing a stream of bytes (a "file"), your LZ77 compressed data, at a high level, would contain two possible kinds of instructions for the decompressor: Emit literal: {byte value=A}Copy from history: {match-offset=B, match-length=C} The match-offset may … Continue reading LZ77 compression prehistory

Thoughts on timestamps of computer files

Computer files can have a number of different kinds of timestamps. Some of them are stored in the file's external metadata, alongside the file's name -- I'll call these external timestamps. Others are stored inside the file itself -- I'll call these internal timestamps. I use the term "timestamp" loosely. When something called a "timestamp" … Continue reading Thoughts on timestamps of computer files