The Cleveland baseball team

I see that the Cleveland Indians baseball team is finally going to change their nickname. I think that's probably a good thing. For one thing, the word "Indians" is ambiguous, and you wouldn't want to accidentally demean people from South Asia, when you're trying to demean people from North America. They say they haven't chosen … Continue reading The Cleveland baseball team

The blocksize field in LHA compression format

This post is about the data compression format I'll call "lh5". It is actually a family of formats that includes the compression methods often named lh{4, 5, 6, 7, 8}. It was most notably used by version 2.x of the old LHA/LZH/LHArc compressed archive format. It was used, often in modified form, in a number … Continue reading The blocksize field in LHA compression format

LZ77 compression prehistory

LZ77 is a widely-used class of data compression algorithms. I'll start with a quick overview of it. Assuming you're compressing a stream of bytes (a "file"), your LZ77 compressed data, at a high level, would contain two possible kinds of instructions for the decompressor: Emit literal: {byte value=A}Copy from history: {match-offset=B, match-length=C} The match-offset may … Continue reading LZ77 compression prehistory

Alternative to Cantor’s diagonalization argument

How does one prove that there are more real numbers than integers? There are an infinite number of each, but the infinity of the real numbers is, in a strict sense, larger than the infinity of the integers. In math terminology, the set of reals has a larger cardinality. Roughly speaking, it's equivalent to saying … Continue reading Alternative to Cantor’s diagonalization argument

Thoughts on timestamps of computer files

Computer files can have a number of different kinds of timestamps. Some of them are stored in the file's external metadata, alongside the file's name -- I'll call these external timestamps. Others are stored inside the file itself -- I'll call these internal timestamps. I use the term "timestamp" loosely. When something called a "timestamp" … Continue reading Thoughts on timestamps of computer files

Win32 I/O character encoding supplement 1 – A Cygwin issue

A while back, I wrote a series of posts about using Unicode in Windows console mode programs: Part 1Part 2Part 3 In Part 2, I said that programmers should probably not be changing the console code page to UTF-8 (65001). And that if they must, they should change it back when they're done. But now … Continue reading Win32 I/O character encoding supplement 1 – A Cygwin issue