The blocksize field in LHA compression format

This post is about the data compression format I'll call "lh5". It is actually a family of formats that includes the compression methods often named lh{4, 5, 6, 7, 8}. It was most notably used by version 2.x of the old LHA/LZH/LHArc compressed archive format. It was used, often in modified form, in a number … Continue reading The blocksize field in LHA compression format

LZ77 compression prehistory

LZ77 is a widely-used class of data compression algorithms. I'll start with a quick overview of it. Assuming you're compressing a stream of bytes (a "file"), your LZ77 compressed data, at a high level, would contain two possible kinds of instructions for the decompressor: Emit literal: {byte value=A}Copy from history: {match-offset=B, match-length=C} The match-offset may … Continue reading LZ77 compression prehistory

Alternative to Cantor’s diagonalization argument

How does one prove that there are more real numbers than integers? There are an infinite number of each, but the infinity of the real numbers is, in a strict sense, larger than the infinity of the integers. In math terminology, the set of reals has a larger cardinality. Roughly speaking, it's equivalent to saying … Continue reading Alternative to Cantor’s diagonalization argument

Thoughts on timestamps of computer files

Computer files can have a number of different kinds of timestamps. Some of them are stored in the file's external metadata, alongside the file's name -- I'll call these external timestamps. Others are stored inside the file itself -- I'll call these internal timestamps. I use the term "timestamp" loosely. When something called a "timestamp" … Continue reading Thoughts on timestamps of computer files

Win32 I/O character encoding supplement 1 – A Cygwin issue

A while back, I wrote a series of posts about using Unicode in Windows console mode programs: Part 1Part 2Part 3 In Part 2, I said that programmers should probably not be changing the console code page to UTF-8 (65001). And that if they must, they should change it back when they're done. But now … Continue reading Win32 I/O character encoding supplement 1 – A Cygwin issue

The recent discovery of an ancient bilaterian fossil

In March 2020, researchers in Australia announced the discovery of a 555 million-year-old fossil of a bilaterian animal, which they named Ikaria wariootia. The way that journalists presented this discovery made it clear that it is significant for… something. But it seemed to be difficult for them to pin down precisely what is significant about … Continue reading The recent discovery of an ancient bilaterian fossil

The World Chess Championship is kinda not so great

As I write this, the chess Candidates Tournament is still expected to be held as scheduled, starting March 17, 2020, in Yekaterinburg, Russia. The tournament is between eight of the top chess players who aren't currently the World Champion. The winner of the tournament will earn the right to play a match against the current … Continue reading The World Chess Championship is kinda not so great