Detecting MacBinary format

Classic Mac OS (the main Macintosh operating system from 1984-2001) had an interesting way of storing files. Each file potentially consists of two separate byte streams, known as the data fork and resource fork. Additionally, some important information was stored in the file’s directory entry, using features of the Mac’s native MFS and HFS filesystems. This information included a type code and creator code.

That’s all well and good, but what if you wanted to use Mac-specific files on non-Mac computers, or transfer them to and from such computers? This was well before the Web existed, but there were lots of dial-up BBSes, and online services. Apple apparently did not offer a solution at that time, so a working group was formed, and in 1985 its members designed a format called MacBinary.

MacBinary is ancient history, and was just one of a number of methods used to deal with such portability issues. But, for a while, it was widely used, and a lot of MacBinary files still exist in old collections.

MacBinary is mostly a pretty okay format. It does what it needs to do. It specifies a portable single-streamed container file format that stores:

  • The full Mac-style filename
  • Metadata including the type/creator codes, timestamps, and a few other Macintoshy things
  • The data fork
  • The resource fork

It did have a few issues, which later versions attempted to address. But one problem really stands out to me: How the heck do you tell whether a file is in MacBinary format or not?

The situation I’m thinking of is that you’re given a file whose format is a complete mystery, and you want to know whether it is in MacBinary format. Even if the contained file uses a format unknown to you, you still want to detect the MacBinary wrapper.

For practical purposes, MacBinary usually amounts to an extra 128-byte header at the beginning of file. It’d be nice if you could examine those 128 bytes and determine, with high reliability, whether they represent a MacBinary header. You could also look at the rest of the file, but ideally you wouldn’t have to.

Another clue you might have is the filename of the maybe-MacBinary file. A lot of filenames end with a short “filename extension”, like “.pdf”. Though the Mac didn’t use filename extensions, MacBinary was intended to be used on foreign computers, so you might expect it to have a conventional extension. But no extension is suggested by the specifications. Somehow, “.bin” came to be the de facto standard extension for MacBinary, but that’s marginally useful at best, because “.bin” is also used for lots of generic “binary” files. Some MacBinary files instead use the conventional extension of the file that they contain, which also isn’t really helpful to us.

A special note about MacPaint

Maybe I’m wrong, but I get the impression that the MacPaint image format has developed a more intimate relationship with MacBinary than most any other format has. It’s like a MacBinary header has become an intrinsic, though optional, part of the MacPaint format itself.

That might be because MacPaint is another difficult format to identify, and adding a MacBinary header (or, not deleting it) can make identification much easier. MacPaint w/MacBinary should have the “PNTG” type code at offset 65. But I have found a few rare exceptions, that have a MacBinary header but don’t have the proper type code.

I’ve also found more than a few MacPaint files that have a MacBinary header with incorrect information, notably the “data fork length” field. It’s as if the image was edited at some point, but the MacBinary part wasn’t properly updated to reflect the new image.

What the specs say

Let’s review what the written MacBinary specifications say about detecting the format.

I’m aware of three significant versions of MacBinary: the original version (I’ll call it “V1”), MacBinary II (“V2”), and MacBinary III (“V3”). There was also a MacBinary II+, but that’s a different animal, and I’ll say no more about it.

V1 carved out the 128 bytes for the header, but didn’t use all of it. Later versions made use of some of the reserved space.

The biggest obstacle to MacBinary detection is that 0 is a valid, or at least arguably valid, value for virtually every one of the 128 header bytes. And there are a lot of other file formats that can start with a lot of 0 bytes. You can’t just assume they’re all MacBinary.

V1

The V1 spec has some detection advice: Look for byte value 0 at offset 0, 74, and 82. But it’s imagining a transfer protocol that supports only two kinds of files: MacBinary and plain text. This advice is not very useful in general, and I guess it’s not intended to be. I feel like I’m missing something here — How was a Mac user expected to download a non-text non-Mac file?

V2

V2 added a 16-bit CRC field containing a “checksum” of the first 124 bytes in the header, which apparently was intended to help with format detection.

The V2 spec also gives some advice on detecting V1. It clarifies that a filename length of 0 is not expected, which if true could mean that the filename length field is the only byte in the header that cannot possibly be 0. It’s probably also safe to assume that none of the bytes making up the filename will be 0.

It says that the V1 fork lengths must be no more than 0x7FFFFF bytes (about 8MB). But the V1 spec does not mentions such a restriction. I guess this comes from a file size limitation of Macs of the V1 era. But there’s nothing stopping someone from creating a valid V1 file with larger forks, so this isn’t a very good thing to test. What it suggests to me is that the V2 spec authors were desperately looking for something, anything, that could be used to help detect V1 format. By the way, I’ve seen this test suggested in third-party documents as a way to detect MacBinary in general. But I’m confident that, at most, it should only apply to files that might be V1.

V3

With V3, the format finally got a proper signature: “mBIN” at offset 102. But V3 was just a maintenance release from late in MacBinary’s useful lifespan. It came too late to be of much help with format detection.

Suggestions

V3 is pretty easy to identify, thanks to its “mBIN” signature, and a few bytes that must be 0.

V2 is possible to identify quite reliably, if the CRC field is set correctly, and the version number fields at offsets 122-123 are set correctly to 129.

Unfortunately, the CRC field’s existence makes the format much harder to write, and some implementers get lazy and just set the CRC field to 0. Note that the spec does not even tell you what CRC algorithm to use — you have to be a detective to figure it out. (It’s the one sometimes called CRC16-CCITT.) The version numbers alone are still a decent signal, and you can combine that with almost anything that works for V1.

Distinguishing V1 (and some invalid V2) from other formats remains difficult.

Check that the filename length field is between 1 and 63. You can then test that many bytes to make sure they are sensible filename characters, say byte values 32 to 255. But no promises. My research suggests that almost anything could technically be legal.

The type and creator codes are (I think?) supposed to be four ASCII characters each, but either or both might be set to all 0 bytes. I guess you could test whether the codes are one of those two things, but that’s not really very discriminating.

Other than the bytes at offset 74 and 82, you cannot rely on any reserved/unused/padding bytes being 0. It can be a helpful signal, but it’s not always correct. Even when the spec clearly states “zero fill” or “padded with nulls”, some files have random garbage in these bytes. Especially annoying is that the V2 version and CRC fields might contain garbage.

If you know the file size, that can help. At least if the “fork length” fields are correct. Which they aren’t always, especially the resource fork length. But assuming they are correct, you can calculate two sensible file sizes: one if the last fork in the file is padded to the next 128-byte boundary, and one if it is not. If the actual file size is one of those two sizes, that’s a strong indication that it is MacBinary format.

Even if you don’t know the file size, since fields-that-can’t-be-0 are in such short supply, you might want to require that at least one of the forks’ lengths be nonzero. If they’re both 0, then even if it is MacBinary, it’s an empty file, and maybe not so important.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s