Ideas for ISO 9660 CD-ROM format shenanigans

ISO 9660 is the filesystem format used on most CD-ROMs. It isn’t restricted to CD-ROMs, though. It can also be found in the form of an “.iso” image file, which might have been “ripped” from a CD-ROM (or other media format), or created with the intention of “burning” it to a CD-ROM.

I researched the format recently, and had some ideas for how to play stupid games with it, to make a CD-ROM or ISO file that might be interpreted differently by different ISO 9660 decoders, or that would also be a valid file of some other format.

That’s no challenge, I admit. ISO 9660 is designed to be flexible, and it’s also somewhat quirky. It’s probably more difficult to make a CD-ROM that works the same on different systems. But… I’m going to write about it anyway.

I’ll use the word “decoder” to mean any software that reads ISO 9660 filesystems. It could be an operating system like Windows or Linux, or an un-archiver application like 7-Zip or bsdtar.

Multiple non-identical PVDs

Each ISO file has one or more “volume descriptors”. A volume descriptor contains a pointer to all the rest of the data.

To decode an ISO file, a decoder starts by looking at the volume descriptors, and choosing its favorite one. If there are multiple volume descriptors, they usually all resolve to the same set of files. But there is nothing that forces them to. Two volume descriptor could point to completely different sets of files.

A lot of ISOs contain two identical copies of the primary volume descriptor, I guess in case one gets damaged. Well, they’re supposed to be identical. If you break the rules and make them different, maybe some decoders will use the first one, and others will use the second.

Joliet detection

Joliet is an extension to support advanced filenames, and it’s implemented in the form of a special volume descriptor. Therefore, it’s trivial to make an ISO that looks completely different depending on whether or not Joliet is supported and enabled.

Even if two decoders both support Joliet, identifying the Joliet descriptor is just complex enough that I bet different decoders use different algorithms. They have to look for one of a set of special byte sequences in the “escape sequences” field of the volume descriptor.

What if the sequence is not the first thing in that field? What if you use a code for UTF-16, instead of the usual UCS-2 codes? What if you put the sequence in the primary volume descriptor, instead of just the Joliet descriptor like you’re supposed to?

Path tables that lie

There are two very different ways to find the files referred to by a volume descriptor. One is to start in the root directory, and recursively read the subdirectories you encounter. The other is to read the “Path table”, which is a simple list of pointers to all the subdirectories.

If the path table is inconsistent with the actual directory tree, a decoder might figure that out, and start reporting errors. Still, it probably wouldn’t fail completely every time. You could try omitting some directories from the path table. Or you could put a secret directory in the path table that’s unreachable from the root directory, so only decoders that use the path table can find it.

What’s more, each volume descriptor can have multiple path tables, for redundancy, and so that the decoder can choose its favorite one.

Numbers with “both-byte orders”

Many numbers in an ISO file are encoded in a palindromic format consisting of a little-endian integer, followed by a big-endian representation of the same integer. For example, the number 1000 (hex 0x3E8) would be stored as

E8 03 00 00 00 00 03 E8

(Really.) If the two halves of the number are inconsistent, I assume that some decoders will use the first half, and others the second (and some will probably notice the problem and fail). For example, by corrupting the “file size” field, you could make a file that is larger on systems that prefer big-endian, than it is on systems that prefer little-endian.

Rock Ridge technicalities

Rock Ridge (together with its companion format, System Use Sharing Protocol) is an extension that adds features to ISO 9660, such as additional file attributes, and longer filenames.

Rock Ridge is specified fairly strictly, but most decoders interpret it more liberally than they’re supposed to. (And I can’t really blame them, because its strictness reduces backward compatibility, which makes users angry.)

It’s probably not much of an exaggeration to suggest that no two Rock Ridge decoders work quite the same. To give an example, it looks to me like bsdtar respects the “ST” terminator entry, while Linux does not. If you were to put a long-filename entry after an ST entry, Linux would use the long filename, while bsdtar would use the original short filename.

Hybrid files

ISO 9660 is deliberately designed to make certain hybrid formats possible. One common use of this flexibility is that Macintosh CD-ROMs often look like both an ISO 9660 CD-ROM, and an Macintosh-formatted disk with an HFS partition (HFS is a Macintosh-specific filesystem).

And if you’re just talking about ISO image files, there are unlimited possibilities.

  • You can put literally anything you want in the first 32 KB of the file. So, any self-terminating format ≤32 KB will work. And in some cases, you can work around the “self-terminating” and “≤32 KB” restrictions.
  • You could put a small JPEG image file there, for example, so your file would look like a JPEG file to some decoders, and like an unrelated CD-ROM image to others. Supporting JPEGs larger than 32 KB is possible as well, though quite a bit more difficult.
  • Making a ZIP hybrid would be pretty simple. If you don’t use compression, you could even have the ZIP and ISO formats share the same member file data. I think this would cost you an extra 2 KB per member file, but still, such a format might actually be useful for something.

References

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s