Survey of RAR comment formats

This post is about RAR, a file compression and archiving file format in the same category as ZIP. It’s known for its association with the WinRAR software, but there are also command-line and text-mode versions of the software that are just named RAR.

RAR supports two kinds of comments:

  1. A main “archive comment”, for the archive file as a whole
  2. “File comments”, one for each member file stored inside the archive file

This is typical of such formats. Comments usually have no significant effects, but they might be shown to the user in certain situations, or might be used to construct a table of contents for a collection of archive files.

I have no particular reason to care about RAR comments, but I was studying RAR, and thought it was interesting how many different comment formats there are.

Though this is a lot of writing for such a tiny topic, it does not come close to being a complete study of RAR comments. I’ll just discuss basic file format issues, compression modes, and apparent size limits. Among the things I won’t cover:

  • The various ways of creating comments
  • Character encoding issues
  • Issues with how line endings are encoded
  • The fact that some self-extracting RAR files put an installation script in the archive comment

Notes

Based on my testing of RAR software, I believe these things to always be true:

  • There are no options that let the user choose whether a comment is compressed, or the compression parameters. The RAR software decides for you.
  • RAR does not optimize the compression. A given version of RAR always uses the same kind of compression for a given type of comment, regardless of how suitable it is for a particular comment.

Format classes

After some research and testing, I decided it made sense to put RAR comment formats into five classifications, based on the version of the software that creates them.

Phase 1: v?.?? through v1.3x

I’ll start with the format used by the (beta) versions of RAR prior to v1.40, with v1.36 as the representative version. The early history of RAR was not so well preserved, so I don’t know exactly which public release was the first to use this format. It wasn’t v1.36.

The archive comment is stored in a special field in the “archive header” segment. There is a flag telling whether it is present. It is always non-compressed.

The file comment is stored in a special field in that member file’s “file header” (which is not highlighted in the diagram, but it’s the first part of the member file block). Other characteristics are the same as for the archive comment.

Size limits: The documentation (RAR.DOC) says that the size of a comment is limited to 16 KB. If you disregard the arbitrary rules, the theoretical limit is restricted by the size of the archive or file header, which can be at most 64 KB. A longer filename, for example, would leave less room for a file comment.

Phase 2: v1.4x

The next set of RAR versions is from v1.40, through whatever the last version was before v1.50.

The archive comment can now be compressed. I assume the compression scheme is the same as that used for data files. There is a header flag that tells whether it is compressed. The software supports reading non-compressed archive comments, but does not have a way to create them.

File comment: No change. Always non-compressed.

Size limits: The size limits are basically the same as in the previous version. The theoretical limit for a compressed archive comment would be based on its compressed size, as there is no field for the comment length after decompression.

Note: You may notice that the “compressed” comment is larger than it would be if it weren’t compressed. Yep, that’s the way it is sometimes.

Note: Though I performed the same actions to create the comments as in the previous version, the comments made by the previous version have an extra newline (0D 0A) at the end that is no longer present.

Phase 3: v1.5x through v2.xx

At RAR v1.50, a new and incompatible RAR format was introduced. A file is composed of segments called “blocks”, each with a type identifier.

A comment is stored in a block nested inside the archive header block, or a file header block. (Note: The “file header” block seems misnamed to me, because it’s best interpreted as containing the file’s data as well. Think of it as the file’s main block.) This might have been a regrettable design decision, as by v1.54 a more flexible way of storing large metadata items was introduced, using non-nested “subblocks”.

Archive and file comments can be compressed with any of RAR’s compression methods, or non-compressed. In practice, archive comments are compressed (with method 0x34: “good” compression), and file comments are non-compressed (method 0x30: “storing”).

Size limits: The documentation says a comment is limited to 16384 bytes. Theoretically, the comment can be up to 65535 bytes after decompression, and the compressed size can be at most a little less than 64 KB.

Phase 4: v3.00 through v4.xx

At v3.00, there were significant changes to the RAR file format, though it is structurally compatible with the previous version. The way that comments are stored is quite different.

A comment is stored in a “new subblock” that appears after the archive header, or after the file’s main block. The new subblock format is not documented very well, but it seems to be roughly the same as that of a file’s main block. Instead of a filename, a comment subblock has the name “CMT”.

Archive and file comments can be compressed with any of RAR’s compression methods, or non-compressed. In practice, archive and file comments are both compressed, now with method 0x33: “normal” compression.

Size limits: The documentation says that an archive comment can be up to 62000 bytes, and a file comment up to 32767 bytes. Theoretically, at least 2 GB or 4 GB is possible, and maybe far larger, using the “large file” feature.

Phase 5: v5.00 and up

RAR 5.00 introduced a new “RAR 5.0” format. While similar in overall structure to the previous version, it is completely incompatible. As I write this, the current software version is 6.11, which still uses the 5.0 format.

In RAR 5.0, the file comment feature was removed. If you really wanted to, there is a logical way to implement file comments, by analogy to how the previous version does it. But no one else would support it.

The archive comment is normally stored non-compressed. The format makes it possible to compress it, but I don’t think that’s supposed to be allowed.

Size limits: The documentation for v5.00 continues to say that the archive comment is limited to 62000 bytes. But from version 5.40 on, that’s changed to 256 KB. There is no theoretical limit to the comment length.

Notes: The comment stored in the file seems to end with an extra zero-valued byte. I don’t know why.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s