Thoughts on timestamps of computer files

Computer files can have a number of different kinds of timestamps. Some of them are stored in the file’s external metadata, alongside the file’s name — I’ll call these external timestamps. Others are stored inside the file itself — I’ll call these internal timestamps.

I use the term “timestamp” loosely. When something called a “timestamp” is set, the implication could be that it will only be set to the current time. But a few of the date/time data elements I’ll discuss don’t always work that way.

Kinds of timestamps

Last-modified time

The last-modified timestamp tells when the file’s contents were last modified or appended to.

This is the most universally supported timestamp, and the most likely one to have a correct and meaningful value.

Creation time

The creation time is when the file was originally created. Merely modifying the file’s contents doesn’t change this.

Creation time is kind of a strange idea. You could change everything about a file, probably even its name, and it will keep the same creation time.

It reminds me of the Ship of Theseus problem. That’s a thought experiment in which, over time, a famous object has every single one of its parts replaced. The philosophical question is whether the object is still the same object.

It’s not the same as the Ship of Theseus problem. With digital files, there may be no meaningful distinction between the original and a copy. And the modified file might not resemble the original in any way.

Access time

The access timestamp usually tells when the file’s contents were last read from or written to. A possible use for it is to help identify files that could be moved to a different (e.g. faster or slower) storage system.

Generally speaking, tracking every file’s access time is a terrible idea that slows down and wears out your computer for no reason.

Modern operating systems that maintain access times might do so only at a very low resolution. For example, they might update it only if it’s wrong by at least one day.

Attribute change time

Traditional Unix filesystems, and not a lot of other things, have an attribute change time. It tells when the file’s metadata (e.g. its access permissions or its name) was last changed.

It has the unfortunate abbreviation “ctime”, causing some people to assume it means “creation time” (which traditional Unix does not have).

Backup time

A backup timestamp could presumably be used by backup software, to help decide whether a file ought to included in an incremental backup. It’s common on some Macintosh filesystems.

In a sense, this exists on DOS/Windows; but it’s reduced to a 1-bit “archive” attribute that just tells whether the file has been modified since it was last backed up.

Effective and expiration time

The ISO 9660 filesystem (which was commonly used on CD-ROMs) allows each file to have an effective time and an expiration time, though this requires a rarely-used feature (Extended Attribute Records).

I don’t know why. Bits and bytes don’t normally expire. Though plenty of files contain time-sensitive data, with associated validity periods, external filesystem attributes is probably not a good place to put such information.

Incidentally, every ISO 9660 CD-ROM also has an expiration time field for the disc as a whole. It’s usually left blank, but I have seen some CD-ROMs that “expired”, say, 10 years after they were created. (I assume that ISO 9660 filesystem drivers generally don’t enforce this.)

More about internal timestamps

Access time, attribute-change time, and backup time usually don’t make sense as internal timestamps.

Creation time could actually work well as an internal timestamp, though only if the author maintains it manually. For a document that has multiple revisions, the creation time could indicate when the original document was created or released.

Last-modified time works fine as an internal timestamp. Sometimes it’s named “last saved time”. It doesn’t necessarily have to refer to the specific bits and bytes in the file; it could refer to its higher-level content.

While external timestamps need to be somewhat standardized, the possible kinds of internal timestamps are limited only by software designers’ imaginations. For example, some Microsoft Office formats contain the time the document was last printed (I don’t know why).

Archive formats

External timestamps can be found inside certain types of files, mainly those that can be classified as “archive” formats. That is, files that are designed to contain other files. An example is ZIP format.

The fact that such timestamps are inside a file does not turn them into internal timestamps. They are instead copies of external timestamps. When the internal files are extracted, their copied timestamps might be turned back into real external timestamps.

Copying files

When a program (or operating system facility) makes a copy of a file, it has to decide which external timestamps the copy will inherit from the original. There is no universal standard for this, though the operating system may have conventions for this.

Traditionally, on DOS/Windows, copying a file also copies the last-modified time. At least, I think that’s how DOS usually worked.

Traditionally, on Unix, when you copy a file, the copy gets a new last-modified time, reflecting the time the copy was made. The Unix way has the advantage that it works well with build systems that rely on the last-modified timestamps to know which files are out of date. But it has the disadvantage that it throws away information that might be useful.

On modern Windows, the copy’s creation and access time are set to the time the copy was made. So, the copy’s timestamps will likely suggest that it was last modified long before it was created. That seems a little odd, though I think it’s probably the best thing to do.

Format conversion

If you’re writing a program to convert a file from one format to another, you have to decide which timestamps to copy, and in what way.

I say you should probably not copy any of the external timestamps, at least not as external timestamps.

Internal timestamps probably should be copied/translated when possible, if it makes sense to do so.

Some file formats have only a single internal timestamp, documented in a way that doesn’t make it clear whether it’s supposed to behave like a last-modified time, or a creation time. Absent information to the contrary, I’d assume that such a timestamp behaves like a last-modified time, even if it’s documented as something like “the time this file was created”.

Suppose you’re converting a file without internal timestamps to a format that supports internal timestamps. Would it be correct to copy the source file’s external timestamps to the destination file’s internal timestamps? I don’t have a clear answer to that, but I would think twice about it, unless you have good reason to believe the external timestamps are accurate and relevant.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s