cpio is an archive file format, similar in concept to ZIP or tar. It combines multiple files (which I’ll call the “member files”) together in a single .cpio archive file. It is an old format, designed to be useful with magnetic tape drives.
Here’s a demonstration, using a Unix shell, and the GNU cpio utility. We’ll create some files for testing:
$ echo x > TA $ echo x > TB $ echo x > TZ
Make a filename list:
$ ls T* > filelist
Create the archive:
$ cpio -o --verbose < filelist > gnu.cpio TA TB TZ 1 block
Just to verify, let’s list the contents of the archive:
$ cpio -it < gnu.cpio TA TB TZ 1 block
Now we’ll do the same thing, but with an additional member file named “TRAILER!!!”, which in our filename list will sort between TB and TZ.
$ echo x > 'TRAILER!!!' $ ls T* > filelist2 $ cpio -o --verbose < filelist2 > gnu2.cpio TA TB TRAILER!!! TZ 1 block
So far, so good. But when we list the contents of the archive:
$ cpio -it < gnu2.cpio TA TB 1 block
the last two files have disappeared! We won’t be able to extract them, either. The “TRAILER!!!” file and all the files after it do exist in the gnu2.cpio file, but they are invisible to the cpio utility.
I also tested some other implementations of cpio, including afio, and the bsdcpio utility from the libarchive software. They have the same issue, in that they are unable to read certain archives that they themselves write.
The problem is that cpio format uses a special pseudo-file with the sentinel name “TRAILER!!!” to mark the end of the archive. Some sort of end marker is important, but I think it’s fair to say that this is a pretty dumb way to do it.
This issue could theoretically have security implications. Imagine that on a server, an untrusted user creates a file named “TRAILER!!!” that messes up the server’s backups. But it’s unlikely to be exploitable in reality, because:
- In a real cpio backup, filenames will almost certainly include directory paths. A filename of “/home/alice/TRAILER!!!” or “www/uploads/alice/TRAILER!!!” will not match the sentinel value, and will be harmless (at least with the cpio software I tested).
- The invisible files are safely stored in the archive. It will just take some extra effort to extract them.
- I assume that cpio is very rarely used these days.
It seems quite possible to write a cpio extractor that can heuristically detect whether an item named “TRAILER!!!” is a real file, versus an end-of-archive marker. For example, if the file mode or inode attribute is not zero, it might be a real member file. Or if it’s not the last item in the cpio file, it might be a real member file. But cpio is not a very strict format, and there are several different flavors of it, and different cpio utilities write trailer records that are a little different from each other. One heuristic you can’t use is to assume that it must be a real file if its size is nonzero. Some cpio utilites always give the trailer a “file size” of zero, but others put padding data inside it, giving it a nonzero size.