In-place editing gets safer in v5.28

In-place editing is getting much safer in v5.28. Before that, in rare circumstances it could lose data. You may have never noticed the problem and even with all the times I’ve explained it in a Perl class I haven’t really thought about it. This was first reported as early as December 2002 and after we get v5.28 it won’t be a problem anymore.

In the age of Perl one-liners people would write complex text transformations on the command-line. I wasn’t quite into that since I’m not precise enough to type a complicated program left-to-right at a prompt but I’ve seen people who are. The -p switch wraps some default code around the string argument to -e. You can see the difference by deparsing the code. First just a program from the command-line. This program simply prints the input line number when it’s run (but you’re not running it):

$ perl -MO=Deparse -e 'print qq($. )'
print $.;

When you add the -p switch there’s much more code. Now it reads input from the files on the command line and outputs the line numbers to standard output. There’s an attached continue block. That’s always evaluated right before the conditional is checked again. In that block the -p prints the current value of $_:

$ perl -MO=Deparse -p -e 'print qq($. )'
LINE: while (defined($_ = readline ARGV)) {
    print "$. ";
}
continue {
    die "-p destination: $!\n" unless print $_;
}

In-place editing is a nifty feature where the output ends up in the filename that it came from. The -i switch turns on this feature by using the $^I special variable. This redirects standard output into a filehandle connected to the same filename:

$ perl -MO=Deparse -p -i -e 'print qq($. )' *.txt
BEGIN { $^I = ""; }
LINE: while (defined($_ = readline ARGV)) {
    print "$. ";
}
continue {
    die "-p destination: $!\n" unless print $_;
}

The -i can take an argument, which will be used as the file extension for a backup file:

$ perl -MO=Deparse -p -i.old -e 'print qq($. )' *.txt
BEGIN { $^I = ".old"; }
LINE: while (defined($_ = readline ARGV)) {
    print "$. ";
}
continue {
    die "-p destination: $!\n" unless print $_;
}

Before v5.28, the -i did a few things to make this work. With no argument it opened the file then unlinked the name. That gives it access to the data but it’s no longer in a the abstract idea of file that you’ll see in a directory. It opens a new file with the same name and outputs to that new file. If you put a sleep in your code you’ll have a chance to look at the directory listing before the process ends:

$ ls -si
8593876032 88 pg27609.txt

$ perl5.26.1 -pi -e 'BEGIN { print $^V } sleep 100' *.txt &
v5.26.1

$ ls -si
8593876937 0 pg27609.txt

The -s switch shows the file size in blocks. The -i switch for ls shows the inode number (Don’t know this stuff? I like Inodes – an Introduction). An inode is the meta data for stuff you store on a disk. A filename is a label a directory gives an inode (and more than one directory can link to an inode hence Perl’s link and unlink). I’ll keeping saying “label” so you don’t have the bias of “filename”. I’m going to keep saying “inode” even though it’s unix jargon. Other filesystems do similar things with their own jargon.

Notice that the second ls shows that the same name points to a different inode. It’s size is also 0; it’s a new and different inode and you haven’t output anything to that yet. So, “in place” editing is a bit of a misnomer.

With an argument the -i relabels the inode (renames the file) based on the original label with that argument added to the end. You can see the inode of the data before the the perl one-liner and that the same inode later has the original label with .old added. There’s also a new inode with the original label but no contents yet. The pg27609.txt before and the pg27609.txt.old after have the same inode number:

$ ls -si
8593874285 88 pg27609.txt

$ perl -pi.old -e 'BEGIN { print $^V } sleep 100' *.txt &
v5.26.1

$ ls -si
8593874356 0 pg27609.txt
8593874285 88 pg27609.txt.old

What could go wrong with this approach?

In the first case, what happens if you can’t get through all of the data? Maybe perl panics, your machine loses power, you fill up the disk, or something else bad. You’ve already unlinked the original file. You lose data (unless there’s another link somewhere). In the second case, you still have the backup file but the final file is incomplete. Various problems are noted in RT #19333, RT #57512, and RT #127663.

This problem shows up if your program exits with a true value (so, not the 0 that means normal execution). If you exit by die-ing or by using a true value with exit, the output is incomplete and the original data are gone:

$ ls -si
8593878186 88 pg27609.txt

$ perl5.26.1 -pi -e 'BEGIN { print $^V } die if $. = 10' *.txt
Died at -e line 1, <> line 10.
v5.26.1

$ ls -si
8593878227 0 pg27609.txt

In either case the original label points to incomplete data. Maybe something else freaks out because it’s incomplete state is invalid XML or something similar. Maybe that incomplete state turns a big number into a small one because the rest of the characters are still in the output buffer. There’s a race condition here; with an empty configuration file I might be allowed to do something should have been forbidden.

If I were creating such a feature for a production system I would have already thought to completely create the new file, check the new file, then move it into place if it worked out.

In the latest development version, v5.27, the -p and -i has changed. Run the same command again and look at the directory listing. The original label still connects to the data. There’s a temporary label to collect the data. At the end of the process that inode is relabeled with the original label:

$ ls -si
8593874476 88 pg27609.txt

$ perl5.27.7 -pi.old -e 'BEGIN { print $^V } sleep 100' *.txt &
v5.27.7

$ ls -si
8593874538 0 Cr1uEizj
8593874476 88 pg27609.txt

There are a few gotchas here. If you change directories during this process the relabeling will probably fail. Trying this across threads (or trying anything across threads) will likely fail.

Leave a comment

3 Comments.

  1. Amazing how that survived for 30 years

  2. Anthony DeRobertis

    Odd changing directory would make the relabeling fail — shouldn’t this be fairly easily prevented by using renameat/renameat2 instead of rename?

  3. The first deparse seems odd because it’s losing a space character and forcing the expansion in double quotes. This is what I get in 5.20.2:

    $ perl -MO=Deparse -e ‘print qq($. )’
    print “$. “;
    -e syntax OK

Leave a Reply


[ Ctrl + Enter ]