Use the /r substitution flag to work on a copy

How many times has this happened to you? You want to modify each element of an array so you send it through a map (Item 20. Use foreach, map, and grep as appropriate.). However, instead of your expected output, you only to get a bunch of numbers or empty strings back? For example, in this case, some digits got into the names of the cats and you want to remove them with a substitution:

my @in  = qw( Bu1s5ter Mi6mi Roscoe Gin98ger El123la );

my @out = map { s/\d+//g } @in;

print "in: @in\nout: @out\n";

The output isn’t a list of cat names though. Not only that, but the input array has changed!

in: Buster Mimi Roscoe Ginger Ella
out: 2 1  1 1

Since your substitution acted on $_, it was really acting on an alias instead of a copy (Item 114. Know when arrays are modiļ¬ed in a loop.). Not only that, you forgot that the s/// returns the number of substitutions it made.

Don’t feel bad about that though. Most everyone has this problem, even to the point where Perl 5.14 is going to add a feature to mitigate it. But, since Perl 5.14 isn’t due for another half a year, here’s how you do it correctly now. You assign the current list element to a new variable, $s, perform the substitution, and finally return the new value. The result of the map is the last evaluated expression in its block:

my @in  = qw( Bu1s5ter Mi6mi Roscoe Gin98ger El123la );

my @out = map { my $s = $_; $s =~ s/\d+//g; $s } @in;

print "in: @in\nout: @out\n";

Now your output is what you probably wanted. The original array is the same as when it started and the output list has the denumified cat names:

in: Bu1s5ter Mi6mi Roscoe Gin98ger El123la
out: Buster Mimi Roscoe Ginger Ella

That’s all pretty ugly though, and the Perl developers finally got tired of it. Remember, Perl’s motto is to make the common things easy. If everyone’s making the same mistake trying to force an idiom that just doesn’t work, should we change the people or change Perl? Well, good luck with the first one.

Perl 5.14 introduces (will introduce) the /r flag for the substitution operator. Until Perl 5.14 is official, you’ll have to use the experimental track, Perl 5.13, as a preview (Item 110. Compile and install your own perls.).

The /r does everything you did in the correct map without you having to do it yourself. That is, it tells the substitution operator to work on a copy, leave the original, bound variable alone, and return the changed value instead of the number of substitutions. Another way to say that is that /r is a non-destructive substitution. Now your code looks almost like the first, albeit incorrect, example, but this time it works. You have to specify the version of Perl to enable the new feature (Item 2. Enable new Perl features when you need them.), but the meaty part is different by one character, the r at the end of the substitution operator:

use 5.013;

my @in  = qw( Bu1s5ter Mi6mi Roscoe Gin98ger El123la );

my @out = map { s/\d+//gr } @in;

print "in: @in\nout: @out\n";

Again, the output is as it should be, with the original array unchanged and the output array with your expected values:

in: Bu1s5ter Mi6mi Roscoe Gin98ger El123la
out: Buster Mimi Roscoe Ginger Ella

You might have this same problem when you need to create a new filename based on an existing one so you can use both of them at the same time, perhaps in a rename. Here’s the pre-Perl 5.14 version, where you have to copy the variable $old to $new first, then perform the substitution on $new:

my @files = qw(
	/Users/buster/images/mice/wooly.JPG
	);
	
foreach my $old ( @files ) {
	( my $new = $old ) =~ s/\.JPG\z/.jpg/;
	print "old: $old\nnew: $new\n\n";
	rename $old => $new;
	}

This is an odd idiom if you’re only seeing it for the first time. That parentheses around the assignment ensure that the assignment happens first. The result of the assignment is $new, so that’s the target for the binding operator (=~). The output shows that you still have both versions of the filename:

old: /Users/buster/images/mice/wooly.JPG
new: /Users/buster/images/mice/wooly.jpg

If you take away the parentheses, things happen in a different order because the binding operator has a higher precedence (see the table in perlop):

my @files = qw(
	/Users/buster/images/mice/wooly.JPG
	);
	
foreach my $old ( @files ) {
	my $new = $old =~ s/\.JPG\z/.jpg/;
	print "old: $old\nnew: $new\n\n";
	rename $old => $new;
	}

The output has the same unexpected results as the map example. You changed the original and got back a number instead:

old: /Users/buster/images/mice/wooly.jpg
new: 1

With Perl 5.14’s /r, you can leave off the parentheses but still get the right value:

use 5.013;

my @files = qw(
	/Users/buster/images/mice/wooly.JPG
	);
	
foreach my $old ( @files ) {
	my $new = $old =~ s/\.JPG\z/.jpg/r;
	print "old: $old\nnew: $new\n\n";
	rename $old => $new;
	}

This might look a little cleaner is you use $_ implicitly (Item 15. Use $_ for elegance and brevity.). That way you can leave off the binding operator:

use 5.013;

my @files = qw(
	/Users/buster/images/mice/wooly.JPG
	);
	
foreach ( @files ) {
	my $new = s/\.JPG\z/.jpg/r;
	print "old: $_\nnew: $new\n\n";
	rename $_ => $new;
	}

You don’t have to use this new feature in a looping construct, though. The /r works the same way without it.

Things to remember

  • Changing $_ in a looping construct affects the original values
  • Normally, the substitution operator returns the number of substitutions (or the empty string).
  • Perl 5.14 introduces the /r flag for the subsitution operator
  • The /r works on a copy of the bound value and return the changed value.
Leave a comment

2 Comments.

  1. The main problem with s///r is that it changes s/// completely, it should really have been made r/// or similar, as it is a different function.

    Probably the easiest way to do without it is something like:

    s/foo/bar/ for my $new = $old;
    s/foo/bar/ for my @new = @old;
    
    • Why create a completely separate operator that shares all of the same behavior except for that one thing? Behind the scenes it would use the same code, so why document it twice and make the programmer think it’s something different? The m// operator already returns different things depending on what you are doing, too.

Leave a Reply to brian d foy


[ Ctrl + Enter ]