Detect regular expression match variables in your code

[UPDATE: this is not a problem in v5.18 and later.]

In Item 33: “Watch out for match variables”, you found out that the match variable $`, $&, and $` come with a performance hit. With all of the module code that you might use, you might be using those variables even though you didn’t code with them yourself.

There’s a module that can tell you if anything in your program used one of these nasty variables. The Devel::SawAmpersand module uses the B backend to look for any compiled Perl code that deals with the internal sawampersand variable that, when true, automatically slows down all of your matches and substitutions. By inserting a couple of lines in at the top of your code, you can tell if Perl ran into sawampersand:

use Devel::SawAmpersand qw(sawampersand);
END { print "saw ampersand => ", sawampersand(), "\n" }

my $string = 'cat bird dog';

$string =~ m/\s*bird\s*/;

print <<"HERE";
\$`  =>  $`
\$&  =>  $&
\$'  =>  $'
HERE

The output shows you that it did indeed see at least one of those variables:

$`  =>  cat
$&  =>   bird 
$'  =>  dog
saw ampersand => 1

That’s not very useful because it doesn’t tell you where it found the variables, and you had to change the source to find out. You really want to do this without changing the source and also have it tell you in which files and on which line numbers those variables appear. That’s what Devel::FindAmpersand does:

my $string = 'cat bird dog';

$string =~ m/\s*bird\s*/;

print <<"HERE";
\$`  =>  $`
\$&  =>  $&
\$'  =>  $'
HERE

Now I run the program by loading the Devel::FindAmpersand module on the command line (in this case, reporting the line number where the string starts):

$ perl5.10.1 -MDevel::FindAmpersand amp.pl
$`  =>  cat
$&  =>   bird 
$'  =>  dog
Found evil variable $` in file amp.pl, line 5
Found evil variable $& in file amp.pl, line 5
Found evil variable $' in file amp.pl, line 5

That’s fine for finding the variables in the same file, but what if they are in a different module? Devel::FindAmpersand only reports what it finds in the main script. Here’s a script that pulls in a library that uses one of the match variables:

require 'uses-amp.pl';

my $string = 'cat bird dog';

$string =~ m/\s*bird\s*/;

print "Pre-match is $PREMATCH\n";

Here’s the tiny culprit library:

# this is a naughty module

my $matched = $&;

1;

Devel::FindAmpersand doesn’t complain though, since the $& doesn’t appear in the main file:

$ perl -MDevel::FindAmpersand amp.pl
Pre-match is cat

Since you’re really only doing this during development and probably infrequently, you can rig a solution to search through any loaded files. Create a small module that uses an END block to go through all of the files listed in %INC and examine each individually:

# ScanAmpersand.pm
END {
	foreach my $file ( values %INC ) {
		system $^X, '-MDevel::FindAmpersand', $file;
		}
	}
	
1;

When you use your ScanAmpersand, you get warnings from any files that used one of the variables:

$ perl -MScanAmpersand amp.pl
Pre-match is cat
Found evil variable $& in file uses-amp.pl, line 3

Workarounds

Now that you’ve found the naughty uses of $`, $&, and $`, you need to fix up the code to remove them. If you are using Perl 5.10 or later, use the per-match variables instead (Item 33: Watch out for match variables). If you are using a version earlier than Perl 5.10, you might modify the regular expression to explicitly capture parts of it. The Devel::SawAmpersand documentation gives you these possibilities:

Naughty Nice
$` of /pattern/ $1 of /(.*?)pattern/s
$& of /pattern/ $1 of /(pattern)/
$' of /pattern/ $+ of /pattern(.*)/s

Things to remember

  • The match variables $`, $&, and $` suffer performance hits
  • You don’t have to use them directly to suffer
  • Use Devel::FindAmpersand to trach down their use.

2 thoughts on “Detect regular expression match variables in your code”

  1. I’ve been meaning to do this for a while. Your post has finally spurred me into action… the next version of NYTProf will report the place that it first noticed that the slow match variables had been seen. (Typically that’ll be the file that uses one of them.)

Comments are closed.