Use a smart match to match several patterns at once

The smart match operator (Item 23. Make work easier with smart matching) reduces many common comparisons to a few keystrokes, keeping with Perl’s goal of making the common things easy. You can use the smart match operator to make even less common tasks, such as matching many regular expressions at the same time, just as easy. This Item shows you how to use the smart match to see if at least one of a series of regexes matches a string.

Before Perl 5.10, you had to do quite a bit of work to accomplish this. The perlfaq6 entry for “How do I efficiently match many regular expressions at once?” shows a couple of examples of what you could have done. For instance, you could have all of the patterns that you want to match in an array then iterate through the array:

my @patterns = qw< foo ba(r|z) quux >;

LINE: while( <INPUT> ) {
	foreach my $pattern ( @patterns ) {
		if( /$pattern/i ) {
			print;
			next LINE;
			}
		}
	}

You use next to skip the rest of the patterns once you find a matching one. The task is slightly different if you want to find which patterns matched. In that case you have to always go through all of them.

You can improve on this code slightly by making regular expression objects (Item 40. Pre-compile regular expressions):

my @patterns = map { qr/$_/i } qw( foo ba(r|z) quux );

LINE: while( <INPUT> ) {
	foreach my $pattern ( @patterns ) {
		if( /$pattern/ ) {
			print;
			next LINE;
			}
		}
	}

You can even do it without the map if you like so you have more control over each pattern:

my @patterns = (qr/foo/, qr/ba(r|z)/, qr/quux/);

You’ll come back to this in a moment.

A note on short circuiting

If you need to try all of the patterns and check side effects at each step, you don’t want to use the smart match code in this Item.

Short-circuiting has consequences. The common case is the values of capture buffers. If more than one pattern can match, only the first one will set capture buffers. If you don’t know what which pattern matched you might have a tough time figuring out where the values in the capture buffers came from.

There is another case, which you may hope is rare (but even the rare, multi-million dollar mistake is still a multi-million dollar mistake). If the patterns have side effects that you care about and not all of the patterns run, some side effects won’t affect sidely.

This series of matches has two patterns that will match foo. The map inserts a (?{...}), the regular expression sequence the evaluates some Perl code during a match:

use 5.010001; # smart matching broken in 5.010000
use re 'eval';

my $matches;

my @patterns = map { 
	state $position = 0;
	$position++;

	qr/$_(?{ print "Matched $_\n"; $matches++ })/i 

	} qw( foo ba(r|z) quux fo);


LINE: while(  ) {
	chomp;
	foreach my $pattern ( @patterns ) {
		if( /$pattern/ ) {
			next LINE;
			}
		}
	}

say "Matches are $matches";

That should be a rare case, but that doesn’t make it any less fragile. Again, if you need all the patterns to at least try to match, you don’t want to use the code in this Item.

Double smart matching

The smart match operator knows how to handle regular expressions and it already knows how to match a scalar against an array. You probably already know that you can see if a scalar is an element of an array using the smart match:

my @strings = qw( Buster Mimi Ginger Ella );
$_ ~~ @strings;

The smart match knows what to do based its arguments. If there’s a simple scalar on each side, it does either a string or number equality comparison. That’s as far as many people think about it though. When the smart match looks at the elements in the array, it does a smart match with the scalar and each element in the array. That is, the comparison operator for any element can be different from any other.

Here’s a smart match against an array, @tests, that has several different types of elements, including a string, a regex object, a subroutine reference, and an anonymous array:

use 5.010001;

my $sub = sub {
	say "\tSub argument is $_[0]";
	return 1 if $_[0] eq 'Roscoe';
	return 0;
	};
	
my @tests = (
	qr/Buster/,
	$sub, 
	'Mimi',
	[ qw(Ginger Ella) ],
	);
	
foreach ( qw(Buster Roscoe Gumdrops Ella) ) {
	say "Trying $_";
	when( @tests ) { say "\tMatched!" }
	}

The smart match looks at the first element in @tests, sees that it’s a regex object and tries a pattern match. If that fails, it looks at the second element, which is a subroutine reference. It passes $_ as the argument to the subroutine, which can then do anything it likes. If the subroutine reference returns true, the smart match succeeds. You can see the non-Buster values go through the subroutine on the way to subsequent checks:

Trying Buster
	Matched!
Trying Roscoe
	Sub argument is Roscoe
	Matched!
Trying Gumdrops
	Sub argument is Gumdrops
Trying Ella
	Sub argument is Ella
	Matched!

Notice that Ella matched. The final element in @tests is another array reference, so the smart match goes through the elements of that second-level array testing each of those items with another series of smart matches. Neat, huh?

If you have Perl 5.12, you can use when as a statement modifier:

use 5.012;

foreach ( qw(Buster Roscoe Gumdrops Ella) ) {
	say "Trying $_";
	say "\tMatched!" when( @tests );
	}

But, back to regexes.

Matching many regexes at once

Putting everything together, to check if any pattern matches, you
only need to construct the patterns, put them in an array, and match against them. That’s only a couple of lines of code:

use 5.010001;

my @patterns = map { qr/$_/i } qw( foo ba(r|z) quux );
if( $string ~~ @patterns ) {
	...;
	}

The smart match tries each pattern until it finds one that matches. If it finds a match, it returns true without trying the rest of the regexes.

Things to remember

  • A smart match against an array can try many types of comparisons
  • A smart match against an array short-circuits
Leave a comment

4 Comments.

  1. Just a note (as I found out and posted on perlmonks.org), this doesn’t work for 5.10.0 but does for 5.10.1 and newer.

    Guess I better convince work to upgrade.

    • Yes, smart matching on Perl 5.10.0 is broken. You shouldn’t use smart matching with that version. However, there’s also a wider point for the effective Perler here. Don’t use .0 point releases. It’s not until real people start using these releases that the developer (perl or anything else) see the errors of their ways and fix them with a .1 release. :)

      I’ve updated the use 5.010001 statements to reflect that, which I should have done previously.

  2. I wish the smart operator would test if a key exists before testing equality when given something like $hash{$key} on one side. I still use ‘exists $hash{$key} and $val ~~ $hash{$key}’ instead of just $val ~~ $hash{$key} because it speeds up the case where $hash{$key} doesn’t exist by a significant margin.

    • You can suggest that as an enhancement by filing a bug with the perlbug utility (that sounds like a good topic for another Item). Maybe one of the perl developers will see that and implement it. :)

Leave a Reply


[ Ctrl + Enter ]

7ads6x98y