Set default regular expression modifiers

Are you tired of adding the same modifiers to all of your regular expressions? For instance, if you might always add the /u modifier to turn on Unicode semantics on all of your patterns, including qr//, m//, and s///. Instead of remembering to do that to every pattern, the re that ships with Perl 5.14 now lets you do that for all patterns in the current lexical scope. You can also turn off a modifier for the rest of the scope.

You can use any modifier that affects the pattern, but not the modifiers that affect the operator (see Know the difference between regex and match operator flags). Try an example with an easier modifier. The /i modifier makes the pattern case insensitive. Instead of adding that for all of your match operations, you use the re pragma’s flags mode. In this example, you use the Test::More module to experiment with new ideas.

First, write some tests that you expect to fail. Since the pattern is all lowercase, but the target string has an uppercase letter, these should fail:

use 5.014;
use Test::More;

like( 'Buster', qr/buster/, 'Buster matches with qr//' );
ok( 'Buster' =~ m/buster/, 'Buster matches with m//' );
done_testing();

And they do fail. That’s a good thing, because you want to magically make them pass by adding a default modifier:

not ok 1 - Buster matches with case insensitivity
not ok 2 - Buster matches with m//
1..2
#   Failed test 'Buster matches with case insensitivity'
#                   'Buster'
#     doesn't match '(?^u:buster)'
#   Failed test 'Buster matches with m//'
# Looks like you failed 2 tests of 2.

Now, add the default modifiers. You add those through the import list for the re module. The list of modifiers starts with a slash to distinguish it from other imports. To make the /i the default, that’s exactly what you import:

use 5.014;
use Test::More;

use re '/i';
like( 'Buster', qr/buster/, 'Buster matches with case insensitivity' );
ok( 'Buster' =~ m/buster/, 'Buster matches with m//' );

done_testing();

Now the tests pass because they are case insensitive:

ok 1 - Buster matches with case insensitivity
ok 2 - Buster matches with m//
1..2

These default modifiers are only lexically scoped, and that’s how you should use them. You don’t want to change more than you intend, and the next programmer who comes along might not realize that you set the default modifiers at the top of the file. Try it with a lexical scope to check that it’s limited to that scope (see Know what creates a scope):

use 5.014;
use Test::More;

SCOPE: {
	use re '/i';
	like( 'Buster', qr/buster/, 'Buster matches with case insensitivity' );
	ok( 'Buster' =~ m/buster/, 'Buster matches with m//' );
	outside_scope();
	}

sub outside_scope {
	unlike( 'Buster', qr/buster/, 'Buster does not match with case insensitivity' );
	ok( !( 'Buster' =~ m/buster/ ), 'Buster does not match with m//' );
	}

done_testing();

Now that you see that the default modifier is limited to the lexical scope.

ok 1 - Buster matches with case insensitivity
ok 2 - Buster matches with m//
ok 3 - Buster does not match with case insensitivity
ok 4 - Buster does not match with m//
1..4

So far, you’ve used only one modifier as the default, but you can stack them just like you would with qr// or m// or s///. Suppose you want to turn on both /i and /s at the same time so you get case insensitivity and let the . match a newline:

use 5.014;
use Test::More;

use re '/is';
like( "Bu\nter", qr/bu.ter/, 'Bu\\nter matches with case insensitivity' );
ok( 'Buster' =~ m/bu.ter/, 'Buster matches with m//' );

done_testing();

Both of those work their magic as default values:

ok 1 - Bu\nter matches with case insensitivity
ok 2 - Buster matches with m//
1..2

You don’t have to stack them, though. You can specify them separately and it works just as well, although each group must start with a /:

use re '/i', '/s';

use re qw(/i /s);

Turn off default modifiers

Once turned on, these modifiers apply to all the patterns in the pragma’s scope, but if you don’t want an enabled modifier in a pattern. Suppose, for instance, that one part of the pattern absolutely should not be case insensitive. You can (?^:) sequence to turn off modifiers for a subpattern:

use 5.014;
use Test::More;
use re '/i';

foreach my $string ( qw(Buster bUSTER buster BuStEr) ) {
	say "$string matches" if  $string =~ /(?^:B)uster/;
	}

The output shows that only the strings starting with an uppercase B match because the (?^:B) portion turns off all modifiers for that subpattern. You should consider using (?^:) if you are also going to have default flags.

Things to remember

  • Set default regular expression modifiers with the re pragma.
  • You can only use modifiers that apply to the pattern, not the operator.
  • You can stack multiple modifiers in a single import string, such as /is.
  • Turn off modifiers for a subpattern with (?^:).
Leave a comment

0 Comments.

Leave a Reply


[ Ctrl + Enter ]

7ads6x98y