Let perl create your regex stringification

Perl 5.14 changes how regular expression objects stringify. This might not seem like a big deal at first, but it exposes a certain sort of bug that you may have never considered. It even broke several modules on CPAN. If you previously tested for hard-coded stringifications of patterns, Perl 5.14 is probably going to break your code.

This happened to Test::Output. I maintain the module, although I didn’t write it. That’s okay, though, because the fix is simple. The module is fine, it’s just the tests that need some minor adjustments because they checked for hard-coded patterns like (?-xism:out):

check_test( sub {
	stdout_like(sub {
				print "TEST OUT\n";
			  },
			  qr/out/,
			  'Testing STDOUT'
			)
	},{
	  ok => 0,
	  name => 'Testing STDOUT',
	  diag => "STDOUT:\nTEST OUT\n\ndoesn't match:\n(?-xism:out)\nas expected\n",
	},'STDOUT not matching failure'
  );

The qr/out/ and the (?-xism:out) in that code are really the idea, although they are disconnected because they are separate values. Perl doesn’t know that they are logically the same idea in different forms, so if one changes, the other doesn’t.

The problem is that Perl 5.14 adds several new pattern modifiers (see Know the difference between regex and match operator flags). These extra options change the way regexes stringify, but to mitigate that for future releases, Perl 5.14 is going to specify the options in a new way, with the new (?^:) sequence. It’s available starting with Perl 5.13.6.

Before Perl 5.14, the output for regex stringification showed the unenabled pattern modifiers:

$ perl5.6.2 -le 'print qr/Buster|Mimi/;'
(?-xism:Buster|Mimi)

$ perl5.8.9 -le 'print qr/Buster|Mimi/;'
(?-xism:Buster|Mimi)

$ perl5.10.1 -le 'print qr/Buster|Mimi/;'
(?-xism:Buster|Mimi)

$ perl5.12.2 -le 'print qr/Buster|Mimi/;'
(?-xism:Buster|Mimi)

Perl added the (?-xism: ) to the pattern that you specified. The compiled pattern has to carry those with it since it might be combined with other regular expressions. Those options only apply to parts of the pattern:

my $cat = qr/B u s t e r/x;
my $dog = qr/Addy/;

my $combined = qr/$cat|$dog/;

print "$combined\n";

The combined regex applies different options to different portions of the pattern:

(?-xism:(?x-ism:B u s t e r)|(?-xism:Addy))

Each of these show that the x, i, s, and m options are off. Change the regex with some modifiers and you change the stringification, sometimes in odd looking ways:

$ perl5.12.2 -le 'print qr/Buster|Mimi/i;'
(?i-xsm:Buster|Mimi)

$ perl5.12.2 -le 'print qr/(?x:Buster|Mimi)/;'
(?-xism:(?x:Buster|Mimi))

$ perl5.12.2 -le 'print qr/(?x:Buster|Mimi)/i;'
(?i-xsm:(?x:Buster|Mimi))

$ perl5.12.2 -le 'print qr/(?x:Buster|Mimi)/ix;'
(?ix-sm:(?x:Buster|Mimi))

$ perl5.12.2 -le 'print qr/(?x:Buster|Mimi) Bean/ix;'
(?ix-sm:(?x:Buster|Mimi) Bean)

The outer set is for the options you set on the entire pattern, and the inner set is for the options you enabled in the pattern itself.

Every one of those patterns includes only the options that those versions of Perl though applied to the pattern, even if you had not enabled them. That means that any new options would change the stringification because the pattern would denote that they were also not enabled:

$ fakeperl-5.14 -le 'print qr/Buster|Mimi/'
(?^-xismudal:Buster|Mimi)

Instead, the (?^:) sequence denotes just the options that you enabled and assumes that you left the rest unenabled:

$ perl5.13.6 -le 'print qr/Buster|Mimi/i'
(?^i:Buster|Mimi)

$ perl5.13.6 -le 'print qr/(?x:Buster|Mimi)/i'
(?^i:(?x:Buster|Mimi))

$ perl5.13.6 -le 'print qr/Buster|Mimi/ix'
(?^ix:Buster|Mimi)

$ perl5.13.6 -le  'print qr/(?x:Buster|Mimi)/ix;'
(?^ix:(?x:Buster|Mimi))

You don’t have to care about the particulars of stringification though, if you let Perl stringify it for you. That way, it doesn’t matter which version of Perl you are using:

my $regex = qr/out/;

check_test( sub {
	stdout_like(sub {
				print "TEST OUT\n";
			  },
			  $regex,
			  'Testing STDOUT'
			)
	},{
	  ok => 0,
	  name => 'Testing STDOUT',
	  diag => "STDOUT:\nTEST OUT\n\ndoesn't match:\n$regex\nas expected\n",
	},'STDOUT not matching failure'
  );

This is similar to Item 59. Compare reference types to prototypes., where you let perl tell you which string values represent the reference types so you never have to know (or care) what the actual values are. Now you let perl tell you what it thinks the regex stringification is so you dont’ get caught out when that representation changes.

Leave a comment

0 Comments.

Leave a Reply


[ Ctrl + Enter ]

7ads6x98y