Use a variable-width lookbehind if it won’t match more than 255 characters

In Ignore part of a substitution’s match, I showed you the match resetting \K—it’s basically a variable-width positive lookbehind assertion. It’s a special feature to work around Perl’s lack of variable-width lookbehinds. However, v5.30 adds an experimental feature to allow a limited version of a variable-width lookbehind.

In that post, I used the example of matching variable whitespace at the beginning of a line but not replacing it:

$_ = "    buster";      # lowercase!
s/\A\s+\K\S+\z/\u$&/;
print "Line |$_|";      # Line |    Buster|

Before v5.30, a positive lookbehind fails to compile:

$_ = "    buster";      # lowercase!
s/(?<=\A\s+)\S+\z/\u$&/;
print "Line |$_|";      # Line |    Buster|

Running this on v5.28 fails:

$ perl5.28.0 vlb.pl
Variable length lookbehind not implemented
in regex m/(?<=\A\s+)\S+\z/ at ...

Running this on v5.30 gives a different error:

$ perl5.30.0 vlb.pl
Lookbehind longer than 255 not implemented
in regex m/(?<=\A\s+)\S+\z/ at ...

That's the limitation. The regex engine needs to know in advance that the length of the subpattern won't be longer than 255. The + has an indeterminate length. Instead of the +, use the generalized quantifier so you can specify a maximum number—255 levels of indent should be enough for anyone:

use v5.30;
$_ = "    buster";      # lowercase!
s/(?<=\A\s{1,255})\S+\z/\u$&/;
print "Line |$_|";      # Line |    Buster|

This works because static analysis can tell that the pattern cannot match more than 255 characters. You do get a warning:

Line |    Buster|
Variable length lookbehind is experimental in regex;
marked by <-- HERE in m/(?<=\A\s{1,255})\S+\z <-- HERE / at ...

Turn that experimental warning like you would other experimental warnings:

use v5.30;
no warnings qw(experimental::vlb);
$_ = "    buster";      # lowercase!
s/(?<=\A\s{1,255})\S+\z/\u$&/;
print "Line |$_|";      # Line |    Buster|

There's one more thing to consider though. Some characters turn into multiple characters with case folding, as you read in Fold cases properly with ß (U+00DF LATIN SMALL LETTER SHARP S) that turns into ss. If you use /i for case insensitivity, Perl knows that this happens and counts the final number of characters in the 255 limit.

use v5.30;
no warnings qw(experimental::vlb);
$_ = "    buster";      # lowercase!
s/(?<=\A\s{1,253}ß)\S+\z/\u$&/i;
print "Line |$_|";      # Line |    Buster|

All of this works for either positive or negative lookbehinds.

Things to remember

  • Before v5.30, you could not have variable-width lookbehinds
  • v5.30 adds limited support for variable-width lookbehinds
  • The lookbehind subpattern must not be able to match more than 255 characters
  • If you can't determine the length of the sub pattern match, you can still use \K
Leave a comment

0 Comments.

Leave a Reply


[ Ctrl + Enter ]