Ignore part of a substitution’s match

Normally, a substitution replaces everything it matched, but v5.10 adds a feature that allows you to ignore part of the match. The \K excludes from $& anything to its left. This feature has already made it into PCRE. It doesn’t have an official name, so I’ll call it the match reset operator because it resets the start of $&.

Start with the match operator. This pattern matches the entire string, being anchored at the absolute beginning and end of the string:

use v5.10;
$_ = 'Buster and Mimi';
print "Matched <$&>" if /\ABuster and \K\S+\z/;

The output doesn’t show the entire part of the string that actually matched. You only see the part past the \K:

$ perl slash-k.pl
Matched <Mimi>

Now apply that to the substitution operator, where the right side replaces $&. Although this pattern needs to have Buster and at the front, you don’t want to replace that part. It’s just part of locating the right part of the string:

use v5.10;
$_ = 'Buster and Mimi';
s/\ABuster and \K\S+\z/Ginger/;
print;   # Buster and Ginger

You can also do this with a positive lookbehind assertion since that doesn’t actually match characters, so that subpattern doesn’t show up in $&:

$_ = 'Buster and Mimi';
s/(?<=\ABuster and )\S+\z/Ginger/;
print; # Buster and Ginger

The lookbehind has a limitation though. It can’t be variable width (update: in v5.30 there’s an experimental feature that allows that).

$_ = 'Buster and Mimi';
s/(?<=\A\S+ and )\S+\z/Ginger/;
print; # Buster and Ginger

This doesn’t work and gives a warning:

$ perl slash-k.pl
Variable length lookbehind not implemented in
regex m/(?<=\A\S+ and )\S+\z/

The \K however, handles the variable width just fine:

use v5.10;
$_ = 'Buster and Mimi';
s/\A\S+ and \K\S+\z/Ginger/;
print;

Now here’s a more likely use for it. Match the whitespace at the beginning of a line, no matter how much there is, but don’t replace it. That is, don’t match lines that don’t start with whitespace. You might try this, but you have to remember $1 and $2:

$_ = "    buster";
s/\A(\s+)(\S+)\z/$1\u$2/;
print "Line |$_|";

With \K, both sides are a bit simpler:

use v5.10;
$_ = "    buster";      # lowercase!
s/\A\s+\K\S+\z/\u$&/;
print "Line |$_|";      # Line |    Buster|

Add /x to make it even more apparent:

$_ = "    buster";      # lowercase!
s/
	\A \s+ \K           # ignore this part for $&
	\S+
	\z
/\u$&/;
print "Line |$_|";      # Line |    Buster|

=head2 Don’t use this in lookarounds

Before v5.32, you could use \K inside lookaround assertions, but what it does in that case is undefined so you might not get what you expect. Since the lookaround was never part of $&, what would it reset?

s/(?<=\s+\Kaaa)Ginger/.../

Since no one could figure out what should happen in that case, v5.32 disallows it.

Things to remember

  • Lookbehinds don't consume characters, but they can't be variable
  • The \K resets the start of $&
Leave a comment

0 Comments.

Leave a Reply


[ Ctrl + Enter ]