Use @{^CAPTURE} to get a list of all the capture buffers

Perl v5.26 adds three new special variables related to captures. The @{^CAPTURE} is an array of all the capture buffers. %{^CAPTURE} is a alias for %+ and stores the actually-matched named capture labels as its keys. %{^CAPTURE_ALL} is an alias for %- and stores all the named capture labels and their matched (or not) values.


@{^CAPTURE}

I started the first edition of Mastering Perl with an example that used @+ and @-. These variables hold the position offsets for the capture buffers. You can use the length of these arrays to figure out the number of capture buffers. You can also use those with substr to extract the proper substrings. Here’s a pattern made up of two other patterns that you might not know ahead of time:

$_ = <<~'HERE';
Buster the cat
Nikki the dog
HERE

my $pattern = qr/(.+?) the (.+?)\R/;

if( /$pattern$pattern/ ) {
	my @captures;
	foreach my $i ( 1 .. $#+ ) {
		push @captures, substr( $_,
			$-[$i],
			$+[$i] - $-[$i]
			);
		}
	say "<@captures>";
	}

Notice that the first index in that example is 1. The offsets in position 0 correspond to $&, the overall match. Here’s the output:

<Buster cat Nikki dog>

You might do this with a complicated pattern where you don’t know how many captures there are or you don’t want to recount every time you change the pattern. But, ignore that for now because you don’t need to do that anymore.

In v5.26 you don’t need to do all of that nonsense. You get an array of all the capture buffers in @{^CAPTURE}:

$_ = <<~'HERE';
Buster the cat
Nikki the dog
HERE

my $pattern = qr/(.+?) the (.+?)\R/;

if( /$pattern$pattern/ ) {
	say "<@{^CAPTURE}>";
	}

Now note that the equivalent of $& isn’t in @{^CAPTURE}. The output is the same as before.

%{^CAPTURE}, or, %+

The %+ hash has the named capture labels as keys (and it’s mostly the keys that are important here):

use Data::Dumper;

$_ = <<~'HERE';
Buster the cat
Nikki the dog
HERE

my $pattern = qr/(?<name>.+?) the (?<animal>.+?)\R/;

if( /$pattern$pattern/ ) {
	say Dumper( \%+ );
	}

The output shows the first match for each of the named labels (you’ll fix that in that in a minute):

$VAR1 = {
          'name' => 'Buster',
          'animal' => 'cat'
        };

The %+ hash only has keys for the named captures that actually matched. This pattern has a named capture label that won’t match:

use Data::Dumper;

$_ = <<~'HERE';
Buster the cat
Nikki the dog
HERE

my $pattern = qr/
	(?<cat>.+?)  \s the \s cat \R |
	(?<dog>.+?)  \s the \s dog \R |
	(?<bird>.+?) \s the \s bird \R
	/x;

if( /$pattern$pattern/ ) {
	say Dumper( \%+ );
	}

The output shows only the labels that match (and still only the first value that matched):

$VAR1 = {
          'cat' => 'Buster',
          'dog' => 'Nikki'
        };

With v5.26, you can give that special variable a name:

if( /$pattern$pattern/ ) {
	say Dumper( \%{^CAPTURE} );
	}

%{^CAPTURE_ALL}, or, %-

The %- hash is more expansive than %+. It has keys for every defined label:

use Data::Dumper;

$_ = <<~'HERE';
Buster the cat
Nikki the dog
HERE

my $pattern = qr/
	(?<cat>.+?)  \s the \s cat \R |
	(?<dog>.+?)  \s the \s dog \R |
	(?<bird>.+?) \s the \s bird \R
	/x;

if( /$pattern$pattern/ ) {
	say Dumper( \%- );
	}

Now the output has entries for every named capture and an array reference for the value. That array has an entry for every place that label could have matched. Where it didn’t match there’s an undef in the array:

$VAR1 = {
          'dog' => [
                     undef,
                     'Nikki'
                   ],
          'cat' => [
                     'Buster',
                     undef
                   ],
          'bird' => [
                      undef,
                      undef
                    ]
        };

Now, instead of %- you can use %{^CAPTURE_ALL}:

if( /$pattern$pattern/ ) {
	say Dumper( \%{^CAPTURE_ALL} );
	}

That provides an easy way to tell them apart. There’s one variable for the named capture labels and there’s one for everything.

Things to Remember

  • Perl v5.26 adds one new special variable and two special variable aliases.
  • @{^CAPTURE} is an array of all the capture buffers.
  • The %{^CAPTURE} is an alias for %+.
  • The %{^CAPTURE_ALL} is an alias for %-.

One thought on “Use @{^CAPTURE} to get a list of all the capture buffers”

Comments are closed.