“Up to N” matches

This is a chapter in Perl New Features, a book from Perl School that you can buy on LeanPub or Amazon. Your support helps me to produce more content.



Perl’s general regex quantifier, {n,m} takes a minimum and maximum number of matches. If you leave out the maximum number, like {n,}, you have to match the preceding thing at least n times but as many times as it can match: the maximum is unbounded.

Before v5.34, the converse {,n} was not legal syntax. Now it is. You can match zero to n of the preceding thing. It’s still optional (zero matched), but if it’s there, this will consume up to n of them.

Consider these examples. First, quantify a to match 0 to 5 times. This matches every string because every string has at least zero as (a trick question we have early on in Learning Perl):

/a{,5}/

This is more useful when it follows or is anchored by something else that must match:

/ba{,5}/    # b optionally followed by up to 5 a
/\Aa{,5}/   # optional zero to 5 a at beginning of string

Here are a few runs (using the /p flag for the safe match variables, although this really hasn’t been a problem with v5.18). The first line matches because there is a b with some as after it. The next line matches for the same reason if you count zero as “some”. The third line matches the first instance of ba like all other Perl patterns; it finds the leftmost match, not the longest. Finally, consider the last line. It matches no as because matching a b with zero as works too. That might surprise you:

% perl5.34.0 -ne '/ba{,5}/p && print qq(${^PREMATCH} <${^MATCH}> ${^POSTMATCH})'
baaaaaaaaaacd
 <baaaaa> aaaaacd
bcd
 <b> cd
babaaa
 <ba> baaa
bbaaa
 <b> baaa

But, consider what happens with other versions of Perl. Perl v5.22 is the version the deprecated an unescaped left brace. Although the regex parses, it doesn’t handle this syntax. Instead, it sees {,5} as literal characters:

% perl5.22 -ne '/ba{,5}/p && print qq(${^PREMATCH} <${^MATCH}> ${^POSTMATCH})'
Unescaped left brace in regex is deprecated, passed through
in regex; marked by <-- HERE in m/ba{ <-- HERE ,5}/ at ...
ba
baaaaa
ba{,5}
 <ba{,5}>

With v5.26 (then not v5.28, then with v5.30 again), the unescaped left brace is fatal (item):

% perl5.26 -ne ' /ba{,5}/p && print qq(${^PREMATCH} <${^MATCH}> ${^POSTMATCH})'

Unescaped left brace in regex is illegal here in regex;
marked by <-- HERE in m/ba{ <-- HERE ,5}/ at ...

Finally, here’s a summary of the quantifiers up to v5.34:

Quantifier General Quantifer Description
* { 0, } zero or more
+ { 1, } one or more
? { 0, 1 } zero or one
{ n, m } between n and m matches
{ n, } at least n, but unlimited-ish
{ , m } up to m (new!)

Although we say that {n,} is unlimited, pedantically it’s not. It’s “unlimited-ish”. Before v5.30, the maximum number is actually 32,766. That likely means “infinite” to you (and if it doesn’t, let me know what you are doing). With v5.30, the maximum increased to 65,535 (and if that’s still not enough, I really want to know what you are doing).