Category Archives: 5.10

Use atomic matching for complex non-backtracking

You can sometimes improve the performance of your regular expression by preventing parts of it from backtracking when you know that might be useful. Item 38. Avoid unnecessary backtracking had many techniques for this, although it did not mention atomic matching (a feature added in v5.005).

Ignore part of a substitution’s match

Normally, a substitution replaces everything it matched, but v5.10 adds a feature that allows you to ignore part of the match. The \K excludes from $& anything to its left. This feature has already made it into PCRE. It doesn’t have an official name, so I’ll call it the match reset operator because it resets […]

Define grammars in regular expressions

[ This is the 100th Item we’ve shared with you in the two years this blog has been around. We deserve a holiday and we’re taking it, so read us next year! Happy Holidays.] Perl 5.10 added rudimentary grammar support in its regular expressions. You could define many subpatterns directly in your pattern, use them […]

The \R generic line ending

Perl v5.10 adds a regular expression shortcut \R that matches anything the Unicode specification thinks is a line ending. It looks similar to a character class shortcut but it’s not. It can match the sequence of carriage-return line-feed, but character classes don’t match sequence.

Use the C3 method resolution order in multiple inheritance

Perl 5.10 introduced a flexible method resolution order mechanism. Instead of Perl’s default order (see Understand Perl’s default inheritance model), you can try something less stupid by using the mro pragma to specify which order perl.

Use the > and < pack modifiers to specify the architecture

Byte-order modifiers are one of the Perl 5.10 features farther along in perl5100delta, after the really big features. To any pack format, you can append a < or a > to specify that the format is little-endian or big-endian, respectively. This allows you to handle endianness in the formats that don’t have specify versions for […]

Use a smart match to match several patterns at once

The smart match operator (Item 23. Make work easier with smart matching) reduces many common comparisons to a few keystrokes, keeping with Perl’s goal of making the common things easy. You can use the smart match operator to make even less common tasks, such as matching many regular expressions at the same time, just as […]

Set default values with the defined-or operator.

[This is a mid-week bonus Item since it’s so short] Prior to Perl 5.10, you had to be a bit careful checking a Perl variable before you set a default value. An uninitialized value and a defined but false value both acted the same in the logical || short-circuit operator. The Perl idiom to set […]

Use branch reset grouping to number captures in alternations

Perl’s regular expressions have a simple rule for capturing groups. It counts the order of left parentheses to assign capture variables. Not all capture groups must actually match parts of the string, and Perl doesn’t care if they do. Perl assigns capture groups inside an alternation consecutively, even though it knows that only one branch […]

Match Unicode characters by property value

A Unicode character has properties; it knows things about itself. Perl v5.10 introduced a way to match a character that has certain properties that v5.10 supports. In some cases you can match a particular property value. Now v5.12 allows you can match any Unicode property by its value. The newly-supported ones include Numeric_Value and Age, […]