/xx match operator flag lets you add insignificant horizontal whitespace to character classes. You’ll have to upgrade to v5.26 to use it though, but that release is right around the corner.
/x has been around since v5.10. That allows you to spread out the parts of your pattern and to add internal comments. This flag makes horizontal whitespace insignificant.
Suppose you started with this compact and dense pattern representing an Apple product number (partially, for brevity):
use v5.10; my $string = 'FC007LL/A'; my $pattern = qr/[MFP][A-Z]\d+(?:LL|[FBEJTXYC])/; say $string =~ $pattern ? 'Matched' : 'Missed';
You can rewrite that with insignificant space and comments to show what the pieces mean. That’s important for you to remember what you did, for other people to see what you did, and have a double check of
your work. Remember, there’s no such thing as “self-documenting” because you only see what you did, not what you were supposed to do. And, this is a good reason to construct a pattern with
qr// so you don’t distract the later code with a bunch of regex lines:
use v5.10; my $string = 'FC007LL/A'; my $pattern = qr/ [MFP] # M = First sale, F = Refurbished, P = Personalized [A-Z] # probably not all these letters valid here \d+ (?: # country codes LL | [FBEJTXYC] ) \/ [ABC] # revision, maybe beyond C \z /x; say $string =~ $pattern ? 'Matched' : 'Missed';
If you try this with the stuff between the braces, the character class includes the whitespace that you used. This means that what you think is insignificant actually isn’t. This allows an invalid space in the input to match:
use v5.10; my $string = ' Z007LL/A'; # shouldn't match, but will in this example my $pattern = qr/ \A [ M F P ] # M = First sale, F = Refurbished, P = Personalized [ A-Z ] # probably not all these letters here \d+ (?: # country codes LL | [ FBEJTXYC ] ) \/ [ A B C ] # revision \z /x; say $string =~ $pattern ? 'Matched' : 'Missed'; # Matches, but oops!
So, you might not have thought to do that since the innards of a character class are not that complicated.
Prior to v5.26 (v5.25.9, really), you could use multiple
/x on the same match without additional effect. Using two, three, or even more
/x didn’t make the whitespace any more insignificant. But, with v5.26, and extra
/x allows you to add horizontal whitespace (space and tab) inside a character class. The previous example now fails to match because the spaces inside the character class are meaningless:
use v5.26; # or v5.25.9 my $string = ' Z007LL/A'; # doesn't match now my $pattern = qr/ \A [ M F P ] [ A-Z ] # probably not all these letters here \d+ (?: # country codes LL | [ FBEJTXYC ] ) \/ [ A B C ] # revision \z /xx; say $string =~ $pattern ? 'Matched' : 'Missed'; # Matches, but oops!
I can see how this will mostly help to emphasize ranges in the character class:
use v5.26; my $pattern =~ rx/ [ a-t v-z A-T V-Z ] /xx;
What would be more interesting to me is the full use of the usual
/x inside the character classes. I could then annotate why each character is there:
# THIS WILL NOT WORK, BUT IT WOULD BE NICE my $pattern = qr/ \A [ M # First sale F # Refurbished P # Personalized ] [A-Z] # probably not all these letters here \d+ # The country code, many more I omit for brevity (?:L [ A # Colombia, Ecuador, El Salvador, Guatemala, Honduras, Peru E # Argentina L # US Z # Chile, Paraguay, Uruguay ] | [ F # France B # Ireland, UK E # Mexico J # Japan T # Italy X # Australia, New Zealand Y # Spain C # Canada ] ) \/ [ABC] # revision \z /xx;
Perhaps there will be a future
/xxx operator that allows this. I could do the same thing with alternations, but that’s a bit sloppy and I’d rather make it easier for the regex engine.