/xx means more meaningless whitespace (and that’s good)

The /xx match operator flag lets you add insignificant horizontal whitespace to character classes. You’ll have to upgrade to v5.26 to use it though, but that release is right around the corner.

The /x has been around since v5.10. That allows you to spread out the parts of your pattern and to add internal comments. This flag makes horizontal whitespace insignificant.

The flag of Amsterdam, not yet a match operator

Suppose you started with this compact and dense pattern representing an Apple product number (partially, for brevity):

use v5.10;

my $string = 'FC007LL/A';

my $pattern = qr/[MFP][A-Z]\d+(?:LL|[FBEJTXYC])/;

say $string =~ $pattern ? 'Matched' : 'Missed';

You can rewrite that with insignificant space and comments to show what the pieces mean. That’s important for you to remember what you did, for other people to see what you did, and have a double check of your work. Remember, there’s no such thing as “self-documenting” because you only see what you did, not what you were supposed to do. And, this is a good reason to construct a pattern with qr// so you don’t distract the later code with a bunch of regex lines:

use v5.10;

my $string = 'FC007LL/A';

my $pattern = qr/
	[MFP]  # M = First sale, F = Refurbished, P = Personalized
	[A-Z]  # probably not all these letters valid here
	\d+
	(?: # country codes
		LL |
		[FBEJTXYC]
	)
	\/
	[ABC] # revision, maybe beyond C
	\z
	/x;

say $string =~ $pattern ? 'Matched' : 'Missed';

You can’t use spaces in the character classes, thinking that they are also insignificant. What you think is insignificant actually isn’t. This allows an invalid space in the input to match:

use v5.10;

my $string = ' Z007LL/A';  # shouldn't match, but will in this example

my $pattern = qr/
	\A
	[ M  F  P ] # M = First sale, F = Refurbished, P = Personalized
	[ A-Z ]  # probably not all these letters here
	\d+
	(?: # country codes
		LL |
		[ FBEJTXYC ]
	)
	\/
	[ A B C ] # revision
	\z
	/x;

say $string =~ $pattern ? 'Matched' : 'Missed';  # Matches, but oops!

So, you might not have thought to do that since the innards of a character class are not that complicated.

Prior to v5.26 (v5.25.9, really), you could use multiple /x on the same match without additional effect, although with a warning since v5.21. Using two, three, or even more /x didn’t make the whitespace any more insignificant (but wouldn’t that be cool!). But, with v5.26, and extra /x allows you to add horizontal whitespace (space and tab) inside a character class. The previous example now fails to match because the spaces inside the character class are meaningless:

use v5.26;  # or v5.25.9

my $string = ' Z007LL/A';  # doesn't match now

my $pattern = qr/
	\A
	[ M  F  P ]
	[ A-Z ]  # probably not all these letters here
	\d+
	(?: # country codes
		LL |
		[ FBEJTXYC ]
	)
	\/
	[ A B C ] # revision
	\z
	/xx;

say $string =~ $pattern ? 'Matched' : 'Missed';  # Matches, but oops!

I can see how this will mostly help to emphasize ranges in the character class:

use v5.26;

my $pattern =~ rx/
	[ a-t v-z A-T V-Z ]
	/xx;

What would be more interesting to me is the full use of the usual /x inside the character classes. I could then annotate why each character is there and use vertical whitespace:

# THIS WILL NOT WORK, BUT IT WOULD BE NICE
my $pattern = qr/
	\A

	[
	M  # First sale
	F  # Refurbished
	P  # Personalized
	]

	[A-Z]  # probably not all these letters here

	\d+

	# The country code, many more I omit for brevity
	(?:L [
		A   # Colombia, Ecuador, El Salvador, Guatemala, Honduras, Peru
		E   # Argentina
		L   # US
		Z   # Chile, Paraguay, Uruguay
		]
	|
		[
		F   # France
		B   # Ireland, UK
		E   # Mexico
		J   # Japan
		T   # Italy
		X   # Australia, New Zealand
		Y   # Spain
		C   # Canada
		]
	)

	\/
	[ABC] # revision

	\z
	/xx;

Perhaps there will be a future /xxx operator that allows this. I could do the same thing with alternations, but that’s a bit sloppy and I’d rather make it easier for the regex engine.

Leave a comment

1 Comments.

  1. Karl Williamson

    Just to be clear, using /xx, /xxx, etc. has raised a deprecation warning since sometime in the 5.21 series, and had been completely illegal between 5.25.1 and 5.25.8. There were no reports of breakage as a result. So there wasn’t the sudden change that a reader might infer from your post.

    The reason that newlines and comments aren’t allowed by this new construct is that there are are some non-obvious edge cases that could give unexpected results.

Leave a Reply


[ Ctrl + Enter ]