/xx means more meaningless whitespace (and that’s good)

The /xx match operator flag lets you add insignificant horizontal whitespace to character classes. You’ll have to upgrade to v5.26 to use it though, but that release is right around the corner.

The /x has been around since v5.10. That allows you to spread out the parts of your pattern and to add internal comments. This flag makes horizontal whitespace insignificant.

The flag of Amsterdam, not yet a match operator

Suppose you started with this compact and dense pattern representing an Apple product number (partially, for brevity):

use v5.10;

my $string = 'FC007LL/A';

my $pattern = qr/[MFP][A-Z]\d+(?:LL|[FBEJTXYC])/;

say $string =~ $pattern ? 'Matched' : 'Missed';

You can rewrite that with insignificant space and comments to show what the pieces mean. That’s important for you to remember what you did, for other people to see what you did, and have a double check of
your work. Remember, there’s no such thing as “self-documenting” because you only see what you did, not what you were supposed to do. And, this is a good reason to construct a pattern with qr// so you don’t distract the later code with a bunch of regex lines:

use v5.10;

my $string = 'FC007LL/A';

my $pattern = qr/
	[MFP]  # M = First sale, F = Refurbished, P = Personalized
	[A-Z]  # probably not all these letters valid here
	\d+
	(?: # country codes
		LL |
		[FBEJTXYC]
	)
	\/
	[ABC] # revision, maybe beyond C
	\z
	/x;

say $string =~ $pattern ? 'Matched' : 'Missed';

If you try this with the stuff between the braces, the character class includes the whitespace that you used. This means that what you think is insignificant actually isn’t. This allows an invalid space in the input to match:

use v5.10;

my $string = ' Z007LL/A';  # shouldn't match, but will in this example

my $pattern = qr/
	\A
	[ M  F  P ] # M = First sale, F = Refurbished, P = Personalized
	[ A-Z ]  # probably not all these letters here
	\d+
	(?: # country codes
		LL |
		[ FBEJTXYC ]
	)
	\/
	[ A B C ] # revision
	\z
	/x;

say $string =~ $pattern ? 'Matched' : 'Missed';  # Matches, but oops!

So, you might not have thought to do that since the innards of a character class are not that complicated.

Prior to v5.26 (v5.25.9, really), you could use multiple /x on the same match without additional effect. Using two, three, or even more /x didn’t make the whitespace any more insignificant. But, with v5.26, and extra /x allows you to add horizontal whitespace (space and tab) inside a character class. The previous example now fails to match because the spaces inside the character class are meaningless:

use v5.26;  # or v5.25.9

my $string = ' Z007LL/A';  # doesn't match now

my $pattern = qr/
	\A
	[ M  F  P ]
	[ A-Z ]  # probably not all these letters here
	\d+
	(?: # country codes
		LL |
		[ FBEJTXYC ]
	)
	\/
	[ A B C ] # revision
	\z
	/xx;

say $string =~ $pattern ? 'Matched' : 'Missed';  # Matches, but oops!

I can see how this will mostly help to emphasize ranges in the character class:

use v5.26;

my $pattern =~ rx/
	[ a-t v-z A-T V-Z ]
	/xx;

What would be more interesting to me is the full use of the usual /x inside the character classes. I could then annotate why each character is there:

# THIS WILL NOT WORK, BUT IT WOULD BE NICE
my $pattern = qr/
	\A

	[
	M  # First sale
	F  # Refurbished
	P  # Personalized
	]

	[A-Z]  # probably not all these letters here

	\d+

	# The country code, many more I omit for brevity
	(?:L [
		A   # Colombia, Ecuador, El Salvador, Guatemala, Honduras, Peru
		E   # Argentina
		L   # US
		Z   # Chile, Paraguay, Uruguay
		]
	|
		[
		F   # France
		B   # Ireland, UK
		E   # Mexico
		J   # Japan
		T   # Italy
		X   # Australia, New Zealand
		Y   # Spain
		C   # Canada
		]
	)

	\/
	[ABC] # revision

	\z
	/xx;

Perhaps there will be a future /xxx operator that allows this. I could do the same thing with alternations, but that’s a bit sloppy and I’d rather make it easier for the regex engine.

v5.26 removes dot from @INC

As of v5.26, the . in @INC is gone by default. When you compile perl, it bakes the default module search path (based on your configure settings) into the binary. These are the paths that perl searches without you adding to @INC with command-line switches or environment variables, and the paths you see when you run perl -V:

» Read more…

Strip leading spaces from here-docs with v5.26

Perl v5.26 allows you to indent heredocs, that special multi-line string quoting mechanism. In v5.24 and earlier, the content of any here-doc included the entire line with any leading whitespace might be there. That meant you typed here-docs in a way the broke the indention of the block that contained them. Perl 6 envisioned a way that this could work and now Perl 5 has stolen some of that.

In this example, I have a here-doc in a subroutine just to have another scope. The lines of the string start at the beginning of the line rather than the beginning of the indentation:

sub say_something {
	my $string =<<'HERE';
This line is not indented
Neither is this line
And the delimiter is not indented
HERE

	print $string;
	}

say_something();

That's a bit annoying to some people (and not a problem to some others). v5.26 (not yet released) allows you to modify the here-doc to strip leading whitespace. Put a ~ after the <<:

sub say_something {
	my $string =<<~'HERE';
		This line is not indented
		Neither is this line
		And the delimiter is not indented
		HERE

	print $string;
	}

say_something();

The parser looks at the space in front of the final delimiter and strips that much space:

This line is not indented
Neither is this line
And the delimiter is not indented

Any whitespace left over is part of the string:

sub say_something {
	my $string =<<~'HERE';
		This line is not indented
			But this line is indented
		And the delimiter is not indented
		HERE

	print $string;
	}

say_something();

The second line kept the space after the whitespace that was before the delimiter:

This line is not indented
	But this line is indented
And the delimiter is not indented

If each line does not start with at least that much space, it's a compile-time error. This won't work:

use v5.25;
sub say_something {

	my $string =<<~"TAB_BEFORE";
		This is the first line
Another line that is a compilation error
		This is the last line
	TAB_BEFORE

	say $string;
	}

You get an error at compile time (although in Perl 6 it tries to figure it out, which I don't think is a good solution):

Indentation on line 2 of here-doc doesn't match delimiter at 526.pl line 5.

The whitespace before the end delimiter must match exactly at the beginning of each line. You can mix tabs and spaces, but it won't try to figure out how many spaces are in a tab (although the answer is four).

This also means that you have to be careful about de-tabbing or en-tabbing code, or when you change indention levels. You might accidentally change the number of whitespace characters.

Things to Remember

  • v5.26 allows insignificant leading whitespace from here-docs
  • Use <<~ to start the whitespace-stripping version of the here-doc.
  • The here-doc will strip the exact same whitespace before the leading delimiter.

Perl v5.24 adds a line break word boundary

Perl v5.24 adds a linebreak word boundary, \b{lb}, to go along the new word boundaries added in v5.22. This is part of Perl’s increasing conformance with the regular expression requirements in Unicode Technical Standard #18. The Unicode::LineBreak implements the same thing, although you have to do a lot more work. » Read more…

Perl v5.22 adds fancy Unicode word boundaries

Perl v5.22’s regexes added four Unicode boundaries to go along with the vanilla “word” boundary, \b, that you’ve been using for years. These new assertions aren’t going to match perfectly with your expectations of human languages (the holy grail of natural language processing), but they do okay-ish. Although these appear in v5.22.0, as a late edition to the language they were partially broken in the initial release. They were fixed for v5.22.1. » Read more…

Perl v5.26 new features

Perl v5.26 isn’t out yet, but here’s what’s interesting in its development version, v5.25, so far. » Read more…

Lexical $_ and autoderef are gone in v5.24

Two features that I have previously discouraged are now gone from Perl. The lexical $_ and auto dereferencing.

The lexical $_ was a consequence of the way Perl wanted smart match to work. In a given-when, instead of aliasing $_ like foreach does, the block had an implicit my $_ = .... This interfered with the package version, as I wrote about in Use for() instead of given() and Perl v5.16 now sets proper magic on lexical $_. » Read more…

Postfix dereferencing is stable is v5.24

Perl’s dereferencing syntax might be, or even should be, responsible for people’s disgust at the language. In v5.20, Perl added the experimental postfix dereferencing syntax that made this analogous to method chaining. This is one of the most pleasing Perl features I’ve encountered in years. » Read more…

No more -no_match_vars

The English module translates Perl’s cryptic variable names to English equivalents. For instance, $_ becomes $ARG. This means that the match variable $& becomes $MATCH. This also means that using the English module triggered the performance issue associated with the match variables $`, $&, and $' even if you didn’t use those variables yourself—the module used them for you. The Devel::NYTProf debugger had a sawampersand feature to tell you one of those variables appeared in the code. We covered this in Item 33. Watch out for the match variables. » Read more…

Perl v5.24 new features

Perl v5.24 may not look like it’s packed full of exciting features (indeed, it removes some of them) but it has lots of improvements under the hood. Here’s some of the user-visible features you might like. » Read more…