Insignificant leading or trailing whitespace in brace constructs

This is a chapter in Perl New Features, a book from Perl School that you can buy on LeanPub or Amazon. Your support helps me to produce more content.



Perl’s coterie of brace constructs become a bit more lenient in v5.34. These things appear in double-quotish constructs, such as \N{CHARNAME} to specify a character by name. And, patterns count as a double-quoted construct (unless you use ' as the delimiter), so these new rules apply to brace constructs such as \k{} (for named backreferences) and the general quantifier, {n,m}.

Specifying characters

These constructs apply to double-quotish interpretation to specify a character by its codepoint or name:

Construct Description/th>

Item
\N{CHARNAME} Character name item
\o{177} Octal code point item
\x{ABCD} Hex code point

There are already loose names for \N{} that ignores whitespace (item), but this feature is a bit different. It ignores horizontal whitespace around a value (but not inside a value):

use v5.10;
use open qw(:std :utf8);

say <<~"HERE";
	Cat face: \N{ BLACK SPADE SUIT }
	Octal:    \o{ 23140 }
	Hex:      \x{ 2660 }
	HERE

This outputs the character we expect:

$ perl5.34.0 whitespace.pl
Spade suit: ♠
Octal:      ♠
Hex:        ♠

If you add space within the value, you don't get the character you want (the \N{} will actually fail):

use v5.34;
use open qw(:std :utf8);

say <<~"HERE";
	Octal:    \o{ 231 40 }
	Hex:      \x{ 26 60 }
	HERE

This discards the cruft once it encounters non-digit characters (just like Perl's string-to-number conversions). This is effectively:

use v5.34;
use open qw(:std :utf8);

say <<~"HERE";
	Octal:    \o{ 231 }
	Hex:      \x{ 26 }
	HERE

It's even worse. You can extra nonsense after the code number and v5.34 will ignore it. Although these have illegal digits (along with the internal space), they still work:

use v5.34;
use open qw(:std :utf8);
use warnings;

say <<~"HERE";
	Octal:    \o{ 231 abc }
	Hex:      \x{ 26 xyz }
	HERE

With trailing tabs or spaces, warnings says that it ignores the cruft and uses what it received so far:

Non-octal character ' ' terminates \o early.  Resolved as "\o{231}" at ...

With leading tabs or spaces, earlier Perls give up right away and uses the null character. The warning from v5.32 is this:

Non-octal character ' ' terminates \o early.  Resolved as "\o{000}" at...

Finally, the whitespace can't be vertical space or other double-quote escapes (it's just literal tabs or spaces). These don't work:

\o{\t231}
\o{
	231 }

In regular expressions, this fails before Perl interprets the pattern, where the /x would be able to handle the vertical whitespace. This would match a null byte because the string-to-number parsing stops at the first newline, returning \000:

m/\o{
	231
	}/x;

In regular expressions

And these constructs apply to regular expression features, and you don't need the /x flag to get this new, insignificant whitespace:

Construct Description Chapter
\b{TYPE} Word boundary Item
\g{N} Numbered backreference Item 31 (book)
\g{NAME} Named backreference Item 31 (book)
\k{NAME} Named backreference Item 31 (book)
\p{PROPNAME} Unicode property name
\P{PROPNAME} Unicode property name
\x{ABCD} Hex code point
{n,m} general quantifier

The rules for these are similar to the same as those from the previous section. Perl ignores the tabs or spaces at the beginning
or the end, but not in the middle (aside from around the , in {n,m}). For example, these all work:

use v5.10;
use open qw(:std :utf8);
use warnings;

$_ = 'aa';

my @patterns = (
	qr/(.)\g{ -1 }/,
	qr/(?.)\g{ first }/,
	qr/(?.)\k{ first }/,
	qr/\b{ sb }(.)/,
	qr/(\o{ 141 })\g{ -1 }/,
	qr/(\p{Letter})\g{ -1 }/,
	qr/(.)\g{ -1 }/,
	qr/(\x{ 61 })\g{ -1 }/,
	);

foreach my $pattern ( @patterns ) {
	say /$pattern/
	};

Specify octal numbers with the 0o prefix

Perl v5.34 allows you to specify octal literals with the 0o prefix, as in 0o123_456. This is consistent with the existing constructs to specify hexadecimal literal 0xddddd and binary literal 0bddddd. The builtin oct() function accepts any of these forms.

Previously, you specified octal with just a leading zero:

chmod 0644, $file;
mkdir 0755, $file;

Now you can do that an extra character that specifies the base:

chmod 0o644, $file;
mkdir 0o755, $file;

This makes it consistent with 0b for binary and 0x for hexadecimal. See “Scalar value constructors” in perldata.

And, remember that v5.14 added the \o{NNN} notation to specify characters by their octal number. We’re still waiting for octal floating point values (we got the hex version in v5.22), but don’t hold your breath.

Perhaps we’ll get 0d sometime so that all the bases.

Undef a scalar to release its memory

When you store a large string in a scalar, perl allocates the memory to store that string and associate it with the scalar. It uses the same memory even if you assign a much shorter value to the same scalar. Use the functional form of undef to let perl reuse that memory for something else. This is important when you want to reuse the variable or the lifetime of the variable is very long.

Continue reading “Undef a scalar to release its memory”

Perl v5.26 now recognizes version control conflict markers

Perl v5.26 can now detect and warn you about a version control conflict markers in your code. In prior versions, the compiler would try to interpret those as code and would complain about a syntax error. You program still fails to compile but you get a better error message. Maybe some future Perl will bifurcate the program, run both versions, and compare the results (don’t hold your breath):

Continue reading “Perl v5.26 now recognizes version control conflict markers”

In-place editing gets safer in v5.28

In-place editing is getting much safer in v5.28. Before that, in rare circumstances it could lose data. You may have never noticed the problem and even with all the times I’ve explained it in a Perl class I haven’t really thought about it. This was first reported as early as December 2002 and after we get v5.28 it won’t be a problem anymore. Continue reading “In-place editing gets safer in v5.28”

Beware of the removal of when in Perl v5.28

[Although I haven’t seen an official notice besides a git commit that reverts the changes, by popular outcry these changes won’t be in v5.28. It’s not that they won’t happen but they won’t be in v5.28. People who depend on Perl should stay vigilant. My advice in the first paragraph stands—change is coming and we don’t know what it is yet.]

Perl v5.28 might do away with when—v5.27.7 already has. Don’t upgrade to v5.28 until you know you won’t be affected by this! This change doesn’t follow the normal Perl deprecation or experimental feature policy. If you are using given-when, stop doing that. If you aren’t using it, don’t start. And everyone should consider if a major change like this on such short notice is comfortable for them. It’s not a democracy but you can still let the core developers know which way you want your favorite language to go.

Continue reading “Beware of the removal of when in Perl v5.28”

keys in scalar context now returns the number of keys

Starting in v5.26, a hash in scalar context evaluates to the number of keys in the hash. You might have thought that it always did that just like an array (not a list!) in scalar context evaluates to the number of items. But nope—it evaluated to a seemingly useless number called the “hash statistics”. Now it’s fixed to do what most people thought it already did. For what it’s worth, keys (or values) in scalar context already provided the count.

Continue reading “keys in scalar context now returns the number of keys”

Make bitwise operators always use numeric context

[This feature is no longer experimental, starting in v5.28. Declaring use 5.28 automatically enables them.]

Most Perl operators force their context on the values. For example, the numeric addition operator, +, forces its values to be numbers. To “add” strings, you use a separate operator, the string concatenation operator, . (which looks odd at the end of a sentence).

The bitwise operators, however, look at the value to determine their context. With a lefthand value that has a numeric component, the bitwise operators do numeric things. With a lefthand value that’s a string, the bit operators become string operators. That’s certainly one of Perl’s warts, which I’ll fix at the end of this article with a new feature from v5.22. Continue reading “Make bitwise operators always use numeric context”