No more false postfix lexical declarations in v5.30

Before Perl v5.10 introduced state variables, people did various things to create persistent lexical variables for a subroutine. With v5.30, one of those constructs is now a fatal error.

Often you want a persistent variable to be scoped and private to a subroutine. But, once you leave that scope, normal lexical variables disappear because their reference count drops to zero. So, no persistence.

Continue reading “No more false postfix lexical declarations in v5.30”

Match only the same Unicode script

Earlier this year, this website was the target of some sort of attack in which a bot sent seemingly random data in its requests. The attack wasn’t that big of a deal since I easily blocked it with Cloudflare, but it was interesting. The apparently random data was actually a mix of Latin, Hangul, and Cyrillic. Domain hacks with unusual Unicode characters shows some of these exploits. Curiously, v5.28 added some regex feature that deals with this sort of nonsense.


Continue reading “Match only the same Unicode script”

Use atomic matching for complex non-backtracking

You can sometimes improve the performance of your regular expression by preventing parts of it from backtracking when you know that might be useful. Item 38. Avoid unnecessary backtracking had many techniques for this, although it did not mention atomic matching (a feature added in v5.005).

Continue reading “Use atomic matching for complex non-backtracking”

Use alpha assertions for more understandable regexes

[This feature stabilizes in Perl v5.32]

Perl v5.28 adds more-readable, alternate spelled-out forms for some of its regular expression extended patterns. Then, to make those slightly less readable, there are very short initialisms for those. Although these might seem superfluous now, the ability to define new syntax without relying on the limited number of ASCII symbols.

Continue reading “Use alpha assertions for more understandable regexes”

Perl v5.30 lets you match more with the general quantifier

Does the {N,} really match infinite repetitions in a Perl regular expression? No, it never has. You’ve been limited to 32,766 repetitions. Perl v5.30 is about to double that for you. And, if you are one of the people who needed more, I’d like to hear your story.

Continue reading “Perl v5.30 lets you match more with the general quantifier”

Use @{^CAPTURE} to get a list of all the capture buffers

Perl v5.26 adds three new special variables related to captures. The @{^CAPTURE} is an array of all the capture buffers. %{^CAPTURE} is a alias for %+ and stores the actually-matched named capture labels as its keys. %{^CAPTURE_ALL} is an alias for %- and stores all the named capture labels and their matched (or not) values.

Continue reading “Use @{^CAPTURE} to get a list of all the capture buffers”

Undef a scalar to release its memory

When you store a large string in a scalar, perl allocates the memory to store that string and associate it with the scalar. It uses the same memory even if you assign a much shorter value to the same scalar. Use the functional form of undef to let perl reuse that memory for something else. This is important when you want to reuse the variable or the lifetime of the variable is very long.

Continue reading “Undef a scalar to release its memory”

Mix assignment and reference aliasing with declared_refs

Perl v5.26 adds the experimental declared_refs feature that expands on the experimental refaliasing feature from v5.22. As with all experimental features, this may change or disappear according to perlpolicy.

Continue reading “Mix assignment and reference aliasing with declared_refs”

Use Unicode 10 in Perl v5.28

Perl v5.28 updates to Unicode 10. There are 8,518 new characters, 7,473 which are in the CJK extension. There are 56 new emojis. And, the Bitcoin symbol, ₿. It adds a T. rex, 🦖, but we’re still waiting for a raptor. To Perl they are just characters like any other so you don’t need anything new to deal with them.

Continue reading “Use Unicode 10 in Perl v5.28”

Find the new emojis in Perl’s Unicode support

Perl v5.26 updates itself to Unicode 9. That’s not normally exciting news but people have been pretty enthusiastic about the 72 new emojis that come. As far as Perl cares, they are just valid code points like all of the other ones.

Continue reading “Find the new emojis in Perl’s Unicode support”