Perl 5.20 introduces “Key/Value Slices”

Perl v5.20 adds the “Key/Value Slice”, which extracts multiple keys and their corresponding values from a container (hash or array). It uses the %, which is new, legal syntax for a variable name with subscripts after it:


use v5.20;  # you don't need this for the new syntax

my %smaller_hash = %big_hash{ @keys };
my %index_hash   = %big_array[ @indices ];

As with the @ sigil, you know variable type you’re dealing with by the indexing syntax after it. The % does not signify a hash; it denotes that you are getting the index (key) with the value.

Looking at that example, you might mistakenly think that these slices return hashes. They don’t. They return lists which have an index then a value. That’s pairwise like the list representation of a hash, but it can also repeat keys (which hashes can’t do):

use v5.20;  # you don't need this for the new syntax

my %big_hash     = qw( cat Buster dog Addy bird Poppy );

my @array = %big_hash{ qw(cat cat) };

say "@array";

The resulting list duplicates that entries for cat:

cat Buster cat Buster

This new type of slice returns a list, which is just data (i.e. not a variable). That means you can’t use the hash or array operators on the result, which would be a neat trick. You can’t take a reference to the entire result, because that’s the same as taking a reference to a list to get a list of references. The result is not an lvalue, so you can’t assign to it or modify it directly.

Previous to v5.20, you can do the same thing with a map:

my @array = map { $_ => $big_hash{$_} } @keys;

That’s not that bad, making this new feature less than compelling. Use it if you need v5.20 for something else, but don’t make this feature the one that forces people to upgrade.

Further Reading

Post to Twitter Post to Delicious Post to Digg Post to Facebook Post to Reddit

Perl 5.20 optimizes return at the end of a subroutine

Want to save 10 nanoseconds? Perl v5.20 optimizes a return at the end of a subroutine to use two fewer ops in the optimized version. During compilation, a subroutine like this one:


sub some_sub { ...; return $foo }

turns into a subroutine like this one, without the return

sub some_sub { ...; $foo }

You can see the difference in the output from the B::Concise module (which you can use through the O frontend). Prior to v5.20, there are five steps to return the first argument:

$ perl5.18.0 -MO=Concise,baz,-exec -e 'sub baz { return $_[0] }'
main::baz:
1  <;> nextstate(main 1 -e:1) v
2  <0> pushmark s
3  <$> aelemfast(*_) s
4  <@> return K
5  <1> leavesub[1 ref] K/REFC,1
-e syntax OK

But, prior to v5.20, if you didn’t use the return keyword, there are only three steps after the PUSHMARK isn’t there:

$ perl5.18.0 -MO=Concise,baz,-exec -e 'sub baz { $_[0] }'
main::baz:
1  <;> nextstate(main 1 -e:1) v
2  <$> aelemfast(*_) s
3  <1> leavesub[1 ref] K/REFC,1
-e syntax OK

You can read about PUSHMARK in the perlcall documentation. It’s a signal to perl to remember where the current stack pointer is.

With v5.20, perl optimizes the return version to have the same steps:

$ perl5.20.0 -MO=Concise,baz,-exec -e 'sub baz { return $_[0] }'
main::baz:
1  <;> nextstate(main 1 -e:1) v
2  <$> aelemfast(*_) s
3  <1> leavesub[1 ref] K/REFC,1
-e syntax OK

$ perl5.20.0 -MO=Concise,baz,-exec -e 'sub baz { $_[0] }'
main::baz:
1  <;> nextstate(main 1 -e:1) v
2  <$> aelemfast(*_) s
3  <1> leavesub[1 ref] K/REFC,1
-e syntax OK

This will be happy news to people who stick to Perl Best Practices, which recommends that you always use an explicit return. Damian’s recommendation is to denote intent, but now you don’t suffer if it’s the last statement in the subroutine.

Post to Twitter Post to Delicious Post to Digg Post to Facebook Post to Reddit

Perl 5.20 uses its own random number generator

Prior to v5.20, perl used whatever random number generator the system provided. This meant that the same program could have statistically different results based on the quality of that function. The rand() for Windows had a max of 32,768 (15 bits), while POSIX has drand48 (48 bits). This sort of numerical un-portability has always been a problem with perl since it’s relied on the underlying libc for so much.

100  One Hundred Random Numbers

Not any more. It’s all internal to perl now. With v5.20 and beyond, you’ll get the same pseudorandom number generator that everyone else with v5.20 and later gets.

For the effective programmer though, this doesn’t really matter because you shouldn’t be using the pseudorandom number generator for anything important. We call it rand, but it’s not really. We should have called it fake_random, good_enough_random, or i_wont_install_a_module_so_ill_deal_with_it_random. The name in Perl comes from the name in libc (e.g. the GNU libc function list), just like many of the oddly named functions such as abs, chmod, or getgrent. From the name, we get sloppy talking about it’s output as “random numbers” instead of the correct “pseudorandom” number.

Sinan Ünür examines how well Perl’s rand does with coin flips and concludes it comes up short (Perl 5.20.0 brings a “better” PRNG to Windows). An older presentation from the Wellington Perl mongers goes through some serious math to talk about better pseudorandom numbers. The documentation for Math::Random::Secure has more interesting details. Several other modules provide rand replacements.

There are ways to get real random numbers. Atmospheric noise, nuclear decay, and other processes are random and their measurement can supply the numbers. The random.org website, for one, can supply these, and the Net::Random makes the connection for you.

Even though rand still isn’t random, at least everyone can use the same thing without any extra work. I like it any time the perl can bring this stuff inside to make it more portable.

Post to Twitter Post to Delicious Post to Digg Post to Facebook Post to Reddit

Perl 5.20 new features

Perl 5.20 is out and there are some nice syntax changes that make life easier for Perlers, along with some improvements that don’t require any work from you. Some of the features are experimental, so be careful that you don’t create problems by overusing them until they settle down.

You can download the Perl source from CPAN. For Windows, Strawberry Perl 5.20 is available now.


  • Subroutine signatures (experimental)
  • Postfix dereferencing (experimental)
  • Slices that return the indices too (“Key/Value Hash Slices”, which is going to be confusing to distinguish from “hash slices” when we talk about them)
  • Some taint improvements to close more loopholes
  • The Standard Library is trimmed (CGI, Module::Build, find2perl, s2p, and a2p are gone)
  • A return at the end of a subroutine is optimized away
  • Adjacent my declarations are combined
  • On the command line, -F implies -a, and -a implies -n
  • rand has a internal generator so it’s not platform or library dependent
  • locale improvements

Post to Twitter Post to Delicious Post to Digg Post to Facebook Post to Reddit

Experimental features now warn (reaching back to v5.10)

Perl 5.18 provides a new way to introduce experimental features in a program, augmenting the feature pragma that v5.10 added. This change marks certain broken v5.10 features as experimental with an eye toward possible removal from the language.

Smart matching in v5.10 led to several broken and conflated features. The given used a lexical version of $_, which broke many other common uses of that variable inside the given, which I explain in Use for() instead of given() and you can see in given/when and lexical $_ ….

Under v5.18, when you use given, when, or ~~, you get a warning, even if there is no smart match involved:

# given_warning.pl
use v5.10; # earliest occurance of feature
for( 'Buster' ) {
	when( 1 == 1 ) { say "Hello" }
	}

These warnings might cause test suites to fail when people try to install modules on the new perl, like it does for Unicode::Tussle.

% perl5.10.1 given_warning.pl
Hello
% perl5.18.0 given_warning.pl
when is experimental at given_warning.pl line 4.
Hello

Using the diagnostics shows the sort of warning it is:

% perl5.18.0 -Mdiagnostics given_warning.pl
when is experimental at -e line 1 (#1)
    (S experimental::smartmatch) when depends on smartmatch, which is
    experimental.  Additionally, it has several special cases that may
    not be immediately obvious, and their behavior may change or
    even be removed in any future release of perl.
    See the explanation under "Experimental Details on given and when"
    in perlsyn.

Hello

To get rid of this warning, you do the same thing you do with other warnings. Take the category of the warning and turn it off with no (Item 100: Use lexical warnings to selectively turn on or off complaints):

# given_warning.pl
use v5.10; # earliest occurance of feature
no warnings 'experimental::smartmatch';
for( 'Buster' ) {
	when( 1 == 1 ) { say "Hello" }
	}

The lexical $_ is another broken fature that’s now marked as experimental.

# lexical_.pl
use v5.10;

sub cat { my $_ }

Any use in v5.18 gives a warning:

% perl5.18.0 lexical_.pl
Use of my $_ is experimental at lexcial_.pl line 3.

The category is different:

% perl5.18.0 -Mdiagnostics lexical_.pl
Use of my $_ is experimental at lexcial_.pl line 4 (#1)
    (S experimental::lexical_topic) Lexical $_ is an experimental
    feature and its behavior may change or even be removed in any
    future release of perl. See the explanation under "$_" in perlvar.

That takes care of the two retro features. Perl v5.18 introduces two new experimental features, set logic in character classes (for complete Unicode Level 1 regular expression compliance), and lexical subroutines, which I’ll cover in other items.

# regex.pl
use v5.18;

print "Match" if 'foo' =~ /(?[ \p{Thai} & \p{Digit} ])/;

Without turning off the warning, perl knows about the feature and points it out:

% perl5.18.0 regex.pl
The regex_sets feature is experimental in regex; marked by <-- HERE in m/(?[ <-- HERE  \p{Thai} & \p{Digit} ])/ at regex.pl line 4.

In this case, diagnostics is not any help:

% perl5.18.0 -Mdiagnostics regex.pl
The regex_sets feature is experimental in regex; marked by <-- HERE in m/(?[
        <-- HERE  \p{Thai} & \p{Digit} ])/ at regex.pl line 3 (#1)
The regex_sets feature is experimental in regex; marked by <-- HERE in m/(?[ <-- HERE  \p{Thai} & \p{Digit} ])/ at regex.pl line 3.

For lexical named subroutines, you have explicitly enable the feature but you then have to explicitly turn off its warnings.

# lexical_sub.pl
use v5.18;
no warnings 'experimental::lexical_subs';
use feature "lexical_subs";

my sub foo { say "Hello" }

Handling older perls

In v5.18, that's all fine and good, but older versions don't understand those warnings categories and will stop your program.

% perl5.10.1 -e 'no warnings qw(smartmatch)'
Unknown warnings category 'smartmatch' at -e line 1
BEGIN failed--compilation aborted at -e line 1.

Instead of using warnings, you can use the non-core experimental module that handles that for you:

use experimental qw(smartmatch);

For versions without that warning category, nothing happens. For versions with that feature, it turns off the warning.

Summary

This table summarizes the new experimental warnings categories and the features they affect.

Category Features
experimental::smartmatch given, when, ~~
experimental::lexical_topic my $_
experimental::regex_sets (?[ ])
experimental::lexical_subs my sub NAME {}, our sub NAME {}

Things to remember

  • Some v5.10 features now warn under v5.18
  • Some new experimental features must be explicitly enabled
  • Even explicitly enabled features still warn
  • The experimental module is version safe

Post to Twitter Post to Delicious Post to Digg Post to Facebook Post to Reddit

Perl 5.18 new features

Perl 5.18 is out and there are some major changes that you should know about before you upgrade. Most notably, some features from v5.10 are now marked experimental. If you use those features, you get warnings.

You can download the Perl source from CPAN. For Windows, Strawberry Perl 5.18 is available now.


Post to Twitter Post to Delicious Post to Digg Post to Facebook Post to Reddit

The vertical tab is part of \s in Perl 5.18

Up to v5.18, the vertical tab wasn’t part of the \s character class shortcut for ASCII whitespace. No one really knows why. It was curious trivia that I pointed out in Know your character classes under different semantics. Whitespace in ASCII, POSIX, and Unicode represented different sets. Perl whitespace was different from POSIX whitespace by only the exclusion of the vertical tab. Now that little oversight is fixed.

I had this program to mark which sets matched which characters. I required v5.10 because that’s the first appearance of the \h and \v shortcuts for horizontal and vertical whitespace.

use 5.010;

use charnames qw(:full);

print <<"LEGEND";
s   matches \\s, matches Perl whitespace
h   matches \\h, horizontal whitespace
v   matches \\v, vertical whitespace
p   matches [[:space:]], POSIX whitespace
all characters match Unicode whitespace, \\p{Space}

LEGEND

printf qq(%s %s %s %s  %-7s --> %s\n),
	qw( s h v p  Ordinal  Name );
print '-' x 50, "\n";

foreach my $ord ( 0 .. 0x10ffff ) {
	next unless chr($ord) =~ /\p{Space}/;
	my( $s, $h, $v, $posix ) =
		map { chr($ord) =~ m/$_/ ? 'x' : ' ' }
			( qr/\s/, qr/\h/, qr/\v/, qr/[[:space:]]/ );
	printf qq(%s %s %s %s  0x%04X  --> %s\n),
		$s, $h, $v, $posix,
		$ord, charnames::viacode($ord);
	}

Under v5.10, the top of the output showed that \s did not include the vertical tab, which the UCS names LINE TABULATION.

$ perl5.10.1 spaces
s   matches \s, matches Perl whitespace
h   matches \h, horizontal whitespace
v   matches \v, vertical whitespace
p   matches [[:space:]], POSIX whitespace
all characters match Unicode whitespace, \p{Space}

s h v p  Ordinal --> Name
--------------------------------------------------
x x   x  0x0009  --> CHARACTER TABULATION
x   x x  0x000A  --> LINE FEED
    x x  0x000B  --> LINE TABULATION
x   x x  0x000C  --> FORM FEED
x   x x  0x000D  --> CARRIAGE RETURN
x x   x  0x0020  --> SPACE

Run under v5.18, the output changes slightly to have another x in the third row (line 12).

$ perl5.18.0 spaces
s   matches \s, matches Perl whitespace
h   matches \h, horizontal whitespace
v   matches \v, vertical whitespace
p   matches [[:space:]], POSIX whitespace
all characters match Unicode whitespace, \p{Space}

s h v p  Ordinal --> Name
--------------------------------------------------
x x   x  0x0009  --> CHARACTER TABULATION
x   x x  0x000A  --> LINE FEED
x   x x  0x000B  --> LINE TABULATION
x   x x  0x000C  --> FORM FEED
x   x x  0x000D  --> CARRIAGE RETURN
x x   x  0x0020  --> SPACE

I don’t foresee this breaking anything since the vertical tab seems to be a rare character, although in ETL I liked using it as a separator since I figured no one else would be using it.

Post to Twitter Post to Delicious Post to Digg Post to Facebook Post to Reddit

Effective Perler discounts during OSCON

I’ll be at OSCON on Tuesday, July 17, but you don’t have to find me to get up to 37% off Effective Perl Programming. That’s a slightly lower price than Amazon. To get that discount, you have to buy the book at Pearson’s booth in the exhibition hall. You’ll need to track me down on Tuesday afternoon or evening if you want me to sign your book.

If you can’t make it to OSCON, you can still get 35% off the cover price by ordering directly from the InformIT discount link or using the OSCON2012 discount code when you check out. Instead of navigating their site, you can go directly to our book.

If you’re not sure you want the book, you can look at a free sample chapter, which is also 35% off during OSCON.

Post to Twitter Post to Delicious Post to Digg Post to Facebook Post to Reddit

Declare your Pod encoding

Pod::Simple 3.21 changed its behavior when it encountered a non-ASCII character in Pod without an encoding. Instead of handling it quietly, it now gives a warning. That’s not so bad, but Test::Pod uses Pod::Simple, and whenever it sees a warning, pod_ok fails, as it did in my Mac::Errors module:


#   Failed test 'POD test for blib/lib/Mac/Errors.pm'
#   at .../Test/Pod.pm line 182.
# blib/lib/Mac/Errors.pm (2776): Non-ASCII character seen before =encoding in 'donÍt'. Assuming ISO8859-1
# Looks like you failed 1 test of 2.
t/pod.t ...........
Dubious, test returned 1 (wstat 256, 0x100)
Failed 1/2 subtests

Unfortunately, the Pod tests are the sort that shouldn’t stop an installation, which is why many developers have a separate area for author tests (which I’ll cover in an upcoming Item). Outside of that, you have to fix the Pod.

There are two things here. First, I have a genuine error here. The module is auto generated from other source files and the “donÍt” is a mistake; it should be “don’t” (with a smart quote) or even better, “don’t”. Test::Pod didn’t catch this before. So, that’s not bad.

More importantly, telling Perl that the source code is UTF-8 isn’t enough. When you use the utf8 pragma, the perl interpreter reads the source as UTF-8:

use utf8;

However, a Pod parser ignores all the code. It looks for Pod sections and never sees that pragma, nor does it care. You have to tell the pod which encoding you have if you want to use something outside of ASCII:

=encoding utf8

I hadn’t used that in Mac::Errors, or any of my other modules, although in some of them I had used genuine UTF-8 sequences. Now any person using Test::Pod with the latest Pod::Simple won’t be able to install those modules normally. That is, until I fix them.

I could use other encodings, such as ISO-8859-1, as long as I declare the right thing and save the file correctly.

Things to remember

  • The utf8 pragma doesn’t affect the Pod
  • Pod::Simple assumes ASCII unless you tell it otherwise
  • Declare your Pod encoding with the =encoding directive

Post to Twitter Post to Delicious Post to Digg Post to Facebook Post to Reddit

Hide namespaces from PAUSE

The Perl Authors Upload Server (PAUSE) is responsible for analyzing distributions on their way to CPAN. PAUSE indexes the distributions to discover the package names that it contains so it can add them to the data files that many of the CPAN clients use to figure out what to download to install the module that you request. It also compares the package names that it finds to a list of permissions it maintains.

The mldistwatch program is reponsible for this bit. It tries two things to find the packages in the distribution. If it can read the META.yml (or META.json) to get the data. Otherwise, it examines files directory to look for package declarations. Sometimes these come up with the wrong answers.

First, there’s an easy fix to package statements in code. After ignoring text in Pod or after __END__ or __DATA__, PAUSE looks for package statements appearing on a single line:

# from PAUSE::pmfile::packages_per_pmfile()
           $pline =~ m{
                      (.*)
                      \bpackage\s+
                      ([\w\:\']+)
                      \s*
                      (?: $ | [\}\;] | ($version::STRICT) )
                    }x

If your complete package statement isn’t on a single line, then that won’t match it. Since Perl has insignificant whitespace, including vertical whitespace, you could to this:

package
    hide::this::package;

You might even leave yourself (or other developers) a note about the importance of that whitespace now:

package # hide from pause
    hide::this::package;

Indeed, if you grep CPAN, you’ll find hide from pause in many distributions.

That’s the easy way, although it’s kludgey and relies on a special case in the PAUSE code. Other indexers might not honor it. There’s a better way for you to explicitly tell an indexer what namespaces you want to advertise. You add them to the provides section of META.yml:

provides:
  Cats::Buster:
	file: lib/Cats/Buster.pm
	version: 0.01

The data in come from the META-spec. Module::Build will automatically create these entries in META.yml for you. The indexer can use these to know what’s in the distribution without directly examining module files.

If there are multiple packages declarations, all of them shown up in META.yml:

provides:
  Cats::Buster:
    file: lib/Test/Provides.pm
    version: 0.01
  Cats::Mimi:
    file: lib/Test/Provides.pm
    version: 0,02
  version:
    file: lib/Test/Provides.pm
    version: 0.01

Notice that version shows up in that list. You may have included it in your module to extend or override parts of that core module, but you don't want people who want the real version to install your module to get it. You might only declare that package in your module as a temporary workaround and don't intend it to be a permanent part of the work. They probably wouldn't be able to do that anyway since PAUSE would recognize that you included a package for which you do not have permissions and would not index it. A site such as CPAN Search might mark your otherwise good distribution as "UNAUTHORIZED. Module::Build doesn't know to exclude version, at least not by default.

To hide that package from indexers, you can specify it in no_index. In Build.PL, you can use META_ADD to specify that parts of the META-spec not already supported by other arguments to new:

use Module::Build;

my $builder = Module::Build->new(
	...,
	meta_add => {
		no_index => {
			package   => [ qw( version Local ) ],
			directory => [ qw( t/inc inc ) ],
			file      => [ qw( t/lib/test.pm ) ],
			namespace => [ qw( Local ) ],
			},
		},
);

The directory and file keys tell the indexer to ignore those parts of the distribution. The package tells the indexer to ignore exactly those packages. The curious one is namespace, which tells the indexer to ignore namespaces under that namespace.

Likewise, you can do the same in Makefile.PL with a recent enough version:

use ExtUtils::Makemaker 6.48;

WriteMakefile(
	...,
	META_ADD => {
		no_index => {
			package   => [ qw( version Local ) ],
			directory => [ qw( t/inc inc ) ],
			file      => [ qw( t/lib/test.pm ) ],
			namespace => [ qw( Local ) ],
			},
		},
	);

Otherwise, it examines the module files to find package statements, but it does it without running the code.

But, what if provides and no_index have conflicting instructions? The META-spec doesn't give any guidance for indexers in those cases. PAUSE filters on no_index last. This means that PAUSE and other indexers might leave out files you specify in provides but then exclude in no_index.

Things to remember

  • Spread the package statement over two or more lines to hide it from PAUSE
  • Use provides to advertise the namespaces a distribution comprises.
  • Use no_index to limit what an indexer sees or reports.

Post to Twitter Post to Delicious Post to Digg Post to Facebook Post to Reddit