Perl v5.36 new features

These items are in Perl New Features, a book from Perl School that you can buy on LeanPub or Amazon. Your support helps me to produce more content.


Know if something is a boolean

This is a chapter in Perl New Features, a book from Perl School that you can buy on LeanPub or Amazon. Your support helps me to produce more content.


So far, Perl has not had special boolean values. Perl doesn’t even have a strictly true value; it has specified false values (0, '0', empty string, and undef), and everything that is not false is true. Not only that, by using strings or numbers (or undef) interchangeably and output stringifying everything, Perl can’t tell how it should represent the value in formats that have particular ideas about booleans, but now there’s something that can help with that.

Through builtin, v5.36 adds true and false functions and a way for you to know that a value is a boolean. This allows you to represent these values correctly for other formats.

Before you get too excited, Perl may have these features but your favorite libraries probably aren’t using them yet. Some of those have special ways to represent these, and they’ll probably aren’t going to change since they have to support the old and new ways.

A JSON example

Consider this JSON, which has true and false:

{ "updated": true, "paid": false }

Now look at what happens when you encode plain, non-special Perl values as JSON. Take the Perl true and false values from the results of some comparisons:

use v5.10;
use JSON;

my $true  = (1==1);
my $false = (0==1);

my $data = { "updated" => $true, "paid" => $false };
my $json = encode_json($data);

say $json;

The output shows that Perl created both JSON values as strings, and the false value is the empty string. Perl dumbly stringifies everything as best if can since it doesn’t know about fine-grained types:

{"updated":"1","paid":""}

This means that any external JSON parsers will not correctly find JSON’s true or false values. The string "1" is no more special than any other string:

$ echo '{"updated":"1","paid":""}' |⏎jq '. | select(.updated == true)'

With JSON’s true value, the filter works:

$ echo '{"updated":true,"paid":false}' |⏎jq '. | select(.updated == true)'
{
  "updated": true,
  "paid": false
}

Perl’s JSON module solves this by using special references so it knows that there is a boolean value. Instead of Perl’s squiggly true or false values, it uses references to either 1 or 0, which it then turns into special internal objects:

use v5.10;
use JSON;

my $true  = \1;
my $false = \0;

my $data = { "updated" => $true, "paid" => $false };
my $json = encode_json($data);

say $json;

Now the output is what JSON expects:

{"updated":true,"paid":false}

Going the other way, from JSON to a Perl structure, dumping the values shows how JSON uses special objects:

use v5.10;
use JSON;

my $json = qq({ "updated": true, "paid": false });
my $data = decode_json($json);

use Data::Dumper;
say Dumper($data);

The output shows that JSON uses a special class, JSON::PP::Boolean, where the object value is a reference to Perl’s false or non-false values:

$VAR1 = {
	'updated' => bless( do{\(my $o = 1)}, 'JSON::PP::Boolean' ),
	'paid' => bless( do{\(my $o = 0)}, 'JSON::PP::Boolean' )
	};

This is one of the cures for people who want types. As long as the object can handle the behavior, you don’t care what it actually is.

true and false

Through the new builtin, Perl v5.36 adds the new functions true and false. These are “distinguished” boolean values because they are explicitly for boolean operations. You can use these anywhere you’d use any of Perl’s false or non-false values. For example, in a while conditional:

use v5.36;
use builtin qw(true false);
use experimental qw(builtin);

while( true ) {
	state $count = 0;
	say $count;
	last if $count++ > 3;
	}

Under the hood, the “distinguished” boolean values have an extra BOOL flag. In this program, the first two boolean values are the new special values and the last one is just a plain "1":

use v5.36;
use builtin qw(true false);
use experimental qw(builtin);

use Devel::Peek;

my $from_function = true;
Dump($from_function);

my $from_eq = 'a' eq 'b';
Dump($from_eq);

my $from_one = "1";
Dump($from_one);

Devel::Peek shows that the first two values have a BOOL flag. Compare those to the last dump, which does not have the boolean value:

SV = PVNV(0x7fcd72809850) at 0x7fcd74008b00
  REFCNT = 1
  FLAGS = (IOK,NOK,POK,IsCOW,pIOK,pNOK,pPOK)
  IV = 1
  NV = 1
  PV = 0x106f29ac3 "1" [BOOL PL_Yes]
  CUR = 1
  LEN = 0
SV = PVNV(0x7fcd72809990) at 0x7fcd74008a88
  REFCNT = 1
  FLAGS = (IOK,NOK,POK,IsCOW,pIOK,pNOK,pPOK)
  IV = 0
  NV = 0
  PV = 0x106f29ac5 "" [BOOL PL_No]
  CUR = 0
  LEN = 0
SV = PV(0x7fcbeb00b050) at 0x7fcbeb86e408
  REFCNT = 1
  FLAGS = (POK,IsCOW,pPOK)
  PV = 0x600003bd9930 "1"\0
  CUR = 1
  LEN = 10
  COW_REFCNT = 1

Testing for a boolean

The previous section showed that Perl now tracks a difference between distinguished booleans and the “normal” truthy values. Along with the new true and false values, builtin also provides is_bool so you can tell the difference between these new values ane “normal” values:

use v5.36;
use builtin qw(true false is_bool);
use experimental qw(builtin);

my $from_function = true;
say is_bool($from_function) ? 'boolean' : 'not boolean';

my $from_eq = 'a' eq 'b';
say is_bool($from_eq) ? 'boolean' : 'not boolean';

my $from_one = "1";
say is_bool($from_one) ? 'boolean' : 'not boolean';

The is_bool returns true for the first two, but false for the last one:

boolean
boolean
not boolean

Maintaining boolean status

If Perl can track boolean status, how would a value lose it? Or maybe, gain it? Comparisons return boolean values. Negation (and double negation) return distinguished booleans. Mathematical and string operations lose the BOOL flag. Here’s a program to demonstrate various situations:

use v5.36;
use builtin qw(true false is_bool);
use experimental qw(builtin);

sub check ( $v ) { is_bool( $v ) ? 'boolean' : 'not boolean' }

my $format = "%13s is %s\n";
printf $format, "1",             check(    1 );
printf $format, "0",             check(    0 );
printf $format, "!1",            check(  ! 1 );
printf $format, "!0",            check(  ! 0 );
printf $format, "!!1",           check( !! 1 );
printf $format, "!!0",           check( !! 0 );
printf $format, "1 == 1",        check( 1 == 1 );
printf $format, "'a' eq 'b'",    check( 'a' eq 'b' );
printf $format, "true",          check( true );
printf $format, "false",         check( false );
printf $format, "0 + true",      check( 0 + true );
printf $format, "0 + false",     check( 0 + false );
printf $format, "'' . true",     check( '' . true );
printf $format, "'' . false",    check( '' . false );
printf $format, "true == true",  check( true == true );
printf $format, "true eq false", check( true eq false );

The output shows what has the BOOL flag and what doesn’t:

			1 is not boolean
			0 is not boolean
		   !1 is boolean
		   !0 is boolean
		  !!1 is boolean
		  !!0 is boolean
	   1 == 1 is boolean
   'a' eq 'b' is boolean
		 true is boolean
		false is boolean
	 0 + true is not boolean
	0 + false is not boolean
	'' . true is not boolean
   '' . false is not boolean
 true == true is boolean
true eq false is boolean

Further reading

From the Perl documentation

Slurp a file from the command line with -g

This is a chapter in Perl New Features, a book from Perl School that you can buy on LeanPub or Amazon. Your support helps me to produce more content.


Perl v5.36 adds the -g switch as a shortcut for -0777, which undefines the input record separator so you can read an entire file as a single string. This is often called “slurping”, and is useful when you need to process text that spans several lines.

The input record separator

The input record separator is the character (or characters) that Perl’s line-input operator uses to determine when a line has ended. By default, that’s a newline (U+0010), but you can use any string you like by setting $/ ($INPUT_RECORD_SEPARATOR). Sometimes the form feed is a useful separator for multiline records:

$/ = "\f";

On the command line, the -0 switch is a quick way to set the value for $/. Without a value, it uses the null byte, which is sometimes a useful as a separator:

% perl -MO=Deparse -0 -e 1
BEGIN { $/ = "\000"; $\ = undef; }
'???';
-e syntax OK

A number in octal or hexadecimal sets $/ to some other single character:

% perl -MO=Deparse -0014 -e 1
BEGIN { $/ = "\f"; $\ = undef; }
'???';
-e syntax OK
% perl -MO=Deparse -0xC -e 1
BEGIN { $/ = "\f"; $\ = undef; }
'???';
-e syntax OK

Any number above 0377 octal (more than 255 decimal) sets $/ to undef:

% perl -MO=Deparse -0377 -e 1
BEGIN { $/ = "\377"; $\ = undef; }
'???';
-e syntax OK
% perl -MO=Deparse -0400 -e 1
BEGIN { $/ = undef; $\ = undef; }
'???';
-e syntax OK

Conventionally, though, the Perl documentation has used 777 as the value to get undef probably since it’s easier to remember:

% perl -MO=Deparse -0777 -e 1
BEGIN { $/ = undef; $\ = undef; }
'???';
-e syntax OK

The new -g is a short synonym for -0777, so it does the same thing that :

% perl -MO=Deparse -e 1
'???';
-e syntax OK
% perl -MO=Deparse -g -e 1
BEGIN { $/ = undef; $\ = undef; }
'???';
-e syntax OK

Far more than you ever wanted to know

The -0 switch has some other interesting behavior, and has a few other interesting features. Since I’m already writing about this feature, I might as well keep going.

Single-character line ending

You can use an octal or hexadecimal number after the -0 to choose the single character that you want to use as the line ending. I’ve often used the form feed (U+000C) to separate multi-line records. The particular character doesn’t matter as long as it doesn’t appear in the data (so the null byte might be useful too):

% perl -le "print qq(one\ntwo\nthree\n\fA\nB\nC\n\f9\n10\n11\n)" > formfeed.txt

When you read F by lines with no change to the input record separator, you see the three records separated by “blank” lines, which are really the form feed:

% perl -ne 'print' formfeed.txt
one
two
three

A
B
C

9
10
11

You can see that easier when you replace the invisible characters with their ordinal values, which you do in octal here:

% perl -pe 's/(\P{Print})/sprintf(q(%03o),ord($1)) . "\n"/eg' formfeed.txt
one012
two012
three012
014
A012
B012
C012
014
9012
10012
11012
012

When you use the octal value of the form feed for the number after the -0 switch and output lines surrounded by angle brackets, you get three lines (with the newlines and line-ending form feed in tact):

% perl -014 -ne 'print qq(<$_>)' formfeed.txt
<one
two
three

><A
B
C

><9
10
11

>

You could have also specified this with three digits, -0014, or as hexadecimal with a leading x, like -0xC. The hexadecimal version is valuable when you need to specify a character past the largest single octet value you can get out of three octal digits, which is 0377.

There’s a catch though. If you want to set the input record separator to a wide character, you need to ensure that you read the input correctly. For the ☃ (U+2603 SNOWMAN) to be the separator, which takes up three octets in UTF-8, you need to read the input as UTF-8 too. The -C is one way to do that:

% perl -0x2603 -C -ne 'print qq(<$_>)' snowmen.txt >>

You aren’t able to specify multiple characters as a line separator since B thinks the extra characters are a file for input:

% perl -MO=Deparse -0x0100x2603 -e
No Perl script found in input

Slurping an entire file

If you specify an octal value 400 or higher, which is more than 8 bits, Perl sets the input record separator to undef. With no defined value for $/, Perl slurps the entire input. But, this is different than setting the empty string (a defined value), which I write about in the next section.

You’ve probably seen -0777, perhaps the most common use of -0:

% perl -0777 -ne 'print qq(<$_>)' dog.txt
<Newfoundland
Golden Retreiver
Boxer
>

That F is actually read through the ARGV filehandle, which does some trickery to make it look like all the input is coming from one source. However, the line input operator can’t read across the command line files; B figures out when one file is empty, closes it, then opens the next file. So, each file appears to be its own line:

% perl -0777 -ne 'print qq(<$_>)'⏎dog.txt cat.txt lizard.txt
<Newfoundland
Golden Retreiver
Boxer
><Tabby
Marmalade
Tiger
><Monitor
Iguana
Godzilla
>

If you wanted all the files to be one lines, route them through standard input before they get to B. This only looks like a useless use of B:

% cat dog.txt cat.txt lizard.txt |⏎perl -0 -ne 'print qq(<$_>)'

=head1 Paragraph mode

“Paragraph mode” is a special case. The -00 sets the input record separator to the empty string. That’s different than the undefined value even though both are false:

% perl -MO=Deparse -00 -e 1
BEGIN { $/ = ""; $\ = undef; }
'???';
-e syntax OK

When the input record separator is the empty string, B treats it as if it is multiple consecutive newlines. This has the same effect as if the input record separator were the pattern \n+ Not only that, put it collapses the multiple newlines to exactly two newlines:

% perl -00 -ne 'print qq(<$_>)' paras.txt
<First line first para
Second line first para
Third line first para

><After first blank line
Second line after first blank line
Third line after first blank line

><After 2nd blank line
2nd line after 2nd blank line
3rd line after 2nd blank line
>

Summary

Here’s a quick summary of the various incantations of the -0 switch:

Switch Input Record Separator Note
-0 \000 null byte
-00 empty string, but “\n+” paragraph mode
-0014 8-bit character, in octal form feed
-0xC 8-bit character, in hex form feed
-0400 undef, above 8-bit slurp
-0777 undef, idiomatic slurp
-g undef slurp, new in v5.36
-0x1FF \777 character, include -C actual \777
-0x2603 wide character, include -C snowman

From the Perl documentation

Automatically turn on warnings

This is a chapter in Perl New Features, a book from Perl School that you can buy on LeanPub or Amazon. Your support helps me to produce more content.



Perl v5.36 automatically turns on warnings when you specify the minimum Perl version with use:

use v5.36;  # use warnings for free

Since this form has been turning on strictures since v5.12 (Implicitly turn on strictures with Perl 5.12), you no longer have to specify warnings or strict at the top of my program.

Continue reading “Automatically turn on warnings”

Iterate over multiple elements at the same time

This is a chapter in Perl New Features, a book from Perl School that you can buy on LeanPub or Amazon. Your support helps me to produce more content.


This feature was promoted to a stable version in v5.40.

Perl v5.36 adds experimental support that allows a foreach (or for) to loop iterate over multiple values at the same time by specifying multiple control variables. This is incredibly cool:

use v5.36;
use experimental qw(for_list);

my @animals = qw( Buster Mimi Ginger Nikki );
foreach my( $s, $t ) ( @animals ) {
	say "$s ^^^ $t";
	}

The output shows two iterations of the loop, each which grabbed two values from the list:

Buster ^^^ Mimi
Ginger ^^^ Nikki

Add another parameter; the list now doesn’t divide evenly between the parameters, so any parameter that can’t match with a list item gets undef, just like normal list assignment:

use v5.36;
use experimental qw(for_list);

foreach my( $s, $t, $u ) ( @animals ) {
	say "$s ^^^ $t ^^^ $u";
	}

Since use v5.36 also turns on warnings, you get those “uninitialized” warnings for free when you use those undef values:

Buster ^^^ Mimi ^^^ Ginger
Nikki ^^^  ^^^
Use of uninitialized value ...
Use of uninitialized value ...

Another interesting use combines the new builtin::indexed feature that gets you the index and value at the same time:

use v5.36;
use experimental qw(for_list builtin);
use builtin qw(indexed);

my @animals = qw( Buster Mimi Ginger Nikki );
foreach my( $i, $value ) ( indexed(@animals) ) {
	say "$i: $value";
	}

That’s a bit nicer than going through the indices to access the value in an additional statement:

foreach my $i ( 0 .. $#animals ) {
	my $value = $animals[$i];
	say "$i: $value";
	}

No placeholders (yet)

So far, this new syntax doesn’t have a way to skip values. In a normal list assignment, you discard a value coming from the right hand list with a literal undef:

my( $s, undef, $t ) = @animals

Try that in the for list and you get a syntax error:

foreach my( $s, undef, $u ) ( @animals ) {  # ERROR!
	say "$s ^^^ $u";
	}

Hash keys and values

I’m tempted to use this for hashes, although each inside a while is still probably better since it doesn’t have to build the entire input list in one go:

use experimental qw(for_list);

my %animals = (
	cats => [ qw( Buster Mimi Ginger ) ],
	dogs => [ qw( Nikki ) ],
	);

foreach my( $k, $v ) ( %animals ) {
	say "$k ^^^ @$v";
	}

Since those hash values are array refs, it would be helpful if this feature could use the refaliasing and declared_refs features (Mix assignment and reference aliasing with declared_refs):

use experimental qw(for_list);
use experimental qw(refaliasing declared_refs);

my %animals = (
	cats => [ qw( Buster Mimi Ginger ) ],
	dogs => [ qw( Nikki ) ],
	);

foreach my( $k, \@v ) ( %animals ) {
	say "$k ^^^ @v";
	}

Sadly, the parser doesn’t expect the reference operator inside that for list:

syntax error ... near ", \"

Doing

Prior to builtin multiple iteration, the best way to do the same thing was probably the List::MoreUtils (not part of core) module. The natatime function, which I wished was named n_at_a_time, grabs the number of elements that you specify and returns them as a list. Since it returns a list instead of an array reference, it’s easier to use it with a while:

use List::MoreUtils qw(natatime);

my @x = ('a' .. 'g');
my $iterator = natatime 3, @x;

while( my @vals = $iterator->() ) {
	print "@vals\n";
	}

Another approach uses splice. The easiest thing might be to do it destructively since that requires no index fiddling:

my @x = 'a' .. 'g';
my @temp = @x;

while( my @vals = splice @temp, 0, 3, () ) {
	print "@vals\n";
	}

Here’s an example from the L documentation that does the same thing:

sub nary_print {
  my $n = shift;
  while (my @next_n = splice @_, 0, $n) {
	say join q{ -- }, @next_n;
  }
}

nary_print(3, qw(a b c d e f g h));
# prints:
#   a -- b -- c
#   d -- e -- f
#   g -- h

Playing with the array indices can get this done, but it comes with a lot of baggage. First, an array slice doesn’t return an empty list, so you can’t use that as a condition in the while as in the previous examples. Since it fills in the missing elements with undef, outputting the values possibly comes with warnings. Even if you want to accept those annoyances, you still have to manage the end of array condition ($#X) yourself:

my @x = 'a' .. 'g';

my $start = 0;
my $n     = 3;

while( $start <= $#x ) {
	no warnings qw(uninitialized);
	my @vals = @x[$start, $start + $n - 1];
	print "@vals\n";
	$start += $n;
	}

So yeah, having a multiple iterator feature built into Perl is a huge win.

Summary

The experimental for_list feature lets you take multiple elements of the list in each iteration. This doesn't yet handle many of the list assignment features that would make this as useful as people will want it to be.

From the Perl documentation

  1. perlsyn