perl – Page 12 – The Effective Perler

Implicitly turn on strictures with Perl 5.12

Perl 5.12 can turn on strict for you automatically, stealing a feature from Modern::Perl that takes away one line of boilerplate in your Perl programs and modules. We talk about strict in Item 3: Enable strictures to promote better coding. Similar to what we show in Item 2: Enable new Perl features when you need them, to turn strictures on automatically, you have to use use with a version of Perl 5.11.0 or later: Continue reading “Implicitly turn on strictures with Perl 5.12”

Turn off Perl 5.12 deprecation warnings, if you dare!

Perl 5.12 deprecates several features, for various reasons. Some of the features were always stupid, some need to make way for future development, and some are just too ornery to maintain. All of these are listed in the perldelta5120 documentation. The new thing, however, is that Perl 5.12 will warn you about these even if you don’t have warnings turned on. Consider this script full of Perl whoppers: Continue reading “Turn off Perl 5.12 deprecation warnings, if you dare!”

Match Unicode characters by property value

A Unicode character has properties; it knows things about itself. Perl v5.10 introduced a way to match a character that has certain properties that v5.10 supports. In some cases you can match a particular property value. Now v5.12 allows you can match any Unicode property by its value. The newly-supported ones include Numeric_Value and Age, for example:

\p{Numeric_Value: 1}
\p{Nv=7}
\p{Age: 3.0}

Continue reading “Match Unicode characters by property value”

Watch out for side effects with `use VERSION`

Item 83: Limit your distributions to the right platforms mentioned that use might invoke side effects. We didn’t get into the details in that Item though. As of Perl 5.10, use imports some feature that you might not want.

Merely specifying a Perl version prior to 5.10 does nothing other than check the version you specify against the interpreter version. If the version you specify is equal to or greater than the interpreter version, your program continues. If not, it dies:

use 5.008;  # needs perl5.008000 or later

This works with require too:

require 5.008;  # needs perl5.008000 or later

However, use is a compile-time function and require is a run-time function. By the time you hit that require, perl has already compiled your program up to that point or died trying as it ran into unknown features. Code may have already run, despite using an inappropriate version of perl. You want to impose your version restriction as soon as possible, so use is more appropriate since it happens earlier.

You might think that you can fix this with a BEGIN block which compiles and immediately runs the code so you get the ordering right. This gets the version check at compile time even though it’s a runtime statement:

BEGIN { require v5.10; }

In early versions of v5.10, this still imported new features, but this bug has been fixed. See BEGIN {require 5.011} imports features.

You should use at least v5.10.1 because it fixes various issues with smart match. That version doesn’t automatically import the new features if you use require. Either of these specify that version:

use v5.10.1;
BEGIN { require v5.10.1; }

use 5.010

With Perl 5.10, you get three side effects with use v5.10. Starting with that version, use-ing the version also pulls in the new features for that version. Obstensibly, that keeps programs designed for earlier versions breaking as newer perls add keywords, but it also tries to enforce the current philosophy of good programming on you.

Perl 5.10 introduces say, state, and given-which, which you import implicitly when you say use v5.10.1:

use v5.10.1;
say 'I can use Switch!';  # imported say()

given ($ARGV[0]) {        # imported given()
	when( defined ) { some_sub() }
	};

sub some_sub {
	state $n = 0;         # imported state()
	say "$n: got a defined argument";
	}

If you want to insist on v5.10 without its new features, perhaps because your code uses some of the same keywords already, you can unimport the side effects immediately with the new feature pragma:

use v5.10.1;     # implicit imports
no feature;    # take it right back again

# you're own version of say()
sub say {
	# something that you want to do
	}

If you only want some of the new features, you can unimport the ones that you don’t want:

use v5.10.1;
no feature qw(say);   # leaves state() and given()

sub say {
	# something that you want to do
	}

use 5.012

Perl 5.12 includes two more side effects for use VERSION. The unicode_strings feature treats all strings outside of bytes and locale scopes as Unicode strings. Additionally, use v5.12 automatically turns on strict:

use v5.12;
# now strictures are on

$foo = 1;   # compile-time error!

If, for some odd and dangerous reason you don’t want strict on by default, you can turn it off yourself, even though unimporting it doesn’t give you the warning that you’ve left the paved roads, you’ve just violated your rental car contract, and there’s a chainsaw massacrer waiting for you:

use v5.12;
no feature;
no strict;

my $foo = 1;   

$fo0++;  # sure, go ahead and make that error

A workaround to restrict `perl` versions

You can restrict the version more tightly by checking the value of the $] variable, just like the various examples you saw in Item 83:

BEGIN {
	die "Unsupported version" 
		unless $] >= 5.010 and $] < 5.011
	}

This has the added benefit of restricting the upper acceptable perl version. It works on older Perls too.

Things to remember

use VERSION imports new features since Perl 5.9.5.
BEGIN { require VERSION } still imports new features (fixed in later versions of v5.10 and v5.12)
Use no feature or no strict to unimport unwanted features.
Restrict the perl version with $].

Use each() on an array in Perl 5.12.

Before Perl 5.12, each only took a hash argument. In list context, it returns a two item list of a key-value pair you had not seen yet (unless you changed the hash in some way that re-ordered it): Continue reading “Use each() on an array in Perl 5.12.”

Perl 5.12 new features

Perl 5.12.1 is out, which is the sign that it’s time for normal users to pay attention to it: that first point release should have sanded down all the rough edges. As usual, the complete list of major changes is in the perldelta5.12.0 documentation, we’ll cover some more of the interesting features in The Effective Perler in the coming weeks. Our initial list of user-interesting features include: Continue reading “Perl 5.12 new features”

Know how Perl handles scientific notation in string to number conversions.

A recent question on Stackoverlow asked about the difference between the same floating numbers being stored in scientific notation and written out. Why does 0.76178 come out differently than 7.6178E-01 When Perl stores them, they can come out as slightly different numbers. This is related to the perlfaq answer to Why am I getting long decimals (eg, 19.9499999999999) instead of the numbers I should be getting (eg, 19.95)?, but a bit more involved. You’ll see how to skip the whole mess at the end, but be patient. Continue reading “Know how Perl handles scientific notation in string to number conversions.”

Manage your Perl modules with git

In Item 110: Compile and install your own perls, you saw how to install multiple versions of perl and to maintain each of the installations separately. Doing something with one version of Perl doesn’t affect any of the other versions.

You can take that a step further. Within each installation, you can use a source control system to manage your Perl modules. In this post you’ll use git, which has the advantage that you don’t need a server.

First, install your perl into its own directory:

% ./Configure -des -Dprefix=/usr/local/perls/perl-5.10.1
% make test
% make install

Second, before you do anything else with your newly installed perl, put your new directory into source control:

% cd /usr/local/perls/perl-5.10.1
% git init
% git add .
% git commit -a -m "Initial installation of Perl 5.10.1"

You’re not quite done there, though. You’re on the master branch:

% git branch
* master

You want to keep at least one pristine branch that is the initial state of your perl installation. You can always come back to it:

% git checkout -b pristine
Switched to a new branch 'pristine'

Leave that branch alone and switch back to master:

% git checkout master
Switched to branch 'master'

From here you can do many things, but you probably want to consider the master branch your “stable” branch. You don’t want to commit anything to that branch until you know it works. When you install new modules, use a different branch until you know you want to keep them:

% git checkout -b unstable
Switched to a new branch 'unstable'
% cpan LWP::Simple
% git add .
% git commit -a -m "* Installed LWP::Simple"

After using your newly installed modules for awhile and deciding that it’s stable, merge your unstable with master. Once merged, switch back to the unstable branch to repeat the process:

% git checkout master
Switched to a new branch 'master'
% git merge unstable
% git checkout unstable

Anytime that you want to start working with a clean installation, you start at the pristine branch and make a new branch from there:

% git checkout pristine
% git checkout -b newbranch

If you aren’t tracking your perl in source control already, just tracking a master and unstable branch can give you an immediate benefit. However, you can take this idea a step further.

With just one perl installation, you can create multiple branches to try out different module installations. Instead of merging these branches, you keep them separate. When you want to test your application with a certain set of modules, you merely switch to that branch and run your tests. When you want to test against a different set, change branches again. That can be quite a bit simpler than managing multiple directories that you have to constantly add or remove from @INC.

Know what creates a scope

Scopes can be confusing. Perl 5 introduced lexical, or my, variables that are only visible in the scope in which you define them. To properly scope your variables, you need to know what can define a scope and what doesn’t.

You commonly see lexical variables for subroutine arguments, for instance:

sub foo {
    my( $self, @args ) = @_;
    ...;
    }

The variables $self and @args don’t exist outside of that subroutine (ignoring black magic with things such as PadWalker). Lexicals variables have limited effect and no action at a distance, making them invaluable for robust programming. Not only that, but since the lexical variable names only matter in their scope, you don’t have to know about all of the variables that you have already defined to choose variable names in your scope.

Before Perl 5, all variables were package variables (so, global). Perl 5 couldn’t just ignore all of the existing Perl 4 programs, so it ended up supporting both the global package variables and lexical variables. That can make things confusing if you don’t understand the difference.

First, you need to know what makes a scope. Most people can give you at least one answer: a block creates a scope. Blocks show up in the syntax of many of Perl’s commonly used features:

# a subroutine definition block, perhaps anonymous
sub foo { ... }
my $foo = sub { ... };

# blocks for control stuctures
foreach ( @array ) { ... }
while( $condition ) { ... }
if( $condition ) { ... }

# blocks related to functions:
my $result = do { ... };
my @transformed = map { ... } @input;
my @filtered = grep { ... } @input;

# blocks in regular expressions
m/(?{...})/

Sometimes you can create the lexical variable outside of the block even though it’s scoped to the block. You can declare the lexical variable in the the test for while or if (and cousins), or as the control variable you want to use with foreach:

foreach my $index ( 0 .. 5 ) {
	print "index: $index\n";
	}

while( my $line = <DATA> ) {
	print "line: $line";
	}

if( my $foo = 'abc' ) {
	print "foo is $foo\n";
	}

You don’t need a control structure or operator to use a block to define the scope. You can use a bare block to create a scope:

# bare blocks
{
my $cat = 'Buster';
...;
}

Most Perler’s could identify blocks as scope definers, but there’s another scope definer that many people miss. File this away for your job interview trivia: a file is a scope too. You can’t see lexical variables outside of the file in which you define them, even if you don’t explicitly create the scope with a block. It’s as if there is a virtual block around the entire file.

You can use the file scope to create private class variables. The methods you define in the same file can see the private variables, but code in other files, such as subclasses, can’t mess with them:

package Some::Class;

my $private = 0; # only visible in this file

sub some_method {
   ...; # can see $private
   }

If you want other parts of the program to get or set the value in this private variable despite its scope, you can provide accessor methods. This gives you a chance to head off any shenanigans before you allow someone to change the value:

package Some::Class;

my $private = 0;

sub get_private { $private }
sub set_private { $private = $_[1] }

Some people extend the idea of private class variables too far because they think that a package creates a scope. It doesn’t. A package merely defines the default package unless you explicitly specify one. Since lexical variables aren’t connected to packages, they don’t care want the current package is. If you change the package, even if it’s in another block:

package Some::Class;

my $n = 'Can you see me?';

{
package main;
# $n still visible here
}

package Some::Class::Subclass;

# $n still visible

There are some more tricks with scopes and what constitutes a scoped variable, but that’s a matter for a future Item.

Things to remember

Lexical variables are only visible in their scope.
A block defines a scope.
A file defines a scope.
A package does not define a scope.

Memory-map files instead of slurping them

The conventional wisdom for slurping a file into a Perl program is to actually load the file into a program. We showed some of these in Item 53: Consider different ways of reading from a stream.

There are several idioms for doing it, from doing it yourself:

my $text = do { local( @ARGV, $/ ) = $file; <> };

or using an optimized module such as File::Slurp.

use File::Slurp qw(read_file);

my $text = read_file( $file );

Given a large file, say, something that is 2 GB, you end up with a memory footprint that is at least the file size. This program to load a 2 GB file took 11 seconds to load the file on my Mac Pro. The memory footprint rose to 2.25 GB and stayed there even after $text went out of scope:

#!/usr/bin/perl
use strict;
use warnings;

print "I am $$\n";

use File::Slurp;

{
my $start = time;
my $text = read_file( $ARGV[0] );
my $loadtime = time - $start;
print "Loaded file in $loadtime seconds\n";

my $count = () = $text =~ /abc/;

print "Found $count occurances\n";
}

print "Press enter to continue...";

<STDIN>;

The problem is in the concept that you have to somehow capture and retain control of the data to make use of it.

To solve this, you should avoid the painful part. That is, don’t load the file at all. That I/O is really slow! You can memory-map, or mmap, the file. The name comes from the system call that makes it possible.

Instead of loading the file, you use mmap to make a connection between your address space and the file on the disk. You don’t have to worry about how this happens, but basically you use part of a disk file as if it was actually in memory. The advantage is that you don’t have the I/O overhead, so there is no load time, and since you don’t have to make space to hold the file in memory, you don’t pay a memory footprint.

This program use File::Map, you “load” the file instantly and it’s actual memory footprint was under 3 MB (three orders of magnitude less!):

#!/usr/bin/perl
use strict;
use warnings;

use File::Map qw(map_file);

print "I am $$\n";

{
my $start = time;
map_file my $map, $ARGV[0];
my $loadtime = time - $start;
print "Loaded file in $loadtime seconds\n";

my $count = () = $map =~ /abc/;

print "Found $count occurances\n";
}

<STDIN>;

The $map acts just like a normal Perl string, and you don’t have to worry about any of the mmap details. When the variable goes out of scope, the map is broken and your program doesn’t suffer from a large chunk of unused memory.

In Tim Bray’s Wide Finder contest to find the fatest way to process log files with “wider” rather than “faster” processors, the winning solution was a Perl implementation using mmap (although using the older Sys-Mmap). Perl had nothing special in that regard because most of the top solutions used mmap to avoid the I/O penalty.

The mmap is especially handy when you have to do this with several files at the same time (or even sequentially if Perl needs to find a chunk of contiguous memory). Since you don’t have the data in real memory, you can mmap as many files as you like and work with them simultaneously.

Also, since the data actually live on the disk, different programs running at the same time can share the data, including seeing the changes each program makes (although you have to work out the normal concurrency issues yourself). That is, mmap is a way to share memory.

The File::Map module can do much more too. It allows you to lock filehandles, and you can also synchronize access from threads in the same process.

If you don’t actually need the data in your program, don’t ever load it: mmap it instead.

use 5.010

use 5.012

A workaround to restrict perl versions

Things to remember

Things to remember

A workaround to restrict `perl` versions