Turn off Perl 5.12 deprecation warnings, if you dare!

Perl 5.12 deprecates several features, for various reasons. Some of the features were always stupid, some need to make way for future development, and some are just too ornery to maintain. All of these are listed in the perldelta5120 documentation. The new thing, however, is that Perl 5.12 will warn you about these even if you don’t have warnings turned on. Consider this script full of Perl whoppers:

use 5.012;

use Switch;             # use given-when
use UNIVERSAL qw(can);  # no more UNIVERSAL->import

$[ = 1;       # I like FORTRAN.
my $pi := 4;  # empty attribute, not a special assignment

OUTER: {
	goto INNER;  # can't jump into inner scope
	...;
	MIDDLE: {
		...;
		INNER: {
			hello();
			}
		}
	}

sub hello :locked { # no more locked attribute
	say 'Here I am!';
	}

When you run this program, you get some warnings:

% perl5.12.1 old_stuff.pl
Use of assignment to $[ is deprecated at old_stuff.pl line 6.
Use of := for an empty attribute list is deprecated at old_stuff.pl line 7.
Use of :locked is deprecated at old_stuff.pl line 20.
Use of "goto" to jump into a construct is deprecated at old_stuff.pl line 9.
Here I am!

If you enable warnings, you get even more deprecation warnings:

% perl5.12.1 -w old_stuff.pl
Switch will be removed from the Perl core distribution in the next major release. Please install it from CPAN. It is being used at old_stuff.pl, line 3.
UNIVERSAL->import is deprecated and will be removed in a future perl at old_stuff.pl line 4
Use of assignment to $[ is deprecated at old_stuff.pl line 6.
Use of := for an empty attribute list is deprecated at old_stuff.pl line 7.
Use of :locked is deprecated at old_stuff.pl line 20.
Use of "goto" to jump into a construct is deprecated at old_stuff.pl line 9.
Here I am!

If you still want to use these features, despite Perl doing everything it can to warn you not to, you can explicitly turn off the deprecation warning class:

use 5.012;
no warnings 'deprecated';

# same as before...

Now you just get the output that you wanted:

$ perl5.12.1 -w old_stuff.pl
Here I am!

You shouldn’t turn off these warnings as a long term strategy. If you’re migrating your ancient Perl to the latest version and want these warnings to temporarily disappear while you focus on some other things, we can let that pass. You might want to give yourself a reminder that you’ve turned off all of these important warnings by checking for warnings as we showed in Item 100: Use lexical warnings to selectively turn on or turn off complaints

use 5.012;
no warnings ‘deprecated’;
temp_warning();

sub temp_warning {
	# needs to be one level lower than the warnings setting
	warn “Hey Evel Knievel! Deprecation warnings are disabled!” unless
		warnings::enabled( ‘deprecated’ );
	}

…;

It's important that you put your check inside a subroutine because warnings::enabled is specifically designed to look on level above where you call it since it's expecting you to use it in a module to respect the state of the calling script.

Since the warnings pragma is lexically scoped, you might have to do this in several places (unless the modules respect the warnings settings of the caller!). Don't expect anyone to rush to make it any easier for you to disengage the safety devices, though!

Post to Twitter Post to Delicious Post to Digg Post to Facebook Post to Reddit

Locate bugs with source control bisection

As you work in Perl you store each step in source control. When you finish a little bit of work, you commit your work. Ideally, every commit deals with one thing so you’re only introducing one logical change in each revision.

Somewhere along the process, you might discover that something is not working correctly. You think that it used to work but you’re not sure where things went pear-shaped, perhaps because the bug seemingly deals with something that you weren’t working on.

You might think that your test suite should catch those, but a test suite doesn’t protect you from bugs. Your tests might not have checked for the problem, or your tests might have been wrong, or all sorts of other things that you don’t expect. Your tests are only as good as you make them, and often the programmer creates his own tests and acts as his own quality control (which isn’t the best of arrangements).

Since you’re keeping your work in source control (we’ll give you the benefit of the doubt), you can easily, at least in process, figure out where the problem appears by bisecting your source tree until you find the revision that introduces the problem:

  1. check out a revision where things don’t work
  2. verify broken behavior with a test script
  3. check out a revision where you think things work
  4. verify working behavior with a test script
  5. check out a revision halfway between the working and broken versions
  6. run the test script to check if that revision works or breaks
  7. repeat this bisection until you find revision that breaks

With each iteration, you narrow the window of revisions where you might have introduced the problem.

This is simple in process, but it’s tedious in practice. With any source control system, you can manually checkout a revision, run your check script, and see what happens. You can keep doing that until you find the problem. You can even automate it.

Some modern source control systems, however, provide a bisection feature for you so you don’t have to automate it yourself. For this Item, consider git, which is popular in the Perl community (and the perl source is in a git repository). Read perl5-porters for a couple of days and you’re likely to read about some of the developers using git bisect to find a bug in perl.

As a demonstration, you can use the Buster::Bean git repository. In that module which simulates my cat, there’s a complaino subroutine that should return a string that has “meow” in it, but somewhere along the line it broke. The t/export.t test checks for this, and you notice in the latest revision that the t/export.t test doesn’t pass.

# t/export.t
like( complaino(), qr/meow/i, 'complaino returns something like a meow' );

The current version of complaino is broken, but you don’t remember when it broke:

# lib/Bean.pm
sub complaino {
	return 'MEOOOOOW!' # this versions is broken
	}

Once you unpack the distribution, change into the Buster-Bean directory if you want to follow along.

To start your hunt, you tell git that you are starting a bisection:

% git bisect start

Next you have to set the initial window. As with many things in git, you can specify the revision in various way. In this case, you can use the start of the SHA-1 digest for each commit. You set the good and bad bounds of the window:

% git bisect good 4be027af
% git bisect bad 14883968

Once you’ve set the window for your bisection, you run the bisection by specifying a test script to run for each commit that git will check:

% git bisect run ./test-script.sh

The trick is to write a test script that tells git if the revision works or not. If your test script exits with 0, you’re telling git that the revision works. If your test script exits with 1-127, that commit fails (although exiting with 125 tells git to skip that revision).

How you write your test script depends on what you want to check. In the case of complaino, you want to find where the t/export.t test starts to fail. You might think that you can just run the your test script:

% git bisect run perl -Iblib/lib t/export.t

However, that might not put the right versions of the modules in blib. In this case, you need to run Makefile.PL each time then make the source (or ./Build it to put everything in the righ place. With the right files, you run the program that provides the final exit code to tell git what happened:

#!/bin/sh

perl Makefile.PL
make
perl -Iblib/lib t/export.t

Now you’re to run the bisection:

% git bisect run ./meow_test.sh

The output shows git‘s progress through the revision history. The first revision it tests is in the middle of the the window you specified. In this case, that one passes, so it’s now the good bound of the window. The next revision is 93d1747, where t/export.t fails. That narrows the window on the bad side. The rest of the bisection tries revisions that all pass, so 93d1747 must be the revision that introduces that brokenness. git reports 93d1747... is first bad commit:

running ./meow_test.sh
Writing Makefile for Buster::Bean
Skip blib/lib/Buster/Bean.pm (unchanged)
Manifying blib/man3/Buster::Bean.3
1..4
ok 1 - use Buster::Bean;
ok 2 - Buster::Bean->can('complaino')
ok 3 - complaino subroutine is defined
ok 4 - complaino returns something like a meow

Bisecting: 3 revisions left to test after this
[93d1747e4681ea21536a66aba25bb21e1cddda05] Make uppercase
running ./meow_test.sh
Writing Makefile for Buster::Bean
cp lib/Bean.pm blib/lib/Buster/Bean.pm
Manifying blib/man3/Buster::Bean.3
1..4
ok 1 - use Buster::Bean;
ok 2 - Buster::Bean->can('complaino')
ok 3 - complaino subroutine is defined
not ok 4 - complaino returns something like a meow
#   Failed test 'complaino returns something like a meow'
#   at t/export.t line 10.
#                   'MEOOOOOW!'
#     doesn't match '(?i-xsm:meow)'
# Looks like you failed 1 test of 4.

Bisecting: 1 revisions left to test after this
[980be46fa2ea3ee32372aedd62949a663c729058] * Use fewer exclamation points
running ./meow_test.sh
Writing Makefile for Buster::Bean
cp lib/Bean.pm blib/lib/Buster/Bean.pm
Manifying blib/man3/Buster::Bean.3
1..4
ok 1 - use Buster::Bean;
ok 2 - Buster::Bean->can('complaino')
ok 3 - complaino subroutine is defined
ok 4 - complaino returns something like a meow

Bisecting: 0 revisions left to test after this
[b5ab15611ca25c6925998463c9cb7b079fe87c8b] * Make it lowercase
running ./meow_test.sh
Writing Makefile for Buster::Bean
cp lib/Bean.pm blib/lib/Buster/Bean.pm
Manifying blib/man3/Buster::Bean.3
1..4
ok 1 - use Buster::Bean;
ok 2 - Buster::Bean->can('complaino')
ok 3 - complaino subroutine is defined
ok 4 - complaino returns something like a meow

93d1747e4681ea21536a66aba25bb21e1cddda05 is first bad commit
commit 93d1747e4681ea21536a66aba25bb21e1cddda05
Author: brian d foy 
Date:   Mon Jul 19 20:43:07 2010 -0500

    Make uppercase

:040000 040000 8360932159317f0685b98f2b2dba4753c53e6240 0e7a2e5d54de7db9f1b790ed4da0a704612ba130 M  lib
bisect run success

Graphically, that bisection looks like this, starting at the top and going toward the bottom for ① then on its way back toward the top for ②, alternating directions as it closes in on the bad revision:

When you are done with the your bisection, you need to tell git to return to the head of the source tree:

% git bisect reset

Other source control systems have a bisection feature which do basically the same thing although their details might be different. If your source control system doesn’t have this feature, you can automate it (or switch to a system that does have it). Once set-up, you should be be able to locate bugs much quicker.

Post to Twitter Post to Delicious Post to Digg Post to Facebook Post to Reddit

Perl Authors Night at Powell’s Technical Books

During OSCON, Joshua and I are taking part in the Perl Authors Night at Powell’s Technical Books on Tuesday, July 20 at 7 pm. Bring your copy of Effective Perl Programming for us to sign.

Other authors confirmed so far include:

  • chromatic (Modern Perl, Perl Testing: A Developer’s Notebook, Perl Hacks, Extreme Programming Pocket Guide)

  • brian d foy (Effective Perl Programming, Learning Perl, Intermediate Perl, Mastering Perl)

  • Joshua McAdams (Effective Perl Programming)

  • Curtis “Ovid” Poe (Perl Hacks)

  • Randal Schwartz (Programming perl (1st edition), Learning Perl, Intermediate Perl, Perls of Wisdom)

  • Peter Scott (Perl Medic, Perl Debugged, Perl Fundamentals (DVD))

Make sure that you’re going to Powell’s Technical Books is at 33 Northwest Park Avenue in Portland, and not one of their other Portland stores. You can take the MAX Green line (for free) from the Convention Center to NW 5th St and NW Couch St, then walk 4 blocks west to the store.

Post to Twitter Post to Delicious Post to Digg Post to Facebook Post to Reddit

Keep your programmatic configuration DRY

A common mantra among programmers today is to keep your code DRY. This little acronym stands for “Don’t Repeat Yourself” and serves as a reminder that when you see a repetitive pattern in your code or are tempted to copy/paste some statements, you should think twice and consider extracting the common logic into a chunk of code that can be reused.

For many programmers, this practice begins to break down when “configuration” code is involved. When I talk about configuration code here, I’m not talking about the XML, YAML, INI, and other non-moving bits of your project. I’m talking about the Perl code in your program that simply serves as data to feed some active portion of your code. For example:

my @anchors = (
  {
    text => 'Effective Perl',
    href => 'http://www.effectiveperlprogramming.com'
  },
  {
    text => 'Perl',
    href => 'http://www.perl.org'
  },
);

for my $anchor (@anchors) {
  create_anchor_tag($anchor);
}

Here I have avoided repeating myself by consolidating all of the common logic for anchor processing into create_anchor_tag. My code loops the configuration data structure and simply feeds a little bit of configuration at a time into that subroutine.

This type of configuration works most of the time, especially when code is first written. Unfortunately, it breaks down quickly as the configuration becomes more complex over time. Say for instance that we now have more anchors and need to be more specific about the details of each anchor.

my @anchors = (
  {
    text  => 'Effective Perl',
    href  => 'http://www.effectiveperlprogramming.com',
    class => 'standard',
    style => '',
    lang  => 'en',
    dir   => 'ltr'
  },
  {
    text  => 'Perl',
    href  => 'http://www.perl.org',
    class => 'standard',
    style => '',
    lang  => 'en',
    dir   => 'ltr'
  },
  {
    text  => 'Perl Foundation',
    href  => 'http://www.perlfoundation.org',
    class => 'standard',
    style => '',
    lang  => 'en',
    dir   => 'ltr'
  },
  {
    text  => 'Perl6 России',
    href  => 'http://www.perl6.ru',
    class => 'standard',
    style => '',
    lang  => 'ru',
    dir   => 'ltr'
  },
  # ... and so on
);

You can see, that beyond simply repeating the hash keys over and over, I am also repeating quite a few of the hash values. For instance, the values for class, lang, and dir are fairly consistant. When you see this pattern occurring in your code, it is time to do a little refactoring. Luckily, refactoring repetitiveness out of code is often very easy.

A common strategy for refactoring is to simply inline a map statement into the configuration setup. In the example below, I have inserted a map that handes the portions of the configuration that change the least. It provides default values that represent the most commonly used configuration values and also allows for overrides by overlaying the anchor-specific hash at the end of the map.

my @anchors = map { {
    class => 'standard',
    style => '',
    lang  => 'en',
    dir   => 'ltr',
    %{$_}
  } } (
  {
    text => 'Effective Perl',
    href => 'http://www.effectiveperlprogramming.com'
  },
  {
    text => 'Perl',
    href => 'http://www.perl.org'
  },
  {
    text => 'Perl Foundation',
    href => 'http://www.perlfoundation.org'
  },
  {
    text => 'Perl6 России',
    href => 'http://www.perl6.ru',
    lang => 'ru'
  },
  # ... and so on
);

This simple tweak to the configuration setup significantly reduces the amount of redundancy in the code. Programmers who maintain this code can quickly see what the default values are for all of our anchor tags and can also more easily see the tags with customizations. When all of the options were specified for every anchor, it was easy to miss the meaningful configuration that made the anchors unique. There was way too much noise with the repeated code.

We can take our DRY’ness a step further now by removing the repeated hash keys in the configuration. Every anchor will have a different bit of text and a different link, so those keys will be present for every single configuration. Why declare them every time? Well, one reason is that it serves as good documentation. However, when you only have a couple of required options, positional notation is fine.

In this example, I’m storing each anchor’s configuration in an anonymous array. I’m assuming that the first two elements of the array are always the text and link for the anchor. Everything else is a customization.

my @anchors = map { {
    text  => shift @{$_},
    href  => shift @{$_},
    class => 'standard',
    style => '',
    lang  => 'en',
    dir  => 'ltr',
    @{$_}
  } } (
    [ 'Effective Perl' => 'http://www.effectiveperlprogramming.com' ],
    [ 'Perl' => 'http://www.perl.org' ],
    [ 'Perl Foundation' => 'http://www.perlfoundation.org' ],
    [ 'Perl6 России' => 'http://www.perl6.ru', lang => 'ru' ],
    # ... and so on
);

Now the configuration is very succinct and easy to read compared to its wordy successor. There is very little noise and no repeated code. Each configured anchor is as customizable as when we started. The common elements are only mentioned once and the differences are all that we see.

Post to Twitter Post to Delicious Post to Digg Post to Facebook Post to Reddit

Set custom DBI error handlers

The DBI module lets you handle errors yourself if you don’t like its built-in behavior. DBI lets you handle the errors at either the database or the statement handle level by specifying attributes:

my $dbh = DBI->connect( ..., ..., \%attr );

my $sth = $dbh->prepare( ..., \%attr );

There are several attributes that affect error handling, each of which you can use with either a connection or a statement handle:

Attribute Type Default
PrintWarn Boolean On
PrintError Boolean On
RaiseError Boolean Off
HandleError Code Ref Off
ShowErrorStatement Boolean Off

These attributes are inherited by anything derived from the handle where you set them.

The PrintWarn and PrintError attributes do just what they say. They are on by default, and they don’t stop your program. In this example, you prepare a statement that expects one bind parameter, but when you execute it, you give two parameters instead:

use DBI;

my $dbh = DBI->connect( 'dbi:SQLite:dbname=test.db', '', '', {} );

my $sth = $dbh->prepare( 'SELECT * FROM Cats WHERE id = ?' );
$sth->execute( 1, 2 );

while( my @row = $sth->fetchrow_array ) {
	print "row: @row\n";
	}

print "Got to the end\n";

Since PrintError is true by default, DBI prints the error, but it allows the program to continue even though there was an error:

DBD::SQLite::st execute failed: called with 2 bind variables when 1 are needed at dbi-test.pl line 12.
Got to the end

If you set the ShowErrorStatement attribute, you get a better error message because DBI appends the SQL statement that you tried to execute. You can set this either database handle or the statement handle, but if you don’t know which statement is causing the problem, it’s easier to set it as part of the database handle:

# The rest of the program is the same
my $dbh = DBI->connect( 'dbi:SQLite:dbname=test.db', '', '', {
	ShowErrorStatement => 1,
	} );

The error message shows the SQL statement, but the program still continues:

DBD::SQLite::st execute failed: called with 2 bind variables when 1 are needed [for Statement "SELECT * FROM Cats WHERE id = ?"] at dbi-test.pl line 12.
Got to the end

The RaiseError attribute turns errors into fatal errors that you can trap with eval { ... } or Try::Tiny (Item 103: Handle Exceptions Properly) (or not trap if you want your program to die):

# The rest of the program is the same
my $dbh = DBI->connect( 'dbi:SQLite:dbname=test.db', '', '', {
	RaiseError         => 1,
	ShowErrorStatement => 1,
	} );

use Try::Tiny;

try {
	$sth->prepare( ... );
	$sth->execute( ... );
	}
catch {
	...
	};

The output shows that the program stops (there’s no “Got to the end”), but you see duplicated error messages; the one from PrintError that is just a warning, and the one from RaiseError that kills the program:

DBD::SQLite::st execute failed: called with 2 bind variables when 1 are needed [for Statement "SELECT * FROM Cats WHERE id = ?"] at dbi-test.pl line 14.
DBD::SQLite::st execute failed: called with 2 bind variables when 1 are needed [for Statement "SELECT * FROM Cats WHERE id = ?"] at dbi-test.pl line 14.

Turning off PrintError can fix the duplication:

# The rest of the program is the same
my $dbh = DBI->connect( 'dbi:SQLite:dbname=test.db', '', '', {
	PrintError         => 0,
	RaiseError         => 1,
	ShowErrorStatement => 1,
	} );

Simply raising the exception might be good enough for some applications, but sometimes you want more control of the errors. In those cases, you can handle the errors yourself by providing a code reference to HandleError. In this case, you can just catch the error and print it:

my $dbh = DBI->connect( 'dbi:SQLite:dbname=test.db', '', '', {
	ShowErrorStatement => 1,
	HandleError        => \&dbi_error_handler,
	} );

sub dbi_error_handler {
	my( $message, $handle, $first_value ) = @_;

	print "Caught: $message\n";

	return 1;
	}

DBI passes your HandleError three arguments: the error string it would have used with PrintError, the handle that generated the error, and first return value from the failing method (which is typically nothing useful since there’s an error of some sort).

The error message shows the you caught the error:

Caught: DBD::SQLite::st execute failed: called with 2 bind variables when 1 are needed [for Statement "SELECT * FROM Cats WHERE id = ?"]
Got to the end

If you want a stack trace, you can use Carp (and curiously, the argument alignment works out!).

use Carp;

my $dbh = DBI->connect( 'dbi:SQLite:dbname=test.db', '', '', {
	ShowErrorStatement => 1,
	HandleError        => \&Carp::confess,
	} );

HandleError is how Exception::Class::DBI inserts its error handler:

my $dbh = DBI->connect( $dsn, $user, $pass, {
	PrintError  => 0,
	RaiseError  => 0,
	HandleError => Exception::Class::DBI->handler,
	});

The DBIx-Log4perl uses HandleError, although it hides the details from you:

my $dbh = DBIx::Log4perl->connect('dbi:Oracle:XE', 'user', 'password');

There are some things that you might want to do when handling the error yourself, depending on what you want to accomplish:

  • rollback if you are in the middle of a transaction
  • disconnect from the database if you are going to quit
  • reconnect to the database if you lost the connection
  • print a stack trace

No matter what you want to do, however, it’s HandleError that lets you do it.

Things to remember

  • The RaiseError attribute turns DBI handle warning into fatal errors
  • You can handle errors yourself by giving HandleError a code reference
  • Setting the ShowErrorStatement attribute adds the offending SQL statement to the error message

Post to Twitter Post to Delicious Post to Digg Post to Facebook Post to Reddit

YAPC::NA 2010 Presentation Slides

At YAPC::NA 2010, brian gave a class on Effective Perl and I gave three short presentations related to topics that we discuss in Effective Perl Programming, 2nd Edition. Links to my presentations (with notes) in PDF format (gzipped) can be found below:

Post to Twitter Post to Delicious Post to Digg Post to Facebook Post to Reddit

Effective Perl Programming is in PDF and ePub

I just found out that Effective Perl Programming is available in digital formats through eBooks.com. They have a PDF version and an ePub version, each for $US31.99, which, sadly, is more than the hard-copy price of $US26.39 on Amazon.com and the Kindle price of $US17.59.

They specifically list that these formats target Adobe Digital Editions, so I don’t know if they’ve checked them on any other readers or have used special features.

InformIT also has an “eBook” version, which I think is just a PDF, for $US28.79, but also has a “Book + eBook Bundle” for $US47.19. They might offer ePub in the future.

If anyone buys either the PDF or ePub versions from eBooks or InformIT, let us know what you think of them, how they look, and which reader you use.

Post to Twitter Post to Delicious Post to Digg Post to Facebook Post to Reddit

Watch out for side effects with `use VERSION`

To specify that you wanted to use at least a particular version of Perl, you specified that version with the use built-in:

use VERSION;

We covered this in Item 83: Limit your distributions to the right platforms, and we mentioned that it might invoke side effects. We didn’t get into the details in that Item though. As of Perl 5.10, this introduces side effects that you might not want.

Merely specifying a Perl version prior to 5.10 (well, actually before 5.9.5, the development track that led to 5.10) does nothing other than check the version you specify against the interpreter version. If the version you specify is equal to or greater than the interpreter version, your program continues. If not, it dies:

use 5.008;  # needs perl5.008000 or later

This works with require too:

require 5.008;  # needs perl5.008000 or later

However, use is a compile-time function and require is a run-time function. By the time you hit that require, perl has already compiled your program up to that point or died trying as it ran into unknown features. Code may have already run, despite using an inappropriate version of perl. You want to impose your version restriction as soon as possible.

You might think that you can fix this with a BEGIN block which compiles and immediately runs the code so you get the ordering right:

BEGIN { require 5.010; }

However, this also pulls in the new features for that version, at least in Perls 5.10 and 5.12, although that might be a bug. Even if it is, it still is what it is and you need to watch out for it.

use 5.010

With Perl 5.10, you get three side effects with use 5.010. Starting with that version, use-ing the version also pulls in the new features for that version. Obstensibly, that keeps programs designed for earlier versions breaking as newer perls add keywords, but it also tries to enforce the current philosophy of good programming on you.

Perl 5.10 introduces say, state, and given-which, which you import implicitly when you say use 5.010:

use 5.010;
say 'I can use Switch!';  # imported say()

given ($ARGV[0]) {        # imported given()
	when( defined ) { some_sub() }
	};

sub some_sub {
	state $n = 0;         # imported state()
	say "$n: got a defined argument";
	}

If you want to insist on Perl 5.010 but not use its new features, perhaps because that’s the installation with the necessary modules, you can unimport the side effects immediately with the new feature pragma:

use 5.010;     # implicit imports
no feature;    # take it right back again

# you're own version of say()
sub say {
	# something that you want to do
	}

If you only want some of the new features, you can unimport the ones that you don’t want:

use 5.010;
no feature qw(say);   # leaves state() and given()

sub say {
	# something that you want to do
	}

use 5.012

Perl 5.12 includes two more side effects for use VERSION. The unicode_strings feature treats all strings outside of bytes and locale scopes as Unicode strings. Additionally, use 5.012 automatically turns on strict:

use 5.012;
# now strictures are on

$foo = 1;   # compile-time error!

Again, you can’t get around this by using require because it still turns on the new features and enables strict:

BEGIN { require 5.012 }

$foo = 1;  # still a compile-time error

If, for some odd and dangerous reason you don’t want strict on by default, you can turn it off yourself, even though unimporting it doesn’t give you the warning that you’ve left the paved roads, you’ve just violated your rental car contract, and there’s a chainsaw massacrer waiting for you:

use 5.012;
no feature;
no strict;

my $foo = 1;   

$fo0++;  # sure, go ahead and make that error

A workaround to restrict perl versions

There’s a way around all of this, and it’s to not restrict the version with use if that’s the only thing that you want to do. Instead, you can check the value of the $] variable, just like the various examples you saw in Item 83:

BEGIN {
	die "Unsupported version"
		unless $] >= 5.010 and $] < 5.011
	}

This has the added benefit of restricting the upper acceptable perl version.

Things to remember

  • use VERSION imports new features since Perl 5.9.5.
  • BEGIN { require VERSION } still imports new features
  • Use no feature or no strict to unimport unwanted features.
  • Restrict the perl version with $].

Post to Twitter Post to Delicious Post to Digg Post to Facebook Post to Reddit

The handful of basic Perl concepts

I’ve now given the second Effective Perl Programming class, this time a two-day master class at YAPC::NA 2010 in Columbus. The common comment during the class seemed to be “You just blew my mind again”. I’m also giving this talk in a one day format at YAPC::EU in Pisa on Aug 7

Most of the class goes back to the basics and re-learning the handful of underlying rules in Perl. These are the things that we don’t get bogged down with in Learning Perl, where we want to get as much Perl in front of people as soon as possible so they can start writing Perl programs (no matter how ugly they might look). Effective Perl Programming, however, takes that to another level as Perlers come back to think about the subtle things that they ignored while they were trying to get the big picture. Learning any language is a spiralling effort to big up the big concepts then coming back around to refine them, over and over.

I’ve started to make a list of the handful of things that Perlers need to understand, and none of it has to do with tricky syntax. Many people never pick up these concepts, so they come up with complicated models to explain why things happen. Instead, the Perler just needs to understand:

  • Data and variables are different things (e.g. lists and arrays)
  • Operators matter more than data for deciding what happens
  • Context is important (strings and numbers; scalar, list, and void)
  • Run-time versus compile time, and when things happen despite their location in the code

Post to Twitter Post to Delicious Post to Digg Post to Facebook Post to Reddit

Respect the global state of the flip flop operator

Perl’s flip-flop operator, .., (otherwise known as the range operator in scalar context) is a simple way to choose a window on some data. It returns false until its lefthand side is true. Once the lefthand side is true, the flip-flop operator returns true until its righthand side is true. Once the righthand side is true, the flip flop operator returns false. That is, the lefthand side turns it on and the righthand side turns it off.

Start with a simple file that has START and END markers:

# input.txt
Ignore this
Ignore this too
START
Show this
And this
Also this
END
Don't show this
Or this

You need to extract the lines between those two markers:

# flip-flop
while( <> ) {
	say if /START/ .. /END/;
	}

The output shows the just the stuff between those markers:

% perl flip-flop input.txt
START
Show this
And this
Also this
END

What if you make the file a bit more complicated so there is an extra matching window? Once the flip-flop operator goes back to false, it can turn to true once its lefthand side matches again. Here’s a file with two windows:

# input2.txt
Ignore this
Ignore this too
START
Show this
And this
Also this
END
Don't show this
Or this
START
Show this again
And this again
Also this again
END
But ignore this

Now you get both windows of output:

% perl flip-flop input2.txt
START
Show this
And this
Also this
END
START
Show this again
And this again
Also this again
END

That’s fine, but it gets a bit more complicated when you try to use the same flip flop more than once when you don’t know its state. Modify the flip-flop program so it goes through each file separately instead of combining all the files into the ARGV filehandle:

foreach my $file ( @ARGV )
	{
	open my $fh, '<', $file or die "Could not find $file\n";
	while( <$fh> ) {
		say if /START/ .. /END/;
		}
	}

To watch it work (or not work, coming later), split the input2a.txt file into two separate files, each of which has its own window you want to extract:

# input2a.txt
Ignore this
Ignore this too
START
Show this
And this
Also this
END
Don't show this
# input2b.txt
Or this
START
Show this again
And this again
Also this again
END
But ignore this

The output isn’t surprising and it looks the same as it did with the previous program:

% perl flip-flop input2a.txt input2b.txt
START
Show this
And this
Also this
END
START
Show this again
And this again
Also this again
END

However, it’s at this point that some people get confused. The flip-flip operator doesn’t care about which file you are looking at, what happened in the last file, and so on. To see it “break”, change input2a.txt to it doesn’t have the END marker:

# input2a.txt
Ignore this
Ignore this too
START
Show this
And this
Also this
Don't show this

Since input2a.txt doesn’t complete the window as you intended, the flip-flop, maintaining its state, is still true when it starts the second file:

% perl flip-flop input2a.txt input2b.txt
START
Show this
And this
Also this
Don't show this
# inputb.txt
Or this
START
Show this again
And this again
Also this again
END

The flip-flop maintains its global state. It doesn’t care about starting new loops, new iterations, or anything else. You might think that you could find that in a subroutine, but it’s not even safe there. Every flip-flop operator that perl compiles has its own state, and perl compiles a subroutine only once:

foreach my $file ( @ARGV )
	{
	open my $fh, '<', $file or die "Could not find $file\n";
	extract( $fh );
	}

sub extract {
	my( $fh ) = shift;

	while( <$fh> ) {
		print if /START/ .. /END/; # this is the same .. on every call
		}
	}

The output doesn’t change! The flip-flop doesn’t really care that it’s in a subroutine. It’s really just the same flip-flop like it was before.

So, if every flip-flop operator that perl compiles has its own state, and you want a flip-flop operator with a new state, you just need to compile a new flip-flop for each iteration. That’s simple enough, kinda. This program won’t work because the subroutine reference is the same each time. When perl compiles it, it knows that the anonymous subroutine is going to be the same each time so perl reuses it:

foreach my $file ( @ARGV )
	{
	open my $fh, '<', $file or die "Could not find $file\n";
	make_extractor()->($fh);
	}

sub make_extractor {
	sub { # only compiled once
		my( $fh ) = shift;

		while( <$fh> ) {
			print if /START/ .. /END/;
			}
		};
	}

You can verify this by dumping the return value of make_extractor:

# dump-subs.pl
use Devel::Peek;

my @subs = map { make_extractor() } 1 .. 3;

print Dump( $_ ) foreach @subs;

sub make_extractor {
	sub { # only compiled once
		my( $fh ) = shift;

		while( <$fh> ) {
			print if /START/ .. /END/;
			}
		};
	}

You get the same subroutine each time, which means you get the same flip-flop each time:

% perl dump-subs.pl
SV = RV(0x80f66c) at 0x80f660
  REFCNT = 2
  FLAGS = (ROK)
  RV = 0x81a4f0
  SV = PVCV(0x80e4b8) at ...
SV = RV(0x80f6fc) at 0x80f6f0
  REFCNT = 2
  FLAGS = (ROK)
  RV = 0x81a4f0
  SV = PVCV(0x80e4b8) ...
SV = RV(0x8030bc) at 0x8030b0
  REFCNT = 2
  FLAGS = (ROK)
  RV = 0x81a4f0
  SV = PVCV(0x80e4b8) ...

You have to make each subroutine different somehow. The trick is to use a closure, which is a subroutine that references a lexical variable that has gone out of scope. In this case, you can enlist state to keep track of how many flip-flop operators you make, and since each new anonymous subroutine needs to capture the value of $count, perl can’t reuse previous definitions. You force it to make a new subroutine:

# flip-flop
use 5.010;

foreach my $file ( @ARGV )
	{
	open my $fh, '<', $file or die "Could not find $file\n";
	make_extractor()->($fh);
	}

sub make_extractor {
	state $count = 0;
	$count++;

	sub {
		my( $fh ) = shift;

		while( <$fh> ) {
			print "$count: $_" if /START/ .. /END/;
			}
		};
	}

Now each file gets its own flip-flop. You can see where the first file ends (and is missing its marker) and the second file begins. Every file gets its own flip-flop:

% perl flip-flop input2a.txt input2b.txt
1: START
1: Show this
1: And this
1: Also this
1: Don't show this
2: START
2: Show this again
2: And this again
2: Also this again
2: END

For more information about flip-flops, see perlop’s entry for Range Operators.

Things to remember

  • Every flip-flop maintains a global state
  • Flip-flops are not scoped
  • Create a new flip-flop by wrapping it in a closure

Post to Twitter Post to Delicious Post to Digg Post to Facebook Post to Reddit