Josh McAdams – The Effective Perler

Don’t make Perl do more work than it needs to

My choice of algorithms and data organization can lead to orders of magnitude of performance differences between functionally equivalent implementations of my programs. Choosing the right way to do something can save me orders of magnitude in processing.

I wrote some code to loop over lines in a file and modify a couple of elements in each line. My code seems very innocent, but the actual field modifications were a little more complex; I replaced them with lc and uc to make things simple for this example:

while (<>) {
  chomp;
  my @x = split "\t";
  $x[3] = lc $x[3];
  $x[5] = uc $x[5];
  print join("\t", @x), "\n";
}

I have a loop that splits each line of a file on the tab character then modifies the fourth and sixth field.
Look closely at the loop: I’m making Perl do a lot more work than it needs to.

First, I cal chomp on each line, but I’m adding the newline right back. There is no reason to do that. If I leave it there I get the same result:

while (<>) {
  my @x = split "\t";
  $x[3] = lc $x[3];
  $x[5] = uc $x[5];
  print join("\t", @x);
}

chomping when I don’t need to is a minor issue, but it is not likely going to destroy the performance of any program.

Looking a little closer, I see a much bigger inefficiency if I know the format of the data. Assume each line of data contains many fields; many being some number greater than six in this specific example. Since I’m are only acting on fields four and six, why do I split the entire line just to put it back together?

With a couple of more arguments, I can tell split to limit the number of fields it creates. I can limit my results to seven items since I don’t care to modify any beyond that:

while (<>) {
  my @x = split "\t", $_, 7;
  $x[3] = lc $x[3];
  $x[5] = uc $x[5];
  print join("\t", @x);
}

If each line only contains seven, or even ten, elements, I won’t see much if any improvement. However, if each line contains dozens or hundreds of fields, my potential speed-up is huge.

There is even more I can do to milk performance out of my loop if I control the data format. If I move the columns that I need to modify to the front of each row, I don’t need to split into so many fields:

while (<>) {
  my @x = split "\t", $_, 3;
  $x[0] = lc $x[0];
  $x[1] = uc $x[1];
  print join("\t", @x);
}

Measure the improvement

Just to be sure that these changes are really making my code faster, I do a little benchmarking to get a feel for the relative performance differences:

use warnings;
use strict;
use Benchmark qw(timethese);

my @data;

for (0..10_000) {
  $data[$_] = join "\t", map { chr(65 + (int rand(52))) } (0..100);
}

timethese(500, {
  'standard' => sub {
    for (@data) {
      chomp;
      my @x = split "\t";
      $x[3] = lc $x[3];
      $x[5] = uc $x[5];
      $_ = join("\t", @x) . "\n";
    }
  },
  'no_chomp' => sub {
    for (@data) {
      my @x = split "\t";
      $x[3] = lc $x[3];
      $x[5] = uc $x[5];
      $_ = join("\t", @x);
    }
  },
  'smaller' => sub {
    for (@data) {
      my @x = split "\t", $_, 7;
      $x[3] = lc $x[3];
      $x[5] = uc $x[5];
      $_ = join("\t", @x);
    }
  },
  'smallest' => sub {
    for (@data) {
      my @x = split "\t", $_, 3;
      $x[0] = lc $x[0];
      $x[1] = uc $x[1];
      $_ = join("\t", @x);
    }
  },
});

In this benchmark I experimented on ten thousand records, each with a hundred fields. The benchmarks measure:

the initial, or “standard” case.
the case where I just removed a chomp, “no_chomp”.
the case where I limit split, “smaller”.
the case where I reorder the inbound data, “smallest”.

The [reordered] results tell me quite a bit:

Benchmark: timing 500 iterations of no_chomp, smaller, smallest, standard…
standard: 451 wallclock secs (449.66 usr + 0.34 sys = 450.00 CPU) @ 1.11/s (n=500)
no_chomp: 451 wallclock secs (446.18 usr + 0.41 sys = 446.59 CPU) @ 1.12/s (n=500)
smaller: 39 wallclock secs (39.15 usr + 0.03 sys = 39.18 CPU) @ 12.76/s (n=500)
smallest: 19 wallclock secs (18.98 usr + 0.01 sys = 18.99 CPU) @ 26.33/s (n=500)

Removing chomp had an almost unnoticeable effect. However, I recuded my processing tenfold by limiting split. I made my code even faster by reordering the inbound data.

Just to see what effect number of fields really has, I reduced the size of the data so that each record had ten fields. The results were obviously less impressive, though still visible:

Benchmark: timing 500 iterations of no_chomp, smaller, smallest, standard…
standard: 58 wallclock secs (57.50 usr + 0.05 sys = 57.55 CPU) @ 8.69/s (n=500)
no_chomp: 55 wallclock secs (55.13 usr + 0.04 sys = 55.17 CPU) @ 9.06/s (n=500)
smaller: 38 wallclock secs (37.67 usr + 0.03 sys = 37.70 CPU) @ 13.26/s (n=500)
smallest: 18 wallclock secs (18.46 usr + 0.01 sys = 18.47 CPU) @ 27.07/s (n=500)

So what does this tell me? Well, as far as a specific optimization goes, limiting split is probably a good idea if I’m trying to optimize my processing and really don’t need every field.

However, there is a larger moral to this story and that is: When performance is a concern, don’t make your code do work that it doesn’t have too. The most important is my choice of algorithms and related data structures for solving my problem. This can make or break my performance. After that, knowing specific optimizations such as limiting split fields, help me get just a little more performance out of my code.

Keep your programmatic configuration DRY

A common mantra among programmers today is to keep your code DRY. This little acronym stands for “Don’t Repeat Yourself” and serves as a reminder that when you see a repetitive pattern in your code or are tempted to copy/paste some statements, you should think twice and consider extracting the common logic into a chunk of code that can be reused.

For many programmers, this practice begins to break down when “configuration” code is involved. When I talk about configuration code here, I’m not talking about the XML, YAML, INI, etc. bits of your project. I’m talking about the Perl code in your program that simply serves as data to feed some active portion of your code.

YAPC::NA 2010 Presentation Slides

At YAPC::NA 2010, brian gave a class on Effective Perl and I gave three short presentations related to topics that we discuss in Effective Perl Programming, 2nd Edition. Links to my presentations (with notes) in PDF format (gzipped) can be found below:

Detect regular expression match variables in your code

[UPDATE: this is not a problem in v5.18 and later.]

In Item 33: “Watch out for match variables”, you found out that the match variable $`, $&, and $` come with a performance hit. With all of the module code that you might use, you might be using those variables even though you didn’t code with them yourself. Continue reading “Detect regular expression match variables in your code”

Know when and when not to write networking code.

Even though Perl has support for low-level socket programming, that doesn’t mean that you have to program at the low-level. For common protocols such as FTP, HTTP, POP3, or SMTP, you can use modules that come with Perl to handle the transport details. libnet, the distribution that comprises the basic protocols, comes with Perl since 5.8. Many other protocols, such as SSH, have implementations on CPAN. You can also install from CPAN many higher level libraries, such as LWP and WWW::Mechanize. Continue reading “Know when and when not to write networking code.”

Use Carp::REPL as an interactive Perl shell.

Wouldn’t it be great if you could stop your program right before it died so you could see what’s causing the problem? You could start the Perl debugger and step your way to the problem, or set up some break points, but that’s often too much work. The Carp::REPL module let’s you drop into a debugger just at the point you need. Continue reading “Use Carp::REPL as an interactive Perl shell.”

Use DBI_TRACE to follow DBI’s work.

There is a lot going on when you use DBI to interact with a database. Tack on a few layers of abstraction from Object Relational Modelers (ORM’s) such as DBIx::Class and you can end up with a tricky maze of subroutine calls to dig through when you need to track down issues. DBI comes with a built-in tracing feature to make it easier though. Continue reading “Use DBI_TRACE to follow DBI’s work.”

Interact with the system when Perl isn’t enough

Usually, you want to do as much of your work inside your Perl program as you can. CPAN has a wealth of modules that accomplish many tasks, including those that you normally do very easily from the command line. In the cases where you can’t find a module, you might not be able to improve on the command-line siutation for complicated tasks. Perl is, after all, a glue language, so you don’t always have to do everything in Perl. When you decide to use an external program, there are many Perl features waiting to help, although you have to choose the right one for the job. Continue reading “Interact with the system when Perl isn’t enough”

Slides for “Effective Perl: Unicode” at Frozen Perl 2010

At Frozen Perl I did a quick presentation about Unicode and Perl. I had to do some work on the slides before releasing them publicly, but here they are… Be sure to look at the author notes if you want more detailed information.