Avoid perl housekeeping for hot loop optimization

David Golden gave a talk at The Perl Conference 2017 where he showed Real World Optimization for the MongoDB Perl driver. He spoke about many big performance gains and you can watch the talk for that, but at the end he talked about various micro-optimizations.

Small gains in “hot loops” (code that executes many, many times) can add up to significant savings. David was able to cut off 20% of the runtime with some of these micro-optimizations. All of these are his ideas but they are the very thing the Effective Perl programmer is curious about.

This is also in line with the history of programming. If you want more speed, at some point you have to think about how things actually happen. Perl, being a dynamic language, does quite a bit of work for you. These optimizations are about delaying or removing that work for hot loops, not everywhere in your program. This is the sort of stuff you think about if you have done all of the other, big optimizations and you still need to squeak out a small sliver of performance.

For what it’s worth, next in my Youtube playlist was Frew Schmidt’s Scaling, Reliability, and Performance at ZipRecruiter from the same conference. His advice was that if something takes a lot of time, you shouldn’t optimize it. Instead, you should just stop doing that thing. You’ll have to watch that talk to see what he means by that.

Avoid Statements

After perl executes a statement it can take some time for itself to do things. In a “hot loop” where you want your code to run as fast as possible (and let Perl do its housekeeping later), you can get rid of the statement boundaries. Here are three statement boundaries and three places where Perl can wander off:

my $x = 5;
my $y = 6;
my $z = 7;

Instead of that, combine that into one statement:

my( $x, $y, $z ) = (5,6,7);

Although this example is simple, you could rearrange it in other ways that aren’t special to assignments. Put “statements” in parentheses to compartmentalize them and separate those with the comma operator:

(my $x = 5), (my $y = 6), (my $z = 7);

This is just about delaying perl‘s potential housekeeping so this code that runs repeatedly runs faster. perl will have other opportunities.

Avoid Scopes

Perl does a lot of work to set up a scope. Each scope gets its own lexical variables so it needs to set up that stuff. Various things can happen on scope entry and exit, so that needs to happen. Sometimes we don’t need that.

Here’s a conditional. You want that print to do its work under that condition. That block isn’t there to limit scope of variables or do any entry or exit work. It’s there because that’s the Perl syntax for an if (because Larry didn’t want the single statement C if).

if( $condition ) {
	print "That was true\n";
	warn "That was also true\n";
	}

In that case, you can avoid the scope with the postfix form. Now there’s no additional block and there’s no scope to setup and tear down:

( (print "That was true\n"), (warn "That was also true\n") ) if $condition;

Avoid Variables

I like intermediate variables and use them liberally, but I do that knowing that I’m wasting a little bit of time. Often it’s time that I don’t care about because I want to grab the intermediate result for debugging (that’s the time I care about):

my $middle = do_this_thing();
# say "middle was [$middle]";
some_other_call( $middle );

For perl to do that, it needs to setup and teardown that lexical variable. That’s not insignificant. Instead, pass the result directly:

some_other_call( do_this_thing() );

If I had to debug this, I could rearrange it to see what’s going on (using a branch in git, of course). When I was satisfied with that I could throw away the branch.