Don’t use auto-dereferencing with each or keys

[Update: Perl v5.24 removes this experimental feature, for the reasons I list, among others.]

Perl 5.14 added an auto-dereferencing features to the hash and array operators, and I wrote about those in Use array references with the array operators. I’ve never particularly liked that feature, but I don’t have to like everything. Additionally, Perl 5.12 expanded the job of keys and values to also work on arrays.

chromatic has explicated a problem with each, which is both an array and hash operator. He details it in Inadvertent Inconsistencies: each in Perl 5.12 and Inadvertent Inconsistencies: each versus Autoderef. In short, if you use it with a reference, Perl doesn’t know until it actually executes the each if it’s going to use it’s array or hash behavior (and in some cases, blow up with either). However, as the programmer, I probably know which behavior I want:

while( my( $index, $value ) = each $ref ) { # I want array behavior
    my $elem = $other_array->[$index]; 
    } 

while( my( $key, $value ) = each $ref ) { ... } # I want hash behavior

The problem isn’t when it blows up, which is easy to catch (it blows up). If you get the wrong sort of reference, you’ll get nonsensical indices or keys. If you have an array reference, you’ll get numbers with the first return value. If you have a hash reference, you’ll get strings. If you get strings but treat them as array indices, you’ll likely always get array index 0, unless the key is a number. You might even get an odd index. If the key is 123Buster, you’ll get array index 123 due to Perl’s numification. Going the other way, using an array reference when you expected a hash, you’ll have to find keys that are whole numbers.

Effective programs reduce ambiguity in their code, but this new feature increases it. It’s easy to fix; you dereference them yourself. If you have the wrong reference type, you’ll find out right away:

while( my( $index, $value ) = each @$ref ) { my $elem = $other_array->[$index]; } # I want array behavior
while( my( $key, $value ) = each %$ref ) { ... } # I want hash behavior

If you really wanted to keep the auto-dereferencing feature, you could check the reference type before you use it, but what’s the point of saving a character with the auto-dereferencing if you have to wrap the whole thing in a guard condition?

if( ref $ref eq ref [] ) {
    while( my( $index, $value ) = each @$ref ) { ... }
    }

Now keys has the same problem. You can use that either with an array or a hash, but at some point you’re probably going to have to know what sort of reference you have so you can use the key to dereference it. At that point, you effectively declare what sort of reference it should have been. If you have the wrong sort of reference, your script dies:

my $ref = [ ... ];
foreach my $key ( keys $ref ) { 
    my $elem = $ref->{$index}; # Big error!
    }

This problem is the unintended consequence of letting the other array and hash operators take a scalar variable as an argument and letting the parser automatically add the bits to dereference. David Golden wanted more magic syntax and the patch wasn’t so tough. To get the nicer syntax in some cases you end up dealing with more special cases. I noted this at the time David proposed it, but his enthusiasm for the interesting parts of the problem steamrolled over the bad parts.

4 thoughts on “Don’t use auto-dereferencing with each or keys”

  1. I don’t understand why this is a bad thing. Isn’t it basically the same kind of polymorphism we’d see as desirable in object-oriented code? Why not pass the same code a whole bunch of array-or-hash-references and allow it to work?

    Yes, the auto-dereferencing doesn’t work on blessed references, and chromatic implies that it’s because it’s possible to overload the same object with both hash and array behaviors and that would be ambiguous. This seems to me an occurrence that is exceptionally rare, and even if it happened, it would be reasonable for it to default to hash behavior. That doesn’t sound like the kind of thing that should stop us from auto-dereferencing.

  2. The polymorphism might be fine if the same method called on different sorts of objects returned the same sort of value. However, as I explained, if you think you have an array and expect an index return value but get a hash key, when you use that key as the array index you think you have, things break.

  3. Thank you for the good posting.

    Why only ‘each’? Are ‘keys’ and ‘values’ okay with this problem? (Particularly ‘keys’. I guess ‘values’ would have no problem because it returns the same results for array and hash)

Comments are closed.