Understand autovivification

Perl will autovivify complex data structures when you use them as if they already exist. This feature saves you a lot of annoying work defining structures that you intend to use. However, this also means that Perl might create data structures that you don’t intend to use in code that isn’t just assigning values.

We briefly mentioned autovivification in Item 58. Understand references and reference syntax, but that was only on the way to talk about complex data structures. The Perl documentation itself barely mentions the term; you’ll find in once in perlref and once in perlreftut, but several times in the perlfunc. Now you’ll read about it in depth, all in one place.

Autovivification only works on undefined values. If you have a scalar without a value and use it like it’s an array reference, Perl makes it an array reference:

use Data::Dumper;

my $array;

$array->[3] = 'Buster';  # autovivification
print Dumper( $array );

Since arrays are ordered, Perl needs to create the prior elements. It doesn’t assign any values to those elements, but they are still there. After those empty elements, Perl creates the one that you used and assigns ‘Buster’:

$VAR1 = [
          undef,
          undef,
          undef,
          'Buster'
        ];

This can get you into trouble though. What if you didn’t know the index ahead of time, then accidently started with a large number? Perl would create an array of that size just to fill in the last element:

use Data::Dumper;

my $array;
my $index = 10_000;

$array->[$index] = 'Buster';  # autovivification
print Dumper( $array );

Consider what you’d have to do if Perl didn’t autovivify:

# initialize the array
my $array = [];   

# extend the array if necessary
$#$array = $index if $#$array < $index;

#assign the value
$array->[3] = 'Buster';

Perl doesn’t always need to fill in a value, though. You can also autovivify by trying to assign part of a data structure that doesn’t exist:

use 5.010;
use Data::Dumper;

my $array;
my $index = 3;

my $scalar = $array->[$index];
print Dumper( $array );
say '$array is an array' if ref $array eq ref [];

Perl doesn’t fill in any values, but to dereference $array to check for element 3, Perl creates the array:

$VAR1 = [];
$array is an array

You can autovivify hashes, too. Since hashes are unordered, Perl doesn’t need to fill in any other indices to maintain the order:

use Data::Dumper;

my $hash;

$hash->{Buster} = 'Bean';
print Dumper( $hash );

Your hash only contains the element that you explicitly created:

$VAR1 = {
          'Buster' => 'Bean'
        };

You can get a hash when you try to delete a key from an undefined value, just as in the array element assignment example:

use 5.010;
use Data::Dumper;

my $hash;

delete $hash->{'Buster'};
print Dumper( $hash );
say '$hash is an hash' if ref $hash eq ref {};

To check for the Buster key, $hash needs to be a hash reference first. Perl doesn’t need to create the key, so you end up with an empty hash reference:

$VAR1 = {};
$hash is an hash

Using any undefined scalar variable with an array or hash operator also autovivifies:

use Data::Dumper;

my( $hash1, $hash2, $array1, $array2 );

my @keys   = keys %$hash1;
my @values = values %$hash2;

pop @$array1;
shift @$array2;

print Dumper( $hash1, $hash2, $array1, $array2 );

All of the scalar values are now references even though you didn’t assign any values:

$VAR1 = {};
$VAR2 = {};
$VAR3 = [];
$VAR4 = [];

Perl autovivifies filehandles too, which we showed without mentioning autovivification in Item 52. Always use the three-argument open:

open my $fh, '<', $filename or die ...;

Perl has this behavior in any of the filehandle sorts of built-ins, including open, opendir, and sysopen.

Complex data structures

Autovivification works for complex data structures too. Consider this gnarly, deeply-nested data structure:

use Data::Dumper;

my $data;

$data->{Buster}[3]{Bean}[1][2]{count}++;
print Dumper( $data );

Perl creates it just like you asked for it:

$VAR1 = {
          'Buster' => [
                        undef,
                        undef,
                        undef,
                        {
                          'Bean' => [
                                      undef,
                                      [
                                        undef,
                                        undef,
                                        {
                                          'count' => 1
                                        }
                                      ]
                                    ]
                        }
                      ]
        };

This still follows the same rules for autovivification. Each level only works when each part is undefined. When you start with an undefined scalar, that's easy. As before, consider what you would have to do if you had to create each level yourself, initializing each level on your own:

use Data::Dumper;

my $data = {};
$data->{Buster} = [];
$#{ $data->{Buster} } = $i if $#{ $data->{Buster} } < $i;
$data->{Buster}[$i] = {};
$data->{Buster}[$i]{Bean} = [];
$#{ $data->{Buster}[$i]{Bean} } = $j 
	if $#{ $data->{Buster}[$i]{Bean} } < $j;
$data->{Buster}[$i]{Bean}[$j] = [];
$#{ $data->{Buster}[$i]{Bean}[$j] } = $k 
	if $#{ $data->{Buster}[$i]{Bean}[$j] } < $k;
$data->{Buster}[$i]{Bean}[$j][$k] = {};
$data->{Buster}[$i]{Bean}[$j][$k]{count}++;

You could also just type it out to match the Data::Dumper output, but that's even more annoying. Consider a more realistic example. Suppose you have a text file that tracks the number of bytes transferred from one host to another (in an idea that shows up in one of my other books, Intermediate Perl):

Buster Mimi 15
Buster Roscoe 18
Roscoe Mimi 19
Roscoe Roscoe 1
Mimi Buster 6
Mimi Buster 10

If I wanted to collate this data, I'd read a line, split that line, and just use the keys as if the hash structure already exists:

my $count;

while( <> ) {
	my( $source, $destination, $bytes ) = split;
	$count{ $source }{ $destination } += $bytes;
	}

That program makes it really easy. If I had to do each step on my own, I might not be a Perl user.

Some gotchas

Perl won't overwrite the value. This almost looks like it works because Perl doesn't complain:

use Data::Dumper;

my $data = { Buster => 'Bean' };

$data->{Buster}[3]{Bean}[1][2]{count}++;

print Dumper( $data );

Even though you might have expected the big structure there, you don't get it:

$VAR1 = {
          'Buster' => 'Bean'
        };

However, if you use strict (which you should, Item 3. Enable strictures to promote better coding or with a recent Perl, Implicitly turn on strictures with Perl 5.12), Perl tells you why it didn't work:

use strict;

use Data::Dumper;

my $data = { Buster => 'Bean' };

$data->{Buster}[3]{Bean}[1][2]{count}++;

print Dumper( $data );

Now Perl doesn't let you get past that troublesome line that doesn't really do anything:

Can't use string ("Bean") as an ARRAY ref while "strict refs" in use at auto.pl line 5.

The problem comes up in some unexpected ways. What if you wanted to check for a value inside a data structure?

use 5.014;

use Data::Dumper;

my $data;

if( exists $data->{Cats}{Buster}{count} ) { 
	say 'Buster exists!';
	}
elsif( $data->{Cats}{Mimi}{count} == 1 ) {
	say 'Mimi exists!';
	}

print Dumper( $data );

Even if you don't pass the condition, Perl still has to create the data structure to test the condition. The only output comes from Data::Dumper:

$VAR1 = {
          'Cats' => {
                      'Mimi' => {},
                      'Buster' => {}
                    }
        };

Notice, though, that Perl only creates the data structure up to the point that it needs to evaluate the test. It doesn't need to create the third level, with count, because the absense of that key already tells Perl that the condition fails.

Things to remember

  • Perl autovivifies when you use an undefined value like it's a reference
  • An undefined value in exists autovivifies
  • Perl can autovivify when you store a value or fetch a value