Use Data::Printer to debug data structures

You can use several different Perl modules to inspect data structures. Many of these modules, however, are really two tools in one. Besides showing a data structure as a string, they also serialize the data as Perl code so you can reconstruct the data structure. That second job often makes things hard for you. If you don’t need the serialization job, don’t use a module that insists on it.

The Data::Dumper module is popular because it comes with Perl. Here’s a program that we’ll use for the rest of the Item, save for changes to the module dumping the structure:

use Data::Dumper qw(Dumper);
use DateTime;
use HTTP::Request;

my $request = HTTP::Request->new(
		GET => 'http://www.perl.org',
		);

$request->header( 'X-Perl' => '5.12.2' );
$request->header( 'Cat'    => 'Buster' );

my $data = {
	hash => {
		cat  => 'Buster',
		dog  => 'Addy',
		bird => 'Poppy',
		},
	array => [ qw( a b c ) ],
	datetime => DateTime->now,
	reqeust  => $request,
	};

print Dumper( $data );

The output is a Perl data structure, suitable for eval. That makes it a bit verbose and ugly:

$VAR1 = {
      'array' => [
             'a',
             'b',
             'c'
           ],
      'hash' => {
            'cat' => 'Buster',
            'dog' => 'Addy',
            'bird' => 'Poppy'
          },
      'reqeust' => bless( {
                '_content' => '',
                '_uri' => bless( do{\(my $o = 'http://www.perl.org')}, 'URI::http' ),
                '_headers' => bless( {
                             'cat' => 'Buster',
                             'x-perl' => '5.12.2'
                           }, 'HTTP::Headers' ),
                '_method' => 'GET'
                }, 'HTTP::Request' ),
      'datetime' => bless( {
                 'local_rd_secs' => 68540,
                 'local_rd_days' => 734452,
                 'rd_nanosecs' => 0,
                 'locale' => bless( {
                            'default_time_format_length' => 'medium',
                            'native_territory' => 'United States',
                            'native_language' => 'English',
                            'native_complete_name' => 'English United States',
                            'en_language' => 'English',
                            'id' => 'en_US',
                            'default_date_format_length' => 'medium',
                            'en_complete_name' => 'English United States',
                            'en_territory' => 'United States'
                          }, 'DateTime::Locale::en_US' ),
                 'local_c' => {
                        'hour' => 19,
                        'second' => 20,
                        'month' => 11,
                        'quarter' => 4,
                        'day_of_year' => 315,
                        'day_of_quarter' => 42,
                        'minute' => 2,
                        'day' => 11,
                        'day_of_week' => 5,
                        'year' => 2011
                        },
                 'utc_rd_secs' => 68540,
                 'formatter' => undef,
                 'tz' => bless( {
                          'name' => 'UTC'
                        }, 'DateTime::TimeZone::UTC' ),
                 'utc_year' => 2012,
                 'utc_rd_days' => 734452,
                 'offset_modifier' => 0
                 }, 'DateTime' )
    };

The Data::Dump also serializes, and is a cleaner Data::Dumper. In void context, it automatically prints for you:

use Data::Dump qw(pp);

...; # same $data thing as before

pp( $data );

The output looks a lot like the Data::Dumper output because it has to be a Perl data structure:

{
  array  => ["a", "b", "c"],
  datetime => bless({
        formatter     => undef,
        local_c     => {
                   day => 11,
                   day_of_quarter => 42,
                   day_of_week => 5,
                   day_of_year => 315,
                   hour => 19,
                   minute => 18,
                   month => 11,
                   quarter => 4,
                   second => 33,
                   year => 2011,
                   },
        local_rd_days   => 734452,
        local_rd_secs   => 69513,
        locale      => bless({
                   default_date_format_length => "medium",
                   default_time_format_length => "medium",
                   en_complete_name => "English United States",
                   en_language => "English",
                   en_territory => "United States",
                   id => "en_US",
                   native_complete_name => "English United States",
                   native_language => "English",
                   native_territory => "United States",
                   }, "DateTime::Locale::en_US"),
        offset_modifier => 0,
        rd_nanosecs   => 0,
        tz        => bless({ name => "UTC" }, "DateTime::TimeZone::UTC"),
        utc_rd_days   => 734452,
        utc_rd_secs   => 69513,
        utc_year    => 2012,
        }, "DateTime"),
  hash   => { bird => "Poppy", cat => "Buster", dog => "Addy" },
  reqeust  => bless({
        _content => "",
        _headers => bless({ "cat" => "Buster", "x-perl" => "5.12.2" }, "HTTP::Headers"),
        _method  => "GET",
        _uri   => bless(do{\(my $o = "http://www.perl.org")}, "URI::http"),
        }, "HTTP::Request"),
}

Steven Haryanto added a filter feature to an existing interface, which you can see in Use Data::Dump filters for nicer pretty-printing. You can get better control of the parts you’d want to distill, such as that DateTime:

{
  array => # some items hidden
  [2011-02-03, "d", "...", "n"],
  datetime => 2011-02-03,
  hash => { bird => "Poppy", cat => "Buster", dog => "Addy" },
}

If you forget about the serialization, though, you can do much better. Often, you want to inspect a data structure to see what’s on the inside without saving it for future use. If that’s the case, you don’t need to see the data structure as Perl code and the pretty printer can organizer the data much better and provide more information. The Data::Printer module doesn’t care at all about serialization. In void context, its p automatically prints:

use Data::Printer;

...; # same $data thing as before

p( $data );

The output is as verbose, but it’s also much more dense. When it prints an object, it shows you the methods in the class:

\ {
    array      [
        [0] "a",
        [1] "b",
        [2] "c"
    ],
    datetime   DateTime  {
        public methods (134) : add, add_duration, am_or_pm, bootstrap, ce_year, christian_era, clone, compare, compare_ignore_floating, date, datetime, day, day_abbr, day_name, day_of_month, day_of_month_0, day_of_quarter, day_of_quarter_0, day_of_week, day_of_week_0, day_of_year, day_of_year_0, day_0, DefaultLanguage, DefaultLocale, delta_days, delta_md, delta_ms, dmy, doq, doq_0, dow, dow_0, doy, doy_0, duration_class, epoch, era, era_abbr, era_name, format_cldr, formatter, fractional_second, from_day_of_year, from_epoch, from_object, hires_epoch, hms, hour, hour_1, hour_12, hour_12_0, INFINITY, is_dst, is_finite, is_infinite, is_leap_year, iso8601, jd, language, last_day_of_month, leap_seconds, local_day_of_week, local_rd_as_seconds, local_rd_values, locale, MAX_NANOSECONDS, mday, mday_0, mdy, microsecond, millisecond, min, minute, mjd, mon, mon_0, month, month_abbr, month_name, month_0, NAN, nanosecond, NEG_INFINITY, new, now, offset, quarter, quarter_abbr, quarter_name, quarter_0, sec, second, SECONDS_PER_DAY, secular_era, set, set_day, set_formatter, set_hour, set_locale, set_minute, set_month, set_nanosecond, set_second, set_time_zone, set_year, STORABLE_freeze, STORABLE_thaw, strftime, subtract, subtract_datetime, subtract_datetime_absolute, subtract_duration, time, time_zone, time_zone_long_name, time_zone_short_name, today, truncate, utc_rd_as_seconds, utc_rd_values, utc_year, wday, wday_0, week, week_number, week_of_month, week_year, weekday_of_month, year, year_with_christian_era, year_with_era, year_with_secular_era, ymd
        private methods (38) : _accumulated_leap_seconds, _add_overload, _adjust_for_positive_difference, _calc_local_components, _calc_local_rd, _calc_utc_components, _calc_utc_rd, _cldr_pattern, _compare, _compare_overload, _day_has_leap_second, _day_length, _era_index, _format_nanosecs, _handle_offset_modifier, _is_leap_year, _month_length, _new, _new_from_self, _normalize_leap_seconds, _normalize_nanoseconds, _normalize_seconds, _normalize_tai_seconds, _offset_for_local_datetime, _rd2ymd, _seconds_as_components, _space_padded_string, _string_compare_overload, _string_equals_overload, _string_not_equals_overload, _stringify, _subtract_overload, _time_as_seconds, _utc_hms, _utc_ymd, _weeks_in_year, _ymd2rd, _zero_padded_number
        internals: {
            formatter         undef,
            local_c           {
                day              11,
                day_of_quarter   42,
                day_of_week      5,
                day_of_year      315,
                hour             19,
                minute           41,
                month            11,
                quarter          4,
                second           42,
                year             2011
            },
            local_rd_days     734452,
            local_rd_secs     70902,
            locale            DateTime::Locale::en_US,
            offset_modifier   0,
            rd_nanosecs       0,
            tz                DateTime::TimeZone::UTC,
            utc_rd_days       734452,
            utc_rd_secs       70902,
            utc_year          2012
        }
    },
    hash       {
        bird   "Poppy",
        cat    "Buster",
        dog    "Addy"
    },
    request    HTTP::Request  {
        Parents       HTTP::Message
        Linear @ISA   HTTP::Request, HTTP::Message
        public methods (10) : accept_decodable, as_string, clone, dump, method, new, parse, uri, uri_canonical, url
        private methods (0)
        internals: {
            _content   "",
            _headers   HTTP::Headers,
            _method    "GET",
            _uri       URI::http
        }
    }
}

You probably don’t want to see all that internal gunk from DateTime or HTTP::Request, so you can set filters from them to print them however you like:

use Data::Printer {
    filters => {
       'DateTime'      => sub { "DateTime => $_[0]" },
       'HTTP::Request' => sub { "URL => " . $_[0]->uri }, 
    },
};

...; # same $data thing as before

p( $data );

Now you can see what you need to see much easier:

\ {
    array      [
        [0] "a",
        [1] "b",
        [2] "c"
    ],
    datetime   DateTime => 2011-11-11T19:51:06,
    hash       {
        bird   "Poppy",
        cat    "Buster",
        dog    "Addy"
    },
    request    URL => http://www.perl.org
}

So far, you’ve changed the import list to specify what you wanted, but you can change it each time that you want to dump something:

p( $data, { index => 0 } );

Now you don’t have array indices:

\ {
    array      [
        "a",
        "b",
        "c"
    ],
    datetime   DateTime => 2011-11-11T20:25:22,
    hash       {
        bird   "Poppy",
        cat    "Buster",
        dog    "Addy"
    },
    request    URL => http://www.perl.org
}

You can make colorized output too by setting another property:

use Data::Printer {
	colored => 1,
	filters => {
		'DateTime'      => sub { "DateTime => $_[0]" },
		'HTTP::Request' => sub { "URL => " . $_[0]->uri }, 
	},
};

...; # same $data thing as before

p( $data );

And you can change the colors if you don’t like the default set. You have to choose a valid Term::ANSI color:

use Data::Printer {
	colored => 1,
	color => {
		array       => 'yellow',
		string      => 'cyan',
		hash        => 'green',
	},
	filters => {
		'DateTime'      => sub { "DateTime => $_[0]" },
		'HTTP::Request' => sub { "URL => " . $_[0]->uri }, 
	},
};

...; # same $data thing as before

p( $data );

This might be more pleasing to you:

Lastly, one of the most annoying “features” of a pretty printer is the constant reference passing. Since Perl flattens its argument list into a single list, to maintain data structure identities, you have to pass them as a reference:

use Data::Dumper;
print Dumper( \%hash );
use Data::Dump;
pp( \%hash );

You can do that with Data::Printer too:

use Data::Printer;
p( \%hash );

Data::Printer uses prototypes to make that easier for you. The Dumper and pp each dump a list of structures, but Data::Printer‘s p dumps exactly one structure. As such, it can use prototypes to recognize a whole hash or array as the first argument:

use Data::Printer;
p( %hash );

This feature has a few oddities, but Breno explains them in the documentation.

Things to remember

  • Serialization and inspection are different tasks
  • Most Perl pretty printers try to serialize
  • The Data::Printer