Use Data::Dump filters for nicer pretty-printing

Data::Dumper, a module that comes in the Standard Library, is one of the great tools knows to Perlers. You give it a big data structure and it pretty prints it for you. If you are one of those people who still believe that the best debugger in the world is print and need to get data structures into a single string with decent formatting, something like Data::Dumper is your best friend. When you get really complex data structures involving complicated objects, though, dumping the entire structure might be too much information.

There are other pretty-printers too. Data::Dump is another interface to Data::Dumper. It’s pp subroutine (named just like the Perl builtin) is similar to calling Data::Dumper‘s Dumper. The pp is actually another name for Data::Dump::dump, but since there is a Perl built-in named dump, you probably want to avoid that overloaded name.

pp has the added advantage that it recognizes void context (Item 12. Understand context and how it affects operations) and in that case fills in the print STDERR for you:

use strict;
use warnings;

use Data::Dump qw(dump);
use DateTime;
use HTTP::Request;

my $request = HTTP::Request->new(
		GET => 'http://www.perl.org',
		);

$request->header( 'X-Perl' => '5.12.2' );
$request->header( 'Cat'    => 'Buster' );

my $data = {
	hash => {
		cat  => 'Buster',
		dog  => 'Addy',
		bird => 'Poppy',
		},
	array => [ qw( a b c ) ],
	datetime => DateTime->now,
	request  => $request,
	};

pp( $data ); # special void context mode

If you don’t want to send it STDERR, or even print it, just don’t use pp in void context:

print $logfile pp( $data ); # send to another filehandle
my $string = pp( $data );   # save in a string

Since you had a couple of complex objects in $data, the output is a bit verbose, especially for the DateTime object:

{
  array    => ["a", "b", "c"],
  datetime => bless({
                formatter       => undef,
                local_c         => {
                                     day => 3,
                                     day_of_quarter => 34,
                                     day_of_week => 4,
                                     day_of_year => 34,
                                     hour => 2,
                                     minute => 35,
                                     month => 2,
                                     quarter => 1,
                                     second => 6,
                                     year => 2011,
                                   },
                local_rd_days   => 734171,
                local_rd_secs   => 9306,
                locale          => bless({
						 default_date_format_length => "medium",
						 default_time_format_length => "medium",
						 en_complete_name => "English United States",
						 en_language => "English",
						 en_territory => "United States",
						 id => "en_US",
						 native_complete_name => "English United States",
						 native_language => "English",
						 native_territory => "United States",
                                   }, "DateTime::Locale::en_US"),
                offset_modifier => 0,
                rd_nanosecs     => 0,
                tz              => bless({ name => "UTC" }, "DateTime::TimeZone::UTC"),
                utc_rd_days     => 734171,
                utc_rd_secs     => 9306,
                utc_year        => 2012,
              }, "DateTime"),
  hash     => { bird => "Poppy", cat => "Buster", dog => "Addy" },
  request  => bless({
                _content => "",
                _headers => bless({ "cat" => "Buster", "x-perl" => "5.12.2" }, "HTTP::Headers"),
                _method  => "GET",
                _uri     => bless(do{\(my $o = "http://www.perl.org")}, "URI::http"),
              }, "HTTP::Request"),
}

Last May, Steven Haryanto suggested some patches to Data::Dump to make this tractable. Instead of dumping the entire structure for DateTime objects, Steven provided a way to let you specify how you wanted to dump particular sorts of objects. Gisle Aas then added filter support to Data::Dump 1.16, providing a way for you to take over the pretty-printing for certain types of objects. The dumpf subroutine takes a coderef to act as a filter.

Your coderef can do anything it likes, but in this example, you have a generator, get_filter, that gets the context object, $ctx, from Data::Dump. That context object tells you just about everything that you need to know about the thing that Data::Dump is about to dump. If you return a hash reference, your version of the dump wins. If you return undef, Data::Dump handles it itself as it normally would (or tries another filter):

use strict;
use warnings;

use Data::Dump qw(dumpf);
use DateTime;
use HTTP::Request;

BEGIN {
my %filters = (
	'DateTime'      => sub { $_[0]->ymd },
	'HTTP::Request' => sub { $_[0]->uri },
	);

sub get_filter {
	my( $ctx ) = @_;
	
	if( $ctx->is_blessed and exists $filters{ $ctx->class } ) {
		my $string = $filters{ $ctx->class }->( $ctx->object_ref );
		return {
			'dump' => $string,
			};
		}
	
	return;
	}
}

my $request = HTTP::Request->new(
		GET => 'http://www.perl.org',
		);

$request->header( 'X-Perl' => '5.12.2' );
$request->header( 'Cat'    => 'Buster' );


my $data = {
	hash => {
		cat  => 'Buster',
		dog  => 'Addy',
		bird => 'Poppy',
		},
	array => [ qw( a b c ) ],
	datetime => DateTime->now,
	request  => $request,
	};

dumpf( $data, \&get_filter ); # special void context mode

Now the output is much easier to read now that you have specialized handlers for the DateTime and HTTP::Request:

{
  array    => ["a", "b", "c"],
  datetime => 2011-02-03,
  hash     => { bird => "Poppy", cat => "Buster", dog => "Addy" },
  request  => http://www.perl.org,
}

Note that you stored the actual filters outside of the get_filter subroutine. With another closure, you can add a way to alter the %filters hash during the run of the program (Item 49. Create closures to lock in data), but you don’t need to worry about that for this example.

You can affect other sorts of references too. If you have a really long array, maybe you only want to show a couple of the elements, like the first two and last one so you don’t have to see the entire list. You can ask the context object is it’s trying to dump an array reference and dispatch the dumper accordingly, in this case to remove some elements from the display. In your filter to handle ARRAY, you make a copy of the reference (Item 17. Know common shorthand and syntax quirks) so you can change it without changing the original data. Instead of just returning a hash with the dumped string, you can also add a comment to the output to note that you removed some of the elements:

use strict;
use warnings;

use Data::Dump qw(pp);

use DateTime;
use List::Util qw(shuffle);

BEGIN {
my %filters = (
	'DateTime'      => sub { $_[0]->ymd },
	'ARRAY'         => 
		sub {
			if( @{ $_[0] } < 4 ) { print "Less than four\n"; $_[0] }
			else {
				require Storable;
				my $clone = Storable::dclone( $_[0] );
				my $last = pop @$clone;
				splice @$clone, 2;
				push @$clone, '...', $last;
				$clone;
				}
			},
	);

sub get_filter {
	my( $ctx, $object ) = @_;
	
	if( $ctx->is_blessed and exists $filters{ $ctx->class } ) {
		my $string = $filters{ $ctx->class }->( $ctx->object_ref );
		return {
			'dump' => $string,
			};
		}
	elsif( $ctx->is_array ) {
		my $object = $filters{ 'ARRAY' }->( $ctx->object_ref );
		return {
			'object' => $object,
			'comment' => 'some items hidden',
			};
		}
	
	return;
	}

}

my $data = {
	hash => {
		cat  => 'Buster',
		dog  => 'Addy',
		bird => 'Poppy',
		},
	array => [ shuffle 'a' .. 'z' ],
	datetime => DateTime->now,
	};

dumpf( $data, \&get_filter );

Now your array won’t dominate the output:

{
  array => # some items hidden
  ["y", "x", "...", "w"],
  datetime => 2011-02-03,
  hash => { bird => "Poppy", cat => "Buster", dog => "Addy" },
}

There’s another way that you can do this though. That example works because you got to start from scratch and you could choose to use the dumpf subroutine. What if someone was already using dump and you want to add a handler to it? You can use Data::Dump::Filtered so set a global filter that affects every call to dump. Now you can isolate the filter part to just the BEGIN block:

BEGIN {
use Data::Dump::Filtered qw(add_dump_filter);

my %filters = (
	'DateTime'      => sub { $_[0]->ymd },
	'HTTP::Request' => sub { $_[0]->uri },
	);

sub get_filter {
	my( $ctx, $object ) = @_;
	
	if( $ctx->is_blessed and exists $filters{ $ctx->class } ) {
		my $string = $filters{ $ctx->class }->( $ctx->object_ref );
		return {
			'dump' => $string,
			};
		}
	
	return;
	}

add_dump_filter( \&get_filter );
}

This global filter makes it a bit easier to segregate all the parts the handle the filtering. In your array, you have a DateTime object and a shuffled list of the letters from a to z, just to make the output more interesting. There’s another interesting twist here, though. If you want to dump the array yourself, you have to take responsibility for dumping everything that it contains. The handler for ARRAY creates and returns a new object, $clone, that’s different from the one it got as an argument. Inside get_filter, you get that object then re-dump it with the pp($object) call:

use strict;
use warnings;

use Data::Dump qw(pp);

use DateTime;
use List::Util qw(shuffle);

my $data = {
	hash => {
		cat  => 'Buster',
		dog  => 'Addy',
		bird => 'Poppy',
		},
	array => [ DateTime->now, shuffle 'a' .. 'z' ],
	datetime => DateTime->now,
	};

pp( $data );


## put this part out of the way
BEGIN {
use Data::Dump::Filtered qw(add_dump_filter);

my %filters = (
	'DateTime'      => sub { $_[0]->ymd },
	'ARRAY'         => 
		sub {
				require Storable;
				my $clone = Storable::dclone( $_[0] );
				my $last = pop @$clone;
				splice @$clone, 2;
				push @$clone, '...', $last;
				$clone;
			},
	);

sub get_filter {
	my( $ctx, $object ) = @_;
	
	if( $ctx->is_blessed and exists $filters{ $ctx->class } ) {
		my $string = $filters{ $ctx->class }->( $ctx->object_ref );
		return {
			'dump' => $string,
			};
		}
	elsif( $ctx->is_array ) {
		return if @{ $ctx->object_ref } <= 4;
		my $object = $filters{ 'ARRAY' }->( $ctx->object_ref );
		return {
			'dump'    => pp( $object ),
			'comment' => 'some items hidden',
			};
		}
	
	return;
	}

add_dump_filter( \&get_filter );
}

Now your long array doesn’t look so bad, although it doesn’t show you all of the values:

{
  array => # some items hidden
  [2011-02-03, "d", "...", "n"],
  datetime => 2011-02-03,
  hash => { bird => "Poppy", cat => "Buster", dog => "Addy" },
}

The Data::Dump module has many other interesting features. Another useful feature is the ddx subroutine. It’s pp, but with a leading # at the beginning of every line, and even better, it prepends the file and line number to it’s string:

# test.pl:51: {
#   array    => ["a", "b", "c"],
#   datetime => 2011-02-03,
#   hash     => { bird => "Poppy", cat => "Buster", dog => "Addy" },
#   request  => http://www.perl.org,
# }

However, where pp sends its output to STDERR by default, ddx sends it to STDOUT by default. That’s only in void context, though, so you can always use it as part of a print statement so you can send it anywhere that you like (or assign it to a string, even).

Things to remember

  • Data::Dump and other modules can pretty print data structures, but may be verbose.
  • You can handle particular objects or references yourself to make your output easier for you to read.
  • You have to also dump the elements in the references or objects that you want to control.
Leave a comment

1 Comments.

  1. Really good info on Dumper especially the pp subroutine. Thanks for writing.

    Pradeep

Leave a Reply


[ Ctrl + Enter ]

7ads6x98y