Use formats to create paginated, plaintext reports

Perl’s format feature allows you to easily create line-oriented text reports with pagination, and if that’s what you want, Perl is for you. This item is just an introduction. You can find the full details in perlform, and in future items.

You may have never heard of formats, but, believe it or not, they were an exciting feature 20 years ago. Learning Perl, First Edition had an entire chapter on them, but that chapter disappeared in the Learning Perl, Third Edition, cutting formats out of this millineum. Formats might not be the new black, but they do have some class in an old-school sorta way.

Formats provide some attractive features:

  • Automatic pagination and page numbering
  • Nice columnar output formatting
  • Easily lining up decimal points in numbers

You could handle all of this yourself, but why would you want to if formats can already do it for you? You can get close with pack or printf, but you have to do a lot of extra work that formats already do.

There is a modular version of formats in Perl6::Form. Indeed, one of the first reactions to Perl 6 was “You better not take away my formats!” to which Damian replied in Exegesis 7 “We’re not taking them away, just moving them to a module. Oh, and I’m making them even cooler!”. Or something like that. It was several years ago, you have to realize.

That doesn’t mean that Perl 5 formats are useless today. If you want to output plaintext lists, as in actually put ink on paper, perhaps so you can take an inventory all of the cats in your house, then formats are quite useful. In this Item, you’ll create output that looks like:

Household Pet Inventory, Page  1

 ID   NAME          DIET            OUNCES
--------------------------------------------
 12   Buster        Tuna            1.00
  1   Mimi          Pounces         0.30
 37   Ginger        Fancy Feast     0.75
 19   Ellie         White Castle    9.99

First, you have to define the formats themselves. These are somewhat like named subroutine definitions in that they can appear anywhere in the file and are defined at compile time. Every format definition is at least three lines. In the first line, you use the format keyword, an identifier (which is conventionally all uppercase since formats have no sigil of their own), and an =. The middle lines, which you’ll see in a moment, are the form lines. Finally, you end the definition with a literal full stop, ., that has to be on the line by itself and at the beginning of the line:

format STDOUT =
... form lines goes here ...
.

The form lines have two parts, the picture lines, and the variables holding the values for the pictures. The pictures are quite sophisticated, like printf, which knows what sort of value it expects, but with the extra formatting. You’ll only see some of the simple pictures in this Item. Most picture fields will start with a @ to denote the start of the picture then have other characters to denote what sort of picture it is:

Picture Meaning
@<<<<<< left justified text
@>>>>>> right justified text
@|||||| centered text
@##.## A number with a decimal point, aligning on the decimal point

After each picture line, you specify variables whose values will fill in the pictures in the previous form line. You specify the variables in the same order as the picture fields, and you separate the variables in commas. You can put the variable anywhere you like in the line (as long as its in the right order), so you can line them up with their pictures. A simple, format, then looks like this:

format STDOUT =
@##   @<<<<<<<<<    @<<<<<<<<<<<    @#.##
$id,  $name,        $food,          $amount
.

As designed, the format name matches the bareword filehandle you'll use it with (but more on that later). You name the format STDOUT because you want to use it with the standard output file handle.

When you are ready to output some data, you use write. This design shows the age of formats since write doesn't take arguments to fill in the pictures. It uses the variables that are in scope. The argument it does take, however, is the filehandle you want to write to:

our( $id, $name, $food, $amount ) = qw( 12 Buster Tuna 1.0 );
write();

If you don't specify a filehandle, write uses the current default filehandle. Putting that all together, you have:

use strict;
use warnings;

our( $id, $name, $food, $amount ) = qw( 12 Buster Tuna 1.0 );
write();

format STDOUT =
@##   @<<<<<<<<<    @<<<<<<<<<<<    @#.##
$id,  $name,        $food,          $amount
.

When you run this starter program, you get a single line of output:

 12   Buster        Tuna             1.00

That's fine, but you probably want to know what those columns are. There's a special format for that, too. By appending _TOP to the end of the format name, Perl knows to add a header when it starts a new page. This top-of-page format is like the previous format:

#!perl
use strict;
use warnings;

our( $id, $name, $food, $amount ) = qw( 12 Buster Tuna 1.0 );
write();

format STDOUT =
@##   @<<<<<<<<<    @<<<<<<<<<<<    @#.##
$id,  $name,        $food,          $amount
.

format STDOUT_TOP =
 ID   NAME          DIET            OUNCES
--------------------------------------------
.

Now your report comes out with column headers:

 ID   NAME          DIET            OUNCES
--------------------------------------------
 12   Buster        Tuna             1.00

Perl's formats are smart enough to know which page of output they are on, and they store that in the $% variable. You can use that to fill in a picture in the top-of-page format:

format STDOUT_TOP =
Household Pet Inventory, Page @#
                              $%
                              
 ID   NAME          DIET            OUNCES
--------------------------------------------
.

Now your report has more information at the top, which will appear every time the format starts a new page:

Household Pet Inventory, Page  1

 ID   NAME          DIET            OUNCES
--------------------------------------------
 12   Buster        Tuna             1.00

When does a format start a new page, though? The formats know how many lines are in a page. That's in the $= variable, which you can set yourself if you don't want the default 60 lines. The formats are also keeping track of the the number of lines it has output so far. When it reaches the number of lines per page, it outputs a page break to end the page. This is a form feed, so no matter how much physical space you have left on the sheet of paper, your printer should spit it out and move on to the next sheet of paper. The formats then output the top-of-page information and continue where they left off. The \f in this output represents the page break:

Household Pet Inventory, Page  1

 ID   NAME          DIET            OUNCES
--------------------------------------------
 12   Buster        Tuna            1.00
  1   Mimi          Pounces         0.30
....many other lines....

\fHousehold Pet Inventory, Page  2

 ID   NAME          DIET            OUNCES
--------------------------------------------
 78   Roscoe        Turkey babyfood 0.15

Using formats with modern Perl

Formats are a bit antiquated. They've been a feature since Perl only had bareword filehandles so they look a bit clunky in the world of filehandles stored in variables. By default, formats look for a format name that matches the filehandle you want to write to. That is, if you want to send yout output to STDOUT, you name your formats STDOUT and STDOUT_TOP. However, plenty of experienced Perlers will complain when they see you using bareword filehandles.

It's easy to use formats with filehandles you store in scalar variables, though. You just have to tell Perl which format names it should use. As with many things in Perl, you can change special variables to do this. The per-filehandle variables $~ and $^ hold the names of the format and top-of-page format respectively. As with any of the per-filehandle special variables, each filehandle has their own versions of these and you can only set them on the default filehandle. Thus, you use select to change the default filehandle:

open my $fh, '>', $filename or die ...;
{
my $old_default = select( $fh );
$^ = 'CAT_INVENTORY_TOP';
$~ = 'CAT_INVENTORY';
select( $old_default );
}

You write to a filehandle by specifying it as an argument:

write( $fh );

Formats are also a bit crufty because you don't pass arguments to write to fill in the pictures. Perl relies on variables with the specified names being in scope. You can use lexical variables, but they have to be in the same scope as the format definition, and they have to be in scope when you call write. It's impractical to do that with lexicals, so the most agile way involves localized package variables:

foreach my $record ( @cats ) {
	local( $id, $name, $food ) = @$record;
	write( $fh );
	}

That somewhat mitigates the cruftiness of the format design, and that's about the best that you can do. If formats provide the features you need, however, then its not that annoying to deal with their eccentricities.

Things to remember

  • Use formats to create paginated text reports
  • Set the format names yourself when you use a filehandle in a variable
  • Use localized package variables to set data for the format

2 thoughts on “Use formats to create paginated, plaintext reports”

  1. Wouldn’t using IO::Handle’s format_name and format_top_name methods be better than a select?

Comments are closed.