Use the > and < pack modifiers to specify the architecture

Byte-order modifiers are one of the Perl 5.10 features farther along in perl5100delta, after the really big features. To any pack format, you can append a < or a > to specify that the format is little-endian or big-endian, respectively. This allows you to handle endianness in the formats that don’t have specify versions for each architecture already, as well as apply endianness to groups.

Before you think about the < and > modifiers, consider those that already specify the endianness. The n and N formats specify an unsigned short or long in “network order”, which is big-endian. The v and V formats specify the same things, but in “VAX order”, which is little endian.

Here’s a test program which takes some bytes, which you specify in a string using the hex representation of each charater (just like pack would). Once you have the string, you use both N and V to unpack that, finding out which one works on your system. The L format always does it using the local architecture:

use 5.010;

my $string = "\xAA\xBB\xCC\xDD";

foreach my $format ( qw(N V) ) {
	my $number = unpack $format, $string;
	say sprintf "%s is 0x%X", $format, $number;
	say "Your native format is $format" if $number == pack 'L', $string;
	}

The output shows that the little-endian order switches the bytes around, and that this program ran on a little-endian machine (in this case, a MacBook Air, which uses Intel processors):

N is 0xAABBCCDD
V is 0xDDCCBBAA
Your native format is V

For those, you need to know which order you have, either by knowing the architecture or getting the producer of the data to tell you the format. For instance, UTF-16 text files can have a byte order mark, 0xFEFF; that’s a short integer (two bytes). If you are using a big endian machine, when you read that short you get 0xFEFF. If you are using a little endian machine, you get 0xFFFE because it switches the bytes around as you saw before.

The other pack formats use the native format so you haven’t had a way to specify which order to interpret the bytes. These formats have always used the native architecture (meaning they will get it wrong on the other architecture):

Format	Description
s, S	signed and unsigned shorts (two bytes)
i, I	signed and unsigned integers (at least four bytes)
l, L	signed and unsigned longs
q, Q	signed and unsigned quads (if you have a 64-bit perl)
j, J	signed and unsigned Perl internal integers
f	single-precision floating-point value
d	double-precision floating-point value
F	Perl internal floating−point value
D	long-double-precision floating-point value
p, P	pointers to a null-terminated string and a structure

Perl 5.10 let’s you specify the architecture these formats should use. You can use big-endian values even if you are using a little-endian machine. Suppose you have π encoded as a single-precision floating point value in big-endian even though you have a little-endian machine. The native format

use 5.010;

my $pi_string = "\x40\x49\x0F\xDA"; # 3.14159250259399 in big-endians

foreach my $format ( qw(f f< f>) ) {
	my $number = unpack $format, $pi_string;
	say sprintf "%s is %f", $format, $number;
	}

The f and f< give the non-π results. The f assumes the native, little-endian format while the f< makes it explicit. The f> specifies big-endian format despite the native architecture, and it gets the right value (with normal floating-point rounding error):

f is -10082865224089600.000000
f< is -10082865224089600.000000
f> is 3.141593

You can also apply these modifiers to groups so that all of the modifiable formats in that group. This example tries combinations of unsigned shorts in either format:

use 5.010;

my $string = "\xAA\xBB\xCC\xDD";

foreach my $format ( qw| SS S<S> S>S< (SS)> (SS)< | ) {
	my( $first, $second ) = unpack $format, $string;
	say sprintf "%5s is 0x%X 0x%X", $format, $first, $second;
	}

The output shows you show the S format changes based on which architecture you tell pack to use:

   SS is 0xBBAA 0xDDCC
 S<S> is 0xBBAA 0xCCDD
 S>S< is 0xAABB 0xDDCC
(SS)> is 0xAABB 0xCCDD
(SS)< is 0xBBAA 0xDDCC

You still have to know which architecture your data are in, but at least you can tell Perl which format you want.

Things to remember

Most pack formats rely on the native architecture
Perl 5.10 introduces the < and > modifiers
so you can specify the architecture
The < specifies little-endian because the little side touches the specifier
The > specifies big-endian because the big side touches the specifier