Specify any character by its octal ordinal value.

Perl 5.14 gives you some new ways to represent characters so you can avoid some annoying and ambiguous interpolations. Not only that, the new syntax unifies the different ordinal representations so you can specify characters using the same syntax even if you want to use different bases. This feature was added in Perl 5.13.3, in the development branch leading to the next stable version.

Before you begin with the new syntax, remember the pre-Perl 5.14 ways that you can specify characters by their ordinal values. You can specify these in either octal or hexadecimal:

use 5.010;
say "P\x65\x{72}\154"; # Perl

Notice how the the octal sequence is different than the hexadecimal sequence. There’s nothing that explicitly marks it as an octal number; it just has to be the digits 0 to 7 after a foward slash. Additionally, you can specify the hexadecimal value either with of without braces around the value.

Now, consider so issues with the \ syntax. Where does the number end? Since you denote an literal octal number starting with a 0, you might want to put a 0 in front of your ordinal character value. However, changing the \154 to \0154 doesn’t do what you might expect:

use 5.010;
say "P\x65\x{72}\0154"; # WTF???

The output is odd. How did that 4 get there?

4er

The octal ordinal value only extends to three characters, so \0154 is really the character represented by \015 and the literal character 4 and there’s no warning to tell you that Perl treated it as two characters. The output is a bit odd because \015 is a carriage return. If you’ve never used a proper typewriter, you might realize that a carriage return is just that: you push the carriage back to the beginning of the line without advancing the paper.

If you start typing after just a carriage return, you overstrike what’s already there. On a computer terminal, instead of seeing one character overlayed on another, you just see the latest character you printed in that position. It’s easier to see what’s going on if you look at all of the output. A hexdump is a good way to do that:

$ perl -lwE 'say "P\x65\162\0154"' | hexdump -C
00000000  50 65 72 0d 34 0a                                 |Per.4.|
00000006

From that you suss out that there is still a P there.

It’s a bit more tricky though. You don’t have to use three digits for an octal sequence. You can use less than three as long as the following character is not an octal digit. In this case, the \12 is still a newline:

$ perl -lwE 'say "\12ab"'             

ab

You can even do it with one digit. There’s no need for a visual representation in this case:

perl -E 'say "\7"'

So what’s the big problem? You have to remember that there’s another place in Perl, one that is also a double-quoted context, where you can use a backslash followed by some numbers. In that one case, you might not mean it as an ordinal representation of a character:

my $string = 'abcabc';

$string =~ s/(...)\1/$1/;

The pattern portion of the match or substitution operator, or the regular expression quoting operator, use the \digit sequence as back references. Instead of matching the a vertical tab, \13 matches the 13th capture buffer (if you have one).

To get around this, Perl 5.14 introduces the same syntax for octal ordinal values as you already can use for hexadecimal values. You can specify the octal value with a \o followed by the value in braces. That is always the character, never the backreference:

use 5.013003;

my $string = '...';

$string =~ s/\o{13}/\f/;

Problem solved, right? Not so fast, you haven’t seen the other problem this can solve. Previously, you could only use up to three octal digits to specify an ordinal value. In the Age of Unicode, you need more digits than that if you want to specify any character in the Unicode Character Set in octal (even if it is a bit daft to do that since hex is much easier. With the braces, you can use as many digits as you like to specify the Unicode code point:

use 5.013003;

binmode STDOUT, ':utf8';

say "It's starting to sound a lot like \o{23003}";

You couldn’t specify that ordinal value in octal previously. Now you can have all of the octal snowmen that you like, as long as you start using Perl 5.14 when it’s available.

Things to remember

  • In a general double-quoted context, \digits specifies the ordinal value of a character in octal, using 1 to 3 digits.
  • In a regular expression, which is also a double-quoted context, \digits is a backreference.
  • Perl 5.14 introduces \o{digits}, which is homologous with \x{digits}, to specify an ordinal value of arbitrary size.
  • When you start using Perl 5.14, specify all ordinal character values with the braces syntax.
Leave a comment

0 Comments.

Leave a Reply


[ Ctrl + Enter ]