Posted by brian d foy on January 16, 2011
Now the Perl is Unicode aware (and, it’s been that way for a long time even if you haven’t been), you might have to be more careful in your regular expressions. Some of the character classes are much more inclusive than the ASCIIphile might imagine. In ASCIIland, character and byte semantics are the same thing. [...]
Posted by brian d foy on January 9, 2011
Perl 5.12 introduced an experimental regex character class to stand in for every character except one, the newline. The \N character class is everything but the newline. In prior versions of Perl, this is the same thing as the . meta character. That is, it’s the same as long as someone doesn’t add the /s [...]
Posted by brian d foy on October 11, 2010
Perl 5.14 gives you some new ways to represent characters so you can avoid some annoying and ambiguous interpolations. Not only that, the new syntax unifies the different ordinal representations so you can specify characters using the same syntax even if you want to use different bases. This feature was added in Perl 5.13.3, in [...]
Posted by josh on February 18, 2010
At Frozen Perl I did a quick presentation about Unicode and Perl. I had to do some work on the slides before releasing them publicly, but here they are… Be sure to look at the author notes if you want more detailed information.
Posted by brian d foy on February 14, 2010
In the Effective Perl class I gave at Frozen Perl last week, I got a question I didn’t have the quick answer to. What happens to the strings when Encode’s decode function only partially decodes the string? The default behavior for decode always decodes the entire string, although it uses substitution character (0xFFFD, which may [...]