Know the difference between character strings and UTF-8 strings

Normally, you shouldn’t have to care about a string’s encoding. Indeed, the abstract string has no encoding. It exists as an idea without a representation and it’s not until you want to put it on disk, send it down a pipe, or otherwise force it to exist as electrical pulses, magnetic pole orientation, and so on that you need to think of it in concrete terms. All stored data, even ASCII, has an encoding. Until you force it to have a bit pattern to live in the tangible world, you shouldn’t have to worry about anything like an encoding. Continue reading “Know the difference between character strings and UTF-8 strings”

Use a Task distribution to specify groups of modules

A Task distribution is like a normal Perl distribution in structure, but it doesn’t actually provide any code. It lists as pre-requisites all of the modules or distributions that you want to install so you can use a conventiional CPAN tool to install all of the dependencies. A Task is slightly different from the older way, a Bundle, but for most people and uses, a Task might be a better way. Continue reading “Use a Task distribution to specify groups of modules”

Group tests by their task with Test::More’s subtest()

In the earlier Item, Understand the Test Anywhere Protocal (TAP), you saw the very basics of that simple, line-oriented test report. You ran a single test and it output a single line to denote the status of the test, and possibly some diagnostic information. The TAP, however, didn’t organize any of the tests for you. Continue reading “Group tests by their task with Test::More’s subtest()”

Modify XML data with XML::Twig

If you need to deal with XML, first, we’re very sorry. Maybe you did something wrong if a previous life, such as munging XML with regular expressions. If you do better in this life, perhaps you won’t have to deal with XML in the next one. That right thing might be using XML::Twig, a powerful package for walking an XML tree, each part of which is a twig. For the rest of this Item, I’ll just call the module Twig. Continue reading “Modify XML data with XML::Twig”

Fix Test::Builder’s Unicode issue

The perl interpreter is getting much better with its Unicode support, but that doesn’t mean everything just works because most of the code you probably are about is in modules, which might not have kept up. Some of this becomes apparent when you give another module some Unicode strings for it to output. Continue reading “Fix Test::Builder’s Unicode issue”

Be careful with Unicode character ranges

Unicode character ranges have the same gotchas as the ASCII character ranges, although they become more apparent and more important. You’re probably used to creating a range for all the letters, like the character classes [A-Z] or [a-z], the range 'a' .. 'z', or the range in a transliteration, and not having a problem. If you look at the ASCII sequence, you see that there is an unbroken series of letters in those ranges. Continue reading “Be careful with Unicode character ranges”

Set default regular expression modifiers

Are you tired of adding the same modifiers to all of your regular expressions? For instance, if you might always add the /u modifier to turn on Unicode semantics on all of your patterns, including qr//, m//, and s///. Instead of remembering to do that to every pattern, the re that ships with Perl 5.14 now lets you do that for all patterns in the current lexical scope. You can also turn off a modifier for the rest of the scope. Continue reading “Set default regular expression modifiers”

Treat Unicode strings as grapheme clusters

If you need to work with Unicode strings, you probably don’t want to use Perl’s built-in string manipulation functions. This might seem a strange thing to say about a lnaguage whose main feature is string processing, but it’s a consequence of Perl’s ease in string processing.

Consider what a string is. Think of that for a moment. Write out your definition if you need to. Now, what is a string in Perl? Does it match your definition? Continue reading “Treat Unicode strings as grapheme clusters”

Set the line number and filename of string evals

Errors from a string eval can be tricky to track down since perl doesn’t tell you where the eval was. It treats each of the string evals as a separate, virtual file because it doesn’t remember where the string argument came from. Since perl compiles that during the run phase (see Know the phases of a Perl program’s execution), the information the compiler dragged along for filenames and line numbers is so longer around. Continue reading “Set the line number and filename of string evals”