Category Archives: regular expressions

Use lookarounds to split to avoid special cases

There are some regular expression tricks that can help you deal with balanced delimiters in a string. The split command takes a pattern, removes the parts of a string that match that pattern, and give you a list of the parts of the string between those separators. Said another way, split works when the parts […]

Use lookarounds to eliminate special cases in split

The split built-in takes a string and turns it into a list, discarding the separators that you specify as a pattern. This is easy when the separator is simple, but seems hard if the separator gets more tricky. For a simple example, you can split an entry from /etc/password (although getpw* functions will do that […]

Set default regular expression modifiers

Are you tired of adding the same modifiers to all of your regular expressions? For instance, if you might always add the /u modifier to turn on Unicode semantics on all of your patterns, including qr//, m//, and s///. Instead of remembering to do that to every pattern, the re that ships with Perl 5.14 […]

The \R generic line ending

Perl v5.10 adds a regular expression shortcut \R that matches anything the Unicode specification thinks is a line ending. It looks similar to a character class shortcut but it’s not. It can match the sequence of carriage-return line-feed, but character classes don’t match sequence.

Find dates with Regexp::Common

[This is a mid-week bonus item] Suppose you want to find some dates inside a big string. The problem with dates is that there are some many ways to write them, and even if you can come up with a pattern to get the structure right, can you handle the different locales and languages that […]

Use Regexp::Common to find locale-specific dates

[This is a mid-week bonus item, and it’s a bit of a departure from much of what you have already seen on this blog. This is just some code that I had to write this week and I thought you’d like to see it.] I had to find some dates inside a big string, and […]

Know your character classes under different semantics

Now the Perl is Unicode aware (and, it’s been that way for a long time even if you haven’t been), you might have to be more careful in your regular expressions. Some of the character classes are much more inclusive than the ASCIIphile might imagine. In ASCIIland, character and byte semantics are the same thing. […]

Use the \N regex character class to get “not a newline”

Perl 5.12 introduced an experimental regex character class to stand in for every character except one, the newline. The \N character class is everything but the newline. In prior versions of Perl, this is the same thing as the . meta character. That is, it’s the same as long as someone doesn’t add the /s […]

Use a smart match to match several patterns at once

The smart match operator (Item 23. Make work easier with smart matching) reduces many common comparisons to a few keystrokes, keeping with Perl’s goal of making the common things easy. You can use the smart match operator to make even less common tasks, such as matching many regular expressions at the same time, just as […]

Let perl create your regex stringification

Perl 5.14 changes how regular expression objects stringify. This might not seem like a big deal at first, but it exposes a certain sort of bug that you may have never considered. It even broke several modules on CPAN. If you previously tested for hard-coded stringifications of patterns, Perl 5.14 is probably going to break […]