<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>The Effective Perler</title>
	<atom:link href="http://www.effectiveperlprogramming.com/feed" rel="self" type="application/rss+xml" />
	<link>http://www.effectiveperlprogramming.com</link>
	<description>Effective Perl Programming - write better, more idiomatic Perl</description>
	<lastBuildDate>Wed, 18 Apr 2012 22:03:06 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0.1</generator>
		<item>
		<title>Don&#8217;t use auto-dereferencing with each or keys</title>
		<link>http://www.effectiveperlprogramming.com/blog/1539</link>
		<comments>http://www.effectiveperlprogramming.com/blog/1539#comments</comments>
		<pubDate>Sat, 31 Mar 2012 09:31:18 +0000</pubDate>
		<dc:creator>brian d foy</dc:creator>
				<category><![CDATA[5.12]]></category>
		<category><![CDATA[5.14]]></category>
		<category><![CDATA[Idiomatic Perl]]></category>

		<guid isPermaLink="false">http://www.effectiveperlprogramming.com/?p=1539</guid>
		<description><![CDATA[Perl 5.14 added an auto-dereferencing features to the hash and array operators, and I wrote about those in Use array references with the array operators. I&#8217;ve never particularly liked that feature, but I don&#8217;t have to like everything. Additionally, Perl 5.12 expanded the job of keys and values to also work on arrays. chromatic has [...]]]></description>
			<content:encoded><![CDATA[<p>Perl 5.14 added an auto-dereferencing features to the hash and array operators, and I wrote about those in <a href="http://www.effectiveperlprogramming.com/blog/756">Use array references with the array operators</a>. I&#8217;ve never particularly liked that feature, but I don&#8217;t have to like everything. Additionally, Perl 5.12 expanded the job of <a href="http://perldoc.perl.org/functions/keys.html">keys</a> and <a href="http://perldoc.perl.org/functions/values.html">values</a> to also work on arrays.</p>
<p>chromatic has explicated a problem with <a href="http://perldoc.perl.org/functions/each.html">each</a>, which is both an array and hash operator. He details it in <a href="http://www.modernperlbooks.com/mt/2012/03/inadvertent-inconsistencies-each-in-perl-512.html">Inadvertent Inconsistencies: each in Perl 5.12</a> and <a href="http://www.modernperlbooks.com/mt/2012/03/inadvertent-inconsistencies-each-versus-autoderef.html">Inadvertent Inconsistencies: each versus Autoderef</a>. In short, if you use it with a reference, Perl doesn&#8217;t know until it actually executes the <a href="http://perldoc.perl.org/functions/each.html">each</a> if it&#8217;s going to use it&#8217;s array or hash behavior (and in some cases, blow up with either). However, as the programmer, I probably know which behavior I want:</p>
<pre class="brush:perl">
while( my( $index, $value ) = each $ref ) { my $elem = $other_array->[$index]; } # I want array behavior
while( my( $key, $value ) = each $ref ) { ... } # I want hash behavior
</pre>
<p>The problem isn&#8217;t when it blows up, which is easy to catch (it blows up). If you get the wrong sort of reference, you&#8217;ll get nonsensical indices or keys. If you have an array reference, you&#8217;ll get numbers with the first return value. If you have a hash reference, you&#8217;ll get strings. If you get strings but treat them as array indices, you&#8217;ll likely always get array index 0, unless the key is a number. You might even get an odd index. If the key is <code>123Buster</code>, you&#8217;ll get array index <code>123</code> due to Perl&#8217;s numification. Going the other way, using an array reference when you expected a hash, you&#8217;ll have to find keys that are whole numbers.</p>
<p>Effective programs reduce ambiguity in their code, but this new feature increases it. It&#8217;s easy to fix; you dereference them yourself. If you have the wrong reference type, you&#8217;ll find out right away:</p>
<pre class="brush:perl">
while( my( $index, $value ) = each @$ref ) { my $elem = $other_array->[$index]; } # I want array behavior
while( my( $key, $value ) = each %$ref ) { ... } # I want hash behavior
</pre>
<p>If you really wanted to keep the auto-dereferencing feature, you could check the reference type before you use it, but what&#8217;s the point of saving a character with the auto-dereferencing if you have to wrap the whole thing in a guard condition?</p>
<pre class="brush:perl">
if( ref $ref eq ref [] ) {
    while( my( $index, $value ) = each @$ref ) { ... }
    }
</pre>
<p>Now <a href="http://perldoc.perl.org/functions/keys.html">keys</a> has the same problem. You can use that either with an array or a hash, but at some point you&#8217;re probably going to have to know what sort of reference you have so you can use the key to dereference it. At that point, you effectively declare what sort of reference it should have been. If you have the wrong sort of reference, your script dies:</p>
<pre class="brush:perl">
my $ref = [ ... ];
foreach my $key ( keys $ref ) {
    my $elem = $ref->{$index}; # Big error!
    }
</pre>
<p>This problem is the unintended consequence of letting the other array and hash operators take a scalar variable as an argument and letting the parser automatically add the bits to dereference. David Golden <a href="http://www.dagolden.com/index.php/986/if-perl-were-smarter-about-references/">wanted more magic syntax</a> and the patch wasn&#8217;t so tough. To get the nicer syntax in some cases you end up dealing with more special cases. I noted this at the time David proposed it, but his enthusiasm for the interesting parts of the problem steamrolled over the bad parts.</p>
<p align="left"><a class="tt" href="http://twitter.com/home/?status=Don%E2%80%99t+use+auto-dereferencing+with+each+or+keys+http://tinyurl.com/7dzn6kj" title="Post to Twitter"><img class="nothumb" src="http://www.effectiveperlprogramming.com/wp-content/plugins/tweet-this/icons/tt-twitter2.png" alt="Post to Twitter" /></a> <a class="tt" href="http://twitter.com/home/?status=Don%E2%80%99t+use+auto-dereferencing+with+each+or+keys+http://tinyurl.com/7dzn6kj" title="Post to Twitter"> </a> <a class="tt" href="http://delicious.com/post?url=http://www.effectiveperlprogramming.com/blog/1539&amp;title=Don%E2%80%99t+use+auto-dereferencing+with+each+or+keys" title="Post to Delicious"><img class="nothumb" src="http://www.effectiveperlprogramming.com/wp-content/plugins/tweet-this/icons/tt-delicious.png" alt="Post to Delicious" /></a> <a class="tt" href="http://delicious.com/post?url=http://www.effectiveperlprogramming.com/blog/1539&amp;title=Don%E2%80%99t+use+auto-dereferencing+with+each+or+keys" title="Post to Delicious"> </a> <a class="tt" href="http://digg.com/submit?url=http://www.effectiveperlprogramming.com/blog/1539&amp;title=Don%E2%80%99t+use+auto-dereferencing+with+each+or+keys" title="Post to Digg"><img class="nothumb" src="http://www.effectiveperlprogramming.com/wp-content/plugins/tweet-this/icons/tt-digg.png" alt="Post to Digg" /></a> <a class="tt" href="http://digg.com/submit?url=http://www.effectiveperlprogramming.com/blog/1539&amp;title=Don%E2%80%99t+use+auto-dereferencing+with+each+or+keys" title="Post to Digg"> </a> <a class="tt" href="http://www.facebook.com/share.php?u=http://www.effectiveperlprogramming.com/blog/1539&amp;t=Don%E2%80%99t+use+auto-dereferencing+with+each+or+keys" title="Post to Facebook"><img class="nothumb" src="http://www.effectiveperlprogramming.com/wp-content/plugins/tweet-this/icons/tt-facebook.png" alt="Post to Facebook" /></a> <a class="tt" href="http://www.facebook.com/share.php?u=http://www.effectiveperlprogramming.com/blog/1539&amp;t=Don%E2%80%99t+use+auto-dereferencing+with+each+or+keys" title="Post to Facebook"> </a> <a class="tt" href="http://reddit.com/submit?url=http://www.effectiveperlprogramming.com/blog/1539&amp;title=Don%E2%80%99t+use+auto-dereferencing+with+each+or+keys" title="Post to Reddit"><img class="nothumb" src="http://www.effectiveperlprogramming.com/wp-content/plugins/tweet-this/icons/tt-reddit.png" alt="Post to Reddit" /></a> <a class="tt" href="http://reddit.com/submit?url=http://www.effectiveperlprogramming.com/blog/1539&amp;title=Don%E2%80%99t+use+auto-dereferencing+with+each+or+keys" title="Post to Reddit"> </a></p>]]></content:encoded>
			<wfw:commentRss>http://www.effectiveperlprogramming.com/blog/1539/feed</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>Look up Unicode properties with an inversion map</title>
		<link>http://www.effectiveperlprogramming.com/blog/1517</link>
		<comments>http://www.effectiveperlprogramming.com/blog/1517#comments</comments>
		<pubDate>Mon, 12 Mar 2012 22:52:25 +0000</pubDate>
		<dc:creator>brian d foy</dc:creator>
				<category><![CDATA[5.16]]></category>
		<category><![CDATA[Unicode]]></category>

		<guid isPermaLink="false">http://www.effectiveperlprogramming.com/?p=1517</guid>
		<description><![CDATA[Perl comes with extracts of the Unicode character data, but it hasn&#8217;t been easy to look up all of the information Perl knows about a character. Perl v5.15.7 adds a way to created an inverted map based on the property that you want to access. The Unicode::UCD module gives you access to some of the [...]]]></description>
			<content:encoded><![CDATA[<p>Perl comes with extracts of the Unicode character data, but it hasn&#8217;t been easy to look up all of the information Perl knows about a character. Perl v5.15.7 adds a way to created an inverted map based on the property that you want to access.</p>
<p>The <a href="https://www.metacpan.org/module/Unicode::UCD">Unicode::UCD</a> module gives you access to some of the information about a character:</p>
<pre class="brush:perl">
use Unicode::UCD 'charinfo';
use charnames qw(:full);
use Data::Dumper;

my $charinfo   = charinfo(
	ord( "\N{SMILING CAT FACE WITH OPEN MOUTH}" )
	);
print Dumper( $charinfo );
</pre>
<p>The output has many of the properties, but not all of them:</p>
<pre class="brush:plain">
$VAR1 = {
		  'digit' => '',
		  'bidi' => 'ON',
		  'category' => 'So',
		  'code' => '1F63A',
		  'script' => 'Common',
		  'combining' => 0,
		  'upper' => '',
		  'name' => 'SMILING CAT FACE WITH OPEN MOUTH',
		  'unicode10' => '',
		  'decomposition' => '',
		  'comment' => '',
		  'mirrored' => 'N',
		  'lower' => '',
		  'numeric' => '',
		  'decimal' => '',
		  'title' => '',
		  'block' => 'Emoticons'
		};
</pre>
<p>This doesn&#8217;t include the Age of the character, that is, when the character was added to Unicode. This might seem like a silly thing to know, but it came in handy typesetting <a href="http://www.programmingperl.org">Programming Perl</a>. We had problems with some characters but we couldn&#8217;t see a pattern until we looked at the age of all the problem characters. Any character added after Unicode 4.0 didn&#8217;t typeset correctly. It took some annoying work to get the age by scanning through each age until that property matched:</p>
<pre class="brush:perl">
#!/Users/brian/bin/perls/perl5.15.7

use v5.10;
use utf8;

use List::Util qw(first);

my @chars =  ( 'a', '→', '⣽', "\N{SMILING CAT FACE WITH OPEN MOUTH}" );

my @ages = qw( 1.1 2.1 2.0 3.0 3.1 3.2 4.0 4.1 5.0 5.1 5.2 6.0 );

foreach my $char ( @chars ) {
	my $age = first { $char =~ /\p{Age=$_}/ } @ages;
	say "Age: $age";
	}
</pre>
<p>It works, but it&#8217;s an unsatisifying kludge:</p>
<pre class="brush:plain">
a Age: 1.1
→ Age: 1.1
⣽ Age: 3.0
😺 Age: 6.0
</pre>
<p>Now, <a href="https://www.metacpan.org/module/Unicode::UCD">Unicode::UCD</a> has a <code>prop_invmap</code> to create an index based on a property you choose and a <code>_search_invlist</code> to return the offset in the map:</p>
<pre class="brush:perl">
#!/Users/brian/bin/perls/perl5.15.7

use 5.15.7;
use utf8;

use charnames qw(:full);
use List::Util qw(first);
use Unicode::UCD;

my @chars =  ( 'a', '→', '⣽', "\N{SMILING CAT FACE WITH OPEN MOUTH}" );

my @ages = qw( 1.1 2.1 2.0 3.0 3.1 3.2 4.0 4.1 5.0 5.1 5.2 6.0 );

foreach my $char ( @chars ) {
	my $age = age_of_char( $char );
	say "$char Age: $age";
	}

sub age_of_char {
	my( $char ) = @_;
	# create the inverted list, once
	# can only initialize as scalar
	state $inv = _make_age_inverted_list();

	my $i = Unicode::UCD::_search_invlist($inv->[0], ord $char);
	return $inv->[1][$i];
	}

# create the inverted list, once
sub _make_age_inverted_list {
	state( $list, $map, $format, $default, $init );
	unless( $init++ ) {
		($list, $map, $format, $default) = Unicode::UCD::prop_invmap("Age");
		$format eq "s" || die "wrong format $format";
		}
	return [ $list, $map ];
	}
</pre>
<p>That looks like a lot of work, but most of it happens once to setup the inversion map. </p>
<p align="left"><a class="tt" href="http://twitter.com/home/?status=Look+up+Unicode+properties+with+an+inversion+map+http://tinyurl.com/87fsowx" title="Post to Twitter"><img class="nothumb" src="http://www.effectiveperlprogramming.com/wp-content/plugins/tweet-this/icons/tt-twitter2.png" alt="Post to Twitter" /></a> <a class="tt" href="http://twitter.com/home/?status=Look+up+Unicode+properties+with+an+inversion+map+http://tinyurl.com/87fsowx" title="Post to Twitter"> </a> <a class="tt" href="http://delicious.com/post?url=http://www.effectiveperlprogramming.com/blog/1517&amp;title=Look+up+Unicode+properties+with+an+inversion+map" title="Post to Delicious"><img class="nothumb" src="http://www.effectiveperlprogramming.com/wp-content/plugins/tweet-this/icons/tt-delicious.png" alt="Post to Delicious" /></a> <a class="tt" href="http://delicious.com/post?url=http://www.effectiveperlprogramming.com/blog/1517&amp;title=Look+up+Unicode+properties+with+an+inversion+map" title="Post to Delicious"> </a> <a class="tt" href="http://digg.com/submit?url=http://www.effectiveperlprogramming.com/blog/1517&amp;title=Look+up+Unicode+properties+with+an+inversion+map" title="Post to Digg"><img class="nothumb" src="http://www.effectiveperlprogramming.com/wp-content/plugins/tweet-this/icons/tt-digg.png" alt="Post to Digg" /></a> <a class="tt" href="http://digg.com/submit?url=http://www.effectiveperlprogramming.com/blog/1517&amp;title=Look+up+Unicode+properties+with+an+inversion+map" title="Post to Digg"> </a> <a class="tt" href="http://www.facebook.com/share.php?u=http://www.effectiveperlprogramming.com/blog/1517&amp;t=Look+up+Unicode+properties+with+an+inversion+map" title="Post to Facebook"><img class="nothumb" src="http://www.effectiveperlprogramming.com/wp-content/plugins/tweet-this/icons/tt-facebook.png" alt="Post to Facebook" /></a> <a class="tt" href="http://www.facebook.com/share.php?u=http://www.effectiveperlprogramming.com/blog/1517&amp;t=Look+up+Unicode+properties+with+an+inversion+map" title="Post to Facebook"> </a> <a class="tt" href="http://reddit.com/submit?url=http://www.effectiveperlprogramming.com/blog/1517&amp;title=Look+up+Unicode+properties+with+an+inversion+map" title="Post to Reddit"><img class="nothumb" src="http://www.effectiveperlprogramming.com/wp-content/plugins/tweet-this/icons/tt-reddit.png" alt="Post to Reddit" /></a> <a class="tt" href="http://reddit.com/submit?url=http://www.effectiveperlprogramming.com/blog/1517&amp;title=Look+up+Unicode+properties+with+an+inversion+map" title="Post to Reddit"> </a></p>]]></content:encoded>
			<wfw:commentRss>http://www.effectiveperlprogramming.com/blog/1517/feed</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Fold cases properly</title>
		<link>http://www.effectiveperlprogramming.com/blog/1507</link>
		<comments>http://www.effectiveperlprogramming.com/blog/1507#comments</comments>
		<pubDate>Wed, 29 Feb 2012 22:04:51 +0000</pubDate>
		<dc:creator>brian d foy</dc:creator>
				<category><![CDATA[5.16]]></category>
		<category><![CDATA[Unicode]]></category>

		<guid isPermaLink="false">http://www.effectiveperlprogramming.com/?p=1507</guid>
		<description><![CDATA[You might think that you know how to compare strings regardless of case, and you&#8217;re probably wrong. After you read this Item, you&#8217;ll be able to do it correctly and without doing any more work than you were doing before. Perl handles all the details for you. If you grew up in the ASCII world, [...]]]></description>
			<content:encoded><![CDATA[<p>You might think that you know how to compare strings regardless of case, and you&#8217;re probably wrong. After you read this Item, you&#8217;ll be able to do it correctly and without doing any more work than you were doing before. Perl handles all the details for you.</p>
<p>If you grew up in the ASCII world, case insensitivity is a difference of literally one bit, so changing case is setting or unsetting a bit in the octet that represents that character.</p>
<p>If you&#8217;ve read the Perl FAQ, you may have seen this quip:</p>
<blockquote><p>
&#8220;Perl&#8221; is the name of the language. Only the &#8220;P&#8221; is capitalized. The name of the interpreter (the program which runs the Perl script) is &#8220;perl&#8221; with a lowercase &#8220;p&#8221;.
</p></blockquote>
<p>When Larry Wall was asked what the difference between &#8220;Perl&#8221; and &#8220;perl&#8221;, he said &#8220;One bit&#8221;. It&#8217;s literally a difference of flipping one bit in the ASCII representation. That&#8217;s as complicated as ASCII case folding gets.</p>
<p>The capital letter <i>P</i> has the ordinal value 0b1010000. The small letter <i>p</i>, which shows up later in the ASCII sequence, has the ordinal value 0b1110000. This makes it extremely easy to write routines to change between upper and lower cases:</p>
<pre class="brush:perl">
use v5.10;

say "  U L";
say "-----";

foreach my $char ( qw(p P a b c A B C) ) {
	my $lower = chr( ord($char) | 0b0100000 );
	my $upper = chr( ord($char) &#038; 0b1011111 );

	say "$char $upper $lower";
	}
</pre>
<p>The output shows what you&#8217;d expect for the upper and lower cases:</p>
<pre class="brush:plain">
  U L
-----
p P p
P P p
a A a
b B b
c C c
A A a
B B b
C C c
</pre>
<p>Since bit flipping is easy to do, it&#8217;s very easy for even primitive computers to quickly change case (assuming that you&#8217;re not so primitive as to not have two cases). But, this only works if you restrict the output to the ASCII letters. If you want to handle non-letters, you have to do a bit more work to ensure that you don&#8217;t shift them into other characters:</p>
<pre class="brush:perl">
use v5.10;

say "  U L";
say "-----";

foreach my $char ( qw(p P a b c A B C # !) ) {
	my $upper = uppercase( $char );
	my $lower = lowercase( $char );

	say "$char $upper $lower";
	}

 sub lowercase {
 	my $_ = shift;
  	my $ord = ord();

 	return $_ unless $ord >= 0x41 and $ord <= 0x5A;
	return chr( $ord ^ 0b100000 );
	}

 sub uppercase {
 	my $_ = shift;
 	my $ord = ord();

 	return $_ unless $ord >= 0x61 and $ord <= 0x7A;
	return chr( $ord ^ 0b100000 );
	}
</pre>
<p>Now the non-letters stay the same character:</p>
<pre class="brush:plain">
  U L
-----
p P p
P P p
a A a
b B b
c C c
A A a
B B b
C C c
# # #
! ! !
</pre>
<p>This almost works for Latin-* encodings too. When you move out of the ASCII sequence into Unicode, you don't have this luxury, and it's not merely a representational issue. </p>
<p>If you were infected with ASCII early, you've grown up thinking that you can go back and forth between upper and lower cases and always get the same result. Outside of ASCII, that's not necessarily true. Consider the word "Reichwaldstraße", a common street name in Germany. The "straße" has the special character ß (U+00DF ʟᴀᴛɪɴ ꜱᴍᴀʟʟ ʟᴇᴛᴛᴇʀ ꜱʜᴀʀᴘ ꜱ). which is a ligature of a long s, the fancy <i>ſ</i> (U+017F ʟᴀᴛɪɴ ꜱᴍᴀʟʟ ʟᴇᴛᴛᴇʀ ʟᴏɴɢ ꜱ) that you may have seen in historical documents, and the familiar short <i>s</i>. Put them together, <i>ſs</i>, and move them close enough and you can see how you would end up with <i>ß</i> once you connect the hanging portion of the long <i>s</i> with the top of the short <i>s</i>. The UCS has an uppercase version (U+1E9E ʟᴀᴛɪɴ ᴄᴀᴘᴛɪᴀʟ ʟᴇᴛᴛᴇʀ ꜱʜᴀʀᴘ ꜱ), although no one uses it aside from saying that no one uses it. U+1E9E lowercases to U+00DF, but U+00DF has no single character uppercase version; it's the two characters <i>SS</i>. The lowercase of <i>SS</i>, however, is <i>ss</i>:</p>
<pre class="brush:perl">
use utf8;

my $string = "Reichwaldstraße";

my $upper = uc( $string );
my $lower = lc( $upper  );

print <<"HERE";
Started with: $string
Upper:        $upper
Lower:        $lower
HERE
</pre>
<p>The output shows that you don't get back to the original:</p>
<pre class="brush:plain">
Started with: Reichwaldstraße
Upper:        REICHWALDSTRASSE
Lower:        reichwaldstrasse
</pre>
<p>There's another <i>s</i> that causes problems: the Greek sigma, which comes in two lowercase forms. One appears in the middle of words and the other appears at the end, as in <i>όσος</i>, where <i>σ</i> and <i>ς</i> represent the same thing, just in different forms mandated by their position:</p>
<pre class="brush:perl">
use utf8;

my $char = "όσος";

my $upper = uc( $char );
my $lower = lc( $upper );

print <<"HERE";
Started with: $char
Upper:        $upper
Lower:        $lower
HERE
</pre>
<p>Again, the lowercase version at the end is different than what you started with:</p>
<pre class="brush:plain">
Started with: όσος
Upper:        ΌΣΟΣ
Lower:        όσοσ
</pre>
<p>This means that you can't merely use <a href="http://perldoc.perl.org/functions/lc.html">lc</a> to normalize text for case insensitive comparison. These won't compare correctly:</p>
<pre class="brush:perl">
lc( "Reichwaldstraße" ) eq lc( "REICHWALDSTRASSE" );  # Nope!
lc( 'όσος' ) eq lc( 'ΌΣΟΣ' );                         # Nope!
</pre>
<p>You might object that these are different strings and that they shouldn't be the same, but where did these strings start? Perhaps that REICHWALDSTRASSE was not originally all uppercase, but changed by some stupid filters between you and the original information (and with a name like mine, I know about stupid casing filters). That's part of the ASCII infection.</p>
<p>So, <a href="http://perldoc.perl.org/functions/lc.html">lc</a> is the wrong way. Sadly, we do this incorrectly in <a href="http://www.learning-perl.com">Learning Perl</a>, when we show this subroutine we want to <a href="http://perldoc.perl.org/functions/sort.html">sort</a>:</p>
<pre class="brush:perl">
sub case_insensitive { "\L$a" cmp "\L$b" }
</pre>
<p>The Unicode specification solves this with its <i>case folding</i> rules. In short, it folds characters with different case forms into a common form. There's not a rule for this; they do it by exhaustion, specifying the common form for each fold. The common form is defined in the Unicode Character Database, which the Perl developers have digested into the files you find in the <i>unicore/</i> directory in your Perl library. Here's a few lines from <i>unicore/CaseFolding.txt</i>:</p>
<pre class="brush:plain">
0050; C; 0070; # LATIN CAPITAL LETTER P
0051; C; 0071; # LATIN CAPITAL LETTER Q
0052; C; 0072; # LATIN CAPITAL LETTER R
00DF; F; 0073 0073; # LATIN SMALL LETTER SHARP S
03A3; C; 03C3; # GREEK CAPITAL LETTER SIGMA
03C2; C; 03C3; # GREEK SMALL LETTER FINAL SIGMA
FB00; F; 0066 0066; # LATIN SMALL LIGATURE FF
FB01; F; 0066 0069; # LATIN SMALL LIGATURE FI
FB02; F; 0066 006C; # LATIN SMALL LIGATURE FL
FB03; F; 0066 0066 0069; # LATIN SMALL LIGATURE FFI
FB04; F; 0066 0066 006C; # LATIN SMALL LIGATURE FFL
</pre>
<p>The first column is the code number of the original character, the second is the type of folding (explained in the data file and coming up later), and the third column are the characters that form the common, folded ("equivalent") version. Essentially, it's a big hash. Notice that some of the folded versions are multiple characters. You're not going to get that with bit fiddling.</p>
<p>Case folding takes the character in the first column and turns them into the characters in the third column, then takes the result and does it again until there are no more folds possible. It keeps doing that until there is nothing to replace. Characters that don't have an entry in this file fold into themselves. You case fold to compare strings, not to normalize strings for storage or other uses. Case folding makes case insensitive comparisons very fast, but it also loses information that you can't recover. You can read the exact rules in Section 5.18, "Case mapping", of the <A href="http://unicode.org/standard/standard.html">Unicode Standard</a>.</p>
<p>To see how that works, try that with <i>Reichwaldstraße</i> and <i>όσος</i>. All characters except two stay the same, and two use the mapping from <i>unicore/CaseFolding.txt</i>:</p>
<ul>
<li>Reichwaldstraße → <b>r</b>eichwaldstra<b>ss</b>e
<li>REICHWALDSTRASSE → reichwaldstrasse
<li>όσος → ΌΣΟ<b>Σ</b>
<li>ΌΣΟΣ → όσοσ
</ul>
<p>To implement these operations, Perl v5.16 adds the <code>fc</code> built-in function. Instead of <a href="http://perldoc.perl.org/functions/lc.html">lc</a>, use that:</p>
<pre class="brush:perl">
use v5.15.8;  # until we get v5.16  XXX feature
fc( "Reichwaldstraße" ) eq fc( "REICHWALDSTRASSE" );  # Yep!
fc( 'όσος' ) eq fc( 'ΌΣΟΣ' );                         # Yep!
</pre>
<p>If you don't have v5.16, you can use the <code>fc</code> front the <a href="https://www.metacpan.org/module/Unicode::CaseFold">Unicode::CaseFold</a> module on CPAN.</p>
<p>If you wanted to do this inside a double-quoted string, you can use the <code>\F</code> case shift operator (but be aware of the things we noted in <a href="http://www.effectiveperlprogramming.com/blog/1496">Understand the order of operations in double quoted contexts</a>). Our <a href="http://www.learning-perl.com">Learning Perl</a> example could change to:</p>
<pre class="brush:perl">
sub case_insensitive { "\F$a" cmp "\F$b" }
</pre>
<h2>More complicated folds</h2>
<p>Looking back at the extract of <i>unicore/CaseFolding.txt</i>, you might remember that I skipped over the second column, the mapping status. Those letters stand for different folding rules:</p>
<ul>
<li>C: common case folding
<li>F: full case folding (strings may grow in length)
<li>S: simple case folding (map to single characters)
<li>T: special case for uppercase I and dotted uppercase I
</ul>
<p>The "T" status stands in for folds that the general rules can't handle, mostly some characters from Turkish and similar languages.</p>
<p>So far, Perl's <code>fc</code> only handles the "F" status for full case folding. It doesn't handle the special folding you'll find in <i>unicore/SpecialCasing.txt</i> that has the oddball situations, such as multiple source characters folding onto other multiple characters. If you want to handle those, you're on your own, although the <A href="https://www.metacpan.org/module/Unicode::Casing">Unicode::Casing</a> module on CPAN might help.</p>
<p>Many of the folding rules depend on the source language, so you'll probably want to pay special attention if you are using that language or completely ignore them if you are not.</p>
<p>Besides that, the Universal Character Set gives people much more of a chance to mess up. Suppose that you want to write "β-carotene", that thing you get from carrots. That first character is β (U+03B2 ɢʀᴇᴇᴋ ꜱᴍᴀʟʟ ʟᴇᴛᴛᴇʀ ʙᴇᴛᴀ). Some people might think it looks like ß (U+00DF ʟᴀᴛɪɴ ꜱᴍᴀʟʟ ʟᴇᴛᴛᴇʀ ꜱʜᴀʀᴘ ꜱ), and that's good enough for them. No amount of case folding is going to let you know that someone used an incorrect character. But, this is also one of the benefits of Unicode: characters know what they are.</p>
<h2>Another correct way</h2>
<p>There's another correct way to check strings regardless of case. You can use the <code>/i</code> flag on the match operator. The Unicode-aware Perl regex engine handles the rest:</p>
<pre class="brush:plain">
use utf8;
use v5.15.7;

use Set::CrossProduct;

my $string = "Reichwaldstraße";

my $upper = uc( $string );
my $lower = lc( $upper  );

my $sets = Set::CrossProduct->new(
	[
	[ $string, $upper, $lower ],
	[ $string, $upper, $lower ],
	]
	);

foreach my $tuple ( $sets->combinations ) {
	my( $l, $r ) = @$tuple;
	next if $l eq $r;

	say "lc($r) eq lc($l)  ? ", lc($r) eq lc($l) ? "matched" : "failed";
	say "fc($r) eq fc($l)  ? ", fc($r) eq fc($l) ? "matched" : "failed";
	say "$r =~ m/$l/i      ? ", $l =~ m/$r/i ? "matched" : "failed";

	say;
	}
</pre>
<p>In the output, you can see that <a href="http://perldoc.perl.org/functions/lc.html">lc</a> sometimes fails, but that the <code>fc</code> and <code>m//i</code> always works:</p>
<pre class="brush:plain">
lc(REICHWALDSTRASSE) eq lc(Reichwaldstraße)  → failed
fc(REICHWALDSTRASSE) eq fc(Reichwaldstraße)  → matched
REICHWALDSTRASSE =~ m/Reichwaldstraße/i      → matched

lc(reichwaldstrasse) eq lc(Reichwaldstraße)  → failed
fc(reichwaldstrasse) eq fc(Reichwaldstraße)  → matched
reichwaldstrasse =~ m/Reichwaldstraße/i      → matched

lc(Reichwaldstraße) eq lc(REICHWALDSTRASSE)  → failed
fc(Reichwaldstraße) eq fc(REICHWALDSTRASSE)  → matched
Reichwaldstraße =~ m/REICHWALDSTRASSE/i      → matched

lc(reichwaldstrasse) eq lc(REICHWALDSTRASSE)  → matched
fc(reichwaldstrasse) eq fc(REICHWALDSTRASSE)  → matched
reichwaldstrasse =~ m/REICHWALDSTRASSE/i      → matched

lc(Reichwaldstraße) eq lc(reichwaldstrasse)  → failed
fc(Reichwaldstraße) eq fc(reichwaldstrasse)  → matched
Reichwaldstraße =~ m/reichwaldstrasse/i      → matched

lc(REICHWALDSTRASSE) eq lc(reichwaldstrasse)  → matched
fc(REICHWALDSTRASSE) eq fc(reichwaldstrasse)  → matched
REICHWALDSTRASSE =~ m/reichwaldstrasse/i      → matched
</pre>
<p>The match operator isn't useful for <a href="http://perldoc.perl.org/functions/sort.html">sort</a> though, since you can only tell if the strings are the same. </p>
<h2>Things to remember</h2>
<ul>
<li>Case-folding is more complicated than merely lowercasing.
<li>The <code>fc</code> does proper case folding according to the Unicode standard.
<li>The <code>\F</code> case fold operator does full case folding in double-quoted contexts.
</ul>
<p align="left"><a class="tt" href="http://twitter.com/home/?status=Fold+cases+properly+http://tinyurl.com/7pbx9wu" title="Post to Twitter"><img class="nothumb" src="http://www.effectiveperlprogramming.com/wp-content/plugins/tweet-this/icons/tt-twitter2.png" alt="Post to Twitter" /></a> <a class="tt" href="http://twitter.com/home/?status=Fold+cases+properly+http://tinyurl.com/7pbx9wu" title="Post to Twitter"> </a> <a class="tt" href="http://delicious.com/post?url=http://www.effectiveperlprogramming.com/blog/1507&amp;title=Fold+cases+properly" title="Post to Delicious"><img class="nothumb" src="http://www.effectiveperlprogramming.com/wp-content/plugins/tweet-this/icons/tt-delicious.png" alt="Post to Delicious" /></a> <a class="tt" href="http://delicious.com/post?url=http://www.effectiveperlprogramming.com/blog/1507&amp;title=Fold+cases+properly" title="Post to Delicious"> </a> <a class="tt" href="http://digg.com/submit?url=http://www.effectiveperlprogramming.com/blog/1507&amp;title=Fold+cases+properly" title="Post to Digg"><img class="nothumb" src="http://www.effectiveperlprogramming.com/wp-content/plugins/tweet-this/icons/tt-digg.png" alt="Post to Digg" /></a> <a class="tt" href="http://digg.com/submit?url=http://www.effectiveperlprogramming.com/blog/1507&amp;title=Fold+cases+properly" title="Post to Digg"> </a> <a class="tt" href="http://www.facebook.com/share.php?u=http://www.effectiveperlprogramming.com/blog/1507&amp;t=Fold+cases+properly" title="Post to Facebook"><img class="nothumb" src="http://www.effectiveperlprogramming.com/wp-content/plugins/tweet-this/icons/tt-facebook.png" alt="Post to Facebook" /></a> <a class="tt" href="http://www.facebook.com/share.php?u=http://www.effectiveperlprogramming.com/blog/1507&amp;t=Fold+cases+properly" title="Post to Facebook"> </a> <a class="tt" href="http://reddit.com/submit?url=http://www.effectiveperlprogramming.com/blog/1507&amp;title=Fold+cases+properly" title="Post to Reddit"><img class="nothumb" src="http://www.effectiveperlprogramming.com/wp-content/plugins/tweet-this/icons/tt-reddit.png" alt="Post to Reddit" /></a> <a class="tt" href="http://reddit.com/submit?url=http://www.effectiveperlprogramming.com/blog/1507&amp;title=Fold+cases+properly" title="Post to Reddit"> </a></p>]]></content:encoded>
			<wfw:commentRss>http://www.effectiveperlprogramming.com/blog/1507/feed</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Use __SUB__ to get a reference to the current subroutine</title>
		<link>http://www.effectiveperlprogramming.com/blog/1503</link>
		<comments>http://www.effectiveperlprogramming.com/blog/1503#comments</comments>
		<pubDate>Sun, 19 Feb 2012 13:53:47 +0000</pubDate>
		<dc:creator>brian d foy</dc:creator>
				<category><![CDATA[5.16]]></category>
		<category><![CDATA[subroutines]]></category>

		<guid isPermaLink="false">http://www.effectiveperlprogramming.com/?p=1503</guid>
		<description><![CDATA[What if you want to write a recursive subroutine but you don&#8217;t know the name of the current subroutine? Since Perl is a dynamic language and code references are first class objects, you might not know the name of the code reference, if it even has a name. Perl 5.16 introduces __SUB__ as a special [...]]]></description>
			<content:encoded><![CDATA[<p>What if you want to write a recursive subroutine but you don&#8217;t know the name of the current subroutine? Since Perl is a dynamic language and code references are first class objects, you might not know the name of the code reference, if it even has a name. Perl 5.16 introduces <code>__SUB__</code> as a special sequence to return a reference to the current subroutine. You could almost do the same thing without the new feature, but each of those have drawbacks you might want to avoid. </p>
<p>Although <code>__SUB__</code> looks like <code>__FILE__</code>, <code>__LINE__</code>, and <code>__PACKAGE__</code>, each of which are compile-time directives, the <code>__SUB__</code> happens at run time so you can use it with subroutines you define later.</p>
<p>First, consider how you&#8217;d try to do this without the <code>__SUB__</code> feature. You could declare a variable to hold a subroutine reference then in a later statement define the subroutine. Since you&#8217;ve already declared the variable, you can use it in the definition. Perl won&#8217;t de-reference it until you actually run the subroutine, so it doesn&#8217;t matter that it&#8217;s not a reference yet:</p>
<pre class="brush:perl">
use v5.10;

my $sub;

$sub = sub {
	state $count = 10;
	say $count;
	return if --$count < 0;
	$sub->();
	};

$sub->();
</pre>
<p>Your output is a countdown:</p>
<pre class="brush:plain">
10
9
8
7
6
5
4
3
2
1
0
</pre>
<p>To do that, there are two requirements: the code reference must be stored in a variable, and the variable must already be defined. That&#8217;s not always convenient. Not only that, your anonymous subroutine contains a reference to itself, so you&#8217;d either have to play games with weak references or just let the reference live forever. Neither of those are attractive.</p>
<p>Rafal Garcia-Suarez solved these problems by creating <a href="https://www.metacpan.org/module/Sub::Current">Sub::Current</a> to give you a <code>ROUTINE</code> function that returns a reference to the current subroutine, even if it is a named subroutine:</p>
<pre class="brush:perl">
use v5.10;
use Sub::Current;

sub countdown {
	state $count = 10;
	say $count;
	return if --$count < 0;
	ROUTINE->();
	};

countdown();
</pre>
<p>You might want to define these code references as a single statement, even you don&#8217;t need to. This is useful for inline subroutines where you want to define the code reference in the parameter list:</p>
<pre class="brush:perl">
use v5.10;
use Sub::Current;

sub run { $_[0]->() };

run( sub {
		state $count = 10;
		say $count;
		return if --$count < 0;
		ROUTINE->();
		}
	);
</pre>
<p>You may want to define the subroutine in one statement as a return value:</p>
<pre class="brush:perl">
use v5.10;
use Sub::Current;

sub factory {
	my $start = shift;
	sub {
		state $count = $start;
		say $count;
		return if --$count < 0;
		ROUTINE->();
		}
	};

factory(4)->();
</pre>
<p>Using this module has the disadvantage of a CPAN dependency, although a very light one because it&#8217;s self contained. There&#8217;s another module, <a href="https://metacpan.org/module/Devel::Caller">Devel::Caller</a>, from Richard Clamp that can can get a code reference from any level in the call stack, including the current level:</p>
<pre class="brush:perl">
use v5.10;
use Devel::Caller qw(caller_cv);

sub factory {
	my $start = shift;
	sub {
		state $count = $start;
		say $count;
		return if --$count < 0;
		caller_cv(0)->();
		}
	};

factory(7)->();
</pre>
<p>Perl 5.16 lets you do the same thing without the CPAN module:</p>
<pre class="brush:perl">
use v5.15.6;  # until v5.16 is released

sub factory {
	my $start = shift;
	sub {
		state $count = $start;
		say $count;
		return if --$count < 0;
		__SUB__->();
		}
	};
</pre>
<p>As with many new features added since Perl v5.10, you can enable <code>__SUB__</code> with a <code>use <i>VERSION</i></code> statement,<br />
as you see in the previous example, or with the <code>feature</code> pragma and the <code>current_sub</code> import:</p>
<pre class="brush:perl">
use feature qw(say state current_sub);

sub factory {
	my $start = shift;
	sub {
		state $count = $start;
		say $count;
		return if --$count < 0;
		__SUB__->();
		}
	};

factory(7)->();
</pre>
<h2>Things to remember</h2>
<ul>
<li>Perl v5.16 provides the <code>__SUB__</code> directive to return a reference to the currently running subroutine
<li>Import this new feature by requiring the Perl version or through<br />
the <code>feature</code> pragma</p>
<li>Prior to Perl v5.16, you can do this the same thing with <a href="https://www.metacpan.org/module/Sub::Current">Sub::Current</a>
</ul>
<p align="left"><a class="tt" href="http://twitter.com/home/?status=Use+__SUB__+to+get+a+reference+to+the+current+subroutine+http://tinyurl.com/7zjgozm" title="Post to Twitter"><img class="nothumb" src="http://www.effectiveperlprogramming.com/wp-content/plugins/tweet-this/icons/tt-twitter2.png" alt="Post to Twitter" /></a> <a class="tt" href="http://twitter.com/home/?status=Use+__SUB__+to+get+a+reference+to+the+current+subroutine+http://tinyurl.com/7zjgozm" title="Post to Twitter"> </a> <a class="tt" href="http://delicious.com/post?url=http://www.effectiveperlprogramming.com/blog/1503&amp;title=Use+__SUB__+to+get+a+reference+to+the+current+subroutine" title="Post to Delicious"><img class="nothumb" src="http://www.effectiveperlprogramming.com/wp-content/plugins/tweet-this/icons/tt-delicious.png" alt="Post to Delicious" /></a> <a class="tt" href="http://delicious.com/post?url=http://www.effectiveperlprogramming.com/blog/1503&amp;title=Use+__SUB__+to+get+a+reference+to+the+current+subroutine" title="Post to Delicious"> </a> <a class="tt" href="http://digg.com/submit?url=http://www.effectiveperlprogramming.com/blog/1503&amp;title=Use+__SUB__+to+get+a+reference+to+the+current+subroutine" title="Post to Digg"><img class="nothumb" src="http://www.effectiveperlprogramming.com/wp-content/plugins/tweet-this/icons/tt-digg.png" alt="Post to Digg" /></a> <a class="tt" href="http://digg.com/submit?url=http://www.effectiveperlprogramming.com/blog/1503&amp;title=Use+__SUB__+to+get+a+reference+to+the+current+subroutine" title="Post to Digg"> </a> <a class="tt" href="http://www.facebook.com/share.php?u=http://www.effectiveperlprogramming.com/blog/1503&amp;t=Use+__SUB__+to+get+a+reference+to+the+current+subroutine" title="Post to Facebook"><img class="nothumb" src="http://www.effectiveperlprogramming.com/wp-content/plugins/tweet-this/icons/tt-facebook.png" alt="Post to Facebook" /></a> <a class="tt" href="http://www.facebook.com/share.php?u=http://www.effectiveperlprogramming.com/blog/1503&amp;t=Use+__SUB__+to+get+a+reference+to+the+current+subroutine" title="Post to Facebook"> </a> <a class="tt" href="http://reddit.com/submit?url=http://www.effectiveperlprogramming.com/blog/1503&amp;title=Use+__SUB__+to+get+a+reference+to+the+current+subroutine" title="Post to Reddit"><img class="nothumb" src="http://www.effectiveperlprogramming.com/wp-content/plugins/tweet-this/icons/tt-reddit.png" alt="Post to Reddit" /></a> <a class="tt" href="http://reddit.com/submit?url=http://www.effectiveperlprogramming.com/blog/1503&amp;title=Use+__SUB__+to+get+a+reference+to+the+current+subroutine" title="Post to Reddit"> </a></p>]]></content:encoded>
			<wfw:commentRss>http://www.effectiveperlprogramming.com/blog/1503/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Understand the order of operations in double quoted contexts</title>
		<link>http://www.effectiveperlprogramming.com/blog/1496</link>
		<comments>http://www.effectiveperlprogramming.com/blog/1496#comments</comments>
		<pubDate>Tue, 31 Jan 2012 05:14:04 +0000</pubDate>
		<dc:creator>brian d foy</dc:creator>
				<category><![CDATA[The Basics of Perl]]></category>

		<guid isPermaLink="false">http://www.effectiveperlprogramming.com/?p=1496</guid>
		<description><![CDATA[Perl&#8217;s powerful string manipulation tools include case-shifting operators that change the parts of a double-quoted string. There are many other things that happen in a double-quoted string too, so you need to know where these operators fit in with each other. A double-quoted string has three features: Variable interpolation Escaped and logical characters Case shift [...]]]></description>
			<content:encoded><![CDATA[<p>Perl&#8217;s powerful string manipulation tools include case-shifting operators that change the parts of a double-quoted string. There are many other things that happen in a double-quoted string too, so you need to know where these operators fit in with each other.</p>
<p>A double-quoted string has three features:</p>
<ul>
<li>Variable interpolation
<li>Escaped and logical characters
<li>Case shift operators
</ul>
<p>You might have missed this because the documentation doesn&#8217;t emphasize it. There is a single sentence in <a href="http://perldoc.perl.org/perlop.html#Quote-and-Quote-like-Operators">perlop</a>, but in relation to the regular expression operators and the <code>\Q</code>:</p>
<blockquote><p>
For double-quoted strings, the quoting from \Q is applied after interpolation and escapes are processed.
</p></blockquote>
<p>If you don&#8217;t pay attention to the order of these operations, you&#8217;ll get results that you might not expect. The problem is that the order of operations isn&#8217;t the same in all double-quoted contexts.</p>
<p>In strings, the order of operations is the same as listed earlier:</p>
<ul>
<li>Variable interpolation
<li>Escaped and logical characters
<li>Case shift operators
</ul>
<h2>Variable interpolation</h2>
<p>You already know about variable interpolation. This is one of Perl&#8217;s greatest features, and the one I miss the most when I have to use a different language:</p>
<pre class="brush:perl">
my $cat = 'Roscoe';
my $string = "Buster $cat Mimi";
</pre>
<p>In a double quoted context, Perl substitutes the value of <code>$cat</code>. You end up with <code>Buster Roscoe Mimi</code>.</p>
<h2>Case-shift operators</h2>
<p>The case-shift operators change parts of a double-quoted string. Although we call them &#8220;case shift&#8221;, not all of them change the case.</p>
<div align="center">
<table>
<tr>
<th>Operator</th>
<th>Effect</th>
<th>Function equivalent</th>
</tr>
<tr>
<td>\U</td>
<td>Uppercase everything following</td>
<td>uc</td>
</tr>
<tr>
<td>\u</td>
<td>Uppercase the next character</td>
<td>ucfirst</td>
</tr>
<tr>
<td>\L</td>
<td>Lowercase everything following</td>
<td>lc</td>
</tr>
<tr>
<td>\l</td>
<td>Lowercase the next character</td>
<td>lcfirst</td>
</tr>
<tr>
<td>\F</td>
<td>(v5.16) Lowercase everything following</td>
<td>fc</td>
</tr>
<tr>
<td>\Q</td>
<td>Quote metacharacters</td>
<td>quotemeta</td>
</tr>
<tr>
<td>\E</td>
<td>Stop whatever you were doing</td>
<td></td>
</tr>
</table>
</div>
<p>The <code>\F</code> and <code>fc</code> are new for the yet unreleased Perl v5.16. Those will show up in a different Item. Notice there&#8217;s no <code>\f</code> for a <code>fcfirst</code>. That double-quoted sequence already means &#8220;form feed&#8221;, the instruction to printers to stop the current page and start a new page.</p>
<p>Look at some examples using these in a double-quoted string:</p>
<pre class="brush:plain">
% perl -e 'print "\ubuster\n"'
Buster
% perl -e 'print "\LBUSTER\n"'
buster
% perl -e 'print "\Ubuster\n"'
BUSTER
% perl -e 'print "\Ubus\Eter\n"'
BUSter
% perl -e 'print "\LBUST\EER\n"'
bustER
% perl -e 'print "\QP*rl\n"'
P\*rl\
</pre>
<p>That last one is a bit odd. It looks like it ends with a <code>\</code>. It doesn&#8217;t really end like that because there&#8217;s a newline that <code>\Q</code> quoted:</p>
<pre class="brush:plain">
% perl -e 'print "\QP*rl\n"' | hexdump -C
00000000  50 5c 2a 72 6c 5c 0a                 |P\*rl\.|
00000007
</pre>
<p>Perl handled the &#8220;\n&#8221; before it handled the <code>\Q</code>, but the meta-character quoter thinks the newline is a special character so it escapes it. An escaped newline is just a newline, though.</p>
<p>Now, combine these with variable interpolation. Perl handles the variables first then does the case shifting:</p>
<pre class="brush:perl">
use 5.14.1;

my $cat = 'Buster';

say "Roscoe $cat Mimi";
say "Roscoe \U$cat Mimi";
say "Roscoe \U$cat\E Mimi";
</pre>
<p>The results are probably not surprising. The first line is just interpolation, the second line uppercases everything from <code>\U</code> to the end, and the third line uppercases only the parts between the <code>\U</code> and the <code>\E</code>:</p>
<pre class="brush:plain">
Roscoe Buster Mimi
Roscoe BUSTER MIMI
Roscoe BUSTER Mimi
</pre>
<p>If the case shift happens after interpolation, you might think that you could interpolate a case shift:</p>
<pre class="brush:perl">
use 5.14.1;

my $cat = '\UBuster'; # no case shift in a single quote!

say "Roscoe $cat Mimi";
</pre>
<p>That doesn&#8217;t work though. The intended case shift operator shows up as literal characters because Perl doesn&#8217;t do double processing:</p>
<pre class="brush:plain">
Roscoe \UBuster Mimi
</pre>
<p>A <code>\U</code> inside the string doesn&#8217;t bother the escaped characters because Perl has already processed those: </p>
<pre class="brush:perl">
use 5.14.1;

my $cat = 'Buster';

say "Roscoe \U$cat\a\n Mimi";
</pre>
<p>The &#8220;\n&#8221; is still a newline and the &#8220;\a&#8221; is still the bell, and everything after the <code>\U</code> is uppercased (if it has an uppercase equivalent).</p>
<p>That seems simple enough. It&#8217;s variable interpolation followed by character escapes followed by case shifting. But this is Perl, so it can&#8217;t be that easy.</p>
<h2>Regular expression double quoting</h2>
<p>The regular expression operators (<code>qr</code>, <code>m//</code>, and <code>s///</code>) handle the double quote operations differently. From <a href="http://perldoc.perl.org/perlop.html#Quote-and-Quote-like-Operators">perlop</a>:</p>
<blockquote><p>
For the pattern of regex operators (qr//, m// and s///), the quoting from \Q is applied after interpolation is processed, but before escapes are processed.
</p></blockquote>
<p>Now the order is of operations is:</p>
<ul>
<li>Variable interpolation
<li>Case shift operators
<li>Escaped and logical characters
</ul>
<p>You can see this when you print the stringified forms of the patterns:</p>
<pre class="brush:plain">
% perl -le 'print qr/\Q\n/'
(?-xism:\\n)
% perl -le 'print qr/\U\n/'
(?-xism:\N)
</pre>
<p>You probably expect all of these to match, but not all of them do:</p>
<pre class="brush:plain">
% perl -le 'print "\n" =~ qr/\n/ ? "Yes" : "No"'
Yes
% perl -le 'print "\n" =~ qr/\Q\n/ ? "Yes" : "No"'
No
% perl -le 'print "\n" =~ qr/\U\n/ ? "Yes" : "No"'
No
% perl -le 'print "\n" =~ qr/\l\n/ ? "Yes" : "No"'
Yes
% perl -le 'print "\n" =~ qr/\L\n/ ? "Yes" : "No"'
Yes
</pre>
<p>The last two times are curious. The <code>\l</code> and <code>\L</code> leave the <i>n</i> as a lowercase <i>n</i> so in the last step, the <code>\n</code> is still a newline. Those two tests still match.</p>
<p>This means that you can construct a string and a pattern with the same sequence of characters, but they might not match:</p>
<pre class="brush:plain">
% perl -le 'print "\Q\n" =~ qr/\Q\n/ ? "Yes" : "No"'
No
% perl -le 'print "\U\n" =~ qr/\U\n/ ? "Yes" : "No"'
No
% perl -le 'print "\L\n" =~ qr/\L\n/ ? "Yes" : "No"'
Yes
</pre>
<p>It&#8217;s even worse. What does the <code>\N</code> mean? It depends on the Perl version:</p>
<pre class="brush:plain">
% perl5.10.1 -le 'print "\n" =~ qr/\N/ ? "Yes" : "No"'
Missing braces on \N{} in regex; marked by <-- HERE in m/\N <-- HERE / at -e line 1.
% perl5.12.1 -le 'print "\n" =~ qr/\N/ ? "Yes" : "No"'
No
% perl5.14.1 -le 'print "\n" =~ qr/\N/ ? "Yes" : "No"'
No
</pre>
<p>Perl v5.12 added <code>\N</code> as "not a newline" to replace the <code>.</code> no matter which default regex switches you have. That's why Perl v5.10 thinks you have an incomplete <code>\N{CHARNAME}</code>. The others match a newline because the case shift happens in the middle of the process:</p>
<pre class="brush:plain">
% perl5.10.1 -le 'print "\n" =~ qr/\L\N/ ? "Yes" : "No"'
Yes
% perl5.14.1 -le 'print "\n" =~ qr/\L\N/ ? "Yes" : "No"'
Yes
% perl5.8.9 -le 'print "\n" =~ qr/\L\N/ ? "Yes" : "No"'
Missing braces on \N{} at -e line 1, near "\L"
Execution of -e aborted due to compilation errors.
</pre>
<p>With the <code>\N{CHARNAME}</code> syntax, you can match characters by their name in the Universal Character Set. Here you match an uppercase <i>A</i>:</p>
<pre class="brush:plain">
% perl -Mcharnames=:full -le 'print "A" =~ qr/\N{LATIN CAPITAL LETTER A}/ ? "Yes" : "No"'
Yes
</pre>
<p>If you put a <code>\L</code> in front of that, you might think it would match the lowercase version of the named letter. There's no such luck because the <code>\L</code> affects the pattern before the <code>\N{CHARNAME}</code>:</p>
<pre class="brush:plain">
% perl -Mcharnames=:full -le 'print "A" =~ qr/\L\N{LATIN CAPITAL LETTER A}/ ? "Yes" : "No"'
No
% perl -Mcharnames=:full -le 'print "a" =~ qr/\L\N{LATIN CAPITAL LETTER A}/ ? "Yes" : "No"'
No
</pre>
<p>The <code>\N</code> turns into a newline and the braces are now for a quantifier with a non-number in it:</p>
<pre class="brush:plain">
% perl -Mcharnames=:full -le 'print qr/\L\N{LATIN CAPITAL LETTER A}/'
(?-xism:\n{u+41})
</pre>
<p>You might think that this would match anything since that should probably turn into <code>\n{0}</code> just like the values in the array index turn into integers. The <a href="http://perldoc.perl.org/perlre.html">perlre</a> section on "Quantifiers" don't say what should happen, but if it's not a number, the braces become literals. Here's a simple demonstration that those braces are literals:</p>
<pre class="brush:plain">
% perl -le 'print "\n{a}" =~ qr/\n{a}/ ? "Yes" : "No"'
Yes
</pre>
<p>Here's the pattern you created before, and that you want to match now:</p>
<pre class="brush:plain">
% perl -Mcharnames=:full -le 'print qr/\L\N{LATIN CAPITAL LETTER A}/'
(?^u:\n{u+41})
</pre>
<p>It doesn't match a lowercase <i>a</i>:</p>
<pre class="brush:plain">
$ perl -Mcharnames=:full -le 'print "a" =~ qr/\L\N{LATIN CAPITAL LETTER A}/ ? "Yes" : "No"'
No
</pre>
<p>It doesn't match a newline either. The pattern in <code>\n{u+41}</code> and that's not a quantifier. There are some characters after the <code>\n</code>, so the target string doesn't have enough characters to match:</p>
<pre class="brush:plain">
% perl -Mcharnames=:full -le 'print "\n" =~ qr/\L\N{LATIN CAPITAL LETTER A}/ ? "Yes" : "No"'
No
</pre>
<p>Using the regular expression text doesn't work either, which you might miss on the first pass:</p>
<pre class="brush:plain">
% perl -Mcharnames=:full -le 'print "\n{u+41}" =~ qr/\L\N{LATIN CAPITAL LETTER A}/ ? "Yes" : "No"'
No
</pre>
<p>Of course! That <code>+</code> is a quantifier, so it isn't a literal character that should show up in the string. So this works:</p>
<pre class="brush:plain">
% perl -Mcharnames=:full -le 'print "\n{u41}" =~ qr/\L\N{LATIN CAPITAL LETTER A}/ ? "Yes" : "No"'
Yes
</pre>
<p>This works too because you can have one or more of <i>u</i>:</p>
<pre class="brush:plain">
% perl -Mcharnames=:full -le 'print "\n{uuuuu41}" =~ qr/\L\N{LATIN CAPITAL LETTER A}/ ? "Yes" : "No"'
Yes
</pre>
<p>If you don't want the <code>\L</code> to extend into character name sequence, you can use the <code>\E</code> to limit its effect:</p>
<pre class="brush:plain">
% perl -Mcharnames=:full -le 'print "bar" =~ qr/\LB\E\N{LATIN SMALL LETTER A}r/ ? "Yes" : "No"'
Yes
</pre>
<h2>Things to remember</h2>
<ul>
<li>The double quote string constructor handles variable interpolation, special characters, and case shift operators in that order.
<li>The regular expression operators handles variable interpolation, case shift operators, and special characters in that order.
<li>Double-quoted interpolation in a match operator happens before regular expression compilation.
<li>The min-max quantifier is only a quantifier if you give it numbers. Otherwise, it's literal characters.
</ul>
<p align="left"><a class="tt" href="http://twitter.com/home/?status=Understand+the+order+of+operations+in+double+quoted+contexts+http://tinyurl.com/798luan" title="Post to Twitter"><img class="nothumb" src="http://www.effectiveperlprogramming.com/wp-content/plugins/tweet-this/icons/tt-twitter2.png" alt="Post to Twitter" /></a> <a class="tt" href="http://twitter.com/home/?status=Understand+the+order+of+operations+in+double+quoted+contexts+http://tinyurl.com/798luan" title="Post to Twitter"> </a> <a class="tt" href="http://delicious.com/post?url=http://www.effectiveperlprogramming.com/blog/1496&amp;title=Understand+the+order+of+operations+in+double+quoted+contexts" title="Post to Delicious"><img class="nothumb" src="http://www.effectiveperlprogramming.com/wp-content/plugins/tweet-this/icons/tt-delicious.png" alt="Post to Delicious" /></a> <a class="tt" href="http://delicious.com/post?url=http://www.effectiveperlprogramming.com/blog/1496&amp;title=Understand+the+order+of+operations+in+double+quoted+contexts" title="Post to Delicious"> </a> <a class="tt" href="http://digg.com/submit?url=http://www.effectiveperlprogramming.com/blog/1496&amp;title=Understand+the+order+of+operations+in+double+quoted+contexts" title="Post to Digg"><img class="nothumb" src="http://www.effectiveperlprogramming.com/wp-content/plugins/tweet-this/icons/tt-digg.png" alt="Post to Digg" /></a> <a class="tt" href="http://digg.com/submit?url=http://www.effectiveperlprogramming.com/blog/1496&amp;title=Understand+the+order+of+operations+in+double+quoted+contexts" title="Post to Digg"> </a> <a class="tt" href="http://www.facebook.com/share.php?u=http://www.effectiveperlprogramming.com/blog/1496&amp;t=Understand+the+order+of+operations+in+double+quoted+contexts" title="Post to Facebook"><img class="nothumb" src="http://www.effectiveperlprogramming.com/wp-content/plugins/tweet-this/icons/tt-facebook.png" alt="Post to Facebook" /></a> <a class="tt" href="http://www.facebook.com/share.php?u=http://www.effectiveperlprogramming.com/blog/1496&amp;t=Understand+the+order+of+operations+in+double+quoted+contexts" title="Post to Facebook"> </a> <a class="tt" href="http://reddit.com/submit?url=http://www.effectiveperlprogramming.com/blog/1496&amp;title=Understand+the+order+of+operations+in+double+quoted+contexts" title="Post to Reddit"><img class="nothumb" src="http://www.effectiveperlprogramming.com/wp-content/plugins/tweet-this/icons/tt-reddit.png" alt="Post to Reddit" /></a> <a class="tt" href="http://reddit.com/submit?url=http://www.effectiveperlprogramming.com/blog/1496&amp;title=Understand+the+order+of+operations+in+double+quoted+contexts" title="Post to Reddit"> </a></p>]]></content:encoded>
			<wfw:commentRss>http://www.effectiveperlprogramming.com/blog/1496/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

