<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>The Effective Perler &#187; chapters</title>
	<atom:link href="http://www.effectiveperlprogramming.com/blog/category/book/chapters/feed" rel="self" type="application/rss+xml" />
	<link>http://www.effectiveperlprogramming.com</link>
	<description>Effective Perl Programming - write better, more idiomatic Perl</description>
	<lastBuildDate>Sat, 28 Jan 2012 02:19:01 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0.1</generator>
		<item>
		<title>Define grammars in regular expressions</title>
		<link>http://www.effectiveperlprogramming.com/blog/1479</link>
		<comments>http://www.effectiveperlprogramming.com/blog/1479#comments</comments>
		<pubDate>Sun, 18 Dec 2011 22:30:20 +0000</pubDate>
		<dc:creator>brian d foy</dc:creator>
				<category><![CDATA[5.10]]></category>
		<category><![CDATA[item]]></category>
		<category><![CDATA[regular expressions]]></category>

		<guid isPermaLink="false">http://www.effectiveperlprogramming.com/?p=1479</guid>
		<description><![CDATA[[ This is the 100th Item we've shared with you in the two years this blog has been around. We deserve a holiday and we're taking it, so read us next year! Happy Holidays.] Perl 5.10 added rudimentary grammar support in its regular expressions. You could define many subpatterns directly in your pattern, use them [...]]]></description>
			<content:encoded><![CDATA[<p><i>[ This is the 100th Item we've shared with you in the two years this blog has been around. We deserve a holiday and we're taking it, so read us next year! Happy Holidays.]</i></p>
<p>Perl 5.10 added rudimentary grammar support in its regular expressions. You could define many subpatterns directly in your pattern, use them to define larger subpatterns, and, finally, when you have everything in place, let Perl do the work.</p>
<p>There are other ways, some more powerful, that let you do the same thing. This Item is not about those, however, but you can read about <a href="https://www.metacpan.org/module/Regex::Grammar">Regex::Grammars</a>, <a href="https://www.metacpan.org/module/Parse::RecDescent">Parse::RecDescent</a> on your own. Also, you&#8217;re not going to get much of a recommendation of which one you should use for your task. We don&#8217;t know your situation.</p>
<p>To understand this new syntax, you have to study it from the ground up. It&#8217;s not simple, and the terse documentation in <a href="">perlre</a> doesn&#8217;t do much to help. </p>
<h2>Referencing a subpattern</h2>
<p>The first part you need is the ability to call a named part of the pattern (to label a subpattern, see <span class="item">Item 31. Use named captures to label matches</span>). To re-match a labeled subpattern, you use: </p>
<pre class="brush:plain">
(?&#038;NAME)
</pre>
<p>You can use that syntax to rerun a subpattern later:</p>
<pre class="brush:perl">
use v5.10;

my $pattern = qr/
	(?&lt;cat>Buster|Mimi)
	\s+
	(?&#038;cat)
	/x;

foreach ( 'Buster Mimi', 'Mimi Buster', 'Buster', 'Buster Buster' ) {
	say "$_ ", m/$pattern/p ? "matched" : 'nope!';
	}
</pre>
<p>The labeled subpattern has an alternation where either cat name can match. When you reference it again, you re-run the alternation and you can match either cat name again:</p>
<pre class="brush:plain">
Buster Mimi matched
Mimi Buster matched
Buster nope!
Buster Buster matched
</pre>
<p>This is not that same thing as matching the same text a labeled capture group already matched. That&#8217;s the <code>\k&lt;NAME></code>:</p>
<pre class="brush:perl">
\k&lt;NAME>
</pre>
<p>This pattern is a different beast. Whichever cat name matches first also has to match second:</p>
<pre class="brush:perl">
use v5.10;

my $pattern = qr/
	(?&lt;cat>Buster|Mimi)
	\s+
	\k&lt;cat>
	/x;

foreach ( 'Buster Mimi', 'Mimi Buster', 'Buster', 'Buster Buster' ) {
	say "$_ ", m/$pattern/p ? "matched" : 'nope!';
	}
</pre>
<p>Now only one of the strings matches because only one string repeats a cat&#8217;s name:</p>
<pre class="brush:plain">
Buster Mimi nope!
Mimi Buster nope!
Buster nope!
Buster Buster matched
</pre>
<p>Although you won&#8217;t see it here, the <code>(?&#038;NAME)</code> syntax is the trick to matching a recursive pattern since the reference can appear inside the pattern it references.</p>
<h2>Conditional match</h2>
<p>The second building block you need starts with a conditional submatch:</p>
<pre class="brush:plain">
(?(condition)yes-pattern|no-pattern)
</pre>
<p>That <i>condition</i> can be many things, most of which won&#8217;t appear in this Item. Although you see the <code>|</code> character, but this isn&#8217;t an alteration. It&#8217;s like an alternation because the <code>|</code> separates distinct subpatterns, but unlike an alternation because this will only ever try one of the subpatterns and you only get two subpatterns.</p>
<p>The simplest condition is just an ordinal number, which is true only if that capture group matched. Here&#8217;s a pattern that has two capture groups:</p>
<pre class="brush:perl">
use v5.10;

my $pattern = qr/
	(?:           # parens for grouping
		(B)     # $1
		|         # alternation
		(M)     # $2
	)
	(?(1)uster|imi) # conditional match
	/x;

foreach ( qw(Mimi Buster Muster Bimi Roscoe) ) {
	say "$_ ", m/$pattern/p ? "matched ${^MATCH}" : 'nope!';
	}
</pre>
<p>In this pattern, if the <code>(B)</code> matches, the conditional uses <code>uster</code> from the conditional. Otherwise, it uses <code>imi</code>. However, the only thing that can match besides a <code>(B)</code> is the other part of the alteration, the <code>(M)</code>. The output shows that only <code>Mimi</code> or <code>Buster</code> matches:</p>
<pre class="brush:plain">
Mimi matched Mimi
Buster matched Buster
Muster nope!
Bimi nope!
Roscoe nope!
</pre>
<p>You get the same results if you use <code>(2)</code> as the condition and re-arrange the order of the patterns:</p>
<pre class="brush:perl">
my $pattern = qr/
	(?:(B)|(M))
	(?(2)imi|uster)
	/x;
</pre>
<h2>Putting it together</h2>
<p>The condition can also be the literal <code>(DEFINE)</code>. In that case, Perl only allows a yes-branch. And, as its condition implies, it merely <i>defines</i> the patterns and does not execute them. </p>
<p>This means that you can create and label the subpatterns that you need, but not actually assert that any of them match the string. The definitions are just there. This pattern defines and labels three subpatterns then uses none of them:</p>
<pre class="brush:perl">
use v5.10;

my $pattern = qr/
	(?(DEFINE)
		(?&lt;cat>Buster)
		(?&lt;dog>Addie)
		(?&lt;bird>Poppy)
	)
	Mimi
	/x;

foreach ( 'Buster Mimi', 'Roscoe', 'Buster', 'Mimi' ) {
	say "$_ ", m/$pattern/ ? "matched" : 'nope!';
	}
</pre>
<p>It&#8217;s as if the <code>DEFINE</code> bit is not even there:</p>
<pre class="brush:plain">
Buster Mimi matched
Roscoe nope!
Buster nope!
Mimi matched
</pre>
<p>Outside the <code>(DEFINE)</code>, you can reference any of the subpatterns that you created:</p>
<pre class="brush:perl">
use v5.10;

my $pattern = qr/
	(?(DEFINE)
		(?&lt;cat>Buster)
		(?&lt;dog>Addie)
		(?&lt;bird>Poppy)
	)
	(?&#038;cat)
	/x;

foreach ( 'Buster Mimi', 'Roscoe', 'Buster', 'Mimi' ) {
	say "$_ ", m/$pattern/ ? "matched" : 'nope!';
	}
</pre>
<p>Now <code>Buster</code> matches because you reference that defined  subpattern:</p>
<pre class="brush:plain">
Buster Mimi matched
Roscoe nope!
Buster matched
Mimi nope!
</pre>
<p>Now it&#8217;s time for the grammar. Inside the <code>(DEFINE)</code>, you can reference subpatterns you haven&#8217;t defined yet, and your subpatterns can get arbitrarily complex:</p>
<pre class="brush:perl">
use v5.10;

my $pattern = qr/
	(?(DEFINE)
		(?&lt;male> Buster | Roscoe )
		(?&lt;female> Mimi | Juliet )
		(?&lt;cat> (?&#038;male) | (?&#038;female) )
		(?&lt;dog>Addie)
		(?&lt;bird>Poppy)
	)
	(?&#038;cat)
	/x;

foreach ( 'Addie', 'Roscoe', 'Buster', 'Mimi' ) {
	say "$_ ", m/$pattern/ ? "matched" : 'nope!';
	}
</pre>
<p>Even though the cat names are in two different subpatterns, the <code>cat</code> subpattern unifies them so all the cat names match:</p>
<pre class="brush:plain">
Addie nope!
Roscoe matched
Buster matched
Mimi matched
</pre>
<p>You should now be able to understand this regular expression from Tom Christainsen (appearing <a href="http://stackoverflow.com/a/4843579/8817">Stackoverflow</a>). You might have to pick it apart, but you know how all the parts fit together to match the Internet Message Format defined in <a href="http://tools.ietf.org/html/rfc5322">RFC 5322</a>:</p>
<pre class="brush:perl">
$rfc5322 = qr{

   (?(DEFINE)

     (?&lt;address>         (?&#038;mailbox) | (?&#038;group))
     (?&lt;mailbox>         (?&#038;name_addr) | (?&#038;addr_spec))
     (?&lt;name_addr>       (?&#038;display_name)? (?&#038;angle_addr))
     (?&lt;angle_addr>      (?&#038;CFWS)? &lt; (?&#038;addr_spec) > (?&#038;CFWS)?)
     (?&lt;group>           (?&#038;display_name) : (?:(?&#038;mailbox_list) | (?&#038;CFWS))? ; (?&#038;CFWS)?)
     (?&lt;display_name>    (?&#038;phrase))
     (?&lt;mailbox_list>    (?&#038;mailbox) (?: , (?&#038;mailbox))*)

     (?&lt;addr_spec>       (?&#038;local_part) \@ (?&#038;domain))
     (?&lt;local_part>      (?&#038;dot_atom) | (?&#038;quoted_string))
     (?&lt;domain>          (?&#038;dot_atom) | (?&#038;domain_literal))
     (?&lt;domain_literal>  (?&#038;CFWS)? \[ (?: (?&#038;FWS)? (?&#038;dcontent))* (?&#038;FWS)?
                                   \] (?&#038;CFWS)?)
     (?&lt;dcontent>        (?&#038;dtext) | (?&#038;quoted_pair))
     (?&lt;dtext>           (?&#038;NO_WS_CTL) | [\x21-\x5a\x5e-\x7e])

     (?&lt;atext>           (?&#038;ALPHA) | (?&#038;DIGIT) | [!#\$%&#038;'*+-/=?^_`{|}~])
     (?&lt;atom>            (?&#038;CFWS)? (?&#038;atext)+ (?&#038;CFWS)?)
     (?&lt;dot_atom>        (?&#038;CFWS)? (?&#038;dot_atom_text) (?&#038;CFWS)?)
     (?&lt;dot_atom_text>   (?&#038;atext)+ (?: \. (?&#038;atext)+)*)

     (?&lt;text>            [\x01-\x09\x0b\x0c\x0e-\x7f])
     (?&lt;quoted_pair>     \\ (?&#038;text))

     (?&lt;qtext>           (?&#038;NO_WS_CTL) | [\x21\x23-\x5b\x5d-\x7e])
     (?&lt;qcontent>        (?&#038;qtext) | (?&#038;quoted_pair))
     (?&lt;quoted_string>   (?&#038;CFWS)? (?&#038;DQUOTE) (?:(?&#038;FWS)? (?&#038;qcontent))*
                          (?&#038;FWS)? (?&#038;DQUOTE) (?&#038;CFWS)?)

     (?&lt;word>            (?&#038;atom) | (?&#038;quoted_string))
     (?&lt;phrase>          (?&#038;word)+)

     # Folding white space
     (?&lt;FWS>             (?: (?&#038;WSP)* (?&#038;CRLF))? (?&#038;WSP)+)
     (?&lt;ctext>           (?&#038;NO_WS_CTL) | [\x21-\x27\x2a-\x5b\x5d-\x7e])
     (?&lt;ccontent>        (?&#038;ctext) | (?&#038;quoted_pair) | (?&#038;comment))
     (?&lt;comment>         \( (?: (?&#038;FWS)? (?&#038;ccontent))* (?&#038;FWS)? \) )
     (?&lt;CFWS>            (?: (?&#038;FWS)? (?&#038;comment))*
                         (?: (?:(?&#038;FWS)? (?&#038;comment)) | (?&#038;FWS)))

     # No whitespace control
     (?&lt;NO_WS_CTL>       [\x01-\x08\x0b\x0c\x0e-\x1f\x7f])

     (?&lt;ALPHA>           [A-Za-z])
     (?&lt;DIGIT>           [0-9])
     (?&lt;CRLF>            \x0d \x0a)
     (?&lt;DQUOTE>          ")
     (?&lt;WSP>             [\x20\x09])
   )

   (?&#038;address)

}x;
</pre>
<p>If that&#8217;s not clever enough for you, try <a href="http://stackoverflow.com/a/4286326/8817">Tom&#8217;s use of <code>(DEFINE)</code> to properly parse HTML</a>.</p>
<h2>Things to remember</h2>
<ul>
<li>You can reference a named subpattern with <code>(?&#038;NAME)</code>
<li>You can choose a subpattern with a condition <code>(?(condition)yes-pattern|no-pattern)</code>
<li>You can define and label subpatterns for later use with <code>(DEFINE)</code>
</ul>
<p align="left"><a class="tt" href="http://twitter.com/home/?status=Define+grammars+in+regular+expressions+http://tinyurl.com/7zo7wrl" title="Post to Twitter"><img class="nothumb" src="http://www.effectiveperlprogramming.com/wp-content/plugins/tweet-this/icons/tt-twitter2.png" alt="Post to Twitter" /></a> <a class="tt" href="http://twitter.com/home/?status=Define+grammars+in+regular+expressions+http://tinyurl.com/7zo7wrl" title="Post to Twitter"> </a> <a class="tt" href="http://delicious.com/post?url=http://www.effectiveperlprogramming.com/blog/1479&amp;title=Define+grammars+in+regular+expressions" title="Post to Delicious"><img class="nothumb" src="http://www.effectiveperlprogramming.com/wp-content/plugins/tweet-this/icons/tt-delicious.png" alt="Post to Delicious" /></a> <a class="tt" href="http://delicious.com/post?url=http://www.effectiveperlprogramming.com/blog/1479&amp;title=Define+grammars+in+regular+expressions" title="Post to Delicious"> </a> <a class="tt" href="http://digg.com/submit?url=http://www.effectiveperlprogramming.com/blog/1479&amp;title=Define+grammars+in+regular+expressions" title="Post to Digg"><img class="nothumb" src="http://www.effectiveperlprogramming.com/wp-content/plugins/tweet-this/icons/tt-digg.png" alt="Post to Digg" /></a> <a class="tt" href="http://digg.com/submit?url=http://www.effectiveperlprogramming.com/blog/1479&amp;title=Define+grammars+in+regular+expressions" title="Post to Digg"> </a> <a class="tt" href="http://www.facebook.com/share.php?u=http://www.effectiveperlprogramming.com/blog/1479&amp;t=Define+grammars+in+regular+expressions" title="Post to Facebook"><img class="nothumb" src="http://www.effectiveperlprogramming.com/wp-content/plugins/tweet-this/icons/tt-facebook.png" alt="Post to Facebook" /></a> <a class="tt" href="http://www.facebook.com/share.php?u=http://www.effectiveperlprogramming.com/blog/1479&amp;t=Define+grammars+in+regular+expressions" title="Post to Facebook"> </a> <a class="tt" href="http://reddit.com/submit?url=http://www.effectiveperlprogramming.com/blog/1479&amp;title=Define+grammars+in+regular+expressions" title="Post to Reddit"><img class="nothumb" src="http://www.effectiveperlprogramming.com/wp-content/plugins/tweet-this/icons/tt-reddit.png" alt="Post to Reddit" /></a> <a class="tt" href="http://reddit.com/submit?url=http://www.effectiveperlprogramming.com/blog/1479&amp;title=Define+grammars+in+regular+expressions" title="Post to Reddit"> </a></p>]]></content:encoded>
			<wfw:commentRss>http://www.effectiveperlprogramming.com/blog/1479/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Create your own dualvars</title>
		<link>http://www.effectiveperlprogramming.com/blog/1470</link>
		<comments>http://www.effectiveperlprogramming.com/blog/1470#comments</comments>
		<pubDate>Sun, 11 Dec 2011 13:36:37 +0000</pubDate>
		<dc:creator>brian d foy</dc:creator>
				<category><![CDATA[item]]></category>
		<category><![CDATA[miscellany]]></category>

		<guid isPermaLink="false">http://www.effectiveperlprogramming.com/?p=1470</guid>
		<description><![CDATA[Perl&#8217;s basic data type is the scalar, which takes its name from the mathematical term for &#8220;single item&#8221;. However, the scalar is really two things. You probably know that a scalar can be either a number or a string, or a number that looks the same as its string, or a string that can be [...]]]></description>
			<content:encoded><![CDATA[<p>Perl&#8217;s basic data type is the scalar, which takes its name from the mathematical term for &#8220;single item&#8221;. However, the scalar is really two things. You probably know that a scalar can be either a number or a string, or a number that looks the same as its string, or a string that can be a number. What you probably don&#8217;t know is that a scalar can be two separate and unrelated values at the same time, making it a <i>dualvar</i>.</p>
<p>You&#8217;re already using a dualvar without knowing it. The <code>$!</code> variable, which holds the value of the last system error, is most often used in its string form:</p>
<pre class="brush:perl">
open my $fh, '>', $filename or die "Error: $!";
</pre>
<p>If something goes wrong with that <code>open</code>, you&#8217;ll get an error such as these:</p>
<pre class="brush:plain">
No such file or directory
Permission denied
</pre>
<p>Both of those error messages have numbers associated with them. You can output their numeric value</p>
<pre class="brush:perl">
open my $fh, '>', $filename or die $! + 0;
</pre>
<p>Now the errors are numbers, which correspond to the <code>errno</code> value for the system call:</p>
<pre class="brush:plain">
2
13
</pre>
<p>These numbers are keys in the <code>%!</code>, and the value for that key is true if that was the last system error.</p>
<p>Perl, on its own, doesn&#8217;t give you a way to create this sort of variable yourself. When you assign a new value to a scalar, whether string or number, Perl clears the previous values it has. When you use a number as a string, Perl converts it to a string, and the same the other way around.</p>
<p>You can watch this with <a href="https://www.metacpan.org/module/Devel::Peek">Devel::Peek</a>. Here&#8217;s a program that sets a string value:</p>
<pre class="brush:perl">
use Devel::Peek;

my $value = 'abc';

Dump( $value );
</pre>
<p>In the scalar record, the <code>POK</code> flag is set, indicating the variable has a string form, and the <code>PV</code> slot has a value (see <a href="http://perldoc.perl.org/perlguts.html">perlguts</a> for more details):</p>
<pre class="brush:plain">
SV = PV(0x100801070) at 0x100827810
  REFCNT = 1
  FLAGS = (PADMY,POK,pPOK)
  PV = 0x100202870 "abc"\0
  CUR = 3
  LEN = 16
</pre>
<p>If you set a numeric value, the flags are different:</p>
<pre class="brush:perl">
use Devel::Peek;

my $value = 137;

Dump( $value );
</pre>
<p>Now there&#8217;s an <code>IOK</code> flag, and one of the numeric slots, in this case the <code>IV</code>, is set:</p>
<pre class="brush:plain">
SV = IV(0x100827800) at 0x100827810
  REFCNT = 1
  FLAGS = (PADMY,IOK,pIOK)
  IV = 137
</pre>
<p>However, if you take the numeric value and use it in a string context, Perl also creates a string version. Now the scalar has both the <code>IOK</code> and <code>POK</code> flags, and both the <code>IV</code> and <code>PV</code> slots have values:</p>
<pre class="brush:plain">
137
SV = PVIV(0x100809208) at 0x100827840
  REFCNT = 1
  FLAGS = (PADMY,IOK,POK,pIOK,pPOK)
  IV = 137
  PV = 0x100202870 "137"\0
  CUR = 3
  LEN = 16
</pre>
<p>If you change the variable, some of those flags disappear, even though the values don&#8217;t necessarily disappear:</p>
<pre class="brush:perl">
use v5.10;
use Devel::Peek;

my $value = 137;

$value = 'Buster';

Dump( $value );
</pre>
<p>Now the scalar&#8217;s value should be just <code>Buster</code>, but the <code>IV</code> slot still has the old <code>137</code>. However, the <code>IOK</code> flag is gone:</p>
<pre class="brush:plain">
SV = PVIV(0x100809208) at 0x100827840
  REFCNT = 1
  FLAGS = (PADMY,POK,pPOK)
  IV = 137
  PV = 0x100202870 "Buster"\0
  CUR = 6
  LEN = 16
</pre>
<p>When you set the new value, and Perl hadn&#8217;t used it in a numeric context yet, there was no need to go through the work to translate the string value to the number value. Instead, it just unset the flag that denotes its okay to use the value as a number. You have to use it as a number to again:</p>
<pre class="brush:perl">
use v5.10;
use Devel::Peek;

my $value = 137;

$value = 'Buster';

say $value + 0;

Dump( $value );
</pre>
<p>Now Perl converts it to a number and sets the numeric flags again:</p>
<pre class="brush:plain">
0
SV = PVNV(0x100801e30) at 0x100827840
  REFCNT = 1
  FLAGS = (PADMY,POK,pIOK,pNOK,pPOK)
  IV = 0
  NV = 0
  PV = 0x100202870 "Buster"\0
  CUR = 6
  LEN = 16
</pre>
<p>That&#8217;s how it works if you go through the Perl interface, but if you play it a scalar through the XS interface, you can set whatever flags and values that you like. That&#8217;s exactly what <a href="https://www.metacpan.org/module/Scalar::Util">Scalar::Util</a>&#8216;s <code>dualvar</code> does that for you:</p>
<pre class="brush:perl">
use v5.10;
use Devel::Peek;
use Scalar::Util qw(dualvar);

my $value = dualvar 137, 'Buster';

Dump( $value );

say "$value";
say $value + 0;
</pre>
<p>Now you have a scalar which has unrelated numeric and string values, <i>and</i> the flags for both values are set:</p>
<pre class="brush:plain">
SV = PVNV(0x100802010) at 0x1008277c8
  REFCNT = 1
  FLAGS = (PADMY,IOK,POK,pIOK,pPOK)
  IV = 137
  NV = 0
  PV = 0x100202870 "Buster"\0
  CUR = 6
  LEN = 16
Buster
137
</pre>
<h2>Things to remember</h2>
<ul>
<li>Scalars can have both numeric and string values at the same time
<li>Those two values can be unrelated
<li>You can create your own dualvar with <a href="https://www.metacpan.org/module/Scalar::Util">Scalar::Util</a>
</ul>
<p align="left"><a class="tt" href="http://twitter.com/home/?status=Create+your+own+dualvars+http://tinyurl.com/6pxrro3" title="Post to Twitter"><img class="nothumb" src="http://www.effectiveperlprogramming.com/wp-content/plugins/tweet-this/icons/tt-twitter2.png" alt="Post to Twitter" /></a> <a class="tt" href="http://twitter.com/home/?status=Create+your+own+dualvars+http://tinyurl.com/6pxrro3" title="Post to Twitter"> </a> <a class="tt" href="http://delicious.com/post?url=http://www.effectiveperlprogramming.com/blog/1470&amp;title=Create+your+own+dualvars" title="Post to Delicious"><img class="nothumb" src="http://www.effectiveperlprogramming.com/wp-content/plugins/tweet-this/icons/tt-delicious.png" alt="Post to Delicious" /></a> <a class="tt" href="http://delicious.com/post?url=http://www.effectiveperlprogramming.com/blog/1470&amp;title=Create+your+own+dualvars" title="Post to Delicious"> </a> <a class="tt" href="http://digg.com/submit?url=http://www.effectiveperlprogramming.com/blog/1470&amp;title=Create+your+own+dualvars" title="Post to Digg"><img class="nothumb" src="http://www.effectiveperlprogramming.com/wp-content/plugins/tweet-this/icons/tt-digg.png" alt="Post to Digg" /></a> <a class="tt" href="http://digg.com/submit?url=http://www.effectiveperlprogramming.com/blog/1470&amp;title=Create+your+own+dualvars" title="Post to Digg"> </a> <a class="tt" href="http://www.facebook.com/share.php?u=http://www.effectiveperlprogramming.com/blog/1470&amp;t=Create+your+own+dualvars" title="Post to Facebook"><img class="nothumb" src="http://www.effectiveperlprogramming.com/wp-content/plugins/tweet-this/icons/tt-facebook.png" alt="Post to Facebook" /></a> <a class="tt" href="http://www.facebook.com/share.php?u=http://www.effectiveperlprogramming.com/blog/1470&amp;t=Create+your+own+dualvars" title="Post to Facebook"> </a> <a class="tt" href="http://reddit.com/submit?url=http://www.effectiveperlprogramming.com/blog/1470&amp;title=Create+your+own+dualvars" title="Post to Reddit"><img class="nothumb" src="http://www.effectiveperlprogramming.com/wp-content/plugins/tweet-this/icons/tt-reddit.png" alt="Post to Reddit" /></a> <a class="tt" href="http://reddit.com/submit?url=http://www.effectiveperlprogramming.com/blog/1470&amp;title=Create+your+own+dualvars" title="Post to Reddit"> </a></p>]]></content:encoded>
			<wfw:commentRss>http://www.effectiveperlprogramming.com/blog/1470/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Make disposable web servers for testing</title>
		<link>http://www.effectiveperlprogramming.com/blog/1463</link>
		<comments>http://www.effectiveperlprogramming.com/blog/1463#comments</comments>
		<pubDate>Sun, 04 Dec 2011 07:42:50 +0000</pubDate>
		<dc:creator>brian d foy</dc:creator>
				<category><![CDATA[item]]></category>
		<category><![CDATA[testing]]></category>

		<guid isPermaLink="false">http://www.effectiveperlprogramming.com/?p=1463</guid>
		<description><![CDATA[If you project depends on a interaction with a web server, especially a remote one, you have some challenges with testing that portion. Even if you can get it working for you, when you distribute your code, someone else might not be able to reach your server for testing. Instead of relying on an external [...]]]></description>
			<content:encoded><![CDATA[<p>If you project depends on a interaction with a web server, especially a remote one, you have some challenges with testing that portion. Even if you can get it working for you, when you distribute your code, someone else might not be able to reach your server for testing. Instead of relying on an external server, you can use a local server that you write especially for your test suite.</p>
<p>This problem has a couple tricky parts. To run a test server from your own test suite, your server needs to bind to a port that&#8217;s not already in use. When it has a port, it needs to communicate that to your test script.</p>
<p>Another problem, which you won&#8217;t consider for this Item, involves the configurability of your program so you can change the hostname and port during the test. This Item assumes you&#8217;ve taken are of that bit.</p>
<p>The <a href="https://www.metacpan.org/module/Test::Fake::HTTPD">Test::Fake::HTTPD</a> module can create a web server directly from your test script. This example creates a web server that returns the same JSON response for every request:</p>
<pre class="brush:perl">
use Test::More;
use Test::Fake::HTTPD;

use Mojo::UserAgent;

my $httpd = run_http_server {
	my $request = shift;

	return [
		200,
		[ 'Content-Type' => 'application/json' ],
		[ '{ "cat": "Buster" }' ]
		];
	};

ok( defined $httpd, 'Got a web server' );

diag( sprintf "You can connect to your server at %s.\n", $httpd->host_port );

my $response = Mojo::UserAgent->new->get(
	$httpd->endpoint
	)->res;

diag( $response->to_string );
is( $response->json->{cat}, 'Buster', 'Cat is Buster' );

done_testing();
</pre>
<p>The test output shows the a message telling you the host and port, as well as the response:</p>
<pre class="brush:plain">
ok 1 - Got a web server
# You can connect to your server at 127.0.0.1:50602.
# HTTP/1.1 200 OK
# Content-Type: application/json
# Date: Wed, 07 Dec 2011 08:44:40 GMT
# Content-Length: 19
# Server: libwww-perl-daemon/6.00
#
# { "cat": "Buster" }
ok 2 - Cat is Buster
1..2
</pre>
<p>When the test script ends (or the web server variable goes out of scope, so there&#8217;s nothing for you to cleanup. It&#8217;s a disposable web server.</p>
<p>In your web server, you can do anything that you like. It doesn&#8217;t have to implement everything, or even close to everything, that the production server does. It just has to return responses that you can use in your tests. That means that you get to control not only the success, but also the failures.</p>
<p>That <code>Server</code> line gives you a hint about what created the <code>$request</code> object—it&#8217;s <a href="https://www.metacpan.org/module/LWP">LWP</a> behind the scenes, so it&#8217;s <a href="https://www.metacpan.org/module/HTTP::Request">HTTP::Request</a>. You can get the requested path with the <code>uri</code> method and decide what to do:</p>
<pre class="brush:perl">
use strict;
use warnings;

use Test::More;
use Test::Fake::HTTPD;

use Mojo::UserAgent;
use URI;

my $httpd = run_http_server {
	my $request = shift;

	my $uri = $request->uri;

	return do {
		if( $uri->path eq '/' ) {
			[
				200,
				[ 'Content-Type' => 'text/plain' ],
				[ "Ask about our cats!" ],
			]
			}
		elsif( $uri->path eq '/cats' ) {
			[
				200,
				[ 'Content-Type' => 'application/json' ],
				[ '{ "cat": "Buster" }' ],
			]
			}
		elsif( $uri->path eq '/dogs' ) {
			[
				408,
				[ 'Content-Type' => 'text/plain' ],
				[ "We don't walk dogs" ],
			]
			}
		else {
			[
				404,
				[ 'Content-Type' => 'text/plain' ],
				[ "Not Found" ],
			]
			}
		}
	};

ok( defined $httpd, 'Got a web server' );
diag( sprintf "You can connect to your server at %s.\n", $httpd->host_port );

my $uri = URI->new( $httpd->endpoint );
isa_ok( $uri, 'URI' );

subtest '/' => sub {
	plan tests => 3;

	my $this = $uri->clone;
	isa_ok( $this, 'URI' );
	$this->path( '/' );

	my $response = Mojo::UserAgent->new->get( $this )->res;

	is( $response->headers->content_type,
		'text/plain', 'Top level is plain text' );
	like( $response->body , qr/cats/, 'Top level has cats' );
	};

subtest '/cats' => sub {
	plan tests => 2;

	my $this = $uri->clone;
	isa_ok( $this, 'URI' );
	$this->path( '/cats' );

	my $response = Mojo::UserAgent->new->get( $this )->res;

	is( $response->headers->content_type,
		'application/json', '/cats returns JSON' );
	};

subtest '/not_there' => sub {
	plan tests => 2;

	my $this = $uri->clone;
	isa_ok( $this, 'URI' );
	$this->path( '/not_there' );

	my $response = Mojo::UserAgent->new->get( $this )->res;

	is( $response->code,
		'404', '/not_there returns 404' );
	};

done_testing();
</pre>
<p>With subtests organized around accesses to paths, the TAP isn&#8217;t so hard to read (although you probably won&#8217;t have to look at it  yourself):</p>
<pre class="brush:plain">
ok 1 - Got a web server
# You can connect to your server at 127.0.0.1:50498.
ok 2 - The object isa URI
    1..3
    ok 1 - The object isa URI
    ok 2 - Top level is plain text
    ok 3 - Top level has cats
ok 3 - /
    1..2
    ok 1 - The object isa URI
    ok 2 - /cats returns JSON
ok 4 - /cats
    1..2
    ok 1 - The object isa URI
    ok 2 - /not_there returns 404
ok 5 - /not_there
1..5
</pre>
<p>If you need to use the same test webserver in more than one test script, you move it into its own file.</p>
<pre class="brush:perl">
use strict;
use warnings;

use Test::More;

use Mojo::UserAgent;
use URI;

require 'server.pl';
my $httpd = get_http_server();

ok( defined $httpd, 'Got a web server' );
diag( sprintf "You can connect to your server at %s.\n", $httpd->host_port );

my $uri = URI->new( $httpd->endpoint );
isa_ok( $uri, 'URI' );

...
</pre>
<p>The <i>server.pl</i> file wraps the call to <code>run_http_server</code>. But, if you are going to do that, you might as well skip the convenience method and set up your own object. You can change the timeout value, for instance:</p>
<pre class="brush:perl">
use Test::Fake::HTTPD;

sub get_http_server {
	my $httpd = Test::Fake::HTTPD->new(
		timeout => 30,
		);

	$httpd->run( sub {
		my $request = shift;
		...;
		} );

	$httpd;
	}
</pre>
<p>Different test scripts each get their own test web server. If you&#8217;re running several tests scripts in parallel, you&#8217;ll start several servers at the same time, which not be that kind to your system or to the other people using it. If you don&#8217;t like that, you could set up a single server at the start of your test run, share it with all tests, and shut down everything at the end, although you won&#8217;t see that in this Item.</p>
<h2>Things to remember</h2>
<ul>
<li>Test web interactions locally
<li>Use <a href="https://www.metacpan.org/module/Test::Fake::HTTPD">Test::Fake::HTTPD</a> to create cheap, disposable web servers
</ul>
<p align="left"><a class="tt" href="http://twitter.com/home/?status=Make+disposable+web+servers+for+testing+http://tinyurl.com/6nvq2h7" title="Post to Twitter"><img class="nothumb" src="http://www.effectiveperlprogramming.com/wp-content/plugins/tweet-this/icons/tt-twitter2.png" alt="Post to Twitter" /></a> <a class="tt" href="http://twitter.com/home/?status=Make+disposable+web+servers+for+testing+http://tinyurl.com/6nvq2h7" title="Post to Twitter"> </a> <a class="tt" href="http://delicious.com/post?url=http://www.effectiveperlprogramming.com/blog/1463&amp;title=Make+disposable+web+servers+for+testing" title="Post to Delicious"><img class="nothumb" src="http://www.effectiveperlprogramming.com/wp-content/plugins/tweet-this/icons/tt-delicious.png" alt="Post to Delicious" /></a> <a class="tt" href="http://delicious.com/post?url=http://www.effectiveperlprogramming.com/blog/1463&amp;title=Make+disposable+web+servers+for+testing" title="Post to Delicious"> </a> <a class="tt" href="http://digg.com/submit?url=http://www.effectiveperlprogramming.com/blog/1463&amp;title=Make+disposable+web+servers+for+testing" title="Post to Digg"><img class="nothumb" src="http://www.effectiveperlprogramming.com/wp-content/plugins/tweet-this/icons/tt-digg.png" alt="Post to Digg" /></a> <a class="tt" href="http://digg.com/submit?url=http://www.effectiveperlprogramming.com/blog/1463&amp;title=Make+disposable+web+servers+for+testing" title="Post to Digg"> </a> <a class="tt" href="http://www.facebook.com/share.php?u=http://www.effectiveperlprogramming.com/blog/1463&amp;t=Make+disposable+web+servers+for+testing" title="Post to Facebook"><img class="nothumb" src="http://www.effectiveperlprogramming.com/wp-content/plugins/tweet-this/icons/tt-facebook.png" alt="Post to Facebook" /></a> <a class="tt" href="http://www.facebook.com/share.php?u=http://www.effectiveperlprogramming.com/blog/1463&amp;t=Make+disposable+web+servers+for+testing" title="Post to Facebook"> </a> <a class="tt" href="http://reddit.com/submit?url=http://www.effectiveperlprogramming.com/blog/1463&amp;title=Make+disposable+web+servers+for+testing" title="Post to Reddit"><img class="nothumb" src="http://www.effectiveperlprogramming.com/wp-content/plugins/tweet-this/icons/tt-reddit.png" alt="Post to Reddit" /></a> <a class="tt" href="http://reddit.com/submit?url=http://www.effectiveperlprogramming.com/blog/1463&amp;title=Make+disposable+web+servers+for+testing" title="Post to Reddit"> </a></p>]]></content:encoded>
			<wfw:commentRss>http://www.effectiveperlprogramming.com/blog/1463/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Know split&#8217;s special cases</title>
		<link>http://www.effectiveperlprogramming.com/blog/1416</link>
		<comments>http://www.effectiveperlprogramming.com/blog/1416#comments</comments>
		<pubDate>Sun, 27 Nov 2011 00:09:00 +0000</pubDate>
		<dc:creator>brian d foy</dc:creator>
				<category><![CDATA[Idiomatic Perl]]></category>
		<category><![CDATA[item]]></category>

		<guid isPermaLink="false">http://www.effectiveperlprogramming.com/?p=1416</guid>
		<description><![CDATA[Perl&#8217;s split has some special cases and some perhaps surprising cases. The empty pattern, zero width match, the special argument ' ', and the /^/ act differently than you might expect from the general rule. The empty pattern, // The empty pattern is a special case that&#8217;s designed to give you a list of characters. [...]]]></description>
			<content:encoded><![CDATA[<p>Perl&#8217;s <a href="http://perldoc.perl.org/functions/split.html">split</a> has some special cases and some perhaps surprising cases. The empty pattern, zero width match, the special argument <code>' '</code>, and the <code>/^/</code> act differently than you might expect from the general rule.</p>
<h2>The empty pattern, //</h2>
<p>The empty pattern is a special case that&#8217;s designed to give you a list of characters. This pattern specifically has nothing in it and is different than a pattern that matches an empty string (that&#8217;s next). For this, <a href="http://perldoc.perl.org/functions/split.html">split</a> returns a list of characters:</p>
<pre class="brush:perl">
use utf8;

use Data::Printer;
my @characters = split //, 'Büster';

p( @characters );
</pre>
<p>The output shows a list of characters:</p>
<pre class="brush:plain">
[
    [0] "B",
    [1] "ü",
    [2] "s",
    [3] "t",
    [4] "e",
    [5] "r"
]
</pre>
<p>This is specifically characters, not grapheme clusters. Depending on the normalization of your source code or input, you can get different results:</p>
<pre class="brush:perl">
use utf8;

use Data::Printer;
use Unicode::Normalize qw(NFD);

my @characters = split //, NFD( 'Büster' );

p( @characters );
</pre>
<p>Now the grapheme cluster <i>ü</i> is actually two characters, the <i>u</i> (U+0075 ʟᴀᴛɪɴ sᴍᴀʟʟ ʟᴇᴛᴛᴇʀ ᴜ) and the <i>¨</i> (U+0308 ᴄᴏᴍʙɪɴɪɴɢ ᴅɪᴀᴇʀᴇsɪs), instead of the single <i>ü</i> (U+00FC ʟᴀᴛɪɴ sᴍᴀʟʟ ʟᴇᴛᴛᴇʀ ᴜ ᴡɪᴛʜ ᴅɪᴀᴇʀᴇsɪs):</p>
<pre class="brush:plain">
[
    [0] "B",
    [1] "u",
    [2] "̈",
    [3] "s",
    [4] "t",
    [5] "e",
    [6] "r"
]
</pre>
<p>You can review grapheme clusters in <a href="http://www.effectiveperlprogramming.com/blog/1286">Treat Unicode strings as grapheme clusters</a>.</p>
<h2>Matching the empty string</h2>
<p>Successfully matching no characters isn&#8217;t really a special case, but people are sometimes surprised about it because it seems special. Your <a href="http://perldoc.perl.org/functions/split.html">split</a> pattern might match an empty string. This is different from the empty pattern because you actually have a pattern, even if it might match zero characters. This is also distinct from a pattern that doesn&#8217;t match:</p>
<pre class="brush:perl">
use v5.10;

my $_ = 'Mimi';

say "Matched empty pattern" if //;
say "Matched optional whitespace" if /\s*/;
say "Matched zero width assertion" if /(?=\w+)/;
say "How did Buster match?" if /Buster/;
</pre>
<p>The first three of these patterns match successfully but matches zero characters, while the fourth fails:</p>
<pre class="brush:plain">
Matched empty pattern
Matched optional whitespace
Matched zero width assertion
</pre>
<p>It&#8217;s easy to construct a pattern that will match zero characters even though it matches successfully. The <code>?</code> (zero or one) and <code>*</code> (zero or more) quantifiers do that quite nicely. Zero width assertions, such as the boundaries and lookarounds, do that too. If the pattern can match zero characters successfully, Perl splits into characters:</p>
<pre class="brush:perl">
use Data::Printer;
my @characters = split /\s*/, 'Buster';

p( @characters );
</pre>
<pre class="brush:plain">
[
    [0] "B",
    [1] "u",
    [2] "s",
    [3] "t",
    [4] "e",
    [5] "r"
]
</pre>
<p>The pattern doesn&#8217;t have to match zero characters for all separators.</p>
<pre class="brush:perl">
use Data::Printer;
my @characters = split /\s*/, 'Buster and Mimi';

p( @characters );
</pre>
<p>Notice that there are no spaces in <code>@characters</code>, since <a href="http://perldoc.perl.org/functions/split.html">split</a> matched those as separator characters:</p>
<pre class="brush:plain">
[
    [0]  "B",
    [1]  "u",
    [2]  "s",
    [3]  "t",
    [4]  "e",
    [5]  "r",
    [6]  "a",
    [7]  "n",
    [8]  "d",
    [9]  "M",
    [10] "i",
    [11] "m",
    [12] "i"
]
</pre>
<h2>The single space, &#8216; &#8216;</h2>
<p>The single space in quotes, single or double, is a special case. It splits on whitespace, but unlike the pattern that is a single space, the one in quotes discards empty leading fields:</p>
<pre class="brush:perl">
use Data::Printer;
my @characters = split ' ', '  Buster and Mimi';

p( @characters );
</pre>
<p>You get just the non-whitespace with no empty fields:</p>
<pre class="brush:plain">
[
    [0] "Buster",
    [1] "and",
    [2] "Mimi"
]
</pre>
<p>This behavior comes from <i>awk</i>:</p>
<pre class="brush:plain">
#!/usr/bin/awk -f
BEGIN {
    string="  Buster Mimi Roscoe";
    search=" ";
    n=split(string,array," ");
    print("[");
    for (i=1;i<=n;i++) {
        printf("    [%d] \"%s\"\n",i,array[i]);
    }
    print("]");
    exit;
}
</pre>
<p>You end up with almost the same input, although the indices are one greater:</p>
<pre class="brush:plain">
[
    [1] "Buster"
    [2] "Mimi"
    [3] "Roscoe"
]
</pre>
<p>Back in Perl, if you tried that with the normal match operator delimiters, you get a different result:</p>
<pre class="brush:perl">
use Data::Printer;
my @characters = split / /, '  Buster and Mimi';

p( @characters );
</pre>
<p>This time you kept the empty leading fields:</p>
<pre class="brush:plain">
[
    [0] "",
    [1] "",
    [2] "Buster",
    [3] "and",
    [4] "Mimi"
]
</pre>
<p>If you include the <code>m</code> in front of the quotes though, you lose the special magic:</p>
<pre class="brush:perl">
use Data::Printer;
my @characters = split m' ', '  Buster and Mimi';

p( @characters );
</pre>
<p>The empty leading fields are back:</p>
<pre class="brush:plain">
[
    [0] "",
    [1] "",
    [2] "Buster",
    [3] "and",
    [4] "Mimi"
]
</pre>
<h2>Splitting lines</h2>
<p>The special pattern of just the beginning-of-line anchor, even without the <code>/m</code> flag, breaks a multi-line string into lines:</p>
<pre class="brush:perl">
use Data::Printer;

my $string = <<'HERE';
Line one
Line two
Line three
HERE

my @lines = split /^/, $string;

p( @lines );
</pre>
<p>Even without the <code>/m</code> you get separate lines:</p>
<pre class="brush:plain">
[
    [0] "Line one
",
    [1] "Line two
",
    [2] "Line three
"
]
</pre>
<p>This only works if the pattern is exactly <code>/^/</code>. If you put anything else in the pattern, you don't get the special behavior, even if it's a zero width match:</p>
<pre class="brush:perl">
...; # same as before

my @lines = split /^(?=Line)/, $string;  # Oops

p( @lines );
</pre>
<p>Now there's only one field:</p>
<pre class="brush:plain">
[
    [0] "Line one
Line two
Line three
Line four
"
]
</pre>
<h2>Things to remember</h2>
<ul>
<li>The empty pattern <code>//</code> splits on characters, but not grapheme clusters
<li>A zero width successful match splits on characters too
<li>The single space in quotes splits on whitespace and discards leading empty fields
<li>The <code>^</code> anchor by itself splits into lines, even without the <code>/m</code>
</ul>
<p align="left"><a class="tt" href="http://twitter.com/home/?status=Know+split%E2%80%99s+special+cases+http://tinyurl.com/ck9yhh4" title="Post to Twitter"><img class="nothumb" src="http://www.effectiveperlprogramming.com/wp-content/plugins/tweet-this/icons/tt-twitter2.png" alt="Post to Twitter" /></a> <a class="tt" href="http://twitter.com/home/?status=Know+split%E2%80%99s+special+cases+http://tinyurl.com/ck9yhh4" title="Post to Twitter"> </a> <a class="tt" href="http://delicious.com/post?url=http://www.effectiveperlprogramming.com/blog/1416&amp;title=Know+split%E2%80%99s+special+cases" title="Post to Delicious"><img class="nothumb" src="http://www.effectiveperlprogramming.com/wp-content/plugins/tweet-this/icons/tt-delicious.png" alt="Post to Delicious" /></a> <a class="tt" href="http://delicious.com/post?url=http://www.effectiveperlprogramming.com/blog/1416&amp;title=Know+split%E2%80%99s+special+cases" title="Post to Delicious"> </a> <a class="tt" href="http://digg.com/submit?url=http://www.effectiveperlprogramming.com/blog/1416&amp;title=Know+split%E2%80%99s+special+cases" title="Post to Digg"><img class="nothumb" src="http://www.effectiveperlprogramming.com/wp-content/plugins/tweet-this/icons/tt-digg.png" alt="Post to Digg" /></a> <a class="tt" href="http://digg.com/submit?url=http://www.effectiveperlprogramming.com/blog/1416&amp;title=Know+split%E2%80%99s+special+cases" title="Post to Digg"> </a> <a class="tt" href="http://www.facebook.com/share.php?u=http://www.effectiveperlprogramming.com/blog/1416&amp;t=Know+split%E2%80%99s+special+cases" title="Post to Facebook"><img class="nothumb" src="http://www.effectiveperlprogramming.com/wp-content/plugins/tweet-this/icons/tt-facebook.png" alt="Post to Facebook" /></a> <a class="tt" href="http://www.facebook.com/share.php?u=http://www.effectiveperlprogramming.com/blog/1416&amp;t=Know+split%E2%80%99s+special+cases" title="Post to Facebook"> </a> <a class="tt" href="http://reddit.com/submit?url=http://www.effectiveperlprogramming.com/blog/1416&amp;title=Know+split%E2%80%99s+special+cases" title="Post to Reddit"><img class="nothumb" src="http://www.effectiveperlprogramming.com/wp-content/plugins/tweet-this/icons/tt-reddit.png" alt="Post to Reddit" /></a> <a class="tt" href="http://reddit.com/submit?url=http://www.effectiveperlprogramming.com/blog/1416&amp;title=Know+split%E2%80%99s+special+cases" title="Post to Reddit"> </a></p>]]></content:encoded>
			<wfw:commentRss>http://www.effectiveperlprogramming.com/blog/1416/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Make grep-like syntax</title>
		<link>http://www.effectiveperlprogramming.com/blog/1408</link>
		<comments>http://www.effectiveperlprogramming.com/blog/1408#comments</comments>
		<pubDate>Sun, 30 Oct 2011 07:03:34 +0000</pubDate>
		<dc:creator>brian d foy</dc:creator>
				<category><![CDATA[item]]></category>
		<category><![CDATA[subroutines]]></category>

		<guid isPermaLink="false">http://www.effectiveperlprogramming.com/?p=1408</guid>
		<description><![CDATA[To create grep- or map-like syntax, you need to use Perl&#8217;s prototypes, despite whatever we told you in Understand why you probably don’t need prototypes. Perl needs the special hints that prototypes to parse a block as an argument to a subroutine. First, remember the forms of grep. There&#8217;s a single expression version and a [...]]]></description>
			<content:encoded><![CDATA[<p>To create <a href="http://perldoc.perl.org/functions/grep.html">grep</a>- or <a href="http://perldoc.perl.org/functions/map.html">map</a>-like syntax, you need to use Perl&#8217;s prototypes, despite whatever we told you in <a href="http://www.effectiveperlprogramming.com/blog/1406">Understand why you probably don’t need prototypes</a>. Perl needs the special hints that prototypes to parse a block as an argument to a subroutine.</p>
<p>First, remember the forms of <a href="http://perldoc.perl.org/functions/grep.html">grep</a>. There&#8217;s a single expression version and a block version:</p>
<pre class="brush:perl">
grep EXPR, @input       # with a comma
grep { ... } @input     # no comma
</pre>
<p>That block, the <code>{...}</code>, is an inline subroutine where the current element shows up in <code>$_</code>:</p>
<pre class="brush:perl">
my @odds = grep { $_ % 2 } @input;
</pre>
<p>For either form, there&#8217;s a scalar and list return value, depending on context (<span class="item">Item 12. Understand context and how it affects operations</span>):</p>
<pre class="brush:perl">
my @array = grep ...;
my $count = grep ...;
</pre>
<p>You can make your own subroutines that work just like <a href="http://perldoc.perl.org/functions/grep.html">grep</a>. The prototype character <code>&#038;</code> tells <i>perl</i> to expect a subroutine reference. However, it can only be the first argument. To try it, define a subroutine that takes a single argument, a code reference:</p>
<pre class="brush:perl">
sub run_it (&#038;) {
	my $sub = shift;
	$sub->();
	}
</pre>
<p>You can call that subroutine in several ways. You can use a block, the <a href="http://perldoc.perl.org/functions/sub.html">sub</a> keyword with a block, or a reference to a subroutine, or a reference to a named subroutine:</p>
<pre class="brush:perl">
use v5.10;

sub named { say "I have a name" }

my $result = run_it { say "Hello!" };
   $result = run_it sub { say "I have a keyword!" };
   $result = run_it \&named;
</pre>
<p>However, <i>perl</i> is not smart enough to recognize other forms. It won&#8217;t like a scalar variable that <i>might</i> have a code reference later, and it can&#8217;t take a bareword that is the name of a defined subroutine (like <a href="http://perldoc.perl.org/functions/sort.html">sort</a> will). These are compile-time errors:</p>
<pre class="brush:perl">
use v5.10;

sub named { say "I have a name" }
my $code_ref = \&named;

my $result = run_it $code_ref;
   $result = run_it named;
   $result = run_it &named;
</pre>
<h2>Handling grep&#8217;s second argument</h2>
<p>Now you can take a code reference as an argument. The next part of the grep syntax is the the input. You could use the <code>@</code> character to denote a list of arguments (not an array argument), and that appears to work:</p>
<pre class="brush:perl">
use v5.10;
use warnings;

sub do_with_array (&#038;@) {
	my( $sub, @args ) = @_;
	my @output;

	foreach my $elem ( @args ) {
		local $_ = $elem;
		push @output, $sub->();
		}
	return @output;
}

sub other_cats { qw(Ellie Ginger) }

my @cats = qw(Buster Mimi Roscoe);

@result = do_with_array { say $_ } @cats;
@result = do_with_array { say $_ } qw(Buster Mimi Roscoe);
@result = do_with_array { say $_ } 1 .. 10;
@result = do_with_array { say $_ } other_cats();
</pre>
<p>Like <a href="http://perldoc.perl.org/functions/grep.html">grep</a> which can alias <code>$_</code> to the original data, you can also means that you can change the original data with your subroutine argument if you use <code>@_</code> (<span class="item">Item 114. Know when arrays are modified in a loop</span>):</p>
<pre class="brush:perl">
use v5.10;
use warnings;

sub do_with_array (&#038;@) {
   my $sub = shift;
   my @output;
   local $_;

   foreach ( @_ ) {
	   push @output, $sub->();
   }
   return @output;
}

my @original   = qw(1 2 3);
my @new        = do_with_array { $_ += 2 } @original;
say "new = @new";             # 3 4 5
say "original = @original";  # 3 4 5
</pre>
<p>You might try the <code>\@</code> prototype, but that limits you in other ways. Now <i>perl</i> expects a named array as an argument. You cannot use an array reference, range, literal list, or the return values from a subroutine call. It&#8217;s a named array or an error. That&#8217;s no good.</p>
<p>Likewise, you might use the <code>+</code> prototype introduced in Perl 5.14. This allows you to use an array or an array reference argument. Perl doesn&#8217;t complain if you use a range or a subroutine call, but it also doesn&#8217;t do the right thing:</p>
<pre class="brush:perl">
use v5.10;
use warnings;

sub do_with_array (&#038;+) {
	my( $sub, $array ) = @_;
	my @output;

	foreach my $elem ( @$array ) {
		local $_ = $elem;
		push @output, $sub->();
		}
	return @output;
}

sub other_cats { qw(Ellie Ginger) }

my @cats = qw(Buster Mimi Roscoe);

@result = do_with_array { say $_ } @cats;
@result = do_with_array { say $_ } [ 'a' .. 'g' ];
@result = do_with_array { say $_ } 1 .. 10;
@result = do_with_array { say $_ } other_cats();
</pre>
<p>The named subroutine and the array reference work just fine. Perl does something weird with the range, and the subroutine call appears to not happen at all:</p>
<pre class="brush:plain">
Buster
Mimi
Roscoe
a
b
c
d
e
f
g
Use of uninitialized value $. in range (or flip) at run_it.pl line 21.
</pre>
<p>Not only that, but the <code>+</code> prototype character also allows named hashes and hash references:</p>
<pre class="brush:perl">
use v5.10;
use warnings;

sub do_with_array (&#038;+) {
	my( $sub, $array ) = @_;
	my @output;

	foreach my $elem ( @$array ) {
		local $_ = $elem;
		push @output, $sub->();
		}
	return @output;
}

my %cats = qw(Buster Mimi Roscoe Ellie);

@result = do_with_array { say $_ } %cats;
@result = do_with_array { say $_ } { 'a' => 'b' };
</pre>
<p>You don&#8217;t get an error until runtime when you try the array dereference:</p>
<pre class="brush:perl">
Not an ARRAY reference at run_it.pl line 8.
</pre>
<p>You can check these things at runtime, though. This reminds you, as we said in <A href="http://www.effectiveperlprogramming.com/blog/1406">Understand why you probably don&#8217;t need prototypes</a>, that prototypes probably don&#8217;t do what you think. You end up doing a lot of the work that most people think prototypes do for you:</p>
<pre class="brush:perl">
use v5.10;
use warnings;
use Carp;

sub do_with_array (&#038;+) {
	my( $sub, $array ) = @_;
	croak "do_with_array takes an array argument"
		unless ref $array eq ref [];

	my @output;

	foreach my $elem ( @$array ) {
		local $_ = $elem;
		push @output, $sub->();
		}

	return @output;
}

my @cats = qw(Buster Mimi Roscoe Ellie);
my %cats = map { $_, 1 } @cats;

@result = do_with_array { say $_ } @cats;
@result = do_with_array { say $_ } %cats;
</pre>
<h2>Handling context</h2>
<p>We explained context in <span>Item 12. Understand context and how it affects operations</span>, and if you want to emulate <a href="http://perldoc.perl.org/functions/grep.html">grep</a> you have to handle them. In list context <a href="http://perldoc.perl.org/functions/grep.html">grep</a> returns a list, in scalar context it returns a count, and in void context it potentially does nothing. You really only need to handle the void case. In this case, you&#8217;ll simply return without doing anything:</p>
<pre class="brush:perl">
use v5.10;
use warnings;
use Carp;

sub do_with_array (&#038;+) {
	return unless defined wantarray;
	my( $sub, $array ) = @_;
	croak "do_with_array takes an array argument"
		unless ref $array eq ref [];

	my @output;

	foreach my $elem ( @$array ) {
		local $_ = $elem;
		push @output, $sub->();
		}

	return @output;
}
</pre>
<p>The list and scalar contexts come from returning an array. When you return a named array, you get the same results as assigning an array. In list context you get the list elements, and in scalar context you get the count. If you want to return something different, such as a list instead of a named array, you have to do more work:</p>
<pre class="brush:perl">
	return wantarray ? qw( a b c ) : 3;
</pre>
<p>So, once again, prototypes half solve the problem, but leave you with more work to do.</p>
<h2>Things to remember</h2>
<ul>
<li>Use the <code>&#038;</code> prototype character to specify a code reference argument
<li>If the code reference argument argument is the first argument, you can leave off the <a href="http://perldoc.perl.org/functions/sub.html">sub</a> keyword
<li>The reference has to be a block of code or a reference to a named subroutine, and specifically not a scalar variable
</ul>
<p align="left"><a class="tt" href="http://twitter.com/home/?status=Make+grep-like+syntax+http://tinyurl.com/bo25zh3" title="Post to Twitter"><img class="nothumb" src="http://www.effectiveperlprogramming.com/wp-content/plugins/tweet-this/icons/tt-twitter2.png" alt="Post to Twitter" /></a> <a class="tt" href="http://twitter.com/home/?status=Make+grep-like+syntax+http://tinyurl.com/bo25zh3" title="Post to Twitter"> </a> <a class="tt" href="http://delicious.com/post?url=http://www.effectiveperlprogramming.com/blog/1408&amp;title=Make+grep-like+syntax" title="Post to Delicious"><img class="nothumb" src="http://www.effectiveperlprogramming.com/wp-content/plugins/tweet-this/icons/tt-delicious.png" alt="Post to Delicious" /></a> <a class="tt" href="http://delicious.com/post?url=http://www.effectiveperlprogramming.com/blog/1408&amp;title=Make+grep-like+syntax" title="Post to Delicious"> </a> <a class="tt" href="http://digg.com/submit?url=http://www.effectiveperlprogramming.com/blog/1408&amp;title=Make+grep-like+syntax" title="Post to Digg"><img class="nothumb" src="http://www.effectiveperlprogramming.com/wp-content/plugins/tweet-this/icons/tt-digg.png" alt="Post to Digg" /></a> <a class="tt" href="http://digg.com/submit?url=http://www.effectiveperlprogramming.com/blog/1408&amp;title=Make+grep-like+syntax" title="Post to Digg"> </a> <a class="tt" href="http://www.facebook.com/share.php?u=http://www.effectiveperlprogramming.com/blog/1408&amp;t=Make+grep-like+syntax" title="Post to Facebook"><img class="nothumb" src="http://www.effectiveperlprogramming.com/wp-content/plugins/tweet-this/icons/tt-facebook.png" alt="Post to Facebook" /></a> <a class="tt" href="http://www.facebook.com/share.php?u=http://www.effectiveperlprogramming.com/blog/1408&amp;t=Make+grep-like+syntax" title="Post to Facebook"> </a> <a class="tt" href="http://reddit.com/submit?url=http://www.effectiveperlprogramming.com/blog/1408&amp;title=Make+grep-like+syntax" title="Post to Reddit"><img class="nothumb" src="http://www.effectiveperlprogramming.com/wp-content/plugins/tweet-this/icons/tt-reddit.png" alt="Post to Reddit" /></a> <a class="tt" href="http://reddit.com/submit?url=http://www.effectiveperlprogramming.com/blog/1408&amp;title=Make+grep-like+syntax" title="Post to Reddit"> </a></p>]]></content:encoded>
			<wfw:commentRss>http://www.effectiveperlprogramming.com/blog/1408/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

