<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>The Effective Perler &#187; administrative note</title>
	<atom:link href="http://www.effectiveperlprogramming.com/blog/category/administrative-note/feed" rel="self" type="application/rss+xml" />
	<link>http://www.effectiveperlprogramming.com</link>
	<description>Effective Perl Programming - write better, more idiomatic Perl</description>
	<lastBuildDate>Sat, 28 Jan 2012 02:19:01 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0.1</generator>
		<item>
		<title>The Effective Perler in 2012 and beyond</title>
		<link>http://www.effectiveperlprogramming.com/blog/1476</link>
		<comments>http://www.effectiveperlprogramming.com/blog/1476#comments</comments>
		<pubDate>Sun, 01 Jan 2012 11:11:40 +0000</pubDate>
		<dc:creator>brian d foy</dc:creator>
				<category><![CDATA[administrative note]]></category>

		<guid isPermaLink="false">http://www.effectiveperlprogramming.com/?p=1476</guid>
		<description><![CDATA[Two years ago, Josh McAdams and I started The Effective Perler as an extension of the second edition of Effective Perl Programming. Since then, roughly once a week, we added one meaty Item a week. Last month, we published our 100th Item. With the 120 Items in the book, that&#8217;s a lot of items. I [...]]]></description>
			<content:encoded><![CDATA[<p>Two years ago, Josh McAdams and I started <a href="http://www.effectiveperlprogramming.com"><i>The Effective Perler</i></a> as an extension of the second edition of <i>Effective Perl Programming</i>. Since then, roughly once a week, we added one meaty Item a week. Last month, we published our 100th Item. With the 120 Items in the book, that&#8217;s a lot of items. </p>
<p>I have a new plan for 2012 onward. It&#8217;s much harder to find topics now and it takes much longer to research and write them. I&#8217;ve exhausted all of the advice I have and all of the easy topics. When I think I have a good idea, I now know to search everything else I&#8217;ve already written. More than a couple of times I thought I had the next week&#8217;s idea, but it was already in the book or on the website.</p>
<p>The search for content has another problem: I don&#8217;t want to add Items for anything that&#8217;s already been written—not just by me, but by anyone. I don&#8217;t want to repeat content unless I have a different take on it and I can illuminate something new.</p>
<p>I&#8217;m not going to do weekly big Items anymore. I&#8217;ll try one a month, I think. I&#8217;ll see how it goes. </p>
<p>There&#8217;s still roomer for shorter content, such as the short demonstrations of new features, and ideas that don&#8217;t have have 1,000 words in them. There are also many interesting, although esoteric, features that I would probably never recommend for production code.</p>
<p>This doesn&#8217;t mean that I&#8217;m going to write less, though. If I do less for <i>The Effective Perler</i>, I can do more somewhere else. There another edition of <i>Intermediate Perl</i> that needs attention, the <a href="http://www.learning-perl.com">Learning Perl</a> website, or a few other things.</p>
<p>Many people have asked for Items about specific modules, but that&#8217;s not really the idea of  <i>The Effective Perler</i>. We want to teach people about core Perl and thinking in Perl. Modules are essentially all the same—they give you an interface and you do what the interface tells you to do. For the most part, they are just subroutines or method calls. There&#8217;s not much interesting there. You already know how to do that. The much more interesting advice is researching modules, but we put that in the book already.</p>
<p align="left"><a class="tt" href="http://twitter.com/home/?status=The+Effective+Perler+in+2012+and+beyond+http://tinyurl.com/86j6bas" title="Post to Twitter"><img class="nothumb" src="http://www.effectiveperlprogramming.com/wp-content/plugins/tweet-this/icons/tt-twitter2.png" alt="Post to Twitter" /></a> <a class="tt" href="http://twitter.com/home/?status=The+Effective+Perler+in+2012+and+beyond+http://tinyurl.com/86j6bas" title="Post to Twitter"> </a> <a class="tt" href="http://delicious.com/post?url=http://www.effectiveperlprogramming.com/blog/1476&amp;title=The+Effective+Perler+in+2012+and+beyond" title="Post to Delicious"><img class="nothumb" src="http://www.effectiveperlprogramming.com/wp-content/plugins/tweet-this/icons/tt-delicious.png" alt="Post to Delicious" /></a> <a class="tt" href="http://delicious.com/post?url=http://www.effectiveperlprogramming.com/blog/1476&amp;title=The+Effective+Perler+in+2012+and+beyond" title="Post to Delicious"> </a> <a class="tt" href="http://digg.com/submit?url=http://www.effectiveperlprogramming.com/blog/1476&amp;title=The+Effective+Perler+in+2012+and+beyond" title="Post to Digg"><img class="nothumb" src="http://www.effectiveperlprogramming.com/wp-content/plugins/tweet-this/icons/tt-digg.png" alt="Post to Digg" /></a> <a class="tt" href="http://digg.com/submit?url=http://www.effectiveperlprogramming.com/blog/1476&amp;title=The+Effective+Perler+in+2012+and+beyond" title="Post to Digg"> </a> <a class="tt" href="http://www.facebook.com/share.php?u=http://www.effectiveperlprogramming.com/blog/1476&amp;t=The+Effective+Perler+in+2012+and+beyond" title="Post to Facebook"><img class="nothumb" src="http://www.effectiveperlprogramming.com/wp-content/plugins/tweet-this/icons/tt-facebook.png" alt="Post to Facebook" /></a> <a class="tt" href="http://www.facebook.com/share.php?u=http://www.effectiveperlprogramming.com/blog/1476&amp;t=The+Effective+Perler+in+2012+and+beyond" title="Post to Facebook"> </a> <a class="tt" href="http://reddit.com/submit?url=http://www.effectiveperlprogramming.com/blog/1476&amp;title=The+Effective+Perler+in+2012+and+beyond" title="Post to Reddit"><img class="nothumb" src="http://www.effectiveperlprogramming.com/wp-content/plugins/tweet-this/icons/tt-reddit.png" alt="Post to Reddit" /></a> <a class="tt" href="http://reddit.com/submit?url=http://www.effectiveperlprogramming.com/blog/1476&amp;title=The+Effective+Perler+in+2012+and+beyond" title="Post to Reddit"> </a></p>]]></content:encoded>
			<wfw:commentRss>http://www.effectiveperlprogramming.com/blog/1476/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>A Chinese translation of Effective Perl Programming</title>
		<link>http://www.effectiveperlprogramming.com/blog/1357</link>
		<comments>http://www.effectiveperlprogramming.com/blog/1357#comments</comments>
		<pubDate>Tue, 27 Sep 2011 16:59:30 +0000</pubDate>
		<dc:creator>brian d foy</dc:creator>
				<category><![CDATA[publishing]]></category>
		<category><![CDATA[translations]]></category>

		<guid isPermaLink="false">http://www.effectiveperlprogramming.com/?p=1357</guid>
		<description><![CDATA[I mentioned a long time ago that a Chinese translation of Effective Perl Programming was in the works, and apparently it&#8217;s done. Someone sent me a copy of the Chinese version of the book. I can&#8217;t tell you who did it (if it&#8217;s you, let me know) and I don&#8217;t know where you can buy [...]]]></description>
			<content:encoded><![CDATA[<p><img src="http://www.effectiveperlprogramming.com/wp-content/uploads/chinese-cover.jpg" align="right"> I mentioned a long time ago that <A href="http://www.effectiveperlprogramming.com/blog/318">a Chinese translation of <i>Effective Perl Programming</i> was in the works</a>, and apparently it&#8217;s done. Someone sent me a copy of the Chinese version of the book. I can&#8217;t tell you who did it (if it&#8217;s you, let me know) and I don&#8217;t know where you can buy it (if you know, let me know). Also, I don&#8217;t know what I want to do with the copy that I have. I don&#8217;t read Chinese, so I can&#8217;t really read the book to see how well it translates, and I don&#8217;t want to keep the book as a trophy. Does someone else want the book? Is there a Chinese Perl event that would like to give it away as a prize? I&#8217;ll get Josh and I to sign it and send it along.</p>
<p align="left"><a class="tt" href="http://twitter.com/home/?status=A+Chinese+translation+of+Effective+Perl+Programming+http://tinyurl.com/3bxex83" title="Post to Twitter"><img class="nothumb" src="http://www.effectiveperlprogramming.com/wp-content/plugins/tweet-this/icons/tt-twitter2.png" alt="Post to Twitter" /></a> <a class="tt" href="http://twitter.com/home/?status=A+Chinese+translation+of+Effective+Perl+Programming+http://tinyurl.com/3bxex83" title="Post to Twitter"> </a> <a class="tt" href="http://delicious.com/post?url=http://www.effectiveperlprogramming.com/blog/1357&amp;title=A+Chinese+translation+of+Effective+Perl+Programming" title="Post to Delicious"><img class="nothumb" src="http://www.effectiveperlprogramming.com/wp-content/plugins/tweet-this/icons/tt-delicious.png" alt="Post to Delicious" /></a> <a class="tt" href="http://delicious.com/post?url=http://www.effectiveperlprogramming.com/blog/1357&amp;title=A+Chinese+translation+of+Effective+Perl+Programming" title="Post to Delicious"> </a> <a class="tt" href="http://digg.com/submit?url=http://www.effectiveperlprogramming.com/blog/1357&amp;title=A+Chinese+translation+of+Effective+Perl+Programming" title="Post to Digg"><img class="nothumb" src="http://www.effectiveperlprogramming.com/wp-content/plugins/tweet-this/icons/tt-digg.png" alt="Post to Digg" /></a> <a class="tt" href="http://digg.com/submit?url=http://www.effectiveperlprogramming.com/blog/1357&amp;title=A+Chinese+translation+of+Effective+Perl+Programming" title="Post to Digg"> </a> <a class="tt" href="http://www.facebook.com/share.php?u=http://www.effectiveperlprogramming.com/blog/1357&amp;t=A+Chinese+translation+of+Effective+Perl+Programming" title="Post to Facebook"><img class="nothumb" src="http://www.effectiveperlprogramming.com/wp-content/plugins/tweet-this/icons/tt-facebook.png" alt="Post to Facebook" /></a> <a class="tt" href="http://www.facebook.com/share.php?u=http://www.effectiveperlprogramming.com/blog/1357&amp;t=A+Chinese+translation+of+Effective+Perl+Programming" title="Post to Facebook"> </a> <a class="tt" href="http://reddit.com/submit?url=http://www.effectiveperlprogramming.com/blog/1357&amp;title=A+Chinese+translation+of+Effective+Perl+Programming" title="Post to Reddit"><img class="nothumb" src="http://www.effectiveperlprogramming.com/wp-content/plugins/tweet-this/icons/tt-reddit.png" alt="Post to Reddit" /></a> <a class="tt" href="http://reddit.com/submit?url=http://www.effectiveperlprogramming.com/blog/1357&amp;title=A+Chinese+translation+of+Effective+Perl+Programming" title="Post to Reddit"> </a></p>]]></content:encoded>
			<wfw:commentRss>http://www.effectiveperlprogramming.com/blog/1357/feed</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Some special Unicode shell aliases to normalize strings</title>
		<link>http://www.effectiveperlprogramming.com/blog/1232</link>
		<comments>http://www.effectiveperlprogramming.com/blog/1232#comments</comments>
		<pubDate>Wed, 20 Jul 2011 07:59:57 +0000</pubDate>
		<dc:creator>brian d foy</dc:creator>
				<category><![CDATA[Unicode]]></category>
		<category><![CDATA[midweek bonus item]]></category>

		<guid isPermaLink="false">http://www.effectiveperlprogramming.com/?p=1232</guid>
		<description><![CDATA[If you are playing with Unicode, you&#8217;re probably going to want to convert to the various normalization forms. There are some programs to do this in the Unicode::Tussle distribution, but you can also create some one-liners to do this as well (Item 120. Use Perl one-liners to create mini programs). If you want to read [...]]]></description>
			<content:encoded><![CDATA[<p>If you are playing with Unicode, you&#8217;re probably going to want to convert to the various normalization forms. There are some programs to do this in the <a href="http://search.cpan.org/dist/Unicode-Tussle">Unicode::Tussle</a> distribution, but you can also create some one-liners to do this as well (<span class="item">Item 120. Use Perl one-liners to create mini programs</span>).</p>
<p>If you want to read and write lines, you can use the <code>-n</code> switch to wrap a <code>while</code> loop around your tiny program. In this case, those tiny programs just call a normalization function from <code>Unicode::Normalize</code>. Here are the <i>bash</i> aliases:</p>
<pre class="brush:plain">
alias nfc="perl5.14.1 -MUnicode::Normalize -CS -ne 'print NFC(\$_)'"
alias nfd="perl5.14.1 -MUnicode::Normalize -CS -ne 'print NFD(\$_)'"
alias nfkd="perl5.14.1 -MUnicode::Normalize -CS -ne 'print NFKC(\$_)'"
</pre>
<p>You can run these as if they were programs with those names. Here you convert those ligature characters, <i>ﬁ</i> (U+FB01) and <i> ﬂ </i> (U+FB02), to their compatible, two-character forms <i>fi</i> and <i>fl</i> as it reads standard input:</p>
<pre class="brush:plain">
$ nfkd
Let's ﬁnd that ﬂying squirrel!
Let's find that flying squirrel!
</pre>
<p>If you wanted to do it with command line arguments as strings instead of files, it&#8217;s a couple small changes. You can add the <code>A</code> flag to the <code>-C</code> switch to interpret the command-line arguments as UTF-8 (unless you want to decode it yourself), and use <code>say</code> to add the newline in the output:</p>
<pre class="brush:plain">
alias nfc="perl5.14.1 -MUnicode::Normalize -CSA -E 'say NFC( qq(@ARGV) )'"
alias nfd="perl5.14.1 -MUnicode::Normalize -CSA -E 'say NFD( qq(@ARGV) )'"
alias nfkd="perl5.14.1 -MUnicode::Normalize -CSA -E 'say NFKC( qq(@ARGV) )'"
</pre>
<p>The output decomposes the ligatures just as before:</p>
<pre class="brush:plain">
nfkd "Let's ﬁnd that ﬂying squirrel."
Let's find that flying squirrel.
</pre>
<p>You can read more about these program features in <span class="item">Item 73. Tell Perl which encoding to use</span> and <span class="item">Item 77. Work with graphemes instead of characters</span>.</p>
<p align="left"><a class="tt" href="http://twitter.com/home/?status=Some+special+Unicode+shell+aliases+to+normalize+strings+http://tinyurl.com/3lkg2bg" title="Post to Twitter"><img class="nothumb" src="http://www.effectiveperlprogramming.com/wp-content/plugins/tweet-this/icons/tt-twitter2.png" alt="Post to Twitter" /></a> <a class="tt" href="http://twitter.com/home/?status=Some+special+Unicode+shell+aliases+to+normalize+strings+http://tinyurl.com/3lkg2bg" title="Post to Twitter"> </a> <a class="tt" href="http://delicious.com/post?url=http://www.effectiveperlprogramming.com/blog/1232&amp;title=Some+special+Unicode+shell+aliases+to+normalize+strings" title="Post to Delicious"><img class="nothumb" src="http://www.effectiveperlprogramming.com/wp-content/plugins/tweet-this/icons/tt-delicious.png" alt="Post to Delicious" /></a> <a class="tt" href="http://delicious.com/post?url=http://www.effectiveperlprogramming.com/blog/1232&amp;title=Some+special+Unicode+shell+aliases+to+normalize+strings" title="Post to Delicious"> </a> <a class="tt" href="http://digg.com/submit?url=http://www.effectiveperlprogramming.com/blog/1232&amp;title=Some+special+Unicode+shell+aliases+to+normalize+strings" title="Post to Digg"><img class="nothumb" src="http://www.effectiveperlprogramming.com/wp-content/plugins/tweet-this/icons/tt-digg.png" alt="Post to Digg" /></a> <a class="tt" href="http://digg.com/submit?url=http://www.effectiveperlprogramming.com/blog/1232&amp;title=Some+special+Unicode+shell+aliases+to+normalize+strings" title="Post to Digg"> </a> <a class="tt" href="http://www.facebook.com/share.php?u=http://www.effectiveperlprogramming.com/blog/1232&amp;t=Some+special+Unicode+shell+aliases+to+normalize+strings" title="Post to Facebook"><img class="nothumb" src="http://www.effectiveperlprogramming.com/wp-content/plugins/tweet-this/icons/tt-facebook.png" alt="Post to Facebook" /></a> <a class="tt" href="http://www.facebook.com/share.php?u=http://www.effectiveperlprogramming.com/blog/1232&amp;t=Some+special+Unicode+shell+aliases+to+normalize+strings" title="Post to Facebook"> </a> <a class="tt" href="http://reddit.com/submit?url=http://www.effectiveperlprogramming.com/blog/1232&amp;title=Some+special+Unicode+shell+aliases+to+normalize+strings" title="Post to Reddit"><img class="nothumb" src="http://www.effectiveperlprogramming.com/wp-content/plugins/tweet-this/icons/tt-reddit.png" alt="Post to Reddit" /></a> <a class="tt" href="http://reddit.com/submit?url=http://www.effectiveperlprogramming.com/blog/1232&amp;title=Some+special+Unicode+shell+aliases+to+normalize+strings" title="Post to Reddit"> </a></p>]]></content:encoded>
			<wfw:commentRss>http://www.effectiveperlprogramming.com/blog/1232/feed</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Find dates with Regexp::Common</title>
		<link>http://www.effectiveperlprogramming.com/blog/1002</link>
		<comments>http://www.effectiveperlprogramming.com/blog/1002#comments</comments>
		<pubDate>Wed, 16 Feb 2011 20:20:08 +0000</pubDate>
		<dc:creator>brian d foy</dc:creator>
				<category><![CDATA[midweek bonus item]]></category>
		<category><![CDATA[regular expressions]]></category>

		<guid isPermaLink="false">http://www.effectiveperlprogramming.com/?p=1002</guid>
		<description><![CDATA[[This is a mid-week bonus item] Suppose you want to find some dates inside a big string. The problem with dates is that there are some many ways to write them, and even if you can come up with a pattern to get the structure right, can you handle the different locales and languages that [...]]]></description>
			<content:encoded><![CDATA[<p>[<i>This is a mid-week bonus item</i>]</p>
<p>Suppose you want to find some dates inside a big string. The problem with dates is that there are some many ways to write them, and even if you can come up with a pattern to get the structure right, can you handle the different locales and languages that use different words to refer to the same day or month?</p>
<p>In <span class="item">Item 42. Don&#8217;t reinvent the regex</span>, you saw the <a class="external cpan" href="http://search.cpan.org/dist/Regexp-Common">Regexp::Common</a> module. It creates the regular expressions that many people often get wrong because they miss some subtle part of the pattern.</p>
<p><a class="external cpan" href="http://search.cpan.org/dist/Regexp-Common-time">Regexp::Common::time</a>&#8216;s date handling is quite amazing though. It&#8217;s a plugin, so you need to install it separately. Instead of specifying a regular expression, you can use the <code>-pat</code> option to specify the <i>structure</i> of the date, using a string much like that for <code>strftime</code>, although with some regular expression bits added. From the semi-pattern, it constructs a much more complicated pattern that does the right thing. Since the module gives you a regex object, you can print it to see the pattern:</p>
<p>In this example, you extract the </p>
<pre class="brush:perl">
use Regexp::Common qw(time);

my @lines = `ls -l`;

# May  3  2010
# Jan 17 18:21
$date_re = $RE{time}{strftime}{
	-pat => '%b\s+%_d\s+(?:%Y|%_H:%M)'
	};

print "Pattern is------\n$date_re\n-------\n";
</pre>
<p>This pattern reflects the national representation for the en_US locale:</p>
<pre class="brush:plain">
Pattern is------
(?=[SAFOJNMD])(?&gt;Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)\s+(?:0[1-9]|[12]\d|3[01]|(?&lt;!\d)[1-9])\s+(?:\d{4}|(?:(?=\d)(?:[01]\d|2[0123]|(?&lt;!\d)\d)):(?:[0-5]\d))
-------
</pre>
<p>You can change your locale, in this case, to tr_TR for Turkish, to get a different pattern that has the same structure, although I don&#8217;t know if the Turks write their dates like this:</p>
<pre class="brush:plain">
Pattern is------
(?=[AOTNKEHM\Å])(?>Oca|\Å\ub|Mar|Nis|May|Haz|Tem|A\Ä\u|Eyl|Eki|Kas|Ara)\s+(?:0[1-9]|[12]\d|3[01]|(?&lt;!\d)[1-9])\s+(?:\d{4}|(?:(?=\d)(?:[01]\d|2[0123]|(?&lt;!\d)\d)):(?:[0-5]\d))
-------
</pre>
<p>You can now use this pattern to match dates in text. Here&#8217;s a program that takes in a line and puts <code class="string">^</code> characters under the parts it thinks are dates:</p>
<pre class="brush:perl">
use Regexp::Common qw(time);

my @lines = `ls -l`;

# May  3  2010
# Jan 17 18:21
$date_re = $RE{time}{strftime}{
	-pat => '%b\s+%_d\s+(?:%Y|%_H:%M)'
	};

while( defined( my $line = &lt;> ) {
	next unless $line =~ /$date_re/;
	my $start = $-[0];
	my $stop  = $+[0];

	my $underline = ( ' ' x $-[0] ) . ( '^' x ($stop - $start) );

	print $line;
	print $underline, "\n\n";
	}
</pre>
<p>You can test this by piping some output into this program. Here&#8217;s an extract of output from the Unix <code class="binary">ls</code> command. Notice that the first date has a time instead of a year, but you still find it:</p>
<pre class="brush:plain">
$ ls -l /usr/local/perls/perl-5.10.1/lib/site_perl/5.10.1 | perl date_finder.pl
drwxr-xr-x   4 brian  wheel    136 Dec  9 01:58 Acme
                                   ^^^^^^^^^^^^

-r--r--r--   1 brian  wheel  32517 Jul  6  2007 AppConfig.pm
                                   ^^^^^^^^^^^^

-r--r--r--   1 brian  wheel  54725 Jul 19  2007 Expect.pm
                                   ^^^^^^^^^^^^

-r--r--r--   1 brian  wheel  43735 Jul 19  2007 Expect.pod
                                   ^^^^^^^^^^^^

drwxr-xr-x   3 brian  wheel    102 May 16  2010 ExtUtils
                                   ^^^^^^^^^^^^

drwxr-xr-x   3 brian  wheel    102 Jun 17  2010 local
                                   ^^^^^^^^^^^^

-r--r--r--   1 brian  wheel   9137 Jun 15  2009 lwpcook.pod
                                   ^^^^^^^^^^^^

-r--r--r--   1 brian  wheel  25447 Jun 15  2009 lwptut.pod
                                   ^^^^^^^^^^^^

drwxr-xr-x   4 brian  wheel    136 May 28  2010 namespace
                                   ^^^^^^^^^^^^

-r--r--r--   1 brian  wheel   1931 Sep 22  2009 oose.pm
                                   ^^^^^^^^^^^^
</pre>
<p>Notice that this would be hard to do with <code class="builtin">split</code> if you run into filenames that have spaces. You can&#8217;t depend on fixed column widths because the file sizes can move things around. It turns out to be pretty annoying.</p>
<p align="left"><a class="tt" href="http://twitter.com/home/?status=Find+dates+with+Regexp%3A%3ACommon+http://tinyurl.com/6gon4ea" title="Post to Twitter"><img class="nothumb" src="http://www.effectiveperlprogramming.com/wp-content/plugins/tweet-this/icons/tt-twitter2.png" alt="Post to Twitter" /></a> <a class="tt" href="http://twitter.com/home/?status=Find+dates+with+Regexp%3A%3ACommon+http://tinyurl.com/6gon4ea" title="Post to Twitter"> </a> <a class="tt" href="http://delicious.com/post?url=http://www.effectiveperlprogramming.com/blog/1002&amp;title=Find+dates+with+Regexp%3A%3ACommon" title="Post to Delicious"><img class="nothumb" src="http://www.effectiveperlprogramming.com/wp-content/plugins/tweet-this/icons/tt-delicious.png" alt="Post to Delicious" /></a> <a class="tt" href="http://delicious.com/post?url=http://www.effectiveperlprogramming.com/blog/1002&amp;title=Find+dates+with+Regexp%3A%3ACommon" title="Post to Delicious"> </a> <a class="tt" href="http://digg.com/submit?url=http://www.effectiveperlprogramming.com/blog/1002&amp;title=Find+dates+with+Regexp%3A%3ACommon" title="Post to Digg"><img class="nothumb" src="http://www.effectiveperlprogramming.com/wp-content/plugins/tweet-this/icons/tt-digg.png" alt="Post to Digg" /></a> <a class="tt" href="http://digg.com/submit?url=http://www.effectiveperlprogramming.com/blog/1002&amp;title=Find+dates+with+Regexp%3A%3ACommon" title="Post to Digg"> </a> <a class="tt" href="http://www.facebook.com/share.php?u=http://www.effectiveperlprogramming.com/blog/1002&amp;t=Find+dates+with+Regexp%3A%3ACommon" title="Post to Facebook"><img class="nothumb" src="http://www.effectiveperlprogramming.com/wp-content/plugins/tweet-this/icons/tt-facebook.png" alt="Post to Facebook" /></a> <a class="tt" href="http://www.facebook.com/share.php?u=http://www.effectiveperlprogramming.com/blog/1002&amp;t=Find+dates+with+Regexp%3A%3ACommon" title="Post to Facebook"> </a> <a class="tt" href="http://reddit.com/submit?url=http://www.effectiveperlprogramming.com/blog/1002&amp;title=Find+dates+with+Regexp%3A%3ACommon" title="Post to Reddit"><img class="nothumb" src="http://www.effectiveperlprogramming.com/wp-content/plugins/tweet-this/icons/tt-reddit.png" alt="Post to Reddit" /></a> <a class="tt" href="http://reddit.com/submit?url=http://www.effectiveperlprogramming.com/blog/1002&amp;title=Find+dates+with+Regexp%3A%3ACommon" title="Post to Reddit"> </a></p>]]></content:encoded>
			<wfw:commentRss>http://www.effectiveperlprogramming.com/blog/1002/feed</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Use Regexp::Common to find locale-specific dates</title>
		<link>http://www.effectiveperlprogramming.com/blog/1003</link>
		<comments>http://www.effectiveperlprogramming.com/blog/1003#comments</comments>
		<pubDate>Thu, 20 Jan 2011 07:07:39 +0000</pubDate>
		<dc:creator>brian d foy</dc:creator>
				<category><![CDATA[midweek bonus item]]></category>
		<category><![CDATA[regular expressions]]></category>

		<guid isPermaLink="false">http://www.effectiveperlprogramming.com/?p=1003</guid>
		<description><![CDATA[[This is a mid-week bonus item, and it's a bit of a departure from much of what you have already seen on this blog. This is just some code that I had to write this week and I thought you'd like to see it.] I had to find some dates inside a big string, and [...]]]></description>
			<content:encoded><![CDATA[<p>[<i>This is a mid-week bonus item, and it's a bit of a departure from much of what you have already seen on this blog. This is just some code that I had to write this week and I thought you'd like to see it.</i>]</p>
<p>I had to find some dates inside a big string, and the problem with dates is that there are some many ways to write them, and even if I get the format right, some of the machines might use another locale. My string comes from an <code class="binary">ls</code> I run as a remote command, which might show the date in one of two formats. The files changed in the last six months replaces the year with the time:</p>
<pre class="brush:plain">
$ ls -l
total 7400
-rw-r--r--@  1 brian  staff      433 Jun 22  2010 Makefile
-rw-r--r--@  1 brian  staff   107721 Jan 19 09:08 appa.xml
-rw-rw-r--@  1 brian  staff    76873 Jan 19 00:18 appb.xml
-rw-rw-r--   1 brian  staff     1802 Jan 14 21:17 book.xml
-rw-rw-r--   1 brian  staff  2457812 Jul 21  2010 book.xml.pdf
-rw-rw-r--   1 brian  staff     4360 Jul 21  2010 bookinfo.xml
-rw-r--r--@  1 brian  staff    25626 Jan 19 09:07 ch00.xml
</pre>
<p>Here&#8217;s the program I wrote to figure out which parts of that string is the dates, using <a class="external cpan" href="http://search.cpan.org/dist/Regexp::Common">Regexp::Common</a> (<span class="item">Item 42. Don’t reinvent the regex</span>):</p>
<pre class="brush:perl">
use Regexp::Common qw(time);

my @lines = `ls -l`;

# May  3  2010
# Jan 17 18:21
$date_re = $RE{time}{strftime}{
	-pat => '%b\s+%_d\s+(?:%Y|%_H:%M)'
	};

foreach my $line ( @lines ) {
	next unless $line =~ /$date_re/;
	my $start = $-[0];
	my $stop  = $+[0];

	my $underline = ( ' ' x $-[0] ) . ( '^' x ($stop - $start) );

	print $line;
	print $underline, "\n\n";
	}
</pre>
<p>That regex is more sophisticated than it looks. I didn&#8217;t have to do anything to deal with month names and abbreviations, but the module will figure it out for me based on the locale of the machine on which I run the command. The regex changes depending on the language that I decide to use:</p>
<pre class="brush:plain">
$ LC_ALL=tr_TR perl -MRegexp::Common=time -le 'print $RE{time}{strftime}{-pat=>"%b"}'
(?:(?=[AOTNKEHM\Å])(?>Oca|\Å\ub|Mar|Nis|May|Haz|Tem|A\Ä\u|Eyl|Eki|Kas|Ara))
</pre>
<pre class="brush:plain">
$ LC_ALL=en_US.UTF-8 perl -MRegexp::Common=time -le 'print $RE{time}{strftime}{-pat=>"%b"}'
(?:(?=[SAFOJNMD])(?>Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec))
</pre>
<pre class="brush:plain">
$ LC_ALL=es_ES.UTF-8 perl -MRegexp::Common=time -le 'print $RE{time}{strftime}{-pat=>"%b"}'
(?:(?=[enadmsjfo])(?>ene|feb|mar|abr|may|jun|jul|ago|sep|oct|nov|dic))
</pre>
<p>Part of my code demonstrated that it found the date part of the string by underlining what it thought the date portion was. That&#8217;s all the fooling around with the <code class="special variable">@-</code> and <code class="special variable">@+</code> special variables. Those are the string locations for the start and end positions of the various capture buffers. The numbers in index 0 applies to <code class="special variable">$&#038;</code>, the index 1 applies to <code class="special variable">$1</code>, and so on:</p>
<pre class="brush:plain">
-rw-r--r--@  1 brian  staff      433 Jun 22  2010 Makefile
                                     ^^^^^^^^^^^^

-rw-r--r--@  1 brian  staff   107721 Jan 19 09:08 appa.xml
                                     ^^^^^^^^^^^^

-rw-rw-r--@  1 brian  staff    76873 Jan 19 00:18 appb.xml
                                     ^^^^^^^^^^^^

-rw-rw-r--   1 brian  staff     1802 Jan 14 21:17 book.xml
                                     ^^^^^^^^^^^^

-rw-rw-r--   1 brian  staff  2457812 Jul 21  2010 book.xml.pdf
                                     ^^^^^^^^^^^^

-rw-rw-r--   1 brian  staff     4360 Jul 21  2010 bookinfo.xml
                                     ^^^^^^^^^^^^

-rw-r--r--@  1 brian  staff    25626 Jan 19 09:07 ch00.xml
                                     ^^^^^^^^^^^^
</pre>
<p>This code also has to work on systems with very ancient versions of <code class="binary">ls</code>. There are some switches that could have made this code much easier, especially if I can make the date column the epoch time instead do it&#8217;s not a combination of whitespace-separated fields itself.</p>
<ul>
<li>The <code>-T</code> switch on Mac OS X and FreeBSD displays all dates in the same format, even for the recently changed ones.
<li>Linux versions might have the <code>--time-style</code>.
<li>FreeBSD has the <code>-D</code> switch to specify the date format.
</ul>
<p>I&#8217;d much rather use <code class="binary">perl</code>, but the equivalent is much uglier even though I can choose my field separator. Perl is ultra-portable and available in most places, but I have to do more work on a one-liner:</p>
<pre class="brush:plain">
$ perl -le 'for(glob(q|*|)){print join qq|\t|, stat(), $_}'
</pre>
<p>However, this causes headaches later when I need to run this as a remote command and I still have to process the results to turn the data into human-readable output. The <code class="binary">ls -l</code> is much nicer without requiring more work than I&#8217;d do normally.</p>
<p>And, as a bonus to this bonus, I discovered that <a class="external" href="http://search.cpan.org/dist/Date-Parse">Date::Parse</a> is smart enough to deal with a date like <code class="string">Dec 31 12:34</code>. It realizes that it was last December, not the one from the current year. I can feed both formats into that module and still have the dates sort correctly.</p>
<p align="left"><a class="tt" href="http://twitter.com/home/?status=Use+Regexp%3A%3ACommon+to+find+locale-specific+dates+http://tinyurl.com/7aj5zdo" title="Post to Twitter"><img class="nothumb" src="http://www.effectiveperlprogramming.com/wp-content/plugins/tweet-this/icons/tt-twitter2.png" alt="Post to Twitter" /></a> <a class="tt" href="http://twitter.com/home/?status=Use+Regexp%3A%3ACommon+to+find+locale-specific+dates+http://tinyurl.com/7aj5zdo" title="Post to Twitter"> </a> <a class="tt" href="http://delicious.com/post?url=http://www.effectiveperlprogramming.com/blog/1003&amp;title=Use+Regexp%3A%3ACommon+to+find+locale-specific+dates" title="Post to Delicious"><img class="nothumb" src="http://www.effectiveperlprogramming.com/wp-content/plugins/tweet-this/icons/tt-delicious.png" alt="Post to Delicious" /></a> <a class="tt" href="http://delicious.com/post?url=http://www.effectiveperlprogramming.com/blog/1003&amp;title=Use+Regexp%3A%3ACommon+to+find+locale-specific+dates" title="Post to Delicious"> </a> <a class="tt" href="http://digg.com/submit?url=http://www.effectiveperlprogramming.com/blog/1003&amp;title=Use+Regexp%3A%3ACommon+to+find+locale-specific+dates" title="Post to Digg"><img class="nothumb" src="http://www.effectiveperlprogramming.com/wp-content/plugins/tweet-this/icons/tt-digg.png" alt="Post to Digg" /></a> <a class="tt" href="http://digg.com/submit?url=http://www.effectiveperlprogramming.com/blog/1003&amp;title=Use+Regexp%3A%3ACommon+to+find+locale-specific+dates" title="Post to Digg"> </a> <a class="tt" href="http://www.facebook.com/share.php?u=http://www.effectiveperlprogramming.com/blog/1003&amp;t=Use+Regexp%3A%3ACommon+to+find+locale-specific+dates" title="Post to Facebook"><img class="nothumb" src="http://www.effectiveperlprogramming.com/wp-content/plugins/tweet-this/icons/tt-facebook.png" alt="Post to Facebook" /></a> <a class="tt" href="http://www.facebook.com/share.php?u=http://www.effectiveperlprogramming.com/blog/1003&amp;t=Use+Regexp%3A%3ACommon+to+find+locale-specific+dates" title="Post to Facebook"> </a> <a class="tt" href="http://reddit.com/submit?url=http://www.effectiveperlprogramming.com/blog/1003&amp;title=Use+Regexp%3A%3ACommon+to+find+locale-specific+dates" title="Post to Reddit"><img class="nothumb" src="http://www.effectiveperlprogramming.com/wp-content/plugins/tweet-this/icons/tt-reddit.png" alt="Post to Reddit" /></a> <a class="tt" href="http://reddit.com/submit?url=http://www.effectiveperlprogramming.com/blog/1003&amp;title=Use+Regexp%3A%3ACommon+to+find+locale-specific+dates" title="Post to Reddit"> </a></p>]]></content:encoded>
			<wfw:commentRss>http://www.effectiveperlprogramming.com/blog/1003/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

