Count the number of times a character occurs in a string

This Item isn’t really about counting characters in a string, but we thought we’d expand on an Item in the original Effective Perl blog that Joseph set up to support the first edition of Effective Perl Programming. He had an Item titled “Counting the Number of Times a Character Occurs in a String”. We won’t reproduce it here, so you should read his version too.

An interesting exercise for a TMTOWTDI language such as Perl is to figure out all the ways that you can complete a task. Since Perl as a language thinks about information and data instead of storage and architecture, there are many ways that you organize a problem to get to the same solution. It’s a bit like asking a New Yorker how to get from the Upper West Side to Katz’s Deli (Send a salami to your boy in the Army): you can get several answers. Put the New York Perl Mongers on the task and watch the debate. It’s almost as fun as watching (but not participating in) vi versus emacs.

Perl has a few ways to do it (many languages do too, but no one cares) because there are actually several different problems you could be solving. A good answer takes into account the level of problem that you are trying to solve. The next time that you’re having a TMTOWTDI contest, don’t only come up with the different techniques, but rate them according to their generality.

Counting characters is a very specific version of counting patterns in a string:

  1. Count overlapping pattern matches
  2. Count non-overlapping pattern matches
  3. Count non-overlapping fixed substring matches
  4. Count characters

The more general you get, the more work you have to do, and you might use different techniques. Suppose, for instance, you use the match operator to count the sum of the occurences of the letters c, a, and t:

my $count = () = $string =~ /c|a|t/g;

You’re really coding (2) to count non-overlapping pattern matches, even if that pattern is just an alternation that looks for single characters. By changing the pattern, but not the code, you can also solve more general problems:

my $count = () = $string =~ /Buster|Mimi/g;

You might also use split to break up the string then store the counts in a hash:

use List::Util qw(sum);
foreach my $char ( split //, $string ) { $counts{$char}++ }
my $count = sum( @counts{ qw(c a t) } );

This can be handy when you need to check a particular string several times. You count everything once, cache the results, and merely lookup the answer when you need it. Note, now, that this solution restricts you from solving the more general problems because it uses a technique that assumes some specificity in (4). Maybe that matters to you, maybe it doesn’t.

You might also use the transliteration operator. If you really only ever wanted to count characters, tr is going to be very, very fast. It is even optimized to go faster when it knows it doesn’t have to actually transliterate. It’s a bit better than the regex because you don’t have to use list assignment to a empty list to get the count of the elements in the righthand list (Item 9. Know the difference between lists and arrays):

my $count = $string =~ tr/cat//;

This technique is the most restrictive because it can only ever do characters. If you wanted to solve a more general task, you’d have to completely replace this code. That doesn’t make it a wrong (or right) way to do it. You can only judge that based on the context. In any case, the proper programming is to hid these decisions behind an interface so the rest of the code doesn’t have to know how you did it:

sub count_characters { ... }

Some people don’t like the TMTOWTDI aspect of Perl because they focus on the “There’s More Than One Way” part, instead of the “Common things should be easy part”. That is, Larry Wall made an affirmative decision that in the case of likely, specific tasks, Perl wouldn’t make you do the work to solve the most general case. You can work at the level of generality that makes sense for the problem. If you think about it that way, maybe you’d change your mind. Is there anyone who really likes the style guides and internal policies that make everyone do every task in the same way no matter what you are doing? Should you try to program in C without pointers because it doesn’t make sense for just some problems (even if they are more social than technical)?

Sure, with Perl you have to learn a little bit more to make your life easier, but you only have to learn it once to get its benefit for the rest of your life. Well, maybe you have to learn it twice if you have a really bad motorcycle accident and brain transplant. Just leave yourself good notes.

One thought on “Count the number of times a character occurs in a string”

  1. Thanks! I just used this to help someone with “counting gaps” (“-” characters) in protein sequence alignments.

Comments are closed.