Find the new emojis in Perl’s Unicode support

Perl v5.26 updates itself to Unicode 9. That’s not normally exciting news but people have been pretty enthusiastic about the 72 new emojis that come. As far as Perl cares, they are just valid code points like all of the other ones.

I went to Emojipedia to see what I could do. I’d already used the ๐Ÿฆ‹ in Learning Perl 6, although ๐Ÿฆ– would be more appropriate here. I was curious where these new emojis are in the UCD.

First, how many new characters show up in Unicode 9? You could look this up but it’s easier and more fun to do it in Perl.

my $count = 0;
foreach ( 0 .. 0x10FFFF ) {
	my $char = chr;
	next if $char =~ /\p{Present_In: 8.0}/;
	next if $char =~ /\p{Unassigned}/;
	$count++;
	}
say "There are $count new chars";

This skips anything that was present in the previous version (8.0) and then any code points that are not assigned. This gives exactly 7,500:

There are 7500 new chars

I could go check what those other 7,428 characters? You could figure that out just as quickly. But who cares? They aren’t emojis.

The 72 new emojis are listed on Emojipedia, so you can use Mojo::UserAgent to grab those. Since these are certainly wide characters you have to do some extra setup (see the “Unicode Primer” chapter in Learning Perl or the various perluni* docs):

use v5.26;
use utf8;
use strict;
use warnings;
use open qw(:std :utf8);
use charnames qw();

use Mojo::UserAgent;
my $ua = Mojo::UserAgent->new;

my $url = 'https://blog.emojipedia.org/new-unicode-9-emojis/';
my $tx = $ua->get( $url );

die "That didn't work!\n" unless $tx->success;

say $tx->result
	->dom
	->find( 'ul:not( [class] ) li a' )
	->map( 'text' )
	->map( sub {
		my $c = substr $_, 0, 1;
		[ $c, ord($c), charnames::viacode( ord($c) ) ]
		})
	->sort( sub { $a->[1] <=> $b->[1] } )
	->map( sub {
		sprintf '%s (U+%05X) %s', $_->@*
		} )
	->join( "\n" );

From this I see the list of new emoji are not contiguous:

๐Ÿ•บ (U+1F57A) MAN DANCING
๐Ÿ–ค (U+1F5A4) BLACK HEART
๐Ÿ›‘ (U+1F6D1) OCTAGONAL SIGN
๐Ÿ›’ (U+1F6D2) SHOPPING TROLLEY
๐Ÿ›ด (U+1F6F4) SCOOTER
๐Ÿ›ต (U+1F6F5) MOTOR SCOOTER
๐Ÿ›ถ (U+1F6F6) CANOE
๐Ÿค™ (U+1F919) CALL ME HAND
๐Ÿคš (U+1F91A) RAISED BACK OF HAND
๐Ÿค› (U+1F91B) LEFT-FACING FIST
๐Ÿคœ (U+1F91C) RIGHT-FACING FIST
๐Ÿค (U+1F91D) HANDSHAKE
๐Ÿคž (U+1F91E) HAND WITH INDEX AND MIDDLE FINGERS CROSSED
๐Ÿค  (U+1F920) FACE WITH COWBOY HAT
๐Ÿคก (U+1F921) CLOWN FACE
๐Ÿคข (U+1F922) NAUSEATED FACE
๐Ÿคฃ (U+1F923) ROLLING ON THE FLOOR LAUGHING
๐Ÿคค (U+1F924) DROOLING FACE
๐Ÿคฅ (U+1F925) LYING FACE
๐Ÿคฆ (U+1F926) FACE PALM
๐Ÿคง (U+1F927) SNEEZING FACE
๐Ÿคฐ (U+1F930) PREGNANT WOMAN
๐Ÿคณ (U+1F933) SELFIE
๐Ÿคด (U+1F934) PRINCE
๐Ÿคต (U+1F935) MAN IN TUXEDO
๐Ÿคถ (U+1F936) MOTHER CHRISTMAS
๐Ÿคท (U+1F937) SHRUG
๐Ÿคธ (U+1F938) PERSON DOING CARTWHEEL
๐Ÿคน (U+1F939) JUGGLING
๐Ÿคบ (U+1F93A) FENCER
๐Ÿคผ (U+1F93C) WRESTLERS
๐Ÿคฝ (U+1F93D) WATER POLO
๐Ÿคพ (U+1F93E) HANDBALL
๐Ÿฅ€ (U+1F940) WILTED FLOWER
๐Ÿฅ (U+1F941) DRUM WITH DRUMSTICKS
๐Ÿฅ‚ (U+1F942) CLINKING GLASSES
๐Ÿฅƒ (U+1F943) TUMBLER GLASS
๐Ÿฅ„ (U+1F944) SPOON
๐Ÿฅ… (U+1F945) GOAL NET
๐Ÿฅ‡ (U+1F947) FIRST PLACE MEDAL
๐Ÿฅˆ (U+1F948) SECOND PLACE MEDAL
๐Ÿฅ‰ (U+1F949) THIRD PLACE MEDAL
๐ŸฅŠ (U+1F94A) BOXING GLOVE
๐Ÿฅ‹ (U+1F94B) MARTIAL ARTS UNIFORM
๐Ÿฅ (U+1F950) CROISSANT
๐Ÿฅ‘ (U+1F951) AVOCADO
๐Ÿฅ’ (U+1F952) CUCUMBER
๐Ÿฅ“ (U+1F953) BACON
๐Ÿฅ” (U+1F954) POTATO
๐Ÿฅ• (U+1F955) CARROT
๐Ÿฅ– (U+1F956) BAGUETTE BREAD
๐Ÿฅ— (U+1F957) GREEN SALAD
๐Ÿฅ˜ (U+1F958) SHALLOW PAN OF FOOD
๐Ÿฅ™ (U+1F959) STUFFED FLATBREAD
๐Ÿฅš (U+1F95A) EGG
๐Ÿฅ› (U+1F95B) GLASS OF MILK
๐Ÿฅœ (U+1F95C) PEANUTS
๐Ÿฅ (U+1F95D) KIWIFRUIT
๐Ÿฅž (U+1F95E) PANCAKES
๐Ÿฆ… (U+1F985) EAGLE
๐Ÿฆ† (U+1F986) DUCK
๐Ÿฆ‡ (U+1F987) BAT
๐Ÿฆˆ (U+1F988) SHARK
๐Ÿฆ‰ (U+1F989) OWL
๐ŸฆŠ (U+1F98A) FOX FACE
๐Ÿฆ‹ (U+1F98B) BUTTERFLY
๐ŸฆŒ (U+1F98C) DEER
๐Ÿฆ (U+1F98D) GORILLA
๐ŸฆŽ (U+1F98E) LIZARD
๐Ÿฆ (U+1F98F) RHINOCEROS
๐Ÿฆ (U+1F990) SHRIMP
๐Ÿฆ‘ (U+1F991) SQUID

Things to remember

  • v5.26 updates to Unicode 9 with 7,500 new characters.
  • You can check Unicode version of characters with the Present_In, In, or Age properties.
  • ord gets you the code number and charnames::viacode can use the code number to get you the code name.
Leave a comment

4 Comments.

  1. I have also enjoyed exploring the universe of Unicode characters and emoji. I built this browser based search tool (which does have a Perl component for generating the data file): https://www.mclean.net.nz/ucf/

    Using this tool you can search for a character by matching keywords in the description; or paste in a character to find out more details. Then explore the other characters around that one in the code chart. You can bookmark a specific character or search term and store your favourites in the scratchpad. Use the ‘Help’ button to get started.

  2. Matthew Persico

    Question: Are there official translations for the Emoji names or is English the only namespace supported? Can a Spanish programmer specify \N{CORAZON_NEGRO} or is she stuck typing \N{BLACK HEART}?

  3. Matthew, for Unicode itself, the names are only English. See sec. 4.8 of http://www.unicode.org/versions/Unicode11.0.0/ch04.pdf — “The character names in the Unicode Standard are identical to those of the English-language edition of ISO/IEC 10646.” I don’t know about other languages of 10646.

  4. perldoc charnames

    gives instructions for creating new aliases for code points, so that a Spanish (or whatever language) programmer could create a file of whatever names they want.

Leave a Reply


[ Ctrl + Enter ]