Parsing JSON In Perl By Example – SouthParkStudios.com South Park Episodes
Updated: May 23rd, 2009
In this tutorial, I'll show you how to parse JSON using Perl. As a fun example, I'll use the new SouthParkStudios.com site released earlier this week, which contains full legal episodes of South Park. I guess the TV companies are finally getting a clue about what users want.
I will parse the first season's JSON and pull out information about individual episodes (like title, description, air date, etc) from http://www.southparkstudios.com/includes/utils/proxy_feed.php?html=season_json.jhtml%3fseason=1. Feel free to replace '1' with any valid season number.
Here's a short snippet of the JSON:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 | { season:{ episode:[ { title:'Cartman Gets an Anal Probe', description:'While the boys are waiting for the school bus, Cartman explains the odd nightmare he had the previous night involving alien visitors.', thumbnail:'http://www.southparkstudios.com/includes/utils/proxy_resizer.php?image=/images/south_park/episode_thumbnails/s01e01_480.jpg&width=55&quality=100', thumbnail_larger:'http://www.southparkstudios.com/includes/utils/proxy_resizer.php?image=/images/south_park/episode_thumbnails/s01e01_480.jpg&width=63&quality=100', thumbnail_190:'http://www.southparkstudios.com/includes/utils/proxy_resizer.php?image=/images/south_park/episode_thumbnails/s01e01_480.jpg&width=190&quality=100', id:'103511', airdate:'08.13.97', episodenumber:'101', available:'true', when:'08.13.97' } , { title:'Weight Gain 4000', description:'When Cartman\'s environmental essay wins a national contest, America\'s sweetheart, Kathie Lee Gifford, comes to South Park to present the award.', thumbnail:'http://www.southparkstudios.com/includes/utils/proxy_resizer.php?image=/images/south_park/episode_thumbnails/s01e02_480.jpg&width=55&quality=100', thumbnail_larger:'http://www.southparkstudios.com/includes/utils/proxy_resizer.php?image=/images/south_park/episode_thumbnails/s01e02_480.jpg&width=63&quality=100', thumbnail_190:'http://www.southparkstudios.com/includes/utils/proxy_resizer.php?image=/images/south_park/episode_thumbnails/s01e02_480.jpg&width=190&quality=100', id:'103516', airdate:'08.20.97', episodenumber:'102', available:'true', when:'08.20.97' } ... ] } } |
Before you can parse JSON, you need to have a few libraries. Install them using CPAN, for example:
1 2 3 4 | cpan install JSON install JSON::XS install WWW::Mechanize # my favorite library for browsing |
Now the script.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 | #!/usr/bin/perl -w # $Rev: 11 $ # $Author: artem $ # $Date: 2009-05-23 23:09:47 -0700 (Sat, 23 May 2009) $ use strict; use WWW::Mechanize; use JSON -support_by_pp; fetch_json_page("http://www.southparkstudios.com/includes/utils/proxy_feed.php?html=season_json.jhtml%3fseason=1"); sub fetch_json_page { my ($json_url) = @_; my $browser = WWW::Mechanize->new(); eval{ # download the json page: print "Getting json $json_url\n"; $browser->get( $json_url ); my $content = $browser->content(); my $json = new JSON; # these are some nice json options to relax restrictions a bit: my $json_text = $json->allow_nonref->utf8->relaxed->escape_slash->loose->allow_singlequote->allow_barekey->decode($content); # iterate over each episode in the JSON structure: my $episode_num = 1; foreach my $episode(@{$json_text->{season}->{episode}}){ my %ep_hash = (); $ep_hash{title} = "Episode $episode_num: $episode->{title}"; $ep_hash{description} = $episode->{description}; $ep_hash{url} = "http://www.southparkstudios.com/episodes/" . $episode->{id}; $ep_hash{publish_date} = $episode->{airdate}; $ep_hash{thumbnail_url} = $episode->{thumbnail_190} || $episode->{thumbnail_larger}; # print episode information: while (my($k, $v) = each (%ep_hash)){ print "$k => $v\n"; } print "\n"; $episode_num++; } }; # catch crashes: if($@){ print "[[JSON ERROR]] JSON parser crashed! $@\n"; } } |
Here's the output:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 | Getting json http://www.southparkstudios.com/includes/utils/proxy_feed.php?html=season_json.jhtml%3fseason=1 publish_date => 08.13.97 url => http://www.southparkstudios.com/episodes/103511 thumbnail_url => http://www.southparkstudios.com/includes/utils/proxy_resizer.php?image=/images/south_park/episode_thumbnails/s01e01_480.jpg&width=190&quality=100 title => Episode 1: Cartman Gets an Anal Probe description => While the boys are waiting for the school bus, Cartman explains the odd nightmare he had the previous night involving alien visitors. publish_date => 08.20.97 url => http://www.southparkstudios.com/episodes/103516 thumbnail_url => http://www.southparkstudios.com/includes/utils/proxy_resizer.php?image=/images/south_park/episode_thumbnails/s01e02_480.jpg&width=190&quality=100 title => Episode 2: Weight Gain 4000 description => When Cartman's environmental essay wins a national contest, America's sweetheart, Kathie Lee Gifford, comes to South Park to present the award. ... |
Of particular interest here is the way JSON accepts settings:
1 | my $json_text = $json->allow_nonref->utf8->relaxed->escape_slash->loose->allow_singlequote->allow_barekey->decode($content); |
I found that these settings fix most of the crashes and incompatibilities while parsing various JSON pages.
Is there something I've missed? Do you know a better way to parse JSON in Perl? Unclear about something. Don't hesitate to share in the comments.
In the meantime, if you found this article useful, feel free to buy me a cup of coffee below.