Parsing JSON In Perl By Example - SouthParkStudios.com South Park Episodes
Thursday, March 27th, 2008
In this tutorial, I'll show you how to parse JSON using Perl. As a fun example, I'll use the new SouthParkStudios.com site released earlier this week, which contains full legal episodes of South Park. I guess the TV companies are finally getting a clue about what users want. I will parse the first season's JSON and pull out information about individual episodes (like title, description, air date, etc) from http://www.southparkstudios.com/includes/utils/proxy_feed.php?html=season_json.jhtml%3fseason=1. Feel free to replace '1' with any valid season number.
Here's a short snippet of the JSON:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 | {
season:{
episode:[
{
title:'Cartman Gets an Anal Probe',
description:'While the boys are waiting for the school bus, Cartman explains the odd nightmare he had the previous night involving alien visitors.',
thumbnail:'http://www.southparkstudios.com/includes/utils/proxy_resizer.php?image=/images/south_park/episode_thumbnails/s01e01_480.jpg&width=55&quality=100',
thumbnail_larger:'http://www.southparkstudios.com/includes/utils/proxy_resizer.php?image=/images/south_park/episode_thumbnails/s01e01_480.jpg&width=63&quality=100',
thumbnail_190:'http://www.southparkstudios.com/includes/utils/proxy_resizer.php?image=/images/south_park/episode_thumbnails/s01e01_480.jpg&width=190&quality=100',
id:'103511',
airdate:'08.13.97',
episodenumber:'101',
available:'true',
when:'08.13.97'
}
,
{
title:'Weight Gain 4000',
description:'When Cartman\'s environmental essay wins a national contest, America\'s sweetheart, Kathie Lee Gifford, comes to South Park to present the award.',
thumbnail:'http://www.southparkstudios.com/includes/utils/proxy_resizer.php?image=/images/south_park/episode_thumbnails/s01e02_480.jpg&width=55&quality=100',
thumbnail_larger:'http://www.southparkstudios.com/includes/utils/proxy_resizer.php?image=/images/south_park/episode_thumbnails/s01e02_480.jpg&width=63&quality=100',
thumbnail_190:'http://www.southparkstudios.com/includes/utils/proxy_resizer.php?image=/images/south_park/episode_thumbnails/s01e02_480.jpg&width=190&quality=100',
id:'103516',
airdate:'08.20.97',
episodenumber:'102',
available:'true',
when:'08.20.97'
}
...
]
}
} |
Before you can parse JSON, you need to have a few libraries. Install them using CPAN, for example:
1 2 3 4 | cpan install JSON install JSON::XS install WWW::Mechanize # my favorite library for browsing |
Now the script.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 | #!/usr/bin/perl -w
# $Rev: 1 $
# $Author: artem $
# $Date: 2008-03-25 14:28:39 -0800 (Tue, 25 Mar 2008) $
use strict;
use WWW::Mechanize;
use JSON -support_by_pp;
fetch_json_page("http://www.southparkstudios.com/includes/utils/proxy_feed.php?html=season_json.jhtml%3fseason=1");
sub fetch_json_page
{
my ($json_url) = @_;
my $browser = WWW::Mechanize->new();
eval{
# download the json page:
print "Getting json $json_url\n";
$browser->get( $json_url );
my $content = $browser->content();
my $json = new JSON;
# these are some nice json options to relax restrictions a bit:
my $json_text = $json->allow_nonref->utf8->relaxed->escape_slash->loose->allow_singlequote->allow_barekey->decode($content);
# iterate over each episode in the JSON structure:
my $episode_num = 1;
foreach my $episode(@{$json_text->{season}->{episode}}){
my %ep_hash = ();
$ep_hash{title} = "Episode $episode_num: $episode->{title}";
$ep_hash{description} = $episode->{description};
$ep_hash{url} = "http://www.southparkstudios.com/episodes/" . $episode->{id};
$ep_hash{publish_date} = $episode->{airdate};
$ep_hash{thumbnail_url} = $episode->{thumbnail_190} || $episode->{thumbnail_larger};
# print episode information:
while (my($k, $v) = each (%ep_hash)){
print "$k => $v\n";
}
print "\n";
$episode_num++;
}
};
# catch crashes:
if($@){
print "[[JSON ERROR]] JSON parser crashed! $@\n";
}
} |
Here's the output:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 | Getting json http://www.southparkstudios.com/includes/utils/proxy_feed.php?html=season_json.jhtml%3fseason=1 publish_date => 08.13.97 url => http://www.southparkstudios.com/episodes/103511 thumbnail_url => http://www.southparkstudios.com/includes/utils/proxy_resizer.php?image=/images/south_park/episode_thumbnails/s01e01_480.jpg&width=190&quality=100 title => Episode 1: Cartman Gets an Anal Probe description => While the boys are waiting for the school bus, Cartman explains the odd nightmare he had the previous night involving alien visitors. publish_date => 08.20.97 url => http://www.southparkstudios.com/episodes/103516 thumbnail_url => http://www.southparkstudios.com/includes/utils/proxy_resizer.php?image=/images/south_park/episode_thumbnails/s01e02_480.jpg&width=190&quality=100 title => Episode 2: Weight Gain 4000 description => When Cartman's environmental essay wins a national contest, America's sweetheart, Kathie Lee Gifford, comes to South Park to present the award. ... |
Of particular interest here is the way JSON accepts settings:
1 | my $json_text = $json->allow_nonref->utf8->relaxed->escape_slash->loose->allow_singlequote->allow_barekey->decode($content); |
I found that these settings fix most of the crashes and incompatibilities while parsing various JSON pages.
Is there something I've missed? Do you know a better way to parse JSON in Perl? Unclear about something. Don't hesitate to share in the comments.

(+2 rating, 2 votes)
beer planet is Artem Russakovskii's blog. Artem is a software engineer at
April 16th, 2008 at 7:52 pm
Well done .. this tutorial was really handy