Parsing JSON In Perl By Example – SouthParkStudios.com South Park Episodes
| Share |
Updated: May 23rd, 2009
In this tutorial, I'll show you how to parse JSON using Perl. As a fun example, I'll use the new SouthParkStudios.com site released earlier this week, which contains full legal episodes of South Park. I guess the TV companies are finally getting a clue about what users want.
I will parse the first season's JSON and pull out information about individual episodes (like title, description, air date, etc) from http://www.southparkstudios.com/includes/utils/proxy_feed.php?html=season_json.jhtml%3fseason=1. Feel free to replace '1' with any valid season number.
Here's a short snippet of the JSON:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 | {
season:{
episode:[
{
title:'Cartman Gets an Anal Probe',
description:'While the boys are waiting for the school bus, Cartman explains the odd nightmare he had the previous night involving alien visitors.',
thumbnail:'http://www.southparkstudios.com/includes/utils/proxy_resizer.php?image=/images/south_park/episode_thumbnails/s01e01_480.jpg&width=55&quality=100',
thumbnail_larger:'http://www.southparkstudios.com/includes/utils/proxy_resizer.php?image=/images/south_park/episode_thumbnails/s01e01_480.jpg&width=63&quality=100',
thumbnail_190:'http://www.southparkstudios.com/includes/utils/proxy_resizer.php?image=/images/south_park/episode_thumbnails/s01e01_480.jpg&width=190&quality=100',
id:'103511',
airdate:'08.13.97',
episodenumber:'101',
available:'true',
when:'08.13.97'
}
,
{
title:'Weight Gain 4000',
description:'When Cartman\'s environmental essay wins a national contest, America\'s sweetheart, Kathie Lee Gifford, comes to South Park to present the award.',
thumbnail:'http://www.southparkstudios.com/includes/utils/proxy_resizer.php?image=/images/south_park/episode_thumbnails/s01e02_480.jpg&width=55&quality=100',
thumbnail_larger:'http://www.southparkstudios.com/includes/utils/proxy_resizer.php?image=/images/south_park/episode_thumbnails/s01e02_480.jpg&width=63&quality=100',
thumbnail_190:'http://www.southparkstudios.com/includes/utils/proxy_resizer.php?image=/images/south_park/episode_thumbnails/s01e02_480.jpg&width=190&quality=100',
id:'103516',
airdate:'08.20.97',
episodenumber:'102',
available:'true',
when:'08.20.97'
}
...
]
}
} |
Before you can parse JSON, you need to have a few libraries. Install them using CPAN, for example:
1 2 3 4 | cpan install JSON install JSON::XS install WWW::Mechanize # my favorite library for browsing |
Now the script.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 | #!/usr/bin/perl -w
# $Rev: 11 $
# $Author: artem $
# $Date: 2009-05-23 23:09:47 -0700 (Sat, 23 May 2009) $
use strict;
use WWW::Mechanize;
use JSON -support_by_pp;
fetch_json_page("http://www.southparkstudios.com/includes/utils/proxy_feed.php?html=season_json.jhtml%3fseason=1");
sub fetch_json_page
{
my ($json_url) = @_;
my $browser = WWW::Mechanize->new();
eval{
# download the json page:
print "Getting json $json_url\n";
$browser->get( $json_url );
my $content = $browser->content();
my $json = new JSON;
# these are some nice json options to relax restrictions a bit:
my $json_text = $json->allow_nonref->utf8->relaxed->escape_slash->loose->allow_singlequote->allow_barekey->decode($content);
# iterate over each episode in the JSON structure:
my $episode_num = 1;
foreach my $episode(@{$json_text->{season}->{episode}}){
my %ep_hash = ();
$ep_hash{title} = "Episode $episode_num: $episode->{title}";
$ep_hash{description} = $episode->{description};
$ep_hash{url} = "http://www.southparkstudios.com/episodes/" . $episode->{id};
$ep_hash{publish_date} = $episode->{airdate};
$ep_hash{thumbnail_url} = $episode->{thumbnail_190} || $episode->{thumbnail_larger};
# print episode information:
while (my($k, $v) = each (%ep_hash)){
print "$k => $v\n";
}
print "\n";
$episode_num++;
}
};
# catch crashes:
if($@){
print "[[JSON ERROR]] JSON parser crashed! $@\n";
}
} |
Here's the output:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 | Getting json http://www.southparkstudios.com/includes/utils/proxy_feed.php?html=season_json.jhtml%3fseason=1 publish_date => 08.13.97 url => http://www.southparkstudios.com/episodes/103511 thumbnail_url => http://www.southparkstudios.com/includes/utils/proxy_resizer.php?image=/images/south_park/episode_thumbnails/s01e01_480.jpg&width=190&quality=100 title => Episode 1: Cartman Gets an Anal Probe description => While the boys are waiting for the school bus, Cartman explains the odd nightmare he had the previous night involving alien visitors. publish_date => 08.20.97 url => http://www.southparkstudios.com/episodes/103516 thumbnail_url => http://www.southparkstudios.com/includes/utils/proxy_resizer.php?image=/images/south_park/episode_thumbnails/s01e02_480.jpg&width=190&quality=100 title => Episode 2: Weight Gain 4000 description => When Cartman's environmental essay wins a national contest, America's sweetheart, Kathie Lee Gifford, comes to South Park to present the award. ... |
Of particular interest here is the way JSON accepts settings:
1 | my $json_text = $json->allow_nonref->utf8->relaxed->escape_slash->loose->allow_singlequote->allow_barekey->decode($content); |
I found that these settings fix most of the crashes and incompatibilities while parsing various JSON pages.
Is there something I've missed? Do you know a better way to parse JSON in Perl? Unclear about something. Don't hesitate to share in the comments.
Artem Russakovskii is a San Francisco programmer, blogger, and future millionaire (that last part is in the works). Follow Artem on Twitter (@ArtemR) or subscribe to the RSS feed.
In the meantime, if you found this article useful, feel free to buy me a cup of coffee below.


beer planet is a blog about technology, programming, computers, and geek life. It is run by Artem Russakovskii - a local San Francisco geek who currently works at
Well done .. this tutorial was really handy
With all due respect, this is a trivial example. The JSON structure didn't have any embedded objects or even an array. Did you even need a JSON parser for this? Nope… RegExp's would have been fine.
jl
Great article, thanks! I also needed HTTP/Response/Encoding.pm from http://search.cpan.org/~dankogai/HTTP-Response-Encoding-0.05/lib/HTTP/Response/Encoding.pm .
@dilino
Well, if we want to get technical, then you need all these: http://deps.cpantesters.org/?module=WWW::Mechanize;perl=latest but they should be installed automatically when you install WWW::Mechanize.
@jnlawton
Indeed, you're right, it wasn't a very complicated example (though 'episode' is actually an array, isn't it?). The purpose was to introduce the usage of the JSON module.
Hi there,
I appreciate the concise example you've provided. I was looking for a way to get something done in JSON quickly, and this looks like the ticket.
Thanks so much, and keep it up.
Farhan
This was really useful to me. I'm still new to JSON though so i guess it'll take some getting used to. Thanks though buddy!
P.S Also a massive fan of south park, props!
Hey its a marvelous post. Wish You sucess.
Finally got the hang of this, works a treat thank you so much artem!
This is extremely useful! Thank you. South Park is awesome also so it's a great example. =)
[...] For those inclined, here's the perl script I used to query the API. It borrows heavily from this example. [...]