Updated: May 23rd, 2009

In this tutorial, I'll show you how to parse JSON using Perl. As a fun example, I'll use the new SouthParkStudios.com site released earlier this week, which contains full legal episodes of South Park. I guess the TV companies are finally getting a clue about what users want.

I will parse the first season's JSON and pull out information about individual episodes (like title, description, air date, etc) from http://www.southparkstudios.com/includes/utils/proxy_feed.php?html=season_json.jhtml%3fseason=1. Feel free to replace '1' with any valid season number.

Here's a short snippet of the JSON:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
{
season:{
 
episode:[
 
{	
title:'Cartman Gets an Anal Probe',
description:'While the boys are waiting for the school bus, Cartman explains the odd nightmare he had the previous night involving alien visitors.',
thumbnail:'http://www.southparkstudios.com/includes/utils/proxy_resizer.php?image=/images/south_park/episode_thumbnails/s01e01_480.jpg&width=55&quality=100',
thumbnail_larger:'http://www.southparkstudios.com/includes/utils/proxy_resizer.php?image=/images/south_park/episode_thumbnails/s01e01_480.jpg&width=63&quality=100',
thumbnail_190:'http://www.southparkstudios.com/includes/utils/proxy_resizer.php?image=/images/south_park/episode_thumbnails/s01e01_480.jpg&width=190&quality=100',	
id:'103511',
airdate:'08.13.97',
episodenumber:'101',
available:'true',
when:'08.13.97'
}
 
,
 
{	
title:'Weight Gain 4000',
description:'When Cartman\'s environmental essay wins a national contest, America\'s sweetheart, Kathie Lee Gifford, comes to South Park to present the award.',
thumbnail:'http://www.southparkstudios.com/includes/utils/proxy_resizer.php?image=/images/south_park/episode_thumbnails/s01e02_480.jpg&width=55&quality=100',
thumbnail_larger:'http://www.southparkstudios.com/includes/utils/proxy_resizer.php?image=/images/south_park/episode_thumbnails/s01e02_480.jpg&width=63&quality=100',
thumbnail_190:'http://www.southparkstudios.com/includes/utils/proxy_resizer.php?image=/images/south_park/episode_thumbnails/s01e02_480.jpg&width=190&quality=100',	
id:'103516',
airdate:'08.20.97',
episodenumber:'102',
available:'true',
when:'08.20.97'
}
 
...
 
]
}
}


Before you can parse JSON, you need to have a few libraries. Install them using CPAN, for example:

1
2
3
4
cpan
install JSON
install JSON::XS
install WWW::Mechanize # my favorite library for browsing

Now the script.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
#!/usr/bin/perl -w
# $Rev: 11 $
# $Author: artem $
# $Date: 2009-05-23 23:09:47 -0700 (Sat, 23 May 2009) $
 
use strict;
use WWW::Mechanize;
use JSON -support_by_pp;
 
fetch_json_page("http://www.southparkstudios.com/includes/utils/proxy_feed.php?html=season_json.jhtml%3fseason=1");
 
sub fetch_json_page
{
  my ($json_url) = @_;
  my $browser = WWW::Mechanize->new();
  eval{
    # download the json page:
    print "Getting json $json_url\n";
    $browser->get( $json_url );
    my $content = $browser->content();
    my $json = new JSON;
 
    # these are some nice json options to relax restrictions a bit:
    my $json_text = $json->allow_nonref->utf8->relaxed->escape_slash->loose->allow_singlequote->allow_barekey->decode($content);
 
    # iterate over each episode in the JSON structure:
    my $episode_num = 1;
    foreach my $episode(@{$json_text->{season}->{episode}}){
      my %ep_hash = ();
      $ep_hash{title} = "Episode $episode_num: $episode->{title}";
      $ep_hash{description} = $episode->{description};
      $ep_hash{url} = "http://www.southparkstudios.com/episodes/" . $episode->{id};
      $ep_hash{publish_date} = $episode->{airdate};
      $ep_hash{thumbnail_url} = $episode->{thumbnail_190} || $episode->{thumbnail_larger};
 
      # print episode information:
      while (my($k, $v) = each (%ep_hash)){
        print "$k => $v\n";
      }
      print "\n";
 
      $episode_num++;
    }
  };
  # catch crashes:
  if($@){
    print "[[JSON ERROR]] JSON parser crashed! $@\n";
  }
}

Here's the output:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
Getting json http://www.southparkstudios.com/includes/utils/proxy_feed.php?html=season_json.jhtml%3fseason=1
publish_date => 08.13.97
url => http://www.southparkstudios.com/episodes/103511
thumbnail_url => http://www.southparkstudios.com/includes/utils/proxy_resizer.php?image=/images/south_park/episode_thumbnails/s01e01_480.jpg&width=190&quality=100
title => Episode 1: Cartman Gets an Anal Probe
description => While the boys are waiting for the school bus, Cartman explains the odd nightmare he had the previous night involving alien visitors.
 
publish_date => 08.20.97
url => http://www.southparkstudios.com/episodes/103516
thumbnail_url => http://www.southparkstudios.com/includes/utils/proxy_resizer.php?image=/images/south_park/episode_thumbnails/s01e02_480.jpg&width=190&quality=100
title => Episode 2: Weight Gain 4000
description => When Cartman's environmental essay wins a national contest, America's sweetheart, Kathie Lee Gifford, comes to South Park to present the award.
 
...

Of particular interest here is the way JSON accepts settings:

1
my $json_text = $json->allow_nonref->utf8->relaxed->escape_slash->loose->allow_singlequote->allow_barekey->decode($content);

I found that these settings fix most of the crashes and incompatibilities while parsing various JSON pages.

Is there something I've missed? Do you know a better way to parse JSON in Perl? Unclear about something. Don't hesitate to share in the comments.

● ● ●
Artem Russakovskii is a San Francisco programmer and blogger. Follow Artem on Twitter (@ArtemR) or subscribe to the RSS feed.

In the meantime, if you found this article useful, feel free to buy me a cup of coffee below.