Share

Updated: May 23rd, 2009

In this tutorial, I'll show you how to parse JSON using Perl. As a fun example, I'll use the new SouthParkStudios.com site released earlier this week, which contains full legal episodes of South Park. I guess the TV companies are finally getting a clue about what users want.

I will parse the first season's JSON and pull out information about individual episodes (like title, description, air date, etc) from http://www.southparkstudios.com/includes/utils/proxy_feed.php?html=season_json.jhtml%3fseason=1. Feel free to replace '1' with any valid season number.

Here's a short snippet of the JSON:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
{
season:{
 
episode:[
 
{
title:'Cartman Gets an Anal Probe',
description:'While the boys are waiting for the school bus, Cartman explains the odd nightmare he had the previous night involving alien visitors.',
thumbnail:'http://www.southparkstudios.com/includes/utils/proxy_resizer.php?image=/images/south_park/episode_thumbnails/s01e01_480.jpg&width=55&quality=100',
thumbnail_larger:'http://www.southparkstudios.com/includes/utils/proxy_resizer.php?image=/images/south_park/episode_thumbnails/s01e01_480.jpg&width=63&quality=100',
thumbnail_190:'http://www.southparkstudios.com/includes/utils/proxy_resizer.php?image=/images/south_park/episode_thumbnails/s01e01_480.jpg&width=190&quality=100',
id:'103511',
airdate:'08.13.97',
episodenumber:'101',
available:'true',
when:'08.13.97'
}
 
,
 
{
title:'Weight Gain 4000',
description:'When Cartman\'s environmental essay wins a national contest, America\'s sweetheart, Kathie Lee Gifford, comes to South Park to present the award.',
thumbnail:'http://www.southparkstudios.com/includes/utils/proxy_resizer.php?image=/images/south_park/episode_thumbnails/s01e02_480.jpg&width=55&quality=100',
thumbnail_larger:'http://www.southparkstudios.com/includes/utils/proxy_resizer.php?image=/images/south_park/episode_thumbnails/s01e02_480.jpg&width=63&quality=100',
thumbnail_190:'http://www.southparkstudios.com/includes/utils/proxy_resizer.php?image=/images/south_park/episode_thumbnails/s01e02_480.jpg&width=190&quality=100',
id:'103516',
airdate:'08.20.97',
episodenumber:'102',
available:'true',
when:'08.20.97'
}
 
...
 
]
}
}


Before you can parse JSON, you need to have a few libraries. Install them using CPAN, for example:

1
2
3
4
cpan
install JSON
install JSON::XS
install WWW::Mechanize # my favorite library for browsing

Now the script.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
#!/usr/bin/perl -w
# $Rev: 11 $
# $Author: artem $
# $Date: 2009-05-23 23:09:47 -0700 (Sat, 23 May 2009) $
 
use strict;
use WWW::Mechanize;
use JSON -support_by_pp;
 
fetch_json_page("http://www.southparkstudios.com/includes/utils/proxy_feed.php?html=season_json.jhtml%3fseason=1");
 
sub fetch_json_page
{
  my ($json_url) = @_;
  my $browser = WWW::Mechanize->new();
  eval{
    # download the json page:
    print "Getting json $json_url\n";
    $browser->get( $json_url );
    my $content = $browser->content();
    my $json = new JSON;
 
    # these are some nice json options to relax restrictions a bit:
    my $json_text = $json->allow_nonref->utf8->relaxed->escape_slash->loose->allow_singlequote->allow_barekey->decode($content);
 
    # iterate over each episode in the JSON structure:
    my $episode_num = 1;
    foreach my $episode(@{$json_text->{season}->{episode}}){
      my %ep_hash = ();
      $ep_hash{title} = "Episode $episode_num: $episode->{title}";
      $ep_hash{description} = $episode->{description};
      $ep_hash{url} = "http://www.southparkstudios.com/episodes/" . $episode->{id};
      $ep_hash{publish_date} = $episode->{airdate};
      $ep_hash{thumbnail_url} = $episode->{thumbnail_190} || $episode->{thumbnail_larger};
 
      # print episode information:
      while (my($k, $v) = each (%ep_hash)){
        print "$k => $v\n";
      }
      print "\n";
 
      $episode_num++;
    }
  };
  # catch crashes:
  if($@){
    print "[[JSON ERROR]] JSON parser crashed! $@\n";
  }
}

Here's the output:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
Getting json http://www.southparkstudios.com/includes/utils/proxy_feed.php?html=season_json.jhtml%3fseason=1
publish_date => 08.13.97
url => http://www.southparkstudios.com/episodes/103511
thumbnail_url => http://www.southparkstudios.com/includes/utils/proxy_resizer.php?image=/images/south_park/episode_thumbnails/s01e01_480.jpg&width=190&quality=100
title => Episode 1: Cartman Gets an Anal Probe
description => While the boys are waiting for the school bus, Cartman explains the odd nightmare he had the previous night involving alien visitors.
 
publish_date => 08.20.97
url => http://www.southparkstudios.com/episodes/103516
thumbnail_url => http://www.southparkstudios.com/includes/utils/proxy_resizer.php?image=/images/south_park/episode_thumbnails/s01e02_480.jpg&width=190&quality=100
title => Episode 2: Weight Gain 4000
description => When Cartman's environmental essay wins a national contest, America's sweetheart, Kathie Lee Gifford, comes to South Park to present the award.
 
...

Of particular interest here is the way JSON accepts settings:

1
my $json_text = $json->allow_nonref->utf8->relaxed->escape_slash->loose->allow_singlequote->allow_barekey->decode($content);

I found that these settings fix most of the crashes and incompatibilities while parsing various JSON pages.

Is there something I've missed? Do you know a better way to parse JSON in Perl? Unclear about something. Don't hesitate to share in the comments.

● ● ●

Artem Russakovskii is a San Francisco programmer, blogger, and future millionaire (that last part is in the works). Follow Artem on Twitter (@ArtemR) or subscribe to the RSS feed.

In the meantime, if you found this article useful, feel free to buy me a cup of coffee below.



Share
  • kingpin

    Well done .. this tutorial was really handy

  • jnlawton

    With all due respect, this is a trivial example. The JSON structure didn't have any embedded objects or even an array. Did you even need a JSON parser for this? Nope… RegExp's would have been fine.

    jl

    • BartJ

      this is so not true…Read this answer on StackOverflow to find out why – http://stackoverflow.com/questions/408570/regular-expression-to-parse-an-array-of-json-objects/408641#408641
      Excerpt from the answer: "Balanced parentheses are literally a textbook example of a language that cannot be processed with regular expressions. JSON is essentially balanced parentheses plus a bunch of other stuff, with the braces replaced by parens. In the hierarchy of formal languages, JSON is a context-free language. Regular expressions can't parse context-free languages."

  • http://environment.dilino.org dilino

    Great article, thanks! I also needed HTTP/Response/Encoding.pm from http://search.cpan.org/~dankogai/HTTP-Response-Encoding-0.05/lib/HTTP/Response/Encoding.pm .

  • http://beerpla.net Artem Russakovskii

    @dilino
    Well, if we want to get technical, then you need all these: http://deps.cpantesters.org/?module=WWW::Mechanize;perl=latest but they should be installed automatically when you install WWW::Mechanize.

    @jnlawton
    Indeed, you're right, it wasn't a very complicated example (though 'episode' is actually an array, isn't it?). The purpose was to introduce the usage of the JSON module.

  • http://farhany.com Farhan

    Hi there,

    I appreciate the concise example you've provided. I was looking for a way to get something done in JSON quickly, and this looks like the ticket.

    Thanks so much, and keep it up.

    Farhan

  • South Park

    This was really useful to me. I'm still new to JSON though so i guess it'll take some getting used to. Thanks though buddy!

    P.S Also a massive fan of south park, props! ;)

  • http://www.spfanzone.com South Park

    Finally got the hang of this, works a treat thank you so much artem!

  • Ventrilo Servers

    This is extremely useful! Thank you. South Park is awesome also so it's a great example. =)

  • Parviz

    great! I tried it and it worked. Thanks for sharing this.

  • mangoo

    Hmm, doesn't seem to be working for me:
    Getting json http://www.southparkstudios.com/includes/utils/proxy_feed.php?html=season_json.jhtml%3fseason=1
    escape_slash is not supported in JSON::XS. at json.pl line 25
    loose is not supported in JSON::XS. at json.pl line 25
    allow_singlequote is not supported in JSON::XS. at json.pl line 25
    allow_barekey is not supported in JSON::XS. at json.pl line 25
    [[JSON ERROR]] JSON parser crashed! malformed JSON string, neither array, object, number, string or atom, at character offset 0 ["/* Generated locally..."] at json.pl line 25.

    • mangoo

      Looks like I missed "-support_by_pp;" ;) – working now, thanks.

  • Wes Garrison

    This was fantastic, thanks a bunch!

    I used LWP::UserAgent to POST the request, but the JSON library worked great, and your example made it easy to understand how the parsed data was structured ie. hash -> array -> hash, etc.

    -Wes

  • Goutham Raja

    You done a great job… Thanks a lot…

  • Paul Maddox

    Very useful. Thanks.

  • spydox

    (unlike some others) I appreciate your effort to publish this. We're just getting started with JSON so it was a useful Primer.

    Sometimes I do wonder why we *invent another way* do so something – XML pretty much handles NV pairs. But I'm trying to keep an open mind, and optimistic that JSON will be a great tool for us.

    Our goal is primarily parsing arbitrary order form data, then applying logic and rules, we interpret the data to populate thousands of Oracle fields.

    I'm not really sure which of those parts JSON helps us with- perhaps as a object we can pass of to an SQL generator? That's really not a difficult role, but I'm hoping it standardizes that step.

    • http://beerpla.net Artem Russakovskii

      I was wondering the same thing until I realized that JSON doesn't have parameters like XML does which makes its parsing, consumption, and direct translation into data structures very easy and straightforward. It's also a lot more compact, though harder to read if unformatted. JSON parsers seem to be simpler, more, agile, and less buggy as they don't have to be as complex.

  • henq

    Or, for simple JSON of clean source, consider using an eval:

    my $res = $response->content;
    $res =~ s/:/ =>/g; #replace semicolon by perl's assign operator
    $res =~ s/null/0/g; #or " , depending on usage

    my @mydata = eval {$res}; # if you know it is an array
    print $@ if ($@);

    print $mydata[0] ; # first element

  • Aneesh Karve

    thank you for the clear example.

    for general coding practices in the wild, it's safer to bypass the JSON object's integrity checks one-by-one, if and only if needed (i.e. allow_nonref, relaxed, escape_slash, loose, allow_singlequote, allow_barekey).

    if programmers use all of these bypass flags by default, they're missing out on the improved performance of JSON::XS, as well as missing out on data integrity checks.

  • Dan Hunter

    Thanks for this- I'm using JSON:XS instead, but working with a similar data structure so it helped a lot. Cheers!

  • whall

    Unfortunately for me and my hope to use your great example to learn JSON requests and such, it appears as though this feed is no longer supported by southparkstudios.com

    One thing I'm also looking to learn but haven't found yet is how to formulate a JSON request — all the examples I've seen so far just do a GET on a url, and the response is JSON-formatted, which you then parse. But I have a need to actually formulate the JSON-formatted request header. Any tips on that?

  • Marcin

    Fantastic tutotial. Thank you.

    Question – how do you handle special characters in JSON feed?

    My feed includes special characters which results in an error:

    "Byte "233" is not a member of the (7-bit" ASCII character set".

    -Marcin

  • Phil Pirozhkov

    You saved my day

  • Tony

    How do I parse a 3D JSON array?

    my $test = '{
    "name":"Tony",
    "body":[ {
    "arms":["hands:fingers", "muscles:biceps"],
    "stomach":["abs:sixpack", "noabs:onepack"]
    },
    {
    "arms":["fingers:nails", "knuckles:sharp"],
    "stomach":["gut:beer", "liver:liquor"]
    }]
    }';

    I'm trying this and it isn't working:

    my $decoded = decode_json($test);
    my @layer1 = @{ $decoded->{'body'} };
    foreach ( @layer1 ) {
    @layer2 = $_->{$decoded->{'arms'} };
    foreach( @layer2 ) {
    print $_->{$decoded->{'hands'}} . "n";
    }
    }

    I expect the printout to be: fingers

  • Namratha

    Thanks for the great example :)