1. JSON and XML

1.1. Overview

• JSON (Javascript Object Notation) and XML (eXtensible Markup Language) are used for communicating data on the internet.
• The are especially useful for data that are not easily represented as a table of numbers.+
• Most programming languages have tools that can read JSON and XML data into an array or other data structure with only a few commands.

+ The best way to store a simple table of numbers is as CSV (Comma Separated Variables). Most programming languages have tools that can read CSV into an array or matrix with only a few commands.

1.2. Motivation

There are many ways of structuring character data in an ASCII-encoded file. A set of measurements of planetary positions could be written as

Planet, Date, Latitude, Longitude
Mars, 04/01/2014, 100.0, 99.1
Mars, 04/02/2014, 101.0, 99.1
Mars, 04/03/2014, 102.0, 99.1
Venus, 04/01/2014, 99.0, 0.1
Venus, 04/02/2014, 88.0, 1.1
Venus, 04/03/2014, 77.0, 2.1


or

 Mars
04/01/2014, 100.0, 99.1
04/02/2014, 101.0, 99.1
04/03/2014, 102.0, 99.1
Venus
04/01/2014, 99.0, 0.1
04/02/2014, 88.0, 1.1
04/03/2014, 77.0, 2.1


or

   04/01/2014
Mars, 100.0, 99.1
Venus, 99.0, 0.1
04/02/2014
Mars, 101.0, 99.1
Venus, 88.0, 1.1
04/03/2014
Mars, 102.0, 99.1
Venus, 77.0, 2.1


or the information could be stored in two files

README:

Columns are Planet, Date, Latitude, Longitude
For planet column, 0 = Mars and 1 = Venus


and Planets.txt:

0, 04/01/2014, 100.0, 99.1
0, 04/02/2014, 101.0, 99.1
0, 04/03/2014, 102.0, 99.1
1, 04/01/2014, 99.0, 0.1
1, 04/02/2014, 88.0, 1.1
1, 04/03/2014, 77.0, 2.1


The problem with each of these files is that to query the file for information, one would need to write code for each file format. (Think of the Bash, MATLAB, or Python commands that you would need to write in order to, say, list the latitude measurement of Mars).

For example, to display the latitudes of Mars using Bash for the first file, I would need to use

cat tmp.txt | head -4 | tail -3 | cut -f2 -d,


This works, but if the number of measurements of Mars changes, one would need to modify the command. In MATLAB, I could write a program that does not require modification when the number of measurements for Mars changes:

% Download remote planets.txt file and save local file as planets.txt
urlwrite('http://mag.gmu.edu/git-data/cds302/planets.txt','planets.txt');
fid = fopen('planets.txt');
while 1
tline = fgetl(fid)

% If at the end of the file stop looping
if ~ischar(tline),break,end

% If the line read contains Mars
if ~isempty(regexp(tline,'[A-Z]'))
tline = fgetl(fid); % Get another line

k = 1; % Intialize counter

% If tline does not contain A, B, ..., Z
while isempty(regexp(tline,'[A-Z]'))
tmp = regexprep(tline,'/',',');
M(k,:) = str2num(tmp);
tline = fgetl(fid); % Get another line
k = k+1;
end

end
end

% Display fourth column, which contains latitudes
M(:,4)


The situation can be improved by formatting the file as either JSON or XML. When this is done, much less code is required to extract information.

1.3. JSON

1.3.1. JSON and MATLAB

Suppose that you set up a web server that returns a list of integers and configured it to return JSON. The URL http://mag.gmu.edu/git-data/cds302/numbers.json contains a string that is formatted in JSON: [0,1,2,3] and http://mag.gmu.edu/git-data/cds302/numbers.txt contains a string that is not formatted in JSON: 0,1,2,3

In the following example, the information in each file is read into a MATLAB data structure.

MATLAB does not have a program for reading JSON, so a user-contributed file is needed. First, download http://mag.gmu.edu/git-data/cds302/parse_json/parse_json.m

To read the JSON file, execute in MATLAB

% Read the JSON-formatted content of the URL into a string
mlcell = parse_json(jsonstr); % Convert the string to a MATLAB cell array
% Display first two latitudes
mlcell{1}{1}
mlcell{1}{2}


Note that the function parse_json did not recognize that jsonstr could be represented as an array. To convert mlcell to an array, use mlarr = cat(2,mlcell{1}{:}).

To read the non-JSON file, execute in MATLAB

% Read the unformatted content of URL into a string
mlarr = str2num(str); % Convert the string to a MATLAB cell array
% Display first two latitudes
mlarr(1,1)
mlarr(1,2)


In the above example, the data file was simply a list of numbers, and getting access to the information was easy when the data was formatted in both JSON and ASCII. (In fact, an extra step was needed to turn the JSON data to a MATLAB array; this is consistent with the claim that for a simple list or table of numbers, CSV is often a better choice than JSON or XML.)

To see the real power of JSON, we return to the planet data.

The file http://mag.gmu.edu/git-data/cds302/planets.json contains

{
"Mars":
{
"Dates": ["04/01/2014","04/02/2014","04/03/2014"],
"Latitudes": [100.0,101.0,102.0],
"Longitudes": [99.1,99.1,99.1]
},
"Venus":
{
"Dates": ["04/01/2014","04/02/2014","04/03/2014"],
"Latitudes": [99.0,88.0,77.0],
"Longitudes": [0.1,1.1,2.1]

}
}


In MATLAB, the code required to read the JSON formatted file is

% Read the content of the file into a string.
% Convert the string to a MATLAB cell array
mlcell = parse_json(jsonstr);
% Display the latitudes (the following is an array)
mlcell{1}.Mars.Latitudes


To appreciate the power of JSON, compare the above three lines of MATLAB code with what was required (discussed previously) when the data were not formatted in JSON:

fid = fopen('planets.txt');
while 1
tline = fgetl(fid)

% If at the end of the file stop looping
if ~ischar(tline),break,end

% If the line read contains Mars
if ~isempty(regexp(tline,'[A-Z]'))
tline = fgetl(fid); % Get another line

k = 1; % Intialize counter

% If tline does not contain A, B, ..., Z
while isempty(regexp(tline,'[A-Z]'))
tmp = regexprep(tline,'/',',');
M(k,:) = str2num(tmp);
tline = fgetl(fid); % Get another line
k = k+1;
end

end
end

% Display fourth column, which contains latitudes
M(:,4)


1.3.2. JSON and Javascript

(The code in this section may be executed by loading [1] in a web browser.)

The URL http://mag.gmu.edu/git-data/cds302/numbers.json contains a string that is formatted in JSON [0,1,2,3]

The programming language available in all web browsers is Javascript (which is very different from Java: "Java and Javascript are similar like Car and Carpenter are similar": [2]).

To be able to use this using Javascript, you would simply execute+

arr = getdata("http://mag.gmu.edu/git-data/cds302/numbers.json"); // Get data from web resource


and now arr is a Javascript array with four elements.

The URL http://mag.gmu.edu/git-data/cds302/numbers.txt contains a string that is not formatted in JSON: 0 1 2 3

To be able to use this in a Javascript array, you would simply execute+

str = getdata("http://mag.gmu.edu/git-data/cds302/numbers.txt"); // Get data from web resource
arr = str.split(" "); // Convert the string to an array.


and now arr is a Javascript array with four elements. The key difference between the two approaches is that an extra processing step was needed in the second example. For more complex data files that are not formatted in JSON, the additional lines of code required is much larger than in this example.

In the above example, the data file was simply a list of numbers, and getting access to the information was easy when the data was formatted in both JSON and ASCII.

To see the real power of JSON, we return to the planet data.

The file http://mag.gmu.edu/git-data/cds302/planets.json contains

{
"Mars":
{
"Dates": ["04/01/2014","04/02/2014","04/03/2014"],
"Latitudes": [100.0,101.0,102.0],
"Longitudes": [99.1,99.1,99.1]
},
"Venus":
{
"Dates": ["04/01/2014","04/02/2014","04/03/2014"],
"Latitudes": [99.0,88.0,77.0],
"Longitudes": [0.1,1.1,2.1]

}
}


Executing the code in Javascript in the Javascript console:

// obj = getdata("http://mag.gmu.edu/git-data/cds302/planets.json"); // Get data from web resource
// The following is equivalent to the above, but will run in Javascript console.
obj = JSON.parse('{"Mars":{"Dates": ["04/01/2014","04/02/2014","04/03/2014"],"Latitudes": [100.0,101.0,102.0],"Longitudes": [99.1,99.1,99.1]},"Venus":{"Dates": ["04/01/2014","04/02/2014","04/03/2014"],"Latitudes": [99.0,88.0,77.0],"Longitudes": [0.1,1.1,2.1]}}');


Followed by

obj.Mars.Latitudes


would display

[100,101,102]


And entering

obj


displays a tree structure that can be navigated by clicking on triangles.

The interpretation of the Javascript object obj is that it is an object with two top-level elements (think folders): Mars and Venus. Each of these elements has three sub-elements (think subdirectories). Inside each subdirectory are three files.

Mars
Dates
04/01/2015
04/02/2015
04/03/2015
Latitudes
100.0
101.0
102.0
Longitudes
99.1
99.1
99.1
Venus
Dates
04/01/2015
04/02/2015
04/03/2015
Latitudes
99.0
88.0
77.0
Longitudes
0.1
1.1
2.1


1.4. XML

XML is another way of structuring text data. To see how it looks, we write the JSON structure

{
"Mars":
{
"Dates": ["04/01/2014","04/02/2014","04/03/2014"],
"Latitudes": [100.0,101.0,102.0],
"Longitudes": [99.1,99.1,99.1]
},
"Venus":
{
"Dates": ["04/01/2014","04/02/2014","04/03/2014"],
"Latitudes": [99.0,88.0,77.0],
"Longitudes": [0.1,1.1,2.1]

}
}


as XML:

<?xml version="1.0" encoding="utf-8"?>
<planets>
<planet>
<Name>
Mars
</Name>
<Dates>
04/01/2014,04/02/2014,04/03/2014
</Dates>
<Latitudes>
100.0,101.0,102.0
</Latitudes>
<Longitudes>
99.1,99.1,99.1
</Longitudes>
</planet>
<planet>
<Name>
Mars
</Name>
<Dates>
04/01/2014,04/02/2014,04/03/2014
</Dates>
<Latitudes>
99.0,88.0,77.0
</Latitudes>
<Longitudes>
0.1,1.1,2.1
</Longitudes>
</planet>
</planets>


 doc = xmlread('planets.xml');
doc.getDocumentElement.getChildNodes.item(1).getChildNodes.item(5).getTextContent


This is compared to the ~20 lines that it took to extract the same information from planets.txt.

(MATLAB has very poor native support for extracting XML. In the above example, it is was quite difficult to determine the appropiate command that displays the latitudes of Mars. To simplify the process, one needs to use MATLAB programs written by MATLAB users, e.g., [3]).

1.5. JSON or XML?

Suppose that you have data that can be represented as either JSON or XML? Which should you choose?

1. Information that can be represented in XML can be represented in JSON.
2. Many programs exist for converting from XML to JSON and vice-versa.

In general, the answer depends on the primary consumer of your data. If most users are familiar with JSON, you should use JSON. If most users are familiar with XML, use XML.

For a more detailed discussion, see [4].

In the context of scientific data, the primary advantage of XML is that one can specify a schema, which is a set of rules for how the file is to be structured. If someone gives you a file, you first test to see if the file is schema valid. If it is not, you abort processing. For example, a schema for the planets file could state that each planet must have a name.

(However, it is expected that soon JSON will have a schema specification [5].)

2. Problems

2.1. Real-time finance data

Write a MATLAB, Python or Bash program to read the contents of [6] and then print out

Stock: GOOG, Time: TIME, Price: PRICE


where TIME and PRICE are numbers extracted from [7]. Note that a description of the fields in the JSON string are given at [8].

If you are using MATLAB, use the function urlread to download the data and parse_json.m to convert the JSON string to a MATLAB data structure.

Note that parse_json returns data stored in a cell array with structures. For help in understanding how to manipulate and display information stored in cell arrays and structures, see

and the example where planets.json was read and some of its contents displayed in JSON_and_XML#JSON_and_MATLAB.

3. Part II

Do problem 7.1 of JSON_and_XML#Problems. Place your code in a file named HW9_II.EXT.

url = 'http://finance.google.com/finance/info?client=ig&q=NASDAQ:GOOG';
mlcell = parse_json(s(5:end)); % See email for discussion of this
var = mlcell{1}{1};
fprintf('Stock: %s, Time: %s, Price: %s\n',var.t,var.lt,var.l);


Better

sym = 'GOOG';
mlcell = parse_json(s(5:end));
fprintf('Stock: %s, Time: %s, Price: %s\n',var.t,var.lt,var.l);


Even Better - create getstock.m:

function [p,t] = getstock(sym)
mlcell = parse_json(s(5:end));

if (nargout == 0) % If no output argument is requested, print output
fprintf('Stock: %s, Time: %s, Price: %s\n',var.t,var.lt,var.l);
else % Otherwise return values
p = str2num(var.lt);
t = datenum(datevec(var.lt,30));
end


and use in a script watchstocks.m:

getstock('GOOG'); % Show line
[p,t] = getstock('GOOG'); % Put information in variables

% Get updates every 2 seconds
for while 1
getstock('GOOG');
getstock('YHOO');
pause(2);
end