Reading Binary Files

From ComputingForScientists

Jump to: navigation, search

Contents

  1. Reading Binary Files
    1. High-Level Binary Read
    2. Low-Level Binary Read
  2. Problems
    1. High-Level Binary Read
    2. Low-Level Binary Read I
    3. Low-Level Binary Read II

1. Reading Binary Files

1.1. High-Level Binary Read

Each of the image and scientific file formats mentioned in IO#Binary have a complex set of rules for how information is stored. To get at the information in the files, you need a software library that can read this information and turn it into something meaningful.

As an example, consider a 1-pixel PNG file [1]. To verify that this is indeed a single black pixel, we could read the documentation at [2] and use od to inspect the file.

Fortunately, MATLAB provides an easy way to turn the content of a PNG file into something meaningful:

First download a PNG image that is one black pixel by entering on the command line

# curl http://bobweigel.net/cds302/images/1x1black.png > 1x1black.png

Then start MATLAB or Octave and enter

>> A = imread('1x1black.png')

The result is A=0. If we read the documentation of imread, we see that each element of A corresponds to a pixel, and the value of the element corresponds to the intensity of black (255 is white and 0 is black).

This (A=0) is much easier to interpret than the output of

# od -t x1 1x1black.png 

which is

0000000 89 50 4e 47 0d 0a 1a 0a 00 00 00 0d 49 48 44 52
0000020 00 00 00 01 00 00 00 01 08 00 00 00 00 3a 7e 9b
0000040 55 00 00 00 07 74 49 4d 45 07 df 02 0b 12 32 05
0000060 00 c5 07 db 00 00 00 0a 49 44 41 54 08 99 63 60
0000100 00 00 00 02 00 01 f4 71 64 a6 00 00 00 00 49 45
0000120 4e 44 ae 42 60 82

Consider next the 10x10 image [3]. First download the PNG file by entering on the command line

# curl http://bobweigel.net/cds302/images/10x10image.png > 10x10image.png

Then start MATLAB or Octave and enter

>> A = imread('10x10image.png')

The result is

A =

     0     0     0   255     0     0     0     0     0     0
     0     0     0   255     0     0     0     0     0     0
     0     0     0   255     0     0     0     0     0     0
     0     0     0   255     0     0     0     0     0     0
     0     0     0   255     0     0     0     0     0     0
     0     0     0   255     0     0     0     0     0     0
     0     0     0   255     0     0     0     0     0     0
     0     0     0   255     0     0     0     0     0     0
     0     0     0   255     0     0     0     0     0     0
     0     0     0   255     0     0     0     0     0     0

You will encounter many binary scientific file formats. The general procedure for getting at the information in the file is

  1. Search for programs that can read the specified file format. Select a program that you are familiar with.
  2. Read the documentation and find examples

Example:

We are given the file [4]. What is in it? A search on the web indicates that a .fits file is most likely encoded according to http://fits.gsfc.nasa.gov/. We search MATLAB read FITS and find http://www.mathworks.com/help/matlab/ref/fitsread.html

First, download the file on the command line

# curl http://fits.gsfc.nasa.gov/samples/UITfuv2582gc.fits > UITfuv2582gc.fits

Next, start MATLAB and enter

>> info = fitsinfo('UITfuv2582gc.fits')
>> data = fitsread('UITfuv2582gc.fits');
>> whos data

1.2. Low-Level Binary Read

When a special program is not available to read a binary file, the low-level function fread can be used. See http://www.mathworks.com/help/matlab/ref/fread.html

2. Problems

2.1. High-Level Binary Read

Find a file encoded in HDF 5 (usually has an extension of ".h5").

Find a program that allows you to inspect the numbers in this file and save it as an ASCII file.

2.2. Low-Level Binary Read I

The program

fid = fopen('file.bin','wb');
fwrite(fid,[1:10],'double');
fclose(fid);

write a binary file named file.bin. Use fread to read this file into a variable named data so that entering data on the command line displays the numbers 1 through 10.

2.3. Low-Level Binary Read II

A binary file format has the following specification:

  • The first 32 bytes are an unsigned 32-bit integer corresponding to the time of a measurement in milliseconds since January 1, 1970 at 00:00:00.000.
  • The next 64 bytes are doubles (written using MATLAB's fwrite function). This value corresponds a measured temperature.
  • The next 32 bytes are an unsigned integer corresponding to the time of a measurement in milliseconds since January 1, 1970 at 00:00:00.000.
  • The next 64 bytes are doubles (written using MATLAB's fwrite function). This value corresponds a measured temperature.
  • etc.

The file is located at [5].

Based on the size of the file reported by ls -l, how many temperature measurements are in the file?

What are the temperature measurements and time that they were made?

Personal tools