# 1. Objective

Introduce basic techniques for parallelizing data reduction operations.

# 2. Introduction

Suppose that you have two arrays x and y, and your task is to compute the sum of all elements in each array by hand. You can do the addition on your own, or you could give one array to a friend and you can both do the addition at the same time. In this case, you can cut the time required to do the task in half (provided that your friend is as fast as you and that you don't, for example, need to mail the list to a friend and wait for him to send it back).

This is an example of a problem that is "Embarrassingly Parallel" (EP) - the task was easily split up, and no communication was required between the workers while they were doing their task.

(An example of a non-EP task is if, for example, x and y each had 100 elements and the task was to add the first ten numbers of each array, multiply the two sums, and place the result of the multiplication in array z and then to repeat this for the next block of ten numbers, etc. In this case, a total of 10 multiplications were needed and the participants in the task needed to communicate 10 times.)

# 3. Code Examples

Create a file named A5.m with a single line:

fprintf('a = %d\n',a);


Normally, one would start Octave and enter

a = 99
A5


to execute this program. Octave commands may also be executed from the command line. The following is equivalent to the above:

# The following is equivalent to starting Octave, evaluating the quoited commands, and then typing "exit".
octave --eval "a = 1;A5"

# The --silent flag suppresses the display of start-up information
octave --silent --eval "a = 1;A5"

# For reference, the following is equivalent to starting Octave and entering "A5" on the command line:
octave A5.m
# (You will get an error because the variable "a" is not defined.)


Suppose that we would like to execute the program A5.m four times. On the command line, we could enter

# Enter these commands in A5a.sh.  Execute with bash A5a.sh.

# Run each command sequentially
echo "Starting Sequential Run"
octave --silent --eval "a = 1;A5"
octave --silent --eval "a = 2;A5"
octave --silent --eval "a = 3;A5"
octave --silent --eval "a = 4;A5"
echo "Finished Sequential Run"
# Or
# octave --silent --eval "a = 1;A5" ; octave --silent --eval "a = 2;A5" ; octave --silent --eval "a = 3;A5" ; octave --silent --eval "a = 4;A5"


In this case, the next command is executed only after the previous command is finished.

Suppose that we would like to execute the program A5.m four times, and we don't care what order they finish in:

# Enter these commands in A5b.sh.  Execute with bash A5b.sh.
# Last three octave commands are executed even if the previous command is not finished.
# The second echo command will not execute until the previous command is finished.
echo "Starting Parallel Run"
octave --silent --eval "a = 1;A5" &
octave --silent --eval "a = 2;A5" &
octave --silent --eval "a = 3;A5" &
octave --silent --eval "a = 4;A5"
echo "Finished Parallel Run"
# Or
# octave --silent --eval "a = 1;A5" & octave --silent --eval "a = 2;A5" & ...


In this case, the & following a command means "continue to the next command prior to finishing this command".

The following is the same as above, except for an additional & after the fourth octave command.

# Enter commands in A5c.sh.  Execute with bash A5c.sh.
# Continue to next command without waiting for previous command to finish
# (Except for last one - echo line won't execute until previous line finished.)
echo "Starting Parallel Run"
octave --silent --eval "a = 1;A5" &
octave --silent --eval "a = 2;A5" &
octave --silent --eval "a = 3;A5" &
octave --silent --eval "a = 4;A5" &
echo "Finished Parallel Run"


Note that "Finished Parallel Run" shows up before some or all of the Octave commands have finished. We'll remedy this later.

Suppose that we wanted to store all of the results in a file like

a = 1
a = 2
a = 4
a = 4


We could try this:

# Enter commands in A5d.sh.  Execute with bash A5d.sh.
echo "Starting Parallel Run"
octave --silent --eval "a = 1;A5" > out1.txt &
octave --silent --eval "a = 2;A5" > out2.txt &
octave --silent --eval "a = 3;A5" > out3.txt &
octave --silent --eval "a = 4;A5" > out4.txt &
echo "Finished Parallel Run"
cat out1.txt out2.txt out3.txt out4.txt > out.txt


but out.txt contains garbled lines for the same reason that A4c.sh produced garbled lines.

To fix this, we need to write a program that periodically checks to see if all of the output files have been written. When all files are written, it concatenates the output.

# Enter commands in A5e.sh.  Execute with bash A5e.sh.
rm -f out*.txt
echo "Starting Parallel Run"
octave --silent --eval "a = 1;A5" > out1.txt &
octave --silent --eval "a = 2;A5" > out2.txt &
octave --silent --eval "a = 3;A5" > out3.txt &
octave --silent --eval "a = 4;A5" > out4.txt &

alldone="no"
while [ "\$alldone" = "no" ]; do
echo "Checking if all files exist."
if [[ -s out1.txt && -s out2.txt  && -s out3.txt  && -s out4.txt ]]; then
echo "All files exist.  Concatenating into out.txt"
cat out1.txt out2.txt out3.txt out4.txt > out.txt
alldone="yes"
fi
done


The above program only checks to see if the size of the file is greater than zero (using test like -s out1.txt). (If, instead of using -s, -e was used, we would find that the parallel code always took about the same amount of time, independent of N. This is due to the fact that the files out*.txt are created immediately when the command is executed, and then content is written to the file. An even better approach is to verify that the files are the correct size, but this requires code beyond the scope of this demonstration.)

The above method is a crude method for determining when all programs have completed. A much cleaner method for handling this is to use the Bash command wait:

# Enter commands in A5e2.sh.  Execute with bash A5e2.sh.
echo "Starting Parallel Runs"
octave --silent --eval "a = 1;A5" > out1.txt &
octave --silent --eval "a = 2;A5" > out2.txt &
octave --silent --eval "a = 3;A5" > out3.txt &
octave --silent --eval "a = 4;A5" > out4.txt &
wait
echo "All Parallel Runs Complete"


# 4. Problems

## 4.1. Timing

Create the files

H10.m

% Modify the number 50000 so that when you execute
% time octave --silent A5.m
% the time reported (real) is between 0.5 and 1.5 seconds.
for i = 1:50000
x(i) = randn();
end
fprintf('%f\n',mean(x));


H10parallel.sh

rm -f out*.txt
rm -f tmp*.txt
echo "Starting Parallel Run"
octave --silent --eval "HW10" > out1.txt &
octave --silent --eval "HW10" > out2.txt &

wait


H10serial.sh

rm -f out*.txt
rm -f tmp*.txt
echo "Starting Serial Run"
octave --silent --eval "HW10" > out1.txt
octave --silent --eval "HW10" > out2.txt

wait


Modify H10serial.sh and H10parallel.sh so that they each execute the Octave programs N = 1, 2, 4, 8, and 16 times (you may do this by just typing in the extra commands needed). The codes samples above are for N = 2.

To test how long the programs take, use

time bash H10serial.sh
time bash H10parallel.sh


and record the value displayed for real (see man time for an explanation of user and sys). Create a table with format

H10.txt

N, Serial, Parallel
1, ..., ...
2, ..., ...
4, ..., ...
8, ..., ...
16 ..., ...


and fill in the timing information (replace the ...s with times in seconds).

## 4.2. GNU Parallel

Repeat the previous problem using http://www.gnu.org/software/parallel/