# 1. Validation

## 1.1. Objective

• To introduce the concept of Validation of a computer simulation

## 1.2. Motivation

1. Every science model is an approximation of reality.
2. Every mathematical representation of a science model is an approximation of the science model.
3. Every computer representation of a mathematical model is an approximation.

It is important to understand how well the computer simulation predicts the actual behavior of the system that is being modeled.

## 1.3. Definitions

There are two basic stages in testing computer simulations

• Verification - Are you solving the equations correctly?
• This does not address the question of if the model is a reasonable reflection of reality.
• Validation - Are you are solving the correct equations?
• A science model requires assumptions and assertions about how a system works. The validation process is used to determine if these assumptions and claims were valid.
• The process of determining the degree to which a model is an accurate representation of the real world from the perspective of the intended uses of the model. (AIAA G-077-1998)

## 1.4. Definitions cont.

Compare with other definitions

• Validation - In the context of science - are you are solving the correct equations?
• In the context of business, it is a statement about a product or service [1]:
Validation is a Quality assurance process of establishing evidence that provides a high degree of assurance that a product, service, or system accomplishes its intended requirements. This often involves acceptance of fitness for purpose with end users and other product stakeholders. It is sometimes said that validation can be expressed by the query "Are you building the right thing?" and verification by "Are you building it right?" "Building the right thing" refers back to the user's needs, while "building it right" checks that the specifications be correctly implemented by the system. In some contexts, it is required to have written requirements for both as well as formal procedures or protocols for determining compliance.

## 1.5. Correct Equations?

• Data for population (S-curve)
• Prediction from computer simulation (J-curve)

## 1.6. Correct Equations? cont.

• How do you know that you are not solving the correct equations?
• Equations are correct, but the growth rate in the simulation was not correct

## 1.7. Parts of a Model

• The equations
• Are usually specified by the science model
• Are parameters like growth rate, interest rate, etc.
• Parameters are usually restricted to fall in a range. What is an acceptable range for growth rate?

## 1.8. Validation Steps

Usually performed after Verification

1. Run the simulation with reasonable estimates of the parameters.
2. Do sanity checks - Does the simulation predict behavior that would never happen in reality? (Does population ever go negative? Does the mass stay positive?)
3. If measurements of the modeled system are available, compare simulation with data

Name some!

## 1.10. Validation vs. Development

Model validation is intimately related to model development.

1. A model is proposed and then validation reveals "warts".
2. The model is revised until warts are gone.

## 1.11. Relation to Scientific Method

Validation is a key component of the scientific method as applied to computational science:

1. Characterization of existing data
• Identify features using a table and/or plot
• Estimate uncertainty using information about how the data were collected
2. Formulation of a hypothesis (the science model is the hypothesis)
3. Formulation of a predictive test (both the mathematical and computational models produce quantitative predictions)
4. Experimental testing (verification and compare computational model predictions with existing data)
5. Validate
1. If "valid", report and peer review. If peer review reveals "warts", go back to hypothesis step
2. If "invalid", go back to hypothesis step
6. Wait till more data comes in. Validate again.

## 1.12. Example

• In the following, we give an example of how the scientific method could be applied to some data involving a flu outbreak.
• Although the steps taken are made up, it is an illustrative example of how science sometimes proceeds.

## 1.13. Model 1, part 1

1. Characterization of existing data

 Data are from article: http://dx.doi.org/10.1136/bmj.1.6112.586 Actual N values not given in paper. N (measured values) determined by zooming in on PDF and overlaying grid. Then numbers were plotted using a spreadsheet. Uncertainty due to zoom method = +/- 3 Uncertainty in N measured by doctors in school = ???

## 1.14. Model 1, part 2

2. Formulation of a hypothesis

Model 1

The number infected equals 10 times the day number, with day number = 1 corresponding to the day the first student was infected.

## 1.15. Model 1, part 3

3. Formulation of a predictive test

Create mathematical and computational representation of the model and plot predictions versus data.

Mathematical Model

N(i) = 10*i


Plot: [2]

## 1.16. Model 1, part 4

4. Experimental testing

Verification and compare computational model predictions with existing data.

Suggest ways to

• Verify
• Compare

Plot: [3]

## 1.17. Model 1, part 5

5. Validate

Validation - Are you are solving the correct equations?

1. Run the simulation with reasonable estimates of the parameters.
• We guess the parameters. The growth rate is positive, so it seems reasonable.
2. Do sanity checks - Does the simulation predict behavior that would never happen in reality? Does population ever go negative? Does the mass stay positive?
• No negative behavior, but ...
3. If measurements of the modeled system are available, compare simulation with data
• We did this, and the match is "not good".
• We will cover what "not good" means later in the semester.

What if you you don't want to give up on your equations?

• Try same equations with different parameters

## 1.18. Model 2

Repeat Model 1 with a different parameter.

Model 2

The number infected equals 20 times the day number, with day number = 1 corresponding to the day the first student was infected.

Plot: [4]

Repeating this process by changing the parameters will result in the realization that the model will never be able to reproduce the curve (basic math happens to tells us this too, but this is rarely the case in the real world).

Are you are solving the correct equations? No

## 1.19. Model 3, part 1

1. Characterization of existing data

Same as Model 1, part 1

## 1.20. Model 3, part 2

2. Formulation of a hypothesis

Model 3

The number of new people infected on a given day is proportional to the number of people infected on the previous day.

or

The change in the number of people infected from one day to the next is proportional to the number of people already infected.

## 1.21. Model 3, part 3

3. Formulation of a predictive test

Create mathematical and computational representation of the model and plot predictions versus data.

Mathematical Model

If N(i) is the number of infected on a given day, then the number of new people infected is a*N(i):

N(i+1) = N(i) + a*N(i)


or

N(i+1) - N(i) = a*N(i)


## 1.22. Model 1, part 4

4. Experimental testing

Verification and compare computational model predictions with existing data.

Enter equations and plot: [5]

## 1.23. Model 3, part 5

5. Validate

1. Run the simulation with reasonable estimates of the parameters.
• We guess the parameters. The growth rate is positive, so it seems reasonable.
2. Do sanity checks - Does the simulation predict behavior that would never happen in reality? Does population ever go negative? Does the mass stay positive?
• No negative behavior although ...
• ... If we ask what will happen on day 32, we get a number that is larger than the number of available students. This is a problem. We need to revise our statement about the model to say that it seems to be valid in the first few days of an outbreak. This is a wart!
3. If measurements of the modeled system are available, compare simulation with data
• We did this, and the match is "good".
• We will cover what "good" means later in the semester.

## 1.24. Model 3, Conclusion

• It is a good representation of the available data?
• Yes, with the caveat that very few data are available
• It is a good predictor of how an influenza outbreak will spread?
• With the caveat that it is only been validated on the initial stages of an outbreak, and that it would fail a sanity check for a longer time.
• It is an exact representation of an influenza outbreak?
• No. No model is ever an exact representation of a system. There are always approximations and uncertainty.

## 1.25. A few more days pass

And more data are discovered! Need to validate the model on the new data.

## 1.26. Model 4

Need something to "pull down" model curve. Guess:

N(i+1) = N(i) + a*N(i) - b*N(i)*N(i)

• How would you go about answering the question: "Are you are solving the correct equations?"

Result: [6]

## 1.27. Summary

How to proceed?

"Inverse modeling" approach

1. Continue to guess mathematical models
2. If one passes all validation tests, then ask "Can we think of a science model that explains this?".

Sometimes failure in passing the validation tests results in discovery - the "guess" mathematical model actually has properties that explain other data!

## 1.28. Summary cont.

How to proceed?

"Forward modeling" approach

1. Go back and think more about the science (how the system works)
2. Try validation on new science models until one passes all validation tests.

## 1.29. Summary question

Think of a science discovery that came about using

• The inverse modeling approach
• The forward modeling approach

## 1.30. Model 5

Instead of trying to remove warts ad-hoc, go back to school and gain a better understanding of how disease spreads. Write a paper when you get it right.

The S-I-R model

• S = Susceptibles (neither infected or immune)
• I = Infectives (infected and can transmit)
• R = Recovered (have been infected but are not immune)

$\frac{}{}N = S + I + R$

We can use this fact for Validation! If the model does not predict this, we have a wart!

$\frac{\Delta S}{\Delta t} = -aSI$

$\frac{\Delta I}{\Delta t} = aSI-bI$

$\frac{\Delta R}{\Delta t} = bI$

a = 0.00218/day (probability of becoming infected)

b = 0.441/day (infectious period)

S(1) = 762

I(1) = 1;

R(1) = 0;

## 1.31. Model 5 cont.

Do these equations make sense? For the first equation, assume S=constant and I=0. What does the equation predict?

$\frac{\Delta I}{\Delta t} = aSI-bI$

$\frac{\Delta R}{\Delta t} = bI$

## 1.32. SIR References

• Original paper: W.O. Kermack and A.G. McKendrick, A Contribution to the Mathematical Theory of Epidemics, Proc. Roy. Soc. London A 115, 700-721, 1927.
• Textbook with extensive discussion of model: J.D. Murray, Mathematical Biology I, An Introduction, p. 325-326, Springer-Verlag, 2002.
• Study of the SIR model using Mathematica (parameters used in lecture were taken from these notes) [7]

# 2. Problems

## 2.1. Validation in the Wild

In one of your science courses, you have been exposed to a science model. In the description of the model, was there any discussion of how it was validated? If you can't find a discussion there, do some searching on the web. Write two or three sentences about how the model was validated. If you could not find any discussion of validation, describe the research that you did in an attempt to find a discussion of validation.

## 2.2. Kinematics

In basic physics, we learn that the velocity of an object dropped near the surface of Earth increases in direct proportion to the time since it was dropped:

v(t) = vo + at

A computational model for this is

v(i) = vo + a*(i-1)


where i is an integer greater than or equal to one, and the difference between i and i+1 is 0.1 seconds. In this case, we can determine the velocity at time 0.2 seconds using

a  = 0.1;  % Proportionality constant
vo = 8.0; % Start velocity
for i = [1,2]
v(i) = vo + a*(i-1);
end


Using iteration with a for loop, and assuming the time difference between i and i+1 is 0.1 seconds, what will the velocity be at 10 seconds if

1. The start velocity was 5, and
2. The start velocity was 12?

Suppose measurements of an object were taken at the following times

1. t=1.0, v=20
2. t=1.2, v=22
3. t=1.4, v=24

How would you go about using your computational model to argue that v(t) = vo + at is the correct/incorrect equation that describes the measurements? If you argued that it was the correct equation, what are the values of a and vo?

Suppose more measurements of the object were made available:

1. t=1.6, v=29
2. t=1.8, v=33
3. t=2.0, v=36

How would you go about arguing that v(t) = vo + at is the correct/incorrect equation that describes the data? If you argued that it was the correct equation, what are the values of a and vo?

## 2.3. Neuron Model

In the above, we gave a procedure for validating a model. In reality, these steps are rarely explicitly presented, and sometimes steps are left out.

In the paper "Simple Model of Spiking Neurons" by Izhikevich (pdf), simulation results from a model are presented and the author argues that the model reproduces the spiking and bursting observed in cortical neurons. You are not expected to understand many of the science or computational details given in this work. However, you should be able to answer the following basic questions that should be specified in any description of the results from a computational simulation:

1. What was the mathematical model?
2. Was the computational model given?
3. What are the model's adjustable parameters?
4. Write out any sentences in the paper that you feel are related to validation.

## 2.4. Dog Shaking Frequency Model

Read this "Physicists Discover Universal "Wet-Dog Shake" Rule - How fast should a wet dog rotate its body to dry its fur?" [8]

• What was the science/conceptual model for how fast a dog shakes?
• What was the mathematical model for how fast a dog shakes?
• How was the mathematical model validated?
• How could a computational model be used to figure out the reason the mathematical model did not match the data?

## 2.5. In-class discussion

In-class validation discussion given after the lecture was presented.

• 1 class participation point; (If you don't like to talk in class, you may earn participation points by posting notes on the wiki; instructions for this will be in the email I send this afternoon.)
• Read the two papers (pdf | pdf) (only read the first two pages of each and the conclusions of the neuron paper, you can skim the rest)
• You are not expected to understand many of the science or computational details given in this work. However, you should be able to answer the following basic questions that should be specified in any description of the results from a computational simulation and be prepared to discuss:
• What is the conceptual/science model?
• What is the mathematical model?
• What is the computational model?
• How many adjustable parameters does each model have?
• How was the model verified?
• How was the model validated?
• Suggest your own verification test
• Suggest your own validation test

See Ethics.

# 4. References

• A discussion of Verification and Validation in the context of Computational Fluid Dynamics (the study of the behavior of liquids and gases with a computer): [9].
• Validation and Verification in the context of models of social behavior: [10]
• Book: "Verification and validation of complex systems: human factors issues" [11]