Validation
From ComputingForScientists
1. Validation
1.1. Objective
 To introduce the concept of Validation of a computer simulation
1.2. Motivation
 Every science model is an approximation of reality.
 Every mathematical representation of a science model is an approximation of the science model.
 Every computer representation of a mathematical model is an approximation.
It is important to understand how well the computer simulation predicts the actual behavior of the system that is being modeled.
1.3. Definitions
There are two basic stages in testing computer simulations
 Verification  Are you solving the equations correctly?
 This does not address the question of if the model is a reasonable reflection of reality.
 Validation  Are you are solving the correct equations?
 A science model requires assumptions and assertions about how a system works. The validation process is used to determine if these assumptions and claims were valid.
 The process of determining the degree to which a model is an accurate representation of the real world from the perspective of the intended uses of the model. (AIAA G0771998)
1.4. Definitions cont.
Compare with other definitions
 Validation  In the context of science  are you are solving the correct equations?
 In the context of business, it is a statement about a product or service [1]:
Validation is a Quality assurance process of establishing evidence that provides a high degree of assurance that a product, service, or system accomplishes its intended requirements. This often involves acceptance of fitness for purpose with end users and other product stakeholders. It is sometimes said that validation can be expressed by the query "Are you building the right thing?" and verification by "Are you building it right?" "Building the right thing" refers back to the user's needs, while "building it right" checks that the specifications be correctly implemented by the system. In some contexts, it is required to have written requirements for both as well as formal procedures or protocols for determining compliance.
1.5. Correct Equations?
 Data for population (Scurve)
 Prediction from computer simulation (Jcurve)
1.6. Correct Equations? cont.
 How do you know that you are not solving the correct equations?
 What about this argument:
 Equations are correct, but the growth rate in the simulation was not correct
1.7. Parts of a Model
 The equations
 Are usually specified by the science model
 The adjustable parameters
 Are parameters like growth rate, interest rate, etc.
 Parameters are usually restricted to fall in a range. What is an acceptable range for growth rate?
1.8. Validation Steps
Usually performed after Verification
 Run the simulation with reasonable estimates of the parameters.
 Do sanity checks  Does the simulation predict behavior that would never happen in reality? (Does population ever go negative? Does the mass stay positive?)
 If measurements of the modeled system are available, compare simulation with data
1.9. Validation Limitations
Name some!
1.10. Validation vs. Development
Model validation is intimately related to model development.
 A model is proposed and then validation reveals "warts".
 The model is revised until warts are gone.
1.11. Relation to Scientific Method
Validation is a key component of the scientific method as applied to computational science:
 Characterization of existing data
 Identify features using a table and/or plot
 Estimate uncertainty using information about how the data were collected
 Formulation of a hypothesis (the science model is the hypothesis)
 Formulation of a predictive test (both the mathematical and computational models produce quantitative predictions)
 Experimental testing (verification and compare computational model predictions with existing data)
 Validate
 If "valid", report and peer review. If peer review reveals "warts", go back to hypothesis step
 If "invalid", go back to hypothesis step
 Wait till more data comes in. Validate again.
1.12. Example
 In the following, we give an example of how the scientific method could be applied to some data involving a flu outbreak.
 Although the steps taken are made up, it is an illustrative example of how science sometimes proceeds.
1.13. Model 1, part 1
1. Characterization of existing data

1.14. Model 1, part 2
2. Formulation of a hypothesis
Model 1
The number infected equals 10 times the day number, with day number = 1 corresponding to the day the first student was infected.
1.15. Model 1, part 3
3. Formulation of a predictive test
Create mathematical and computational representation of the model and plot predictions versus data.
Mathematical Model
N(i) = 10*i
Plot: [2]
1.16. Model 1, part 4
4. Experimental testing
Verification and compare computational model predictions with existing data.
Suggest ways to
 Verify
 Compare
Plot: [3]
1.17. Model 1, part 5
5. Validate
Validation  Are you are solving the correct equations?
 Run the simulation with reasonable estimates of the parameters.
 We guess the parameters. The growth rate is positive, so it seems reasonable.
 Do sanity checks  Does the simulation predict behavior that would never happen in reality? Does population ever go negative? Does the mass stay positive?
 No negative behavior, but ...
 If measurements of the modeled system are available, compare simulation with data
 We did this, and the match is "not good".
 We will cover what "not good" means later in the semester.
What if you you don't want to give up on your equations?
 Try same equations with different parameters
1.18. Model 2
Repeat Model 1 with a different parameter.
Model 2
The number infected equals 20 times the day number, with day number = 1 corresponding to the day the first student was infected.
Plot: [4]
Repeating this process by changing the parameters will result in the realization that the model will never be able to reproduce the curve (basic math happens to tells us this too, but this is rarely the case in the real world).
Are you are solving the correct equations? No
1.19. Model 3, part 1
1. Characterization of existing data
Same as Model 1, part 1
1.20. Model 3, part 2
2. Formulation of a hypothesis
Model 3
The number of new people infected on a given day is proportional to the number of people infected on the previous day.
or
The change in the number of people infected from one day to the next is proportional to the number of people already infected.
1.21. Model 3, part 3
3. Formulation of a predictive test
Create mathematical and computational representation of the model and plot predictions versus data.
Mathematical Model
If N(i)
is the number of infected on a given day, then the number of new people infected is a*N(i)
:
N(i+1) = N(i) + a*N(i)
or
N(i+1)  N(i) = a*N(i)
1.22. Model 1, part 4
4. Experimental testing
Verification and compare computational model predictions with existing data.
Enter equations and plot: [5]
1.23. Model 3, part 5
5. Validate
 Run the simulation with reasonable estimates of the parameters.
 We guess the parameters. The growth rate is positive, so it seems reasonable.
 Do sanity checks  Does the simulation predict behavior that would never happen in reality? Does population ever go negative? Does the mass stay positive?
 No negative behavior although ...
 ... If we ask what will happen on day 32, we get a number that is larger than the number of available students. This is a problem. We need to revise our statement about the model to say that it seems to be valid in the first few days of an outbreak. This is a wart!
 If measurements of the modeled system are available, compare simulation with data
 We did this, and the match is "good".
 We will cover what "good" means later in the semester.
1.24. Model 3, Conclusion
 It is a good representation of the available data?
 Yes, with the caveat that very few data are available
 It is a good predictor of how an influenza outbreak will spread?
 With the caveat that it is only been validated on the initial stages of an outbreak, and that it would fail a sanity check for a longer time.
 It is an exact representation of an influenza outbreak?
 No. No model is ever an exact representation of a system. There are always approximations and uncertainty.
1.25. A few more days pass
And more data are discovered! Need to validate the model on the new data.
1.26. Model 4
Need something to "pull down" model curve. Guess:
N(i+1) = N(i) + a*N(i)  b*N(i)*N(i)
 How would you go about answering the question: "Are you are solving the correct equations?"
Result: [6]
1.27. Summary
How to proceed?
"Inverse modeling" approach
 Continue to guess mathematical models
 If one passes all validation tests, then ask "Can we think of a science model that explains this?".
That is, start with mathematical and figure out science later.
Sometimes failure in passing the validation tests results in discovery  the "guess" mathematical model actually has properties that explain other data!
1.28. Summary cont.
How to proceed?
"Forward modeling" approach
 Go back and think more about the science (how the system works)
 Try validation on new science models until one passes all validation tests.
That is, start with science and work out mathematical model later.
1.29. Summary question
Think of a science discovery that came about using
 The inverse modeling approach
 The forward modeling approach
1.30. Model 5
Instead of trying to remove warts adhoc, go back to school and gain a better understanding of how disease spreads. Write a paper when you get it right.
The SIR model
 S = Susceptibles (neither infected or immune)
 I = Infectives (infected and can transmit)
 R = Recovered (have been infected but are not immune)
We can use this fact for Validation! If the model does not predict this, we have a wart!
a = 0.00218/day (probability of becoming infected)
b = 0.441/day (infectious period)
S(1) = 762
I(1) = 1;
R(1) = 0;
1.31. Model 5 cont.
Do these equations make sense? For the first equation, assume S=constant
and I=0
. What does the equation predict?
1.32. SIR References
 Original paper: W.O. Kermack and A.G. McKendrick, A Contribution to the Mathematical Theory of Epidemics, Proc. Roy. Soc. London A 115, 700721, 1927.
 Textbook with extensive discussion of model: J.D. Murray, Mathematical Biology I, An Introduction, p. 325326, SpringerVerlag, 2002.
 Study of the SIR model using Mathematica (parameters used in lecture were taken from these notes) [7]
2. Problems
2.1. Validation in the Wild
In one of your science courses, you have been exposed to a science model. In the description of the model, was there any discussion of how it was validated? If you can't find a discussion there, do some searching on the web. Write two or three sentences about how the model was validated. If you could not find any discussion of validation, describe the research that you did in an attempt to find a discussion of validation.
2.2. Kinematics
In basic physics, we learn that the velocity of an object dropped near the surface of Earth increases in direct proportion to the time since it was dropped:
v(t) = v_{o} + at
A computational model for this is
v(i) = vo + a*(i1)
where i
is an integer greater than or equal to one, and
the difference between i
and i+1
is 0.1
seconds. In this case, we can determine the velocity at time 0.2
seconds using
a = 0.1; % Proportionality constant vo = 8.0; % Start velocity for i = [1,2] v(i) = vo + a*(i1); end
Using iteration with a for
loop, and assuming the time difference
between i
and i+1
is 0.1 seconds, what will
the velocity be at 10 seconds if
 The start velocity was 5, and
 The start velocity was 12?
Suppose measurements of an object were taken at the following times
 t=1.0, v=20
 t=1.2, v=22
 t=1.4, v=24
How would you go about using your computational model to argue that
v(t) = v_{o} + at
is the correct/incorrect equation that describes the measurements? If you argued that it was the correct equation, what are the values of a
and vo
?
Suppose more measurements of the object were made available:
 t=1.6, v=29
 t=1.8, v=33
 t=2.0, v=36
How would you go about arguing that
v(t) = v_{o} + at
is the correct/incorrect equation that describes the data? If you argued that it was the correct equation, what are the values of a
and vo
?
2.3. Neuron Model
In the above, we gave a procedure for validating a model. In reality, these steps are rarely explicitly presented, and sometimes steps are left out.
In the paper "Simple Model of Spiking Neurons" by Izhikevich (pdf), simulation results from a model are presented and the author argues that the model reproduces the spiking and bursting observed in cortical neurons. You are not expected to understand many of the science or computational details given in this work. However, you should be able to answer the following basic questions that should be specified in any description of the results from a computational simulation:
 What was the mathematical model?
 Was the computational model given?
 What are the model's adjustable parameters?
 Write out any sentences in the paper that you feel are related to validation.
2.4. Dog Shaking Frequency Model
Read this "Physicists Discover Universal "WetDog Shake" Rule  How fast should a wet dog rotate its body to dry its fur?" [8]
 What was the science/conceptual model for how fast a dog shakes?
 What was the mathematical model for how fast a dog shakes?
 How was the mathematical model validated?
 How could a computational model be used to figure out the reason the mathematical model did not match the data?
2.5. Inclass discussion
Inclass validation discussion given after the lecture was presented.
 1 class participation point; (If you don't like to talk in class, you may earn participation points by posting notes on the wiki; instructions for this will be in the email I send this afternoon.)
 Read the two papers (pdf  pdf) (only read the first two pages of each and the conclusions of the neuron paper, you can skim the rest)
 You are not expected to understand many of the science or computational details given in this work. However, you should be able to answer the following basic questions that should be specified in any description of the results from a computational simulation and be prepared to discuss:
 What is the conceptual/science model?
 What is the mathematical model?
 What is the computational model?
 How many adjustable parameters does each model have?
 How was the model verified?
 How was the model validated?
 Suggest your own verification test
 Suggest your own validation test
3. Activity
See Ethics.
4. References
 A discussion of Verification and Validation in the context of Computational Fluid Dynamics (the study of the behavior of liquids and gases with a computer): [9].
 Validation and Verification in the context of models of social behavior: [10]
 Book: "Verification and validation of complex systems: human factors issues" [11]