This short post is for reference. I will point back to it from time to time.
Reason 1: to say something about the past
Examples: counting seasonal numbers of wins by the Detroit Tigers, or the number of Republican state senators, or how many people you had over last Christmas.
All are raw numbers, counts, tallies, collected to say something about a historical circumstance and for no other reason.
No probability models are needed here, or they are all trivial. For example: what is the probability the Tigers won more than 90 games in 2008? It is either 0 or 1 just in case they either did win more than 90 games or they did not (they did not).
In order to say something about the past—about data we have already collected—we just need to look and count and nothing more.
Most sports statistics fits here, as do other areas of trivia. Any kind of record keeping counts.
Reason 2: to say something about things not yet seen
If you have not yet seen a thing, you are uncertain about what state that thing will take.
If you are uncertain, you quantify that uncertainty using probability. All probability statements are conditional on some evidence.
Evidence usually consists of two things: (1) historical data and a probability model that accounts for that data plus (2) the probability model said to explain the thing we have not yet seen.
(1) and (2) are frequently the same; sometimes we do not need (1); we always need (2).
For example, given just the evidence that “This is a six-sided die, and just one side is labeled a 3” then the probability of the thing “We see a 3 when the die is tossed” is 1/6. No historical data was needed to make this statement.
To quantify the probability of other unseen things, historical data is typically used. For example, the thing “The Detroit Tigers will win more than 90 games in 2009” is unknown as yet. To say what the probability of it is, we can collect historical data, assign a probability to it, and then make a quantification.
More than one probability model can be assigned to the historical data and the thing. This leads to two consequences, both crucial to remember.
(a) If the evidence that implies what probability to model to use is ambiguous, then that evidence that leads to the model you use should be made explicit; and
(b) The probability statements made by conflicting models are all correct (assuming no computational errors, of course).
If model A says the probability of a thing is x and model B says it has a probability of y, and x does not equal y, neither probability is wrong before we see the thing.
After we have seen the thing, we can compute the probability that model A or model B is correct.
All that is found in statistics books falls under this branch. Anytime a prediction, or forecast, or prognostication is made, it is this type of statistics.
To specify a probability model means specifying the value of certain parameters. In the die example, the value of the parameter was deduced. In models that use historical data, most or all parameters cannot be specified example and usually remain unknown to some extent.
Do not be fooled that most statistical procedures revolve around finding estimates to the parameters of the probability models. These estimates are not necessary and are at best proxies to what is of interest: real, tangible, observable things.
Modern statistical methods is designed to make probability statements about observable things (like the numbers of Tigers wins) in such a way that the uncertainty in the parameters is accounted for.
Suppose you have observed global mean temperatures (suppose, too, this quantity is unambiguously and suitably defined) from 1900 up through 2009. What branch of statistics can answer the following:
(i) What is the probability the temperature increased from 1900?
(ii) What is the probability that the temperature in 2009 will be larger than that in 2008?
If anything above is ambiguous, let me know and I’ll fix it. In a big hurry today.