No single fit statistic is either necessary or sufficient. David Andrich
You won’t get famous by inventing the perfect fit statistic. Benjamin Wright[1]
That’s funny or when the model reveals something we didn’t know
You say goodness of fit; Rasch said control. The important distinction in the words is that, for the measure, once you have extracted, through the sufficient statistics, all the information in the data relevant to measuring the aspect you are after, you shouldn’t care what or how much gets left in the trash. Whatever it is, it doesn’t contribute to the measurement … directly. It’s of no more than passing interest to us how well the estimated parameters reproduce the observed data, but very much our concern that we have all the relevant information and nothing but the relevant information for our task. Control, not goodness of fit, is the emphasis.
Rasch, very emphatically, did not mean that you run your data through some fashionable software package to calculate its estimates of parameters for a one-item-parameter IRT model and call it Rasch. Going beyond the sufficient statistics and parameter estimates to validate the model’s requirements is where the control is; that’s how one establishes Specific Objectivity. If it holds, then we have a pretty good idea what the residuals will look like. They are governed by the binomial variance pvi(1-pvi) and they should be just noise, with no patterns related to person ability or item difficulty, nor to gender, format, culture, type, sequence, or any of the other factors we keep harping on (but not restricted to the ones that have occurred to me this morning) as potential threats. If the residuals do look like pvi(1-pvi), then we are on reasonably solid ground for believing Specific Objectivity does obtain but even that’s not good enough.
It does not matter if there are other models out there that can “explain” a particular data set “better”, in the rather barren statistical sense of explain meaning they have smaller mean residual deviates. Rasch recognized that models can exist on three planes in increasing order of usefulness[2]:
- Models that explain the data,
- Models that predict the future, and
- Models that reveal something we didn’t know about the world.
Models that only try to maximize goodness of fit are stuck at the first level and are perfectly happy fitting something other than the aspect you want. This mind-set is better suited to trying to explain the stock market, weather, or Oscar winners and to generate statements like “The stock market goes up when hemlines go up.” Past performance does not ensure future performance. They try to go beyond the information in the sufficient statistics, using anything in the data that might have been correlated and, to appropriate a comment by Rasch , correlation coefficients are population dependent and therefore scientifically rather uninteresting.
Models that satisfy Rasch’s principle of Specific Objectivity have reached the second level and we can begin real science, possibly at the third level. Control of the models often points directly toward the third level, when the agents or objects didn’t interact the way we intended or anticipated[3]. “The most exciting phrase to hear in science, the one that heralds new discoveries, is not ‘Eureka!’ but ‘That’s funny.’” (Isaac Asimov.)
Continue reading . . . Model Control ala Choppin
[1] I chose to believe Ben’s comment reflected his attitude toward hypothesis testing, not his assessment of my prospects, although in that sense, it was prophetic.
[2] Paraphrasing E. D. Ford.
[3] “In the best designed experiments, the rats will do as they damn well please.” (Murphy’s Law of Experimental Psychology.)
Previous: Doing the Math Next: Model Control ala Panchakesan
[…] Previous: Doing the Math Next: Controlling the Model […]
LikeLike
[…] Previous: Model Control ala Choppin Home: Theory of Relativity […]
LikeLike