Control

When the experts all agree, it doesn’t necessarily follow that the converse is true. When the experts don’t agree, the average person has no business thinking about it. B. Russell

The experts don’t agree on the topic of reversed thresholds and I’ve been thinking about it anyway. But I may be even less lucid than usual.

The categories, whether rating scale or partial credit, are always ordered: 0 always implies less than 1; 1 always implies less than 2; 2 always implies less than 3 . . . The concentric circle for k on the archery target is always inside (smaller thus harder to hit) than the circle for k-1. In baseball, you can’t get to second without touching first first. The transition points, or thresholds, might or might not be ordered in the data. Perhaps the circle for k-1 is so close in diameter to k that it is almost impossible to be inside k-1 without being inside k. Category k-1 might be very rarely observed, unless you have very sharp arrows and very consistent archers. Perhaps four-base hits actually require less of the aspect than three-base.

Continue . . . Ordered categories, disordered thresholds

No single fit statistic is either necessary or sufficient. David Andrich

You won’t get famous by inventing the perfect fit statistic. Benjamin Wright[1]

That’s funny or when the model reveals something we didn’t know

You say goodness of fit; Rasch said control. The important distinction in the words is that, for the measure, once you have extracted, through the sufficient statistics, all the information in the data relevant to measuring the aspect you are after, you shouldn’t care what or how much gets left in the trash. Whatever it is, it doesn’t contribute to the measurement … directly. It’s of no more than passing interest to us how well the estimated parameters reproduce the observed data, but very much our concern that we have all the relevant information and nothing but the relevant information for our task. Control, not goodness of fit, is the emphasis.

Rasch, very emphatically, did not mean that you run your data through some fashionable software package to calculate its estimates of parameters for a one-item-parameter IRT model and call it Rasch. Going beyond the sufficient statistics and parameter estimates to validate the model’s requirements is where the control is; that’s how one establishes Specific Objectivity. If it holds, then we have a pretty good idea what the residuals will look like. They are governed by the binomial variance p_vi(1-p_vi) and they should be just noise, with no patterns related to person ability or item difficulty, nor to gender, format, culture, type, sequence, or any of the other factors we keep harping on (but not restricted to the ones that have occurred to me this morning) as potential threats. If the residuals do look like p_vi(1-p_vi), then we are on reasonably solid ground for believing Specific Objectivity does obtain but even that’s not good enough.

It does not matter if there are other models out there that can “explain” a particular data set “better”, in the rather barren statistical sense of explain meaning they have smaller mean residual deviates. Rasch recognized that models can exist on three planes in increasing order of usefulness[2]:

Models that explain the data,
Models that predict the future, and
Models that reveal something we didn’t know about the world.

Models that only try to maximize goodness of fit are stuck at the first level and are perfectly happy fitting something other than the aspect you want. This mind-set is better suited to trying to explain the stock market, weather, or Oscar winners and to generate statements like “The stock market goes up when hemlines go up.” Past performance does not ensure future performance. They try to go beyond the information in the sufficient statistics, using anything in the data that might have been correlated and, to appropriate a comment by Rasch , correlation coefficients are population dependent and therefore scientifically rather uninteresting.

Models that satisfy Rasch’s principle of Specific Objectivity have reached the second level and we can begin real science, possibly at the third level. Control of the models often points directly toward the third level, when the agents or objects didn’t interact the way we intended or anticipated[3]. “The most exciting phrase to hear in science, the one that heralds new discoveries, is not ‘Eureka!’ but ‘That’s funny.’” (Isaac Asimov.)

Continue reading . . . Model Control ala Choppin

[1] I chose to believe Ben’s comment reflected his attitude toward hypothesis testing, not his assessment of my prospects, although in that sense, it was prophetic.

[2] Paraphrasing E. D. Ford.

[3] “In the best designed experiments, the rats will do as they damn well please.” (Murphy’s Law of Experimental Psychology.)

Previous: Doing the Math Next: Model Control ala Panchakesan

The Trouble with Rasch: the Rasch Model Exposed

Measurement solutions too simple to publish.

Viiie: Ordered Categories, Disordered Thresholds

V. Control of Rasch’s Models: Beyond Sufficient Statistics