V. Control of Rasch’s Models: Beyond Sufficient Statistics

 No single fit statistic is either necessary or sufficient.  David Andrich

You won’t get famous by inventing the perfect fit statistic. Benjamin Wright[1]

That’s funny or when the model reveals something we didn’t know

You say goodness of fit; Rasch said control. The important distinction in the words is that, for the measure, once you have extracted, through the sufficient statistics, all the information in the data relevant to measuring the aspect you are after, you shouldn’t care what or how much gets left in the trash. Whatever it is, it doesn’t contribute to the measurement … directly. It’s of no more than passing interest to us how well the estimated parameters reproduce the observed data, but very much our concern that we have all the relevant information and nothing but the relevant information for our task. Control, not goodness of fit, is the emphasis.

Rasch, very emphatically, did not mean that you run your data through some fashionable software package to calculate its estimates of parameters for a one-item-parameter IRT model and call it Rasch. Going beyond the sufficient statistics and parameter estimates to validate the model’s requirements is where the control is; that’s how one establishes Specific Objectivity. If it holds, then we have a pretty good idea what the residuals will look like. They are governed by the binomial variance pvi(1-pvi) and they should be just noise, with no patterns related to person ability or item difficulty, nor to gender, format, culture, type, sequence, or any of the other factors we keep harping on (but not restricted to the ones that have occurred to me this morning) as potential threats. If the residuals do look like pvi(1-pvi), then we are on reasonably solid ground for believing Specific Objectivity does obtain but even that’s not good enough.

It does not matter if there are other models out there that can “explain” a particular data set “better”, in the rather barren statistical sense of explain meaning they have smaller mean residual deviates. Rasch recognized that models can exist on three planes in increasing order of usefulness[2]:

  1. Models that explain the data,
  2. Models that predict the future, and
  3. Models that reveal something we didn’t know about the world.

Models that only try to maximize goodness of fit are stuck at the first level and are perfectly happy fitting something other than the aspect you want. This mind-set is better suited to trying to explain the stock market, weather, or Oscar winners and to generate statements like “The stock market goes up when hemlines go up.” Past performance does not ensure future performance. They try to go beyond the information in the sufficient statistics, using anything in the data that might have been correlated and, to appropriate a comment by Rasch , correlation coefficients are population dependent and therefore scientifically rather uninteresting.

Models that satisfy Rasch’s principle of Specific Objectivity have reached the second level and we can begin real science, possibly at the third level. Control of the models often points directly toward the third level, when the agents or objects didn’t interact the way we intended or anticipated[3]. “The most exciting phrase to hear in science, the one that heralds new discoveries, is not ‘Eureka!’ but ‘That’s funny.’” (Isaac Asimov.)

Continue reading . . . Model Control ala Choppin

[1] I chose to believe Ben’s comment reflected his attitude toward hypothesis testing, not his assessment of my prospects, although in that sense, it was prophetic.

[2] Paraphrasing E. D. Ford.

[3] “In the best designed experiments, the rats will do as they damn well please.” (Murphy’s Law of Experimental Psychology.)

Previous: Doing the Math                                Next: Model Control ala Panchakesan

IVb. The Point is Measurement

To measure the person, not the test

In spite of most of what has been said up to this point, we did not undertake this project with the hope of building better thermometers. The point is to measure the person. Because of the complete symmetry of the model, everything we have done for items, we can do again for people just by reversing the subscripts. For any two people who took some of the same items, count the number N12 that person 2 answered correctly and person 1 missed; also the number N21 that person 1 passed and person 2 missed. The relative abilities of the people will parallel expressions 23 and 25:

Continue reading . . .The Point is Measurement

 

Previous: Doing the Math                                               Next: Controlling the Model

IV. Doing the math (and a little algebra)

Estimates and Estimators: connecting model to data

The essential attributes of a Rasch model are sufficient statistics and separable parameters, which allow, but don’t guarantee, specific objectivity. Well, actually sufficient statistics come pretty close if they really are sufficient to capture all the relevant information in the data. We will come back to this in the discussion of what Rasch called control of the model and most of us call goodness of fit. The current topic is a demonstration, more intuitive than mathematical, of how to manipulate the model to estimate item difficulties.

The process begins with the basic Rasch model for how likely the person wins when one person takes one dichotomous item: . . . The Disappearing Beta Trick

 

Previous: IIIf. Another Aspect                               Home

IIIf. Another Aspect, Reading Aloud

Truth emerges more readily from error than from confusion. Bacon

There is no such thing as measurement absolute; there is only measurement relative. Jeanette Winterson.

The Case of the Missing Person Parameters

Eliminating nuisance parameters and #SpecificObjectivity

It was a cold and snowy night when, while trying to make a living as a famous statistical consultant, Rasch was summoned to the isolated laboratory of a renowned reading specialist to analyze data related to the effect of extra instruction for poor readers. There may be better ways to make a statistician feel a valued and respected member of the team than to ask for an analysis of data collected years earlier but Rasch took it on (Rasch, 1977, p. 63.)

If we could measure, in the strictest sense, reading proficiency, measurements could be made before the intervention, after the intervention, and perhaps several points along the way. Then the analysis is no different, in principle, than if we were investigating the optimal blend of feed for finishing hogs or concentration of platinum for re-forming petroleum.

Continue reading . . .IIIf. Reading Aloud

Previous Specter of Math                                                    Return to Start

IIIe: A Spectrum of Math Proficiency and the Specter of Word Problems

In mathematics, one does not understand anything. You just get used to them. Johann Von Neumann

Defining mile posts along the way from counting your toes to doing calculus

The world has divided itself in two factions: those who think they don’t understand math and those who think they do. But we’re not talking about proving Fermat’s Last Theorem or correcting Stephan Hawking’s tensor algebra; we’re talking about counting, applying the four basic operators, and solving the dreaded word problems using basic algebra, geometry, and perhaps a little calculus. That just about covers the range from counting your toes to determining the spot in the outfield where a player should stand to catch a fly ball and should be good enough to get you through freshman math.

Continue reading . . . A Spectrum of Math Proficiency

Previous: Any given Sunday                                    Next: Before Science

IIId. On Any Given Sunday

May the better team have better odds

Pair-wise comparisons and arbitrary labels

All of us have probably thought sometime during the football season that there must be a Rasch analysis in here somewhere. Every team in the National Football League plays a different selection of opponents but rankings are based, mostly[1], on a simple count of games won. We certainly know better than that. Here’s my answer.

Click here to continue reading On Any Given Sunday.

Previous: IIIc. Hot and Cold                       Home

 

[1]There are more rules for resolving ties in the rankings but these are designed more to create excitement and sell tickets than to ensure that the best team wins.