Vb. Mean Squares; Outfit and Infit

A question can only be valid if the students’ minds are doing the things we want them to show us they can do. Alastair Pollitt

Able people should pass easy items; unable people should fail difficult ones. Everything else is up for grabs.

One can liken progress along a latent trait to navigating a river; we can treat it as a straight line but the pilot had best remember sandbars and meanders.

More about what could go wrong and how to find it

However one validates the items, with a plethora of sliced and diced matrices, between group analyses based on gender, ethnicity, ses, age, instruction, etc., followed by enough editing, tweaking, revising, and discarding to ensure a perfectly functioning item bank and to placate any Technical Advisory Committee, there is no guarantee that the next kid to sit down in front of the computer won’t bring something completely unanticipated to the process. After the items have all been “validated,” we still must validate the measure for every new examinee.

The residual analysis that we are working our way toward is a natural approach to validating any item and any person. But we should know what we are looking for before we get lost in the swamps of arithmetic. First, we need to make sure that we haven’t done something stupid, like score the responses against the wrong key or post the results to the wrong record.

Checking the scoring for an examinee is no different than checking for miskeyed items but with less data; either would have both surprising misses and surprising passes in the response string. Having gotten past that mine field, we can then check for differences by item type, content, sequence to just note the easy ones. Then depending on what we discover, we proceed with doing the science either with the results of the measurement process or with the anomalies from the measurement process.

Continue . . .Model Control ala Panchapekesan


Previous: Model Control ala Choppin                        Next: Beyond Outfit and Infit


V. Control of Rasch’s Models: Beyond Sufficient Statistics

 No single fit statistic is either necessary or sufficient.  David Andrich

You won’t get famous by inventing the perfect fit statistic. Benjamin Wright[1]

That’s funny or when the model reveals something we didn’t know

You say goodness of fit; Rasch said control. The important distinction in the words is that, for the measure, once you have extracted, through the sufficient statistics, all the information in the data relevant to measuring the aspect you are after, you shouldn’t care what or how much gets left in the trash. Whatever it is, it doesn’t contribute to the measurement … directly. It’s of no more than passing interest to us how well the estimated parameters reproduce the observed data, but very much our concern that we have all the relevant information and nothing but the relevant information for our task. Control, not goodness of fit, is the emphasis.

Rasch, very emphatically, did not mean that you run your data through some fashionable software package to calculate its estimates of parameters for a one-item-parameter IRT model and call it Rasch. Going beyond the sufficient statistics and parameter estimates to validate the model’s requirements is where the control is; that’s how one establishes Specific Objectivity. If it holds, then we have a pretty good idea what the residuals will look like. They are governed by the binomial variance pvi(1-pvi) and they should be just noise, with no patterns related to person ability or item difficulty, nor to gender, format, culture, type, sequence, or any of the other factors we keep harping on (but not restricted to the ones that have occurred to me this morning) as potential threats. If the residuals do look like pvi(1-pvi), then we are on reasonably solid ground for believing Specific Objectivity does obtain but even that’s not good enough.

It does not matter if there are other models out there that can “explain” a particular data set “better”, in the rather barren statistical sense of explain meaning they have smaller mean residual deviates. Rasch recognized that models can exist on three planes in increasing order of usefulness[2]:

  1. Models that explain the data,
  2. Models that predict the future, and
  3. Models that reveal something we didn’t know about the world.

Models that only try to maximize goodness of fit are stuck at the first level and are perfectly happy fitting something other than the aspect you want. This mind-set is better suited to trying to explain the stock market, weather, or Oscar winners and to generate statements like “The stock market goes up when hemlines go up.” Past performance does not ensure future performance. They try to go beyond the information in the sufficient statistics, using anything in the data that might have been correlated and, to appropriate a comment by Rasch , correlation coefficients are population dependent and therefore scientifically rather uninteresting.

Models that satisfy Rasch’s principle of Specific Objectivity have reached the second level and we can begin real science, possibly at the third level. Control of the models often points directly toward the third level, when the agents or objects didn’t interact the way we intended or anticipated[3]. “The most exciting phrase to hear in science, the one that heralds new discoveries, is not ‘Eureka!’ but ‘That’s funny.’” (Isaac Asimov.)

Continue reading . . . Model Control ala Choppin

[1] I chose to believe Ben’s comment reflected his attitude toward hypothesis testing, not his assessment of my prospects, although in that sense, it was prophetic.

[2] Paraphrasing E. D. Ford.

[3] “In the best designed experiments, the rats will do as they damn well please.” (Murphy’s Law of Experimental Psychology.)

Previous: Doing the Math                                Next: Model Control ala Panchakesan

IIIf. Another Aspect, Reading Aloud

Truth emerges more readily from error than from confusion. Bacon

There is no such thing as measurement absolute; there is only measurement relative. Jeanette Winterson.

The Case of the Missing Person Parameters

Eliminating nuisance parameters and #SpecificObjectivity

It was a cold and snowy night when, while trying to make a living as a famous statistical consultant, Rasch was summoned to the isolated laboratory of a renowned reading specialist to analyze data related to the effect of extra instruction for poor readers. There may be better ways to make a statistician feel a valued and respected member of the team than to ask for an analysis of data collected years earlier but Rasch took it on (Rasch, 1977, p. 63.)

If we could measure, in the strictest sense, reading proficiency, measurements could be made before the intervention, after the intervention, and perhaps several points along the way. Then the analysis is no different, in principle, than if we were investigating the optimal blend of feed for finishing hogs or concentration of platinum for re-forming petroleum.

Continue reading . . .IIIf. Reading Aloud

Previous Specter of Math                                                    Return to Start

IIIe: A Spectrum of Math Proficiency and the Specter of Word Problems

In mathematics, one does not understand anything. You just get used to them. Johann Von Neumann

Defining mile posts along the way from counting your toes to doing calculus

The world has divided itself in two factions: those who think they don’t understand math and those who think they do. But we’re not talking about proving Fermat’s Last Theorem or correcting Stephan Hawking’s tensor algebra; we’re talking about counting, applying the four basic operators, and solving the dreaded word problems using basic algebra, geometry, and perhaps a little calculus. That just about covers the range from counting your toes to determining the spot in the outfield where a player should stand to catch a fly ball and should be good enough to get you through freshman math.

Continue reading . . . A Spectrum of Math Proficiency

Previous: Any given Sunday                                    Next: Before Science

IIIb. The Aspect of Color

Roy G. Biv: How many bands in your rainbow?

Art is the imposing of a pattern on experience. Alfred North Whitehead

Performance bands are arbitrary but useful

Qualitative meaning and quantitative precision

We all know our basic colors before we start to school. We learn early on that there are three primary colors (red, yellow, and blue), from which all others can be created, although designers of color printers apparently missed that lesson. The ancients saw five colors (red, yellow, green, blue, violet) in the rainbow. Newton saw seven, adding orange and indigo (perhaps to align with the natural harmony of the universe found in the number of musical notes, days of the week, and known planets; or perhaps he was just buying some vowels.)  Continue reading . . . The Aspect of Color


Previous: IIIa. Abstracting Some Aspects        Next: IIIc. Hot and Cold

III. Abstracting Some Aspects

  • Measure what is measurable, and make measurable what is not so. Galileo Galilei

Measuring rocks and the significance of being sufficient

The process of measurement begins long before any data are collected. The starting point is a notion, or even better, a theory about an aspect of a class of things we want to understand better, maybe even do some science on. Successful measurement depends on clear thinking about the aspect and clever ideas for the agents. This is much more challenging and much more rewarding than any mathematical gymnastics that might be performed to fit model to data.

All analogies are limited but some are useful. Considering aspects of things far removed from cognitive traits may help avoid some of the pitfalls encountered when working too close to home. Hardness is a property of materials that is hard to define but we all know what it is when it hits us. Color is a narrow region of a continuous spectrum that non-physicists tend to think about as discrete categories. Temperature is an intimate part of our daily lives, which we are quite adept at sensing and more recently at measuring, but the closely connected idea, heat, may actually be more real, less bound to conventions and populations. If I could scale the proficiency of professional football teams and reliably predict the outcomes of games, I wouldn’t be writing this.

Continue reading . . . Hard Headedness: the importance of being sufficient


Previous: II: Measurement in Science         Next: IIIb: The Aspect of Color

II. Measurement in Science

Measurement is the breaking up of a quantum of energy into equal units. George Herbert Mead

What does it mean “to measure” and #RaschMeasurement as a foreign language

If, in a discussion about buying a new table, your spouse were to say to you, “I measured the width of the room and …” you would not expect the conversation to degenerate immediately into a discussion about what is width, or what does measured mean, or who made your yardstick, or what units you used. But if, in a discussion with the school guidance counselor, you are told, “I measured the intelligence of your child and …” you could, and probably should, ask those same questions, although they probably won’t be any more warmly received in the guidance office than they were in the dining room.

Continue reading . . . Measurement in Science

Previous: I. Rasch’s Theory of Relativity               Next: IIIa: Abstracting Some Aspects