What’s the trouble?

[Rasch Measurement Theory more intuitive than mathematical: Rasch’s Theory of Relativity ]

The real trouble with Rasch measurement is that it leads to solutions to measurement problems that are too simple to publish. Over-generalizing quite a bit, on the one hand, those of us doing it think what we do is too obvious to write down; on the other hand, those not doing it think what we do is not worth writing down.

There are a couple of troubles for Rasch as well. First, it doesn’t fit (no pun intended) the standard statistical paradigm for the social sciences, which seems to be concerned more with fitting models to data than data to models. That’s more Pearson than Fisher. Second, constructing measuring instruments that conform to Rasch’s principles is hard.  The easy way out is to give up on the possibility of measurement, accept the data at face value, and hide in the mathematical brambles of Item Response Theory.

The alternative is to try to understand why, for example, this item is harder for boys than girls or that fifth graders order some items differently than do sixth graders. Rasch Measurement Theory provides a framework for reaching Thurstone’s measurement ideal but, perhaps equally compelling, it is a framework for controlling, understanding, and improving the process.

I intended to write a book but my attention span is too short. I think at the level of bumper stickers and fortune cookies, not chapters and volumes. I ponder things like:

  • You won’t get famous by inventing the perfect fit statistic (B. Wright),
  • No single fit statistic is either necessary or sufficient (D. Andrich),
  • If your data have something to tell you, your statistics won’t stop them (G. Box),
  • Models must be used but never believed (M. Wilk),
  • A model without parameters is de-testable (G. Rasch),
  • Correlations are population-dependent and therefore scientifically rather uninteresting (G. Rasch),
  • The experiment is simply a demonstration for those too slow to follow the argument (Galileo?),
  • If the experts all agree, it doesn’t necessarily follow that the converse is true (B. Russell.)

I prefer to believe that Wright’s comment to me forty years ago about perfect fit statistics had more to do with his view of significance testing than his assessment of my prospects, although it was certainly prophetic in that sense. By any measure, “outfit” isn’t the answer.

And shaping my view of the “analysis of fit”,

  • The most exciting thing to hear in science, the one that heralds new discoveries, is not “Eureka” but “That’s funny.” (Isaac Asimov)

and the goal of Rasch’s philosophy of measurement, which would take a large bumper,

  • The reasonable man strives to adapt himself to the world; the unreasonable man persists in trying to adapt the world to himself. Therefore, all progress depends on the unreasonable man (G. Shaw).

or one that doesn’t exactly rise to the level of the others in this series,

  • Putting discrimination in your model is like putting the bath water to bed with the baby (R. Mead).

[For a more general listing of what I think I know, try reading collected wisdom.]

If the Rasch Model has been introduced to you as the “one-parameter logistic” item response theory model (1 PL IRT), then you haven’t been properly introduced. This dialog is an antidote to the so-called measurement classes that dismiss Rasch as a trivial restricted case of more general and esoteric IRT models.

I consider myself professionally a double grandson of Ronald Fisher. I learned statistics from Kempthorne and Bancroft at Iowa State and measurement from Rasch via Ben Wright at the University of Chicago. Kempthorne, Bancroft, and Rasch all knew and worked with Fisher at some time. For me, that has made all the difference.

I of course am Ron Mead. And, beyond Bancroft and Wright, I have known, worked with, and learned from David Andrich, Joe Ryan, George Englhard, Mark Wilson, Richard Smith, Geoff Masters, Graham Douglas, Larry Ludlow, and many others at some point in those 40 years, some mathematicians, some philosophers, some educators, none of which describes me.

I won’t hope for definitive but fantasize about seminal, which may mean I would rather pose questions than resolve them. There are more than enough loose ends here to keep us supplied with dissertation topics for many years. If I have an intended audience, it is either beginning graduate students in measurement or myself. These entries will rely more on argument by analogy or metaphor than derivation or proof, and tend more toward  philosophy than math, statistics, or arithmetic. If you need to know those try Wright & Stone, Smith & Smith, Bond & Fox, or Fischer & Molenaar. Maybe I needed a co-author. Or I may just be cleaning out the attic.

It would make the most sense to start at the bottom with Rasch’s Theory of Relativity and read up, because that’s the order I wrote them and I occasionally think you might remember something I said. My intent is to publish something relevant and not terribly irreverent on Fridays. Other days I am less constrained.

To find the bibliography that I generally rely on, click on: Rasch Related References; read those and you won’t need me. Any omissions are probably due to ignorance rather than rejection. A paper reviewer once said, If he read more, Mead would discover less. I think that reviewer plagiarized something he had read.

And here are PDF versions of almost everything.

Measurement in Science

Hard Headedness

The Aspect of Color

Hot and Cold

On Any Given Sunday

Spectrum of Math Proficiency

IIIf. Reading Aloud

The Disappearing Beta Trick

The Point is Measurement

Model Control ala Choppin

Model Control ala Panchapekesan

Control after Infit, Outfit

Vc. Diagnosis with and from the Model

Linking and Equating

Multiple Link Forms

Longitudinal Scales

Measuring Bowmanship

Polytomous Rasch Models

Ordered categories, disordered thresholds

Rules of Thumb, Short Cuts, Loose Ends

Doing the Arithmetic Redux

Useful reporting

New Report

Computerized Adaptive Testing

Simplistic Statistics

Answer until Correct

Using Lexiles Safely

4 thoughts on “What’s the trouble?

  1. Being and grand-daughter of Ben Wright as a student of Mark Wilson, I had the joy of being introduced to Rasch family models and the simple solutions they present. It wasn’t until I ventured out into testing companies or academia that I realized how mucked-up measurement can become. Many of Mark’s students have done the same (see Derek Briggs, Ou Lydia Liu, Kathleen Scalise, and Andrew Maul, to name a few). These folks have and reliably do document Rasch solutions and procedures.

    I also had the pleasure of meeting the Rasch folks in the UK. They host a lovely conference every year http://www.rasch.org.uk/. I encourage you to check it out. The also regularly publish in JAM. Also check out the AERA Rasch SIG, of course.

    Like

  2. What would happen, or not happen, if we did if we did not rasch it? My guess is not a lot. And the little that would happen is not significant.

    Like

    • On one level, you have a point; on another level, you seem to be missing the point. Using Rasch as a verb usually goes along with running the data through a Rasch analysis package, and checking the fit mean squares, either in- or out-. That isn’t going to change the world; you rarely come to different conclusions than just looking at point biserials. Even going to the next step and using Rasch methods to build banks and equate forms isn’t going to be much different than 3PL, because, when you really get down to it, 3PL is a pretty good approximation to Rasch. And in fact, if you know the item discriminations well enough, 2PL is a Rasch model because it has sufficient statistics for item locations, which is the sine qua non for measurement.

      Estimating the locations with a sufficient statistic ensures (assuming adequate fit to the model) that you have all the information in the data relevant to the task and nothing else. The first takes you beyond the non-linearity of true score theory and the second stops short of the over-fitting trap of IRT. Once you’ve got the information in the sufficient statistics, you’re done. Anything beyond that will reduce the residual error but the estimates become sample specific and less generalizable. That’s not the way to build measuring instruments.

      In short, validity trumps reliability.

      Of course, there is also the point that, in almost all educational measurement, we are nowhere near the point of obtaining “adequate fit to the model” and claiming we are making measurements. Still things to do.

      Liked by 1 person

Leave a comment