LLTM

Archery as an example of decomposing item difficulty and validating the construct

The practical definition of the aspect is the tasks we use to provoke the person into providing evidence. Items that are hard to get right, tasks that are difficult to perform, statements that are distasteful, targets that are hard to hit will define the high end of the scale; easy items, simple tasks, or popular statements will define the low end. The order must be consistent with what would be expected from the theory that guided the design of the instrument in the first place. Topaz is always harder than quartz regardless of how either is measured. If not, the items may be inappropriate or the theory wrong[1]. The structure that the model provides should guide the content experts through the analysis, with a little help from their friends.

Table 5 shows the results of a hypothetical archery competition. The eight targets are described in the center panel. It is convenient to set the difficulty of the base target (i.e., largest bull’s-eye, shortest distance and level range) to zero. The scale is a completely arbitrary choice; we could multiply by 9/5 and add 32, if that seemed more convenient or marketable. The most difficult target was the smallest bull’s-eye, longest distance, and swinging. Any other outcome would have raised serious questions about the validity of the competition or the data.

Table 5 Definition of Bowmanship

The relative difficulties of the basic components of target difficulty are just to the right of the numeric logit scale: a moving target added 0.5 logits to the base difficulty; moving the target from 30 m. to 90 m. added 1.0 logits; and reducing the diameter of the bull’s-eye from 122 cm to 60 cm added 2.0 logits.

The role of specific objectivity in this discussion is subtle but crucial. We have arranged the targets according to our estimated scale locations and are now debating among ourselves if the scale locations are consistent with what we believe we know about bowmanship. We are talking about the scale locations of the targets, period, not about the scale locations of the targets for knights or pages, for long bows or crossbows, for William Tell or Robin Hood. And we now know that William Tell is about quarter logit better than Robin Hood, but maybe we should take the difference between a long bow and a crossbow into consideration.

While it may be interesting to measure and compare the bowmanship of any and all of these variations and we may use different selections of targets for each, those potential applications do not change the manner in which we define bowmanship. The knights and the pages may differ dramatically in their ability to hit targets and in the probabilities that they hit any given target, but the targets must maintain the same relationships, within statistical limits, or we do not know as much about bowmanship as we thought.

The symmetry of the model allows us to express the measures of the archers in the same metric as the targets. Thus, after a competition that might have used different targets for different archers, we would still know who won, we would know how much better Robin Hood is than the Sheriff, and we would know what each is expected to do and not do. We could place both on the bowmanship continuum and make defendable statements about what kinds of targets they could or could not hit.

[1] A startling new discovery, like quartz scratching topaz, usually means that the data are miscoded.

PDF version: Measuring Bowmanship

#LLTM

Rather than treating each balloon as a unique “fixed effect” and estimating a difficulty specific to it, there may be other types of effects for which it is more effective and certainly more parsimonious to represent the difficulty as a composite, i.e., linear combination of more basic factors like size, distance, drafts. With estimates of the relevant effects in hand, we would have a good sense of the difficulty of any target we might face in the future. This is the idea behind Fischer’s (1973) Linear Logistic Test Model (LLTM), which dominates the Viennese school and has been almost totally absent in Chicago.

Poisson

Rasch (1960) started with the Poisson model circa 1950 with his original problem in reading remediation, for seconds needed to read a passage or for errors made in the process. Andrich (1973) used it for errors in written essays. It could also be appropriate for points scored in almost any game. The Poisson can be viewed as a limiting case of the binomial (see Wright, 2003 and Andrich, 1988) where the probability of any particular error becomes small (i.e., b_n-d_i large positively) enough that the d_i and the probabilities are all essentially equal.

The Trouble with Rasch: the Rasch Model Exposed

Measurement solutions too simple to publish.

Viiid: Measuring Bowmanship

Viiib. Linear Logistic Test Model and the Poisson Model