Archery as an example of decomposing item difficulty and validating the construct
The practical definition of the aspect is the tasks we use to provoke the person into providing evidence. Items that are hard to get right, tasks that are difficult to perform, statements that are distasteful, targets that are hard to hit will define the high end of the scale; easy items, simple tasks, or popular statements will define the low end. The order must be consistent with what would be expected from the theory that guided the design of the instrument in the first place. Topaz is always harder than quartz regardless of how either is measured. If not, the items may be inappropriate or the theory wrong. The structure that the model provides should guide the content experts through the analysis, with a little help from their friends.
Table 5 shows the results of a hypothetical archery competition. The eight targets are described in the center panel. It is convenient to set the difficulty of the base target (i.e., largest bull’s-eye, shortest distance and level range) to zero. The scale is a completely arbitrary choice; we could multiply by 9/5 and add 32, if that seemed more convenient or marketable. The most difficult target was the smallest bull’s-eye, longest distance, and swinging. Any other outcome would have raised serious questions about the validity of the competition or the data.
The relative difficulties of the basic components of target difficulty are just to the right of the numeric logit scale: a moving target added 0.5 logits to the base difficulty; moving the target from 30 m. to 90 m. added 1.0 logits; and reducing the diameter of the bull’s-eye from 122 cm to 60 cm added 2.0 logits.
The role of specific objectivity in this discussion is subtle but crucial. We have arranged the targets according to our estimated scale locations and are now debating among ourselves if the scale locations are consistent with what we believe we know about bowmanship. We are talking about the scale locations of the targets, period, not about the scale locations of the targets for knights or pages, for long bows or crossbows, for William Tell or Robin Hood. And we now know that William Tell is about quarter logit better than Robin Hood, but maybe we should take the difference between a long bow and a crossbow into consideration.
While it may be interesting to measure and compare the bowmanship of any and all of these variations and we may use different selections of targets for each, those potential applications do not change the manner in which we define bowmanship. The knights and the pages may differ dramatically in their ability to hit targets and in the probabilities that they hit any given target, but the targets must maintain the same relationships, within statistical limits, or we do not know as much about bowmanship as we thought.
The symmetry of the model allows us to express the measures of the archers in the same metric as the targets. Thus, after a competition that might have used different targets for different archers, we would still know who won, we would know how much better Robin Hood is than the Sheriff, and we would know what each is expected to do and not do. We could place both on the bowmanship continuum and make defendable statements about what kinds of targets they could or could not hit.
 A startling new discovery, like quartz scratching topaz, usually means that the data are miscoded.
PDF version: Measuring Bowmanship