In lieu of an abstract, here is a brief excerpt of the content:

The Journal of General Education 55.1 (2006) 17-39


What Are You Thinking?
Postsecondary Student Think-Alouds of Scientific and Quantitative Reasoning Items
Amy D. Thelk
Emily R. Hoole
Abstract

To investigate the cognitive validity of scientific and quantitative reasoning items, “think-alouds” (verbal solutions) were elicited for a general education instrument. Several items were not aligned in terms of anticipated versus actual content, and the instrument’s accuracy is questioned. We discuss study weaknesses and merits of this framework and analysis.

Assessment, a key component of program evaluation, is often accomplished through multiple-choice testing. This method has many advantages (mostly related to efficiency), but one significant drawback is the lack of information about how items are interpreted, reacted to, and solved by the examinee. Deeper understanding of the test-taker experience provides greater evidence for the validity of the scores produced by an instrument. Here, validity means that a test measures what it is intended to measure. Throughout the different stages of assessment (item development, test administration, and interpretation of results) attention to validity is the professional duty of test developers and, ethically, their responsibility as well. For instance, the consequences of testing (consequential validity) have received more press in recent years to draw attention to the uses and abuses of test results. Before one can even consider reporting on test data, test score validity (structural validity) must be examined through statistical analysis and evaluation of administrative conditions. To further deconstruct the idea of validity evidence, one cannot trust validity data at the test level until the test-taker performance of each individual item (cognitive validity) has been considered. Our research targets validity at this "building-block" level (Ferrara, Duncan, Perie, Freed, McGivern, & Chilukuri, 2003). Through the use of think-aloud procedures, we can make educated and well-informed statements about the cognitive validity of the items used for measuring scientific and quantitative reasoning.

A "think-aloud" is an evidence-collecting activity that takes place during test completion in a laboratory setting. The examinee sits with the researcher and is instructed to "think out loud" while solving the test items. The verbalizations are recorded and later [End Page 17] analyzed by the researcher. By hearing the student's thought processes during testing, researchers can gather information about whether the test items are being solved in the way intended when items were developed for the instrument.

This research is quite valuable in that, to date, a literature search fails to reveal other research on think-alouds with a postsecondary population in the area of scientific and quantitative reasoning. Additionally, cognitive validity is an essential, and often overlooked, area when validity studies are being conducted. Establishing a method to investigate the validity of individual items can lend support for the accuracy of the assessment in a specific and relevant manner. Finally, the novice–expert design in our research is unique because we used pre- and post-treatment groups to distinguish the two skill levels. A novice–expert approach can reveal change over time, therefore demonstrating the value of higher education and providing a key component for program evaluation.

Validity has long been recognized as the most important aspect of testing and psychological assessment (Standards for Educational and Psychological Testing, 1999). Whenever a test user wishes to make an inference from test scores, the validity of those inferences must be verified. Current conceptions of validity are best represented by Messick's (1989) unified theory, which places all other types of validity under the umbrella of construct validity. In this framework, all evidence provided strengthens the argument that the construct of interest is actually the construct the scores represent. A key point to understanding validity is the realization that it is not the test that is valid or invalid but, rather, the test scores and the proposed inference the test user wishes to make.

Given that validity is so important in testing, researchers have developed a wide array of techniques to investigate it...

pdf