Assessment of Critical Thinking: An Evolutionary Approach

Sean A. McKitrick; Sean M. Barnes

Journal of Assessment and Institutional Effectiveness

Assessment of Critical Thinking:An Evolutionary Approach
Sean A. McKitrick (bio) and Sean M. Barnes (bio)

Abstract

Ensuring university students' ability to demonstrate and express critical thinking through competent writing is vital to students' future success. In this article, the authors describe one university's efforts to respond to external pressures to assess the critical-thinking skills of upper-division students, presenting the implementation of a critical-thinking assessment strategy through the lenses of three distinct stages of development, arguing that the successful assessment of critical thinking requires the acceptance of and patience with an evolutionary approach to implementation. They conclude with a discussion of what factors appear to be necessary conditions for the successful implementation of critical-thinking assessment in practice.

Introduction

At many institutions, the assessment of students' critical thinking is fraught with complexities because of varying policy mandates and accreditation requirements, all of which are usually outside of the institution's control. [End Page 1] In our search of the literature, there also appear to be varying philosophies regarding what constitutes critical thinking because subject disciplines have their own parochial preferences, employers have theirs, and professional organizations such as the American Association of Colleges and Universities (AAC&U) have still others.¹ Further complicating matters are testing organizations such as the RAND Corporation, American College Testing, Inc., and Educational Testing Service, as well as survey organizations such as the National Survey of Student Engagement (NSSE), which offer their own test-based assertions of what critical thinking is and is not.²

Universities and colleges are often caught within this maelstrom of competing definitions, preferences, and mandates. Our search of the critical-thinking literature also suggests that few institutions have engaged in critical-thinking assessment because it is not easy to implement at the campus level. One reason for this is that faculty members not only differ in their motivation to define and integrate critical- thinking learning goals into their course curricula, but also teach very different student populations (e.g., two- vs. four-year institutions). Faculty members, therefore, have discrepant notions of how critical thinking should be assessed.³ Faculty, staff, and administrators also worry that the assessment will de-evolve into a commentary on individual professors' teaching skills or the institution's overall effectiveness in educating its students.⁴

Some state universities, in our case the State University of New York (SUNY), have assembled faculty experts and developed critical-thinking rubrics for their own member campuses in an effort to assess critical thinking on their own terms.⁵ In addition, the AAC&U has developed a "VALUE" rubric in critical thinking through a process in which faculty throughout the United States discussed what learning outcomes are most appropriate in critical thinking, created a common instrument, and then required their member institutions to use either this instrument or other standardized tests to assess critical thinking.⁶ All of this suggests that campus-based attempts at defining these outcomes have provided a renewed impetus toward assessing critical thinking because campus administrators and faculty are able to do so on their own terms.⁷

Purpose

In this article, we describe our experience with assessing critical thinking, positing that a good way of proceeding successfully is to recognize [End Page 2] that the process should be evolutionary in its implementation. We have learned that assessment is less likely to be successful in a hurried process in which an assessment approach is agreed to, implemented, and completed within a very limited time frame. Indeed, our experience has shown that successful and legitimate critical-thinking assessment (at least in the opinion of faculty) is an evolutionary process that occurs more in "fits and starts," in a way that includes faculty feedback over several stages of implementation. We also assert that entering into the complexities of critical-thinking assessment requires institutional leaders to delve into a complicated world of social psychology in which it is rarely clear which faculty will be supportive and which will not, even when an institution is far into the process. In addition, it is our opinion that experimenting with different methodologies and approaches is an important part of the process, and institutions and their administrators, faculty, and staff might be willing to accept or reject approaches not because of their research worthiness, but because of their usefulness. Last, we believe that institutions (and their faculty) should be devoted to engaging in "institutional learning" in which they accept the risk of adverse consequences to reputation, are willing to pay the medium-term expense of participating in the assessment process, and especially realize that a longer time horizon is needed in order to make the resultant information useful and meaningful to student learning.⁸

In what follows, we share Binghamton University's experience with critical-thinking assessment, describing the three stages of evolutionary progression it used to accomplish this task. The first of these was a developmental stage, in which our institution responded and adapted to external mandates to assess critical thinking and began responding by heeding internal (faculty senate) pressures to approach assessment meaningfully instead of focusing on mere compliance. The second stage was an enculturation stage in which administrators, staff, and faculty experimented with various methods of implementation, responding all the while to internal and external pressures to adapt to various, often competing goals and expectations with regard to assessment. The third stage, a refinement phase, involved continued adaptation to environmental pressures, with a focus on making meaning out of various experimental approaches in order to find ways to define strengths and weaknesses in students' critical thinking and to act on these findings. We highlight the successes and failures of the evolution of our assessment program and conclude by sharing the lessons we have learned. We delineate six factors that enabled [End Page 3] the development of our assessment program and discuss implications for future research in implementing critical-thinking assessment at other campuses.

Stages of Evolutionary Progression

Stage 1: Development

In addition to being required by the SUNY Board of Trustees to use critical-thinking learning goals and to select a method of critical- thinking assessment, campuses were required to submit to SUNY a plan for assessing critical thinking that would be approved by SUNY systems administration with the help of SUNY's newly formed General Education Assessment Review (GEAR) group. This group had been formed by SUNY to develop a critical-thinking rubric with the collaboration of a number of SUNY-wide faculty members. Campuses were then required to submit reports every three years describing how critical thinking was assessed, what the results of such assessments were, and what the campus planned to do to address weaknesses in student learning discovered through these assessments.⁹

Our campus chose to use the GEAR group's critical-thinking rubric because it gave faculty more freedom to choose samples of student work from their own courses and therefore gave them more control over the process. As table 1 indicates, it was also easier for faculty and administrators to accept the rubric because it aligned with an already-existing board of trustees resolution that defines student-learning outcomes in critical thinking for its member campuses. Although the use of the rubric was still unpopular, faculty felt that it might produce more relevant information for them than other standardized assessment measures while still satisfying the overall mandate to assess critical thinking. The campus, therefore, developed a plan of critical-thinking assessment in which the central feature involved use of a rubric. The faculty senate approved the plan, which was subsequently approved by the GEAR group and by SUNY. [End Page 4]

Click for larger view
View full resolution

Table 1.

Critical-thinking rubric arranged by critical-thinking outcome

[End Page 7]

These actions took time to discuss, formulate, and finally implement. Assessment information for the first critical-thinking assessment report was due to SUNY in 2006, and the campus did not have time to implement the rubric assessment. At that time, the newly formed assessment office initially searched for critical-thinking assessments that could be quickly obtained, but would also produce more than merely anecdotal data. The assessments in this developmental stage included the faculty Delphi study and survey results.

1. Faculty Delphi study. As an initial step, the campus assessment office conducted a Delphi study on critical thinking. In the Delphi procedure, key individuals are sent an open-ended survey in which they are asked to write down their opinions on a specific topic. Then they are asked to complete a second, closed-ended survey in which they rank one another's answers. The answers that exceed 4.0 (on a five-point Likert scale) and have standard deviations of less than 1.0 are deemed "items of consensus" intended for further discussion. Instructors of jointly designated writing/ oral communication courses were invited to participate in the study because it was assumed that they have the best knowledge of students' critical-thinking skills, given that their courses require written and oral assignments involving students' critical thinking.¹⁰

The thirteen instructors who agreed to participate were first asked to identify strengths and weaknesses in students' critical-thinking skills using an open-ended written response format. In the second, closed-ended survey developed using the responses from the first survey, eleven of the thirteen instructors rated their level of agreement with one another's answers from the first, open-ended survey using a five-point Likert scale (1 = strongly disagree to 5 = strongly agree). There was consensus (mean > 4.0, SD < 1.0) that students were good at making arguments but needed to improve in respect to supporting such arguments factually. There was also a high degree of consensus that students knew how to use computers to access articles via online search engines and databases but needed to improve in the area of critically evaluating the information they used in their papers and oral presentations. The highest degree of consensus, finally, was that students failed to use enough peer-reviewed sources in their writing. Faculty believed that students often relied on sources with unknown reliability found via basic Internet search tools such as Google or Wikipedia as opposed to utilizing the library's resources for obtaining peer-reviewed materials. [End Page 8]
2. Survey results. Results from the National Survey of Student Engagement (NSSE) and the university's two internship supervisor surveys were also used to assess students' critical thinking. All three surveys contained questions about student learning with regard to the synthesis and analysis of information. Respondents indicated that students' preparation for thinking critically and analytically was between "average" and "good" on a five-point Likert-type scale ranging from "poor" to "excellent." Alumni, students, and internship supervisors reported that students had strong critical-thinking skills (M > 4.5 on questions about critical thinking). In contrast, faculty reported on a separate learning outcomes survey that students had deficits (M < 4.0 on questions about critical thinking) in the ability to gather and analyze information from various sources. Faculty also suggested in their open-ended responses on that survey that students would benefit from more extensive instruction in understanding the broader relevance of their thinking and in considering different points of view when developing arguments.

The overall assessment plan developed by our university faculty required that assessment results be reported to faculty committees on a regular basis. The results of these initial assessments were therefore presented to academic affairs administration and to the faculty senate committee responsible for overseeing general education assessment. The faculty senate committee members largely agreed that the measures generated interesting information and that the findings validated current perceptions about strengths and weaknesses in student learning in the area of critical thinking. They accepted the information as being accurate but also felt dissatisfied that they had not been more involved in the assessment process. They believed that the results were limited by the fact that they were based on surveys that merely assessed attitudes and perceptions, not actual student performance. Finally, they noted that the rubric had not yet been implemented and that they anticipated receiving the results of the critical-thinking rubric assessment in future years.

Despite the weaknesses of the approaches described above, these assessments succeeded in providing information for discussion by faculty and administrators, and they satisfied the SUNY mandates to engage in critical-thinking assessment. In this developmental stage, however, assessment activity resulted from policies and procedures developed in response to external pressures exerted on the campus by SUNY. [End Page 9] The university's faculty senate responded by developing its own policies regarding assessment in general education and in its various educational offerings (subject to approval by SUNY), and administrators mediated the process by working with the faculty senate and system leaders to ensure that the university selected an approach that was somewhat palatable to its faculty. In the next stage, efforts to make the process more meaningful to its own campus constituents were the primary focus.

Stage 2: Enculturation

Prior to this point, emphasis was placed on reacting to external mandates to develop an assessment approach that was acceptable to external agencies. Once a plan was submitted and approved and the intent to meet external standards was successfully communicated, it became clear that the primary challenge of developing a critical-assessment program had more to do with satisfying internal demands that this effort constitute less of an empty exercise of compliance and more of a process that would provide useful information to faculty. In this stage relevance was key. Efforts had been made to create policies and procedures that would support meaningful assessment, but it was unclear whether or not the campus culture would accept these policies. We therefore focused on three primary questions in the enculturation phase: How might faculty's attitudes about the overall worth of assessment become more positive? How can their own expectations about the process go from merely being a response to a mandate to becoming intrinsically motivated participation in anticipation of meaningful information for their own use? How can assessment information be used to affect their own goals for student learning?

Two choices were made to address these questions. First, decisions were made to continue or discontinue assessments used in the first stage based on how useful they were to faculty. The Delphi results were interesting, but the method was discontinued because the sample sizes were too small and the results were difficult to compare over time. Because the results from the NSSE led to some interesting conversations about the quality of student learning in critical thinking, administrators and faculty decided to continue participating in that survey. The assessment office then worked with the university's career development center to include questions on the senior survey about how well graduating students felt [End Page 10] they had been prepared in general education areas (including critical thinking). We also worked with the university alumni relations office to develop an alumni survey that included the same questions as those found on the senior survey. The assessment office hoped that comparisons might be made between senior and alumni responses that would shed light on how well the university had prepared students. Faculty also seemed interested in the information because the results enabled them to converse about what they perceived to be weaknesses in students' learning of critical thinking.

The assessment office also focused on facilitating the SUNY-mandated critical-thinking rubric assessment. SUNY had already begun to plan rubric training sessions for those faculty selected to participate and had again communicated its expectation that participating campuses would submit the results of this assessment to its systems office.

The critical-thinking rubric presents guidelines for evaluating seven elements of critical thinking in students' writing samples (see table 1 above). Its guidelines and the elements it focuses on were developed based on the same critical-thinking learning objectives previously adopted by the university (see table 2). The rubric focuses on the following seven elements:

Target argument assesses the extent to which the student identifies the primary argument.

Conclusion assesses the student's articulation of the argument's conclusion.

Logical support assesses the degree to which the argument's premises provide logical support for the conclusion (regardless of whether the premises are true).

Reasonableness of premises measures the extent to which the student correctly assesses reasonableness of premises and the credibility of sources.

Development of argument assesses the student's use of logical reasoning in support of a point of view.

Identification of qualifications/objections assesses the student's identification and analysis of alternative points of view.

Broader relevance assesses the degree to which the student describes the significance of the argument or applies the reasoning to a novel problem. [End Page 11]

Click for larger view
View full resolution

Table 2.

Alignment of SUNY Board of Trustees' critical-thinking learning outcomes with the critical-thinking rubric

The rubric provides achievement criteria for each of four ratings: "does not meet expectations," "approaches expectations," "meets expectations," [End Page 12] and "exceeds expectations." At the training workshops, evaluators did not have significant concerns about the quality of the rubric or their ability to measure critical thinking using it.¹¹ In accordance with the faculty senate- approved procedure, the assessment office randomly selected four courses from a pool of 300-level courses, and instructors were asked to submit student research papers. All of the selected instructors elected to participate. A total of nineteen junior-level papers were randomly selected for review by the evaluators.

The faculty senate selected three faculty members (an assessment category team) to oversee critical-thinking assessment. These faculty members were subsequently trained to conduct the rubric assessment. In training sessions conducted after the critical-thinking papers were randomly selected, the evaluators were introduced to the rubric and invited to discuss any concerns they had with the four rating levels. Language used within each rubric element was discussed, and then the raters practiced evaluating samples of student work using the rubric. Papers were reviewed until the evaluators felt confident that they understood the rubric and how to apply it consistently.

To assess interrater reliability, we calculated intraclass correlations, which measure the variability among raters' scoring.¹² An intraclass correlation of .7 or higher reflects a high level of reliability, a correlation between .3 and .7 reflects a moderate level of reliability, and a correlation less than .3 indicates a low level of interrater reliability.¹³ Table 3 contains the interrater reliability statistics for all the elements of the critical-thinking rubric used in this assessment. The results indicate that there was acceptable reliability for the rubric elements. Agreement among the raters was moderate to high for all elements. Because we complied with the faculty senate procedure of choosing papers from a strictly defined set of courses, including laboratory-based science courses, the research papers from these courses were lab reports that were fundamentally different than those of other courses selected from the social sciences and humanities. This resulted in the evaluators not feeling that they could appropriately evaluate papers on two elements of the rubric, most notably the element "identification of qualifications and objections" and, to a lesser extent, "broader relevance."

Results of the rubric analyses indicated that the average Binghamton University student was able to identify, analyze, and evaluate arguments (see table 4). Means for the rubric elements "targeting argument," [End Page 13]

Click for larger view
View full resolution

Table 3.

Interrater reliability for critical-thinking rubric in 2006 and 2009

[End Page 14]

"conclusion," "logical support," and "reasonableness of premises" ranged between 2.90 and 3.02 (3 = "meets expectations"). Results regarding students' ability to develop well-reasoned arguments were mixed. The mean for "development of argument" was satisfactory (M = 3.04), but the means for "identification of qualifications/objections" and "broader relevance" were both indicative of a failure to meet expectations (M = 2.20 and M = 2.63, respectively).

As previously noted, the validity of the results for "identification of qualifications/objections" was suspect due to the inclusion of writing samples that did not require the identification of qualifications/objections. Administrators warned that these results should be interpreted cautiously as a result of conversations about the heterogeneity of the papers included in this initial rubric assessment. The faculty responsible for overseeing the critical-thinking assessment process directed the assessment office to ensure that the papers collected for future rubric administrations were sufficiently similar to warrant adequate evaluation and to adopt specific selection criteria.

The faculty senate had assigned the assessment category team in critical thinking with the task of reviewing the results of the rubric assessment and making recommendations for improvements in student learning. Members of this team discussed the implications of the findings and the adequacy of the critical-thinking assessment program. In their report, they expressed concern that the definition of critical thinking remained oblique at best, although they appreciated the fact that the rubric elements themselves might realistically serve as specific learning outcomes for students at the university. They were also pleased that the results gave them some impetus and guidance to take specific actions to improve student learning in critical thinking.

After discussing the rubric results, as well as the aforementioned survey data dealing with student learning in critical thinking, the assessment category team recommended that the assessment office and University Libraries conduct critical-thinking workshops and enhance faculty's awareness of the many research resources available to students. They subsequently conducted a workshop entitled "How to Use Information Management Resources to Empower Students to Master Critical Thinking." They also suggested that the University Libraries staff collaborate with instructors teaching first-year student courses to incorporate further instruction on critical-thinking research practices into the course requirements. The library staff also created two electronic tutorials [End Page 15]

Click for larger view
View full resolution

Table 4.

Results of 2006 and 2009 critical-thinking rubric evaluations

[End Page 16]

to be used by students: "Finding Scholarly Journal Articles" and " Finding Books." These tutorials were designed to increase students' ability to access reliable information. Finally, the University Libraries developed a "Web Page Checklist" to assist students in evaluating the usefulness and reliability of web pages and created a website that describes the differences between trade, popular, and scholarly journals.¹⁴ In addition to the University Libraries' involvement in attempts to enhance students' critical-thinking skills, the university provost responded to the rubric evaluation by commissioning a special critical-thinking course that would be taught by an adjunct faculty member who was well versed in teaching critical-thinking skills.

We began this section by describing the hope that we would be able to foster the campus culture of assessment by positively affecting the attitudes, goals, and expectations that both faculty and administrators had with regard to student-learning assessment. In this stage, efforts were particularly focused on gaining faculty acceptance of our assessment program. This was partially accomplished via methodological changes and faculty involvement in the program. The inclusion of formal statistical analyses and the use of multiple measures added to the legitimacy of the information among the empirically minded faculty, most of whom were also researchers. Many of the faculty participated in the assessment program by suggesting questions for inclusion in the alumni or graduating senior surveys, providing rubric data, serving as rubric evaluators, or taking part in the University Libraries' workshops. There was also a focus on discussing the implications of the results with the assessment category teams, which were staffed by faculty. Finally, there was an attempt to implement recommendations made by these teams, and to periodically inform faculty of our progress. This communication took many forms, including announcing assessment findings in newsletters, discussing the findings with faculty in assessment workshops, and gathering feedback on ways to improve the quality of assessments. At the conclusion of the enculturation phase, it was clear that the critical-thinking assessment program was directing educational changes at Binghamton University and gaining more faculty support. However, it was also apparent that further refinement and adaptation were necessary. In the next stage, attempts were made to continue evolving the critical-thinking assessment program, with an eye toward continually refining the assessments in order to optimize their utility for guiding critical-thinking instruction. [End Page 17]

Stage 3: Refinement

In the next stage, the central questions became "How do we make the prior work useful to various faculty audiences?" and "How can we make the process sustainable?" During the enculturation stage, faculty responded favorably to the fact that the critical-thinking rubric was in line with student-learning goals; however, they still believed that parts of the assessment program were irrelevant. Therefore, it became important to identify other assessment approaches to include in the program that were also applicable to the university's mission, vision, goals, and expectations.¹⁵ Because the institution is a research university, faculty had clear expectations that the methods used to collect assessment information be rigorous enough to pass academic review.

Meanwhile, there were continuing external pressures to assess, including increasing expectations from the university's regional accreditor that universities assess student learning through evaluations of student work. The university's membership in the National Association of State Universities and Land Grant Colleges (NASULGC; currently known as the APLU) and that organization's strong suggestion that its member institutions participate in publically reported assessment results in critical thinking produced pressure on the university to participate in a standardized test of critical thinking. The university also submitted its rubric findings to SUNY as instructed but received little or no feedback about the results, and there were signals from the systems office that staff members were so overburdened by budget and other staffing issues that little review of this information occurred at all. As a result, the university's assessment office felt free to augment the wording within the critical-thinking rubric to fit campus needs. Although SUNY still required critical-thinking assessment and review by GEAR, faculty and administrators grew increasingly insistent that the information be accurate and meaningful, as opposed to the product of mere compliance with SUNY mandates. Although external pressures continued, administrators and faculty appeared more willing to suggest ways to modify assessments because the external pressure had become a general requirement that assessment be conducted and there was less pressure to use specific methods that suited these external audiences. This context resulted in the following three specific refinements.

Modifying Survey Modalities to Facilitate Assessment Discussions

In the enculturation stage, surveys such as the NSSE and the alumni and senior surveys were administered and attempts were made to align specific [End Page 18] questions with general education student-learning objectives. In addition to these efforts, the assessment office discussed survey questions that could be put in both the alumni and senior surveys to help answer questions requested by magazine ranking services and the Voluntary System of Accountability (VSA) template. A newly formed university president's commission requested that several questions be placed on the alumni survey for the purpose of understanding students' attitudes and perceptions about technology and critical thinking. After being promised that they would receive survey data pertinent to their students, academic departments helped recruit students to complete the senior survey, and response rates on that survey increased.

Using a Standardized Test to Assess Critical Thinking

In order to meet VSA requirements and to further strengthen our assessment program, we integrated a standardized and normed measure, the ETS Proficiency Profile, into our assessment battery. This allowed us to compare the critical-thinking skills of Binghamton University students to those of other university students across the country. We administered the abbreviated form of the ETS Proficiency Profile to approximately 100 first-year students and 80 senior students. In the abbreviated form, the standard 108 multiple-choice questions are split across three forms. Each student only completes one form (i.e., 36 questions). When scores are aggregated across students, the ETS Proficiency Profile provides a measure of student achievement in critical thinking as well as reading, writing, mathematics, humanities, social sciences, and natural sciences.

Results showed that Binghamton University students' critical-thinking proficiency is above the national average. This was particularly true of entering freshmen, whose critical-thinking scores placed them in the 100th percentile when compared to entering freshmen at other doctoral/ research universities (BU first-year students: M = 115.53, SD = 6.74 vs. normative sample of first-year students: M = 109.27, SD = 1.84). Binghamton University seniors scored in the 86th percentile when compared to seniors at other doctoral/research universities (BU senior students: M = 115.24, SD = 1.84 vs. normative sample of senior students: M = 112.90, SD = 2.21). Although data suggest that the critical-thinking skills of both freshmen and seniors are quite strong, it is interesting that there was a decrease in performance over time relative to the normative sample. The finding could be accounted for by the fact that Binghamton University has seen an increase in applications and has made its admissions criteria more stringent over the past few years. In other words, when current seniors [End Page 19] enrolled as freshmen they might not have possessed critical-thinking skills that were as strong as the critical-thinking skills of the freshmen who are currently entering BU. Alternatively, students' education at BU might not have developed their critical-thinking skills to the same extent that curriculum at other universities had.

Faculty members were interested in this assessment of critical thinking because it gave them an externally referenced source of information about their students' critical-thinking skills, other than the self-report data provided by the NSSE. Although most faculty were (and remain) suspicious of standardized tests, they were generally reassured that this was not the only assessment of critical thinking at the university. The School of Business was pleased to receive the information because it was preparing for an upcoming accreditation visit, and standardized test scores could help faculty discuss the status of critical-thinking skills among their students, especially when given a chance to compare the performance of their students with those in other units within the university. Another academic department, after hearing that several faculty felt their students were not as academically qualified as those in other units, was interested to see what the test scores said about their students' critical-thinking abilities when compared to others at the university. The provost expressed her concern that although the scores were higher than the national average, they suggested that students needed to improve their critical-thinking abilities. She directed her office to begin engaging faculty in discussions about how to do so and to continue working with the library to help improve students' use of database resources.

Refining Use of the Critical-Thinking Rubric

The assessment category team in critical thinking stressed a need to continue the rubric assessment, both because the campus was required to do so under SUNY guidelines and because they wanted to improve on some of the weaknesses discovered in the first cycle. We therefore used the same procedure when administering the critical-thinking rubric in 2009 that we had in 2006, with two exceptions. First, papers were submitted by instructors of both freshman-level and junior/senior-level classes. First-year student papers were selected from a course that only first-semester first-year students were allowed to enroll in, and senior papers were selected from 300- and 400-level courses.¹⁶ We included freshman-level papers because the assessment office felt that it might be interesting to the assessment category team to compare the results of rubric assessment for first-year students with those of senior students (i.e., to experiment with formative [End Page 20] assessment). Second, in 2009, we only selected papers that met the inclusion criteria proposed after the first rubric administration:

1. The papers were submitted in response to a requirement that students complete a research paper. Lab reports were excluded.
2. The papers were at least eight pages in length.
3. The papers contained cited pages or a bibliography.
4. The papers were written by a native English speaker.

Ten of thirteen instructors agreed to provide the identities of the students submitting the papers, so that we could check school records to see whether or not they were native English speakers. A random number generator was used to select 30 first-year student papers from freshman-level classes and 32 senior student papers from 300/400-level classes. Information regarding the student's native language was available for 27 first-year students and 19 senior students. If data were not available, we assumed that the person was a native English speaker. To preserve confidentiality, the papers were de-identified prior to their evaluation so that raters were unaware of the author and the class he or she was taking.

Using the freshman- and junior/senior-level papers, we were able to infer the effect of a Binghamton University education on critical thinking. By utilizing the previous administration of the critical-thinking rubric, we were also able to examine changes in interrater reliability to see if the adjustments to the critical-thinking rubric were effective. Furthermore, by comparing the 2009 results to the baseline scores from the first administration of the critical-thinking rubric in 2006, we were able to assess the efficacy of the curriculum changes designed to help correct critical-thinking weaknesses identified in 2006.

Although interrater reliability was acceptable in both 2006 and 2009, there were some significant differences between the two administrations (see table 3 above). For example, on the rubric element, "target argument," agreement decreased significantly from 2006 to 2009. In contrast, agreement increased substantially on the rubric element, "broader relevance." This finding matches what raters said in postevaluation interviews. In 2006 evaluators complained that it was difficult to evaluate papers not only on the rubric element, "identification of qualifications and objections," but also on "broader relevance" because the course assignment prompts did not require students to explain the broader relevance of their research. Limiting the heterogeneity of the papers included in our sample in 2009 [End Page 21] enabled evaluators to adequately use the critical-thinking rubric to evaluate both of these elements.

In order to infer the effect of a Binghamton University education on critical-thinking skills, we compared rubric scores of first-year students (i.e., freshman-level papers) to those of senior students. The papers were collected from courses that only enrolled first-year students, which ensured that we were observing the scores of first-year, and not junior and senior students. The average scores on the junior/senior-level papers were higher than those of the freshman-level courses, suggesting that students' critical-thinking skills improved over the course of their education. However, the only statistically significant difference was that "target argument" was higher on junior/senior-level papers. Lack of statistical significance on the other elements might be due to the inadequacies of our sample size. Nonetheless, these results gave comparative information to the faculty committee overseeing the assessment process.

The results of the NSSE, ETS Proficiency Profile, and alumni and senior surveys as well as the results of the comparative critical-thinking rubric analyses were included on a visual dashboard for members of the faculty committee to view. This medium was intended both for ease of presentation and to facilitate a comparative discussion about what might be concluded about strengths and weaknesses in student learning in the area of critical thinking. The faculty assessment team for critical thinking was particularly interested in the finding that students were in fact improving their ability to explain the broader relevance of their arguments. They did again express some concern about the rubric evaluation; however, unlike before, members of the committee did not hesitate to voice their support of the use of rubric assessment. Instead, they focused on refining words found in the rubric in order to more clearly highlight the differences among the elements for the purpose of more accurate assessment. In addition, members of the faculty committee and administrators worked on developing ways to implement the committee's recommendations. Some of these stipulations owed their existence to an impending decennial team visit from the university's regional accreditor. The committee also discussed the need for faculty to understand the university's critical-thinking learning goals and suggested that the assessment office work with the university's Center for Learning and Teaching to help enhance faculty's understanding and support of these learning goals. The assessment category team also mentioned a need to do more at new student orientation to educate incoming first-year students about what critical thinking is and [End Page 22] why it is important to them, surmising that the teaching of critical thinking should be a more overt exercise.

Discussion

Binghamton University's experience demonstrates a critical-thinking assessment process that was initially spurred by external mandates but moved forward on its own terms given a reasonably cooperative campus culture and its willingness to participate in critical-thinking assessment as we strove to meet public policy and accreditation demands while also addressing the local needs of our campus. Concerted efforts were made to harmonize the process with local attitudes and expectations regarding the general use of assessment in evaluating student learning on campus. SUNY had to approve all campus assessment plans, but campuses were given the freedom to select from a limited array of strategies in order to assess critical thinking. Because a good amount of time was afforded to experiment with various approaches, the campus was able to move through, first, a developmental stage in which it reacted directly to external pressures to implement a preliminary assessment program by a specific deadline; second, an enculturation phase in which campus goals, attitudes, and expectations had time to play a more direct role in shaping assessment policies and methods; and, third, a refinement stage in which this campus culture went through a process of modification in order to make the assessment process useful and sustainable while continuing to satisfy dynamic external mandates.

As figure 1 depicts, at first external pressures were exerted on the campus in the form of a critical-thinking assessment mandate. These pressures caused conversations to occur between administrators and the faculty senate about what approaches were most appropriate to meaningfully assess critical thinking. The resulting plans had to be sent to the SUNY administration and approved in order to initially comply with the mandate, but the sufficient leeway and time given to the campus to then experiment with different methods permitted the successful completion of this evolutionary process. It should be noted that this process occurred in "fits and starts" as the campus developed a critical-thinking approach that had intrinsic value and relevance. Furthermore, this process is ongoing. Our critical-thinking assessment program continues to evolve in response to the ever-changing demands of external agencies and the local needs of campus faculty. [End Page 23]

Click for larger view
View full resolution

Fig 1.

An Evolutionary Assessment Process, 2001-2010.

In retrospect, there appear to have been several necessary factors that allowed the successful evolution of our critical-thinking assessment program, and which might constitute advice to other institutions currently endeavoring to assess critical thinking.

1. Include a reasonable time frame in which to implement the assessment procedure. Had the State of New York mandated a critical-thinking assessment policy with strictly defined procedures to be completed within only a few years, a critical-thinking assessment program would still have been developed, but this shorter time-frame would have come at the expense of the enculturation and refinement stages and would likely have been met with significantly more resistance from faculty. Furthermore, the lack of time and freedom to develop an assessment program that met the specific interests of our university community [End Page 24] may have failed to provide information relevant enough to lead to effective changes in curriculum and student services. If assessment is to be meaningful and useful, it requires at least a minimum degree of acceptance by faculty. This in turn depends on whether there is adequate time for administrators to collaborate with faculty in the development and refinement of an assessment program that balances internal and external demands.
2. Attempt to include faculty, preferably the faculty senate, in the initial development of the assessment process. Important to our process was a faculty senate willing to cooperate in the development of a critical-thinking assessment process, and to work with administrators on implementing the process over time. Also important was the creation of an assessment office that was able to facilitate discussions about critical-thinking assessment and to wade through the complexities of warning faculty about the consequences of noncompliance while assisting in the development of a critical-thinking assessment process that was acceptable to campus audiences in a process Levi called "quasi-voluntary compliance."¹⁷
3. Learn about various assessment approaches and be willing to utilize them, even if they are less than perfect. In our case, the critical-thinking process became more sophisticated over time. The use of rubrics to assess critical thinking went through fits and starts, with some successes and failures. However, it was also important that staff in the assessment office were knowledgeable about the rubric, knew how to use it given campus culture and its constraints, and were patient enough to do so despite its imperfections and the uncertainty of how the results would be received. Had the assessment office personnel criticized various approaches without first hearing out the initial rationale for each approach, rubric evaluators and other faculty might have effectively withdrawn from the process because they might have feared recriminations from the office rather than constructive feedback. Central to the evolutionary process is a degree of patience with imperfection (at least in the eyes of anyone wanting methodological perfection), with an eye singly aimed at attaining more valid and reliable information in the future. The critical-thinking rubric evaluation described above grew more refined and reliable over time, thanks to the willingness of campus constituencies to experiment with various approaches with the aim of producing a more complete picture of strengths and weaknesses in students' critical-thinking abilities. [End Page 25]
4. Devote sufficient resources to assessment. Even if external mandates had increased in their intensity and faculty and administrators were eager to participate in the assessment of students' critical-thinking skills during their work hours, without campus resources (e.g., funding for the initiative, clerical support), this evolution toward more meaningful and useful critical-thinking assessment would not have been sustainable. In our case, it required the campus's monetary support for the administration of various surveys and generous staff, administrators, and faculty willing to devote committee and personal time to help critical-thinking assessment grow its sea legs. If it were not for the devotion of these resources, movement from one evolutionary stage to another would have been difficult, if not impossible.
5. Constantly work on developing or sustaining a campus culture acceptant of assessment. Bergquist's assertion that successful management relies on an understanding of various institutional cultures also applies to assessment in higher education.¹⁸ Community colleges and for-profit institutions have expectations, goals, and attitudes about assessment that differ from those at four-year research institutions (the type of institution featured in this case study) or four-year undergraduate-only liberal arts institutions. Institutions wishing to move along an evolutionary trajectory from the developmental to refinement stage will have to take into account what kind of institution they are and what sociopolitical environment they are operating in. Because each institution is unique and because internal and external demands are ever-changing, specific assessment techniques and measures that thrived at Binghamton University might fail to flourish when implemented by an institution with different needs, resources, and obstacles to overcome.
6. Resist believing that methodological perfection is a necessary condition for successful assessment. We hope that our description above does not lead to the conclusion that our experience led to a perfect, or even laudable, critical-thinking assessment process. There have been, and remain, many limitations and bumps in the road to a nirvana of perfect assessment experiences. Had we expected methodological perfection, we would still be discussing how to go about assessing students' strengths and weaknesses in critical thinking and would have failed to do any actual assessment. A constantly evolving approach to assessment had led us to use different methods that point to each [End Page 26] of the critical-thinking student-learning outcomes and to focused conversations by faculty that have led to specific recommendations to enhance student learning in critical thinking. Since that time, the assessment office has been able to develop student-learning dashboards that contain various assessments regarding critical thinking and looks forward to further faculty-based discussions about critical thinking in the future.

The purpose of this article has been to share Binghamton University's approach to assessing critical thinking, a process that has taken time and which remains imperfect and subject to change. From the beginning, faculty at our campus agreed that assessment of student learning was itself an initiative to support; however, many faculty were concerned that governments and other organizations outside campus might interpret the collected information without taking our institutional context and culture into account. Refining assessments over a longer time period than some policy leaders in higher education might have desired created opportunities to revise assessment procedures and enabled faculty to reflect upon and contribute to an evolving assessment strategy that balances internal and external demands. As the research literature contains very little information on institution-wide assessment strategies in general education, particularly in critical thinking, we hope that our experience might add to a discussion about which assessment strategies might work for your campus.

Sean A. McKitrick

Office of Institutional Research and Assessment, Binghamton University (SUNY)

Sean M. Barnes

Binghamton University (SUNY) and Veterans Administration VISN 19 Mental Illness Research, Education, and Clinical Center

Sean A. McKitrick

Sean McKitrick is Assistant Provost and Director of the Office of Institutional Research & Assessment at Binghamton University (State University of New York). His duties include overseeing the institutional research office, strategic planning, and student learning assessment.

Sean M. Barnes

Sean M. Barnes received his doctorate in clinical psychology from Binghamton University (State University of New York) and is now a postdoctoral fellow at the Mental Illness Research, Education, and Clinical Center of the VA Rocky Mountain Network. His current research focuses on suicide prevention.

Correspondence concerning this article should be addressed to Sean A. McKitrick, State University of New York at Binghamton, PO Box 6000, Binghamton, NY 13902-6000.

Notes

This research was supported in part by financial assistance from the State University of New York and the Provost's Office of the State University of New York at Binghamton.

1. C. F. Hobaugh, "Critical Thinking Skills: Do We Have Any? Critical Thinking Skills of Faculty Teaching Medical Subjects in a Military Environment," U.S. Army Medical Department Journal 1 (October-December 2010): 48-62; P. Finn, "Critical Thinking: Knowledge and Skills for Evidence-Based Practice," Language, Speech, and Hearing Services in Schools 42, no. 1 (2011): 69-72; M. E. Carey and M. McCarolle, "Field Note: Can an Observational Field Model Enhance Critical Thinking and Generalist Practice Skills?" Journal of Social Work Education 47, no. 2 (2011): 357-66; D. A. Bensley, D. S. Crowe, P. Bernhardt, C. Buckner, and A. L. Allman, "Teaching and Assessing Critical Thinking Skill for Argument Analysis in [End Page 27] Psychology," Teaching of Psychology 37, no. 2 (2010): 91-96; S. A. McKitrick, "Developing an Assessment Procedure to Enhance Student Learning Outcomes in Critical Thinking/Information Management," Re-engineering Assessment Practices (REAP), Assessment Design for Learner Responsibility Conference, May 29-31, 2007, http://www.reap.ac.uk.

2. T. B. Erwin and K. W. Sebrell, "Assessment of Critical Thinking: ETS' Tasks in Critical Thinking," Journal of General Education 52, no. 1 (2003): 50-70.

3. M. Lloyd and N. Bahr, "Thinking Critically about Critical Thinking in Higher Education," International Journal for the Scholarship of Teaching and Learning 4, no. 2 (2010): 1-16; E. Krupat, J. M. Sprague, D. Wolpaw, P. Haidet, D. Hatem, and B. O'Brien, "Thinking Critically about Critical Thinking: Ability, Disposition, or Both?" Medical Education 45, no. 6 (2011): 625-35.

4. F. Fendrich, "Pedagogical Straitjacket," Chronicle Review 53, no. 40 (2007): B6; T. Banta, "Can Assessment for Accountability Complement Assessment for Improvement?" Peer Review 9, no. 2 (2007): 9-12.

5. P. L. Francis, P. D. Salins, and A. E. Huot, "The SUNY Assessment Initiative: Meeting Standards of Good Practice," Assessment Update 18, nos. 1-2 (2006): 1-2.

6. T. L. Rhodes, "VALUE: Valid Assessment of Learning in Undergraduate Education," New Directions for Institutional Research (2008): 59-70.

7. L. S. Almeida and A. H. R. Franco, "Critical Thinking: Its Relevance for Education in a Shifting Society," Revista de Psicologia 29, no. 1 (2011): 175-95.

8. Z. Spicer, "Institutional Policy Learning and Formal Federal-Urban Engagement in Canada," Commonwealth Journal of Local Governance 7, no. 1 (2010): 99-119; G. Gibbs, T. Habeshaw, and M. Yorke, "Institutional Learning and Teaching Strategies in English Higher Education," Higher Education 40, no. 3 (2000): 351-72.

9. This process has since been discontinued by SUNY. See SUNY Board of Trustees resolution. See SUNY document 1151, July 1, 2010, http://www.suny.edu/sunypp/documents.cfm?doc_id=179.

10. P. K. Weerakoon and D. N. Fernando, "Self-Evaluation of Skills as a Method of Assessing Learning Needs for Continuing Education," Medical Teacher 13, no. 1 (1991): 1-4.; J. K. Rao, L. A. Anderson, B. Sukumar, D. A. Beaushesne, T. Stein, and R. M. Frankel, "Engaging Communication Experts in a Delphi Process to Identify Patient Behaviors That Could Enhance Communication in Medical Encounters," BMC Health Services Research 10, no. 1 (2010): 91-113.

11. M. L. Blommel and M. A. Abate, "Instructional Design and Assessment: A Rubric to Assess Critical Literature Evaluation Skills," American Journal of Pharmaceutical Education 71, no. 4 (2007): 1-8; P. Dlugos, "Using Critical Thinking to Assess the Ineffable," Community College Journal of Research and Practice 27 (2003): 613-29; A. Kan, "An Alternative Method in the New Educational Program from the Point of Performance-Based Assessment: Rubric Scoring Scales," Educational Sciences: Theory and Practice 7, no. 1 (2007): 144-52; J. F. Schamber and S. L. Mahoney, "Assessing and Improving the Quality of Group Critical Thinking Exhibited in the Final Projects of Collaborative Learning Groups," JGE: The Journal of Higher Education 55, no. 2 (2006): 103-37. [End Page 28]

12. L. Lu and N. Shara, Reliability Analysis: Calculate and Compare Intra-Class Correlation Co-Efficients in SAS (Baltimore: SAS, NESUG, 2007); M. R. Stanford, L. Gras, A. Wade, and R. E. Gilbert, "Reliability of Expert Interpretation of Retinal Photographs for the Diagnosis of Toxoplasmosis and Retinochoroiditis," British Journal of Ophthalmology 86, no. 6 (2002): 636-39.

13. L. Beyreli and G. Ari, "The Use of Analytic Rubrics in the Assessment of Writing Performance: Inter-Rater Reliability Concordance Study," Educational Sciences: Theory and Practice 9, no. 1 (2009): 105-25.

14. B. Mulligan, K. Bouman, S. Currie, S. McKitrick, and S. Fellows, "Critical Research Practices at Binghamton University: A Case Study in Collaboration," College and Research Libraries News 69, no.7 (2008): 382-85.

15. W. H. Bergquist, Four Cultures of the Academy (San Francisco: Jossey-Bass, 1992); S. A. McKitrick, "Engaging Faculty as a Strategic Choice in Assessment," in Handbook of Research in Assessment Technologies (Hershey, PA: IGI Global, 2009).

16. A limitation of this study was that it was not possible to determine if first-year students had enrolled in 300- and 400-level courses because of drawbacks associated with the university's electronic database system. However, a review of student records determined that first-year and sophomore student enrollment in such courses was a rare event.

17. M. Levi, Of Rule and Revenue (Berkeley: University of California Press, 1988).

18. W. H. Bergquist, Four Cultures of the Academy (San Francisco: Jossey-Bass, 1992). [End Page 29]

Measuring Educational Gains from Participation in Intensive Co-Curricular Experiences at Bridgewater State University

Journal of Assessment and Institutional Effectiveness

Introduction

Purpose

Stages of Evolutionary Progression

Stage 1: Development

Stage 2: Enculturation

Stage 3: Refinement

Modifying Survey Modalities to Facilitate Assessment Discussions

Using a Standardized Test to Assess Critical Thinking

Refining Use of the Critical-Thinking Rubric

Discussion

Notes

Next Article

Share

Additional Information

Project MUSE Mission