Effects of Inconsistent Relevance Judgments on Information Retrieval Test Results: A Historical Perspective

Tefko Saracevic

doi:10.1353/lib.0.0000

Library Trends

Effects of Inconsistent Relevance Judgments on Information Retrieval Test Results: A Historical Perspective
Tefko Saracevic
Library Trends
Johns Hopkins University Press
Volume 56, Number 4, Spring 2008
pp. 763-783
10.1353/lib.0.0000
Article
- View Citation
- Related Content
Additional Information

Purchase/rental options available:
- Buy Issue for $25 at JHUP

Abstract

The main objective of information retrieval (IR) systems is to retrieve information or information objects relevant to user requests and possible needs. In IR tests, retrieval effectiveness is established by comparing IR systems retrievals (systems relevance) with users’ or user surrogates’ assessments (user relevance), where user relevance is treated as the gold standard for performance evaluation. Relevance is a human notion, and establishing relevance by humans is fraught with a number of problems—inconsistency in judgment being one of them. The aim of this critical review is to explore the relationship between relevance on the one hand and testing of IR systems and procedures on the other. Critics of IR tests raised the issue of validity of the IR tests because they were based on relevance judgments that are inconsistent. This review traces and synthesizes experimental studies dealing with (1) inconsistency of relevance judgments by people, (2) effects of such inconsistency on results of IR tests and (3) reasons for retrieval failures. A historical context for these studies and for IR testing is provided including an assessment of Lancaster’s (1969) evaluation of MEDLARS and its unique place in the history of IR evaluation.

collapse

You are not currently authenticated.

If you would like to authenticate using a different subscribed institution or have your own login and password to Project MUSE

Authenticate

Purchase/rental options available:
- Buy Issue for $25 at JHUP

Library Trends

Effects of Inconsistent Relevance Judgments on Information Retrieval Test Results: A Historical Perspective

Share

Additional Information

Project MUSE Mission