Why are notions about voice and race that are no longer supported by research still reproduced? Through ethnography work on classical vocal training in southern California in the early twenty-first century, this article demonstrates that listeners—teacher and audiences—project intention and identity onto vocal timbres and entrain the voices accordingly. As such, this research is concerned with the cultural-historical formation of one specific category of vocal timbre. More broadly, it argues that by paying attention to the microscopic nuances of vocal timbre, awareness can be drawn to the politics of listening that come into play each time vocal timbre is assessed.


The racialization of timbre pervades Western discourses on voice. While vocal timbre is habitually naturalized and racially differentiated through enculturation, this racialization has not been thus far subject to systematic critical examination.1 Why do centuries-old sentiments about vocal timbre and difference persist? What supports the assertions and assumptions that essentialized qualities, including race and ethnicity, are not only evidenced through vocal timbre but also confirm racial, ethnic, or other differential markers as innate? In this article, I consider why notions about voice and race that are no longer supported by research continue to be reproduced. I posit that it is not necessarily attitudes about race per se that give rise to this evaluation of voices and people. Instead, I suggest that broader notions of sound and voice entrain and support a more general listening for difference, and, by extension, that values and beliefs, including those regarding race, are identified and, as a result, seemingly confirmed, like self-fulfilling prophecies.

As a musicologist and scholar of voice and sound studies, I consider the racialization of voice within the vocal tradition and genre of Western opera. In this article I focus on operatic vocal pedagogy. Through my ethnographic work on this discipline, I demonstrate that listeners project intention and identity onto vocal timbres—and that the projected meanings and identities are derived from received value systems and contexts. Generally speaking, teachers’ perceptions of students’ ethnicities shape their understanding of how the students might develop timbrally.

While I consider vocal timbre here in a contemporary context, this inquiry is also set within a broader historical awareness of the racial dynamics surrounding physiology and the ways in which they are connected to notions of voice. Engaging perspectives from performance studies, I address concerns in critical race studies and sound studies and extend them to the site of vocal timbre. Thus, my questions find a parallel in theater scholarship’s inquiry into the performed spoken voice. Faedra Chatard Carpenter is also “struck” by the phenomenon according to which, “despite the widely accepted recognition that race is a social construct, Americans still talk about what sounds black or sounds white in simplified racial terms” (195). I share goals with scholars of avant-garde music, jazz, and literature such as Fred Moten, who is concerned with the rematerialization of the visual through sound, and the objectification of persons based on the way in which their visual presentation is understood. I also share objectives with Daphne Brooks’s critical catalogue of the African-American experience, archived in the form of vocal micro-sonorities and inflections in popular music production, representation, and reception (“All that You Can’t,” “There Must Be,” and “Bring the Pain”).

The setting of classical vocalists’ training in general, and the ways voice teachers listen to student voices in particular, offer poignant examples of a broader phenomenon: that people pay very detailed attention to vocal timbre, and form assessments based on impressions they gain through such close listening. By paying attention to the microscopic nuances of vocal timbre, I wish to draw awareness to the politics of listening that come into play each time vocal timbre is assessed—that is, each time a voice is understood as female or male, old or young, healthy or unhealthy, white or not-white. Although I am concerned with the cultural-historical formation of one specific category of vocal timbre, I use perspectives from sound studies to address a broader concern which researchers in the humanities have no method to explain adequately: the timbral micropolitics of difference to which the voice is subjected.

Hearing Race, Teaching Race

Classical vocal artists undergo intense training. A decade of daily practice, weekly (or more) private lessons, monthly or quarterly master classes, summer workshops or university or conservatory training with classical singers and musicians, and opera apprenticeship programs constitute the pedagogical structure as well as the business model for this world. The path towards a professional vocal career is an immersive experience and lifestyle. I was intensely part of this world for a decade and a half. While the following discussion draws on specific examples from the world of classical vocal training, this project seeks to examine the ways in which general attitudes about sound play out when voices are listened to within the context of deeply held assumptions regarding difference. The discussion is based on observations drawn from sixteen years of deep and direct participant observation of selected classical vocal music communities and training.2 While I am still in touch with the classical vocal world, my immersion in the community, including what I refer to as my period of participant observation, took place in Norway and Denmark (1991–1999), New York City (1995–1999), and Southern California (1999–2007).

In addition, over a period of one year, I conducted thirteen interviews with voice teachers.3 In these conversations, I asked what constitutes vocal timbre, how vocal timbre is developed, and what kinds of information vocal timbre is able to convey about the singer. When we discussed correct singing—in terms of vocal weight and color (both are crucial issues in vocal pedagogy)—issues of race, ethnicity and vocal timbre arose. All but two teachers told me that they can always tell the ethnicity of the singer by his or her vocal timbre. In the following discussion, I will focus on two different interviews drawn from the thirteen I conducted.

It is worthwhile to acknowledge that my participation in and knowledge of classical vocal communities is not comprehensive, but is limited to the places and times noted above. My observations have also been affected by what I, and my life circumstances, brought to the scene. On the one hand, having followed the opera world generally, the sentiments I have observed and the dynamics in which my visual presentation (Korean), accent within the context of the English-speaking United States (Norwegian), and vocal school (Scandinavian) have participated seem quite representative of the contemporary classical vocal world. On the other hand, in this article I limit my inquiry to ask what might form the basis of the observations made by the voice teachers I interviewed. As such, I do not purport to make broad statements about voice teachers per se. While some readers may find my interviewees’ observations extreme or at least out of the ordinary, based on my many years as a participant observer, they did not strike me as outliers4—neither as statements by voice teachers nor as statements that might be made by the general public. Therefore, while these specific case studies come from the world of classical vocal practice, I believe my observations here have broader applications and ramifications.

The following inquiry and argument are based on the general practice of listening for sameness and difference, and on the perceived implication of the listener in these processes. Such listening springs from the assumed connection between a given sound’s source and its apparent meaning. Therefore, while some of the following statements from my interviews may seem provocative, I choose these because they are helpful in identifying choices to hear timbral phenomena as personal and innate rather than as stylistic performances. Observations such as “this is a soprano,” “this is a woman’s voice,” “this person is happy,” or “this person is sad” are not driven by the same urgency. However, there are no technical differences between these seemingly innocuous observations and the types of observations made by voice teachers recounted below. Indeed, through deconstructing what some might consider extreme observations, and connecting these observations to what could be considered innocent or common observations, I wish to advance our knowledge of the cultural-historical formation of the timbral micropolitics of difference.

Overall, my interviews with teachers revealed two prevalent concerns around guiding vocal timbral development: first, the question of what constitutes healthy and natural singing for the student; and, second, the need to avoid homogenizing students’ voices and allow each singer’s “true timbre” to emerge. In my conversations with voice teachers, we discussed what constitutes vocal timbre, how it is developed, and what kinds of information it conveys about a singer. When we discussed the “correctness” of vocal weight and tone color, which are crucial problems in vocal pedagogy, issues of race and ethnicity consistently arose.

Voice teachers returned to notions of the “correctness” of vocal weight and tone color in our discussions of the maintenance of healthy, authentic, and beautiful voices. Interestingly, practices that the teachers I interviewed considered “healthy” and “honest” were ultimately correlated with each student’s race and ethnicity (Allison).5 Because race has been thoroughly naturalized, what I describe as racialized vocal timbre is conceived by voice teachers as simply the result of a healthy way of singing that promotes a nonhomogenized sound and that allows students to be “themselves” (Allison). Voices with health problems are commonly conceived as unrealized or repressed due to any number of causes, from bad vocal habits, often conceptualized as “tensions,” to evidence of underlying health issues. In short, a “healthy”-sounding voice is assumed to be a voice freed from blockages, and thus is assumed to be an unmediated sonorous conduit for the subject. Voice teachers equate what they understand as the singer’s “inner essence” with a healthy voice, and listen for it during vocal “diagnosis.”6

For example, Dorothy, a soprano and professor of voice for seventeen years, told me that she can invariably identify whether a student is, for example, Armenian, Russian, or Korean from the student’s vocal timbre, but she frames her classification of students as a concern about vocal health:

There are principles of what is healthy, a balanced sound and all of that, and if [voice teachers] observe that rule, then how can they not hear an Armenian sound or Korean sound and cultivate it?

In this statement, Dorothy reasons that if the voice is trained along principles designed to promote a healthy, balanced sound, it will inevitably display its inherent ethnicity—conflating race, national identity, and vocal health.7 Rather than considering this strategy as a race- or ethnicity-based categorization of voices, Allison, another longtime teacher, views what she calls “ethnic timbre” simply as the “unique color” and vocal “fingerprint” of the student, one that is nevertheless associated with a racially categorized group. Pedagogy, then, becomes a matter of bringing out the “true sound” of the student’s voice—and that true sound happens to be connected to his or her perceived race or ethnicity (Allison). Allison regards this pedagogical philosophy as a means of allowing each student to maintain an element of individuality within the highly cultivated and stylized world of classical singing. During the interview process, I frequently heard such statements regarding the individuality of a voice, by which my interviewees meant, I believe, the opposite: “an ethnic vocal timbre,” a timbre determined by socially constructed notions of ethnicity. Indeed, an ethic of multiculturalism permeates vocal pedagogy: Allison goes so far as to criticize ignorant teachers, who have not been exposed to a variety of “ethnic timbres,” for “homogenizing” their students’ sounds. Most teachers with whom I spoke did stress the importance of being literate readers of “ethnic” vocal timbres.

When we began to discuss what might cause the varied timbres of different ethnicities, Allison explained that the Central and South American timbre is influenced by Latin people’s connection to their bodies; in her view, inhabitants of Latin cultures are motivated by bodily drives, while North American inhabitants are moved by cerebral concerns. She explained that singers’ connections to their bodies affect their sounds:

The Mexican culture, for example, is, to me, a very visceral culture. It’s not a super heady culture. I think we in the United States of America tend to be more cognitive. You know the whole Puritan ethics where sex is bad and you just disallow that you have anything below your waist. You know, that is a primary drive in people.

I asked Allison whether she believed that some cultures come by that body-voice connection more naturally, so that even if a singer from one of those cultures studies with an American teacher, or a teacher who is not particularly focused on the development of the body-voice connection, his or her voice would still sound the connection that was “in” him or her from the beginning, and thus would differ from the voice of an Anglo-American growing up in the US. Allison responded:

Yes. I think [Latin Americans] naturally have that connection […] They’re […] connected to their bodies […] and their guts [said with throaty, “gut sound”] and they make music from their hearts. In European repertoire they talk about that “she broke my heart, I will just lay down and die now” [said with a very “proper” voice], and in Hispanic music, the Latino music: “She broke my heart, she ripped it out of my chest and stomped it on the floor!” [nearly screaming]. And that’s how their music sounds. It’s very gut. Americans—we don’t operate on that level, we tend to be a visual or cognitive society.

Allison expressed her claims in compassionate language and avowed a commitment to allowing the “natural” and “individual” voice to remain untouched through intense classical vocal training. Yet several interviewees used these notions of “naturalness” and “individuality” synonymously (if unconsciously) with ethnic, national, or racial difference. The sentiments that attend such notions echo those of musicologist Marius Schneider in a 1957 encyclopedia entry on “Primitive Music,” where he posits that “Every being has its own sound or its own song, the timbre and rhythm of which embody the mystic substance of the owner” (42). As Alan P. Merriam and Valerie Merriam observe, “Races are held to have special and mystic abilities, and what the anthropologist attributes to learning and to culture, Schneider attributes to race” (255). In Schneider’s own words, some musical characteristics are “bound up with certain racial factors. … In fact, the innermost essence of the more intensely specialized types of song cannot be transmitted at all … since the dynamic and vocal timbre which is inseparably bound up with it cannot be acquired by learning” (27). According to Schneider, vocal qualities that are heard to express “certain racial factors” are understood within this listening framework as non-negotiable expressions. The consequence of such a listening position is that meaning is formed within a rigid and closed cycle.

Historically speaking, the investment in race on the part of my informants, as classical vocal pedagogues, is far from anomalous, although their frankness on this question is worth noting.8 As mentioned earlier, all but two teachers claimed to hear singers’ ethnicities in their vocal timbres. That is, teachers’ perceptions of students’ ethnicities shape their understanding of how the students might develop as singers, and further direct teachers’ ears.9 The classical vocal pedagogy practiced today in southern California (and elsewhere in the US) can be traced back to the formation, during the mid-nineteenth century, of what John Potter has called the modern classical voice. For Potter, the formalization of vocal pedagogy grounded in scientific principles marks the transition from the premodern to the modern classical voice (54). Modern classical vocal pedagogy’s advances, and its questionable notion of “the natural,” were aided in part by findings encouraged and enabled by colonial racial dynamics and research resting on colonial power structures. While Potter has observed that the ideologies powering the formation of the modern classical voice are still present in current vocal practices, I add that classical vocal training fosters a racial-vocal microtimbral discourse.10 Despite having substantial specific knowledge about voice, voice teachers—like most people—hear race (or health, or authenticity) as communicated through essential timbral qualities that are presumed to tell us something unmediated about a person’s internal state (Miller, Solutions; Rubenstein 90; Miller, National Schools 220).

Perhaps surprisingly, I would posit that these racialized assessments do not flow directly from, nor are they solely enabled by, racial sentiment in a given culture and society. Instead, I would argue that this type of listening emerges from general assumptions about the nature of sound and its ontology, i.e., assumptions regarding what we can know about sound and its meaning. That is, the type of listening reported above is supported by assumptions regarding what meaning sound is capable of communicating. I suggest that the vocal teachers’ way of listening does not directly arise from racism, sexism, or other prejudice. Instead, what underpins such assessments is the general belief that we can identify and know sound. Once the assumption is in place that sound is knowable, what then becomes “knowable” through sound are values and beliefs in a given society—concerning, say, race, ethnicity, gender, or class. In other words, when the beliefs are in place that we can know sound, and that the meaning we infer from it is stable, then whatever we believe is projected onto the sound.

In order to move towards an understanding of specific racialized listening, I propose that we must investigate general assumptions about sound and meaning. Returning to the above interviews, I’d like to frame the following discussion by reiterating that the interviews were originally carried out as part of an earlier study that examined the question of the reproduction of racialized notions through vocal pedagogy (Eidsheim, “Race”; “Marian Anderson”). That was a satisfying study that participated in a broader conversation that emphasized the relation of body and voice. However, returning to its findings over the years, and considering them in the context of my work on the way that voice, music, and sound are heard and understood within preconceived values and categories prompted me to think more broadly about the question. For example, where I had previously asked, “What is the musical-historical and vocal pedagogical context for racialized listening, hearing, and teaching?” I now expand that question by also asking: “What, more broadly, are the frames of listening and the assumptions about sound’s ontology that lie at the foundation of listening, and that not only associate timbre with race, but also believe these traits to be innate?”

Racialized Timbral Judgments are Based on the Assumption that We Can Know Sound

Within such a constellation of beliefs in a stable, knowable sound—what I call the Figure of Sound (FoS)—we are conditioned to hear what we listen for, and to assume that what we hear is indisputable. As I see it, the dominant Western notions of music making and listening are founded on this paradigm of the FoS. Listening that is formed and that takes place within this paradigm is listening that only knows how to listen for and through difference from a fixed referent. Because the FoS paradigm assumes a fixed referent, it fosters a specific kind of listening where the primary goal is to identify difference from that referent. In other words, within this paradigm, making sound and listening are about degrees of fidelity to an imagined a priori sound and our ability to identify that fidelity. For example, then, we note observations such as:

—This is “ma” (as opposed to “pa,” and “ma” is different from “pa”).

—This is Bb (different from other pitches).

—This is a too high or “out of tune” G# (it is not faithful to the a priori G#).

However, the paradigm of the FoS does not end with the drive to know and identify a sound such as, say, G# as the second-scale degree of the key of F# major. Nor does it end the drive to know and identify sounds on a unit level, such as syllables, words, or pitches. The paradigm of the figure of sound extends into timbre, and such timbral assessments are used to establish basic information around a sound source:

—This is a flute (different from other instruments: say, a clarinet).

However, beyond making basic distinctions such as between flute and clarinet, judgments of timbre are often bound up with the assessment of value and identity. For example, in the FoS paradigm, listening to human voices can lead to appraisals, such as “This is the sound of a woman’s voice,” based on perceived similarities between a given sound and other, specifically female, human voices, and their dissimilarities to male and children’s voices.11 Likewise, the observation that someone is “talk[ing] white” has at least two layers: the assumption that the speaker is not white, and the assumption that the unexpected racialized vocal style is out of place, necessitating attention to the perceived clash of identity and timbre (“Nader”). In other words, this observation exemplifies assumptions that race is quantifiable and knowable, and that race is timbrally conveyed. This is but one example of how the FoS is also bound up with the assumed meaning of an identity, which is often derived from values and assumptions related to visual cues.12

Listening within the FoS framework has a circular logic. This logic is akin to a self-fulfilling prophecy. Per Robert Merton’s description, the self-fulfilling prophecy is “in the beginning, a false definition of the situation evoking a new behavior which makes the original false conception come true.” He continues, “This specious validity of the self-fulfilling prophecy perpetuates a reign of error. For the prophet will cite the actual course of events as proof that he was right from the very beginning” (195). In the case of racialized vocal timbre, the “false definition” is the belief in sound as stable and knowable—which causes us to fail to attend to the many ways in which timbre is learned and performed, including those we associate with race, ethnicity, or authenticity (Eidsheim, “Voice” 12-13, 19-21). We then listen for those phenomena that we believe to exist; we subsequently hear them; and because we hear them, we believe the perceived meaning to be verified.

For example, “black voice” is an observation born from an encultured notion of sound that expects fidelity to a referent and listens for difference. When voices are reduced to fixed sounds and undergo assessment, they cannot but be heard within binaries or scale-degrees of fidelity and difference. Moreover, due to the ways in which vocal timbre has historically been aligned with and metaphorized as interiority and truth, the stakes and ramifications of such assessment involve more than just the sounds. What is measured is a person’s degree of fidelity to, and difference from, a dominant category. I bring two observations to this reading of a timbral micropolitics.13 Firstly, the persistence of the metaphor of vocal timbre as the unmediated sound of selfhood and subjectivity means that a given society’s beliefs lie at the core of its citizens’ personhood. (For example, where race is believed to be a stable category, it is then believed to be audible in vocal timbre.) Secondly, culturally trained ears assume, and thus perceive, only formalized vocal practices as encultured; moreover, they tend to perceive enculturation only in certain components of the trained voice. The naturalization of the untrained voice as an expression of “essential identity” and the naturalization of aspects of the trained voice according to racial categories are both functions of the micropolitics of timbre.

Independent of the “actual” or intended sound, what a given listener hears depends to a large extent on his or her assumptions regarding the ontology of sound. For example, the belief that it is possible to know something firm about a sound and its source deeply affects the meaning the listener will form based on that sound. Such belief arises from assumptions that sound can be known, is stable, and can be unequivocally recognized and unambiguously named. Furthermore, such belief assumes a deep connection between sound and its assumed signification—an assumption regarding significance that is taken on through enculturation. The statements made by voice teachers in light of the broader listening framework help us to see how, when listening through the FoS, we will listen for and, indeed, hear according to categories, such as race, that are aligned with values in a given society.

Moreover, the assumption that it is possible to know sound leads to an overarching listening stance that seeks fidelity. When people assume that it is possible to know sound, their primary tenet in listening is identification. The basic tenet of identification is comparison with an “original,” an actual sound or the idea of a sound in the mind’s ear. The success of such listening then depends on the listener’s ability to distinguish similarity to or difference from the ideal. In other words, on a basic yet profound level, such listening entrains listening for sameness and difference.

As argued above, the sounds we ultimately produce and hear are based on enculturation, and are not essential qualities expressed through timbre in an unfiltered manner. I now wish to clarify further that because of assumptions around the FoS, we are unlikely to examine critically listening processes and the meaning they produce on a fundamental level—for example, the assumption that sound can be identified. Basic assessments regarding a given sound’s resemblance to or difference from the ideal are also strengthened. Given categories, which thus offer the basis for listening for sameness and difference, are of course culturally dependent. The sound categories that can be further identified include distinct pitches, adult voices (versus, say, children’s voices); male versus female; “ethnic” versus “non-ethnic”; and authentic versus inauthentic. However, because the premise of listening is identification, the likelihood of the a priori existence of whichever categories are identified is not questioned. Thus, due to a basic belief in something as seemingly innocuous as the possibility of knowing sound, we do not ask whether it is possible to identify social categories by listening. And, when listening within the ontology of the FoS, what is heard is then understood as evidencing essential and non-negotiable traits.14

The Role of the Interpretant within the Figure of Sound Listening Framework

It is not timbre per se that is believed to signal only essential elements. For example, the singers trained by the teachers discussed above can easily be heard to follow different operatic styles (say, baroque versus romantic) or even different genres (say, rock rather than opera). That is, while the singers’ vocal timbres are taken as indisputable evidence of aspects that are considered essential traits within a given society—including race and ethnicity—other aspects of the same singers’ timbres can be recognized as performed. What is the difference between these two interpretative situations?

The contingent relationship between sign, object, and interpreter is indeed well recognized for certain aspects of vocal timbre. For example, it is accepted that one singer can successfully sing multiple genres, while it is also noted when a voice unsuccessfully performs a given genre. We can compare this to the recognition of the performative (and non-innate) aspect of vocal performance, including the phenomenon of vocal adaptation to different situations (e.g., talking to a baby or to an adult). The contingency of the relationship between the sound source (sign) and the signal (signified) is the very premise of these complex assessments of acquired and performed timbres. Moreover, the socialized and performative aspects of these timbral presentations form the basis of the relational contingencies.

The classical vocal world also recognizes these dynamic relations. For example, the phenomenon of national schools of singing is understood as contextually contingent and acquired. The processes involved in forming vocal timbre are formalized and recognized in detail, even if the resulting timbres are understood within the signifying process only by those who know the cues. While most people can recognize operatic timbral characteristics in general, not everyone can distinguish between the various national schools of singing. However, for those initiated into operatic timbre, the different national schools are quite distinct.

Classical vocal pedagogy is built upon the notion that it is possible to construct timbre. While, for most people, all classically trained voices might simply sound “classical” or “operatic,” tone quality is refined in specific ways within subgroups. A national school of singing is understood as both a preferred tone quality and the technique that produces that quality. Tone quality and technique function symbiotically on a national and regional scale, and result in differing pedagogical schemes and corresponding shapings of the voice according to national tone ideals. (Perhaps the most commonly known national schools of singing are the English, French, German, and Italian, but there are also the Nordic and Slavic.15) We know that the sounds of these various “schools” are the result of aesthetic preferences and of vocal techniques designed to accommodate those preferences.16 We also know that they are not recognized as the unmediated expression of a people, contra nineteenth-century Romantic nationalism.17 A national school of singing simply refers to a region’s preferred tonal quality (and the vocal technique that engenders it) and does not, of course, necessarily indicate the nationality of the singer. A Norwegian singer may be educated in a conservatory in Germany and thus develop a German tone. An Italian teacher might teach in Paris and pass on his or her Italian technique and tone ideal.

This preferred national tone is not a casual matter. The French Ministry of Culture, for example, has employed official inspectors to observe regional conservatories of music in order to evaluate their vocal pedagogy. Richard Miller reports that, in the post-World War II decades, some inspectors were especially adamant that their concept of proper onset be taught in French conservatories.18 The preferred onset among these inspectors was an “attack,” a very strong beginning that is created by a powerful inward thrust of the abdomen. As a result, the vocal folds were forced to deal with a high level of airflow, and in response the larynx resisted the excess airflow by fixing the vocal folds in a single position. The result is a “held” sound that is slightly above pitch, with a pushed and sharp-sounding phonation. This sound is now characteristic of the French onset and—because the attack sets up a tense position of the vocal folds—of the French line.19

It is also important to note that within the geographical area of a single national school there will be many different spoken dialects. In some areas these dialects are so different that they are close to separate languages. However, phonation and, as a result, pronunciation differ in song and speech, and singers learn very carefully how to pronounce words when singing, even in their first language. Even singers with different mother tongues or dialects are unified under a single national school or a single teacher’s tonal ideal. In summary, the presence of national schools of singing not only exemplifies the malleability of the human voice and the enormous impact that teachers’ and institutions’ tonal ideals and pedagogical practices generally have on the sound of a trained classical singer’s voice, it also shows that we are fully aware of, and acknowledge, the constructedness of vocal timbre in formally trained voices.

But isn’t it contradictory that, while a singer is understood by a vocal community simply to show timbral evidence of her, say, “ethnicity,” the same vocal community also has the capacity to recognize that the same singer—at will and practice—is able to perform across a wide timbral range? It is not contradictory. I understand these different listening outcomes as arising from a split in listening, while both branches arise from the FoS listening framework. That is, while within a Western listening context all sounds are heard through the FoS, what we know based on timbre falls into two broad categories. Some aspects are understood as essential, while others are understood as acquired, performed, and somewhat open to interpretation.

I find Charles S. Peirce’s work useful in thinking through a bifurcated listening process. Of Peirce’s many definitions of a sign that capture this listening process, the one below is particularly relevant to our discussion:

I define a sign as anything which is so determined by something else, called its Object, and so determines an effect upon a person, which effect I call its interpretant, that the latter is thereby immediately determined by the former.


While we recall that Pierce’s theory of the sign delineates different stages and levels of complexity, what is important for the purpose of this discussion is that three interrelated parts—a sign, an object, and an interpretant—together make up a sign. Simplifying this triadic model, the sign can be conceived of as that which is presented. For example, a waving hand, a word uttered, or a vocal timbre produced. The object can be imagined as whatever is signified: for example, the “hello!” that the hand is waving; the object to which a spoken word attaches; or the vocal genre to which the timbre refers. The interpretant represents the observer’s understanding that there is a sign/object relationship. In other words, Peirce explains that there is no clear one-to-one relationship between sign and object; the sign signifies only in the interpreted moment.

When a voice is heard as “ethnic” it appears that there is a non-negotiable relationship between sign and object, and thus there is no interpretant creating an understanding of the sign/object relationship. Identifying essential qualities communicated through vocal timbre is the expression of a triadic model—with the omission of one of the three parts presented by Pierce: the interpretant. In the dyadic model, there is a direct and unquestionable connection between sign and object, and in this relationship the listener or interpretant is impotent. His or her only task is to be literate in these connections and to correctly identify the pairs. That is, when someone listens to and trains voices within an overarching understanding of the FoS framework, the FoS sets up a belief in the possibility of non-interpretation of sound. Innateness is the particularity of that category, and the listener is passive in the assessment. FoS listening assumes one-to-one correlation, i.e., a dyad; and the assumed dyadic relationship is understood as indisputably true and real.

In contrast, I posit that there is always an interpretant naming timbre. However, when the communicated meaning is understood as innate, any possible involvement of the interpretant is deflated. Because of the assumption regarding innate qualities evidenced through vocal timbre, each resulting “assessment”—in the form of an “ethnic” assessment, say—makes the interpretant seem to be inactive. Moreover, this assumption makes it appear as though the Pierce-like analysis is only applied to those aspects that we believe are performed through choice. This erroneous assessment explains the seeming split in awareness in our own involvement as listeners and interpretants, varying from aspects we believe are innate to those we believe are not: it is the assumption that a quality is innate, and that we can know sound, that disempowers decision-making by interpretant and performer alike.

These three parts are always present in signification—but the third is made impotent by the rhetorical frame of assumed innate qualities and assumed ability to know sound. Therefore, what seem to be two contrasting modes of listening are actually the same. People who see only two parts (subject and object) are not wrong per se, but they have attempted to filter out the dimension constituted by their own active and contextually dependent listening. Therefore, what seems to be a dyadic form is, instead, a triadic situation—one in which we lack sufficient perspective to notice the third part: the interpretant. Because sound and “meaning” seem to confirm each other so closely within the FoS framework, there is no analytical space for asserting the interpretant’s role. Therefore, rather than examining what is purported to be heard, I suggest we step back to examine the listening practice and the frames around it that yield any given “outcomes.” We can apply Pierce-like operations to acknowledge the third party: I propose that we examine racialized vocal timbre (and any other qualities that are understood as essential) in order to move from analysis of sound to analysis of the way that sound is listened to.

Conclusion: Listening to Listening

Listening is always already active. It is through deconstructing the process within which listening takes place—through listening to how we listen—that the listening framework becomes apparent, and that we can grasp the very politics of listening. By shifting our analytical lens from the so-called sound to observing and understanding the process of listening, we may listen against the FoS. Within a given context, there is only the triad: the consortium of sound, meaning, and listener. Within this consortium, the listener is the point of origin for meaning production.20 By understanding the relationship between the FoS and the way in which general and essentialist assumptions about sound are acted out within a given society, we may begin to grasp some of the ways in which listening is always already political. By breaking down the consequences of FoS listening, we can understand its potential power. More importantly, by enumerating the consequences of the FoS it is once again confirmed that it is false.21 That is, since sound is not always already static and knowable, the “identified and its meaning” are listener-derived. And, while the “identified and its meaning” are listener-derived, the assessments produced are assumed to be so indisputable that they are used as evidence of everyday observations (such as those referred to at the beginning of this essay), and their validity is extended to the American court system.

For example, in a 1999 ruling, a Kentucky Supreme Court judge decreed that since no one would find it inappropriate for an officer to identify the voice of a woman, “we perceive no reason why a witness could not likewise identify a voice as being that of a particular race or nationality, so long as the witness is personally familiar with the general characteristics, accents or speech patterns of the race or nationality in question” (Clifford 371). With this pronouncement, the Kentucky Supreme Court ruled that a conviction was appropriately based solely on a police officer’s identification of a suspect whose voice he had heard. The officer identified the suspect as a black male, testifying that during his thirteen years as a policeman he had had several conversations with black men and therefore was able to identify a black male voice. We see here an assumption that the speaker in question did not completely control his body, and therefore could not help but sound in a way that identified him.

When people assume that “what you hear is what he is,” what the listener thinks she hears is generally unexamined (Clifford 375–6). While the expression of many commonly held sentiments about race (or any other category important to a given society) is often curtailed, people routinely act and report on what they hear, and that is deemed sound evidence, even in a Supreme Court. Therefore, even in cases where racialized politics are not expressed in broad gestures by institutions that underpin a society, like a Supreme Court, they are enacted “under the radar” through practices such as listening to and reporting on vocal timbre.

While, as we have seen, listening has been used to apply and maintain intense power dynamics, I posit that merely realizing that this is the case can open a space within which we may detect the politicization of listening, while also applying critical listening in a kind of counter-movement. In much the same way that words used in hate speech can be reclaimed by the targeted group, listening can be applied against conventional meaning, and thus can lead us to hear and perceive vocal timbre anew. Accepting that listening and the meaning derived from it are never stable—even in areas that were previously naturalized—opens a new area of interrogation by applying the politics of listening. Adopting the mind frame that listening is always already political has the potential to put intense pressure on the positionality of the listener. That is, the listener is not let off the hook, as he or she is now. Keeping in mind that listening is always already political, the listener would examine any interpretation or judgment, acknowledge that it is the process of listening and interpreting that willed that particular meaning into being, and interrogate why it was projected onto a particular vocal timbre.

An examination of meaning, then, would lie not only in the “objective” sound, or in disputes about its possible meanings, but also in a meeting between the sound and the listening stance of the listener who derived those meanings. And, most importantly, such a critical inquiry would find new sites to deconstruct the process of signification.22 For example, how and why were aspects of vocal timbre such as health and race areas of signification that were understood as innate, and to which no interrogative or deconstructive pressure had been applied? For my part, uncovering the overall pedagogy of FoS listening, and understanding that it is involved in each and every act of listening, made the performativity of these areas very apparent. When the political aspects of listening are acknowledged, meaning is, by definition, no longer naturalized. Instead, each and every meaning is assumed to be the result of a process of meaning formation, and each meaning thus formed is interrogated. Within such a framework, assessments such as “healthy,” “authentic,” and “ethnic” in relation to a given voice would be openly traced. Rather than judging and dismissing beliefs such as naturalized notions about race, one can clearly articulate and understand them—and thus reckon with them. For me, the take-away is that each meaning that arises for a given listener or listening community is an opportunity for reflection. And, in a way, this article is a demonstration of seizing such an opportunity.

Nina Sun Eidsheim
University of California, Los Angeles
Nina Sun Eidsheim

Nina Sun Eidsheim is Assistant Professor in the Department of Musicology, University of California, Los Angeles. Her first book, Sensing Sound: Singing and Listening as Vibrational Practice, is forthcoming with Duke University Press (Fall 2015). In her second book project, Measuring Race: Listening to Vocal Timbre and Vocality in African-American Music, Eidsheim deals with the cultural, social, and material projection and perception of vocal timbre.


1. William Labov’s concept of the linguistic variable accounts for accents and social factors, including gender and cultural affiliation, expressed through vocal variation. These variations are also referred to as speech communities—for example, West coast compared to East coast English, or rural compared to urban. But they are not always geographically bound: communities exceeding geographic boundaries also form speech communities. Speech variation can be extended to timbral variation, since the particularities of vowel and consonant enunciation participate in forming vocal timbre (Labov; Labov, Ash, and Boberg). See also Shuy; Fadden and LaFrance; Scollon and Scollon; and Butcher.

2. As a classical singer, I trained in music conservatories in Norway and Denmark, and took voice lessons in New York for five years. I then moved to Los Angeles and subsequently to San Diego, and in both places I participated in higher education vocal training communities. While I am still a member of these communities, I stopped taking and giving lessons in 2007, so my account of sixteen years ends at that point.

3. Most of the interviews took place in teachers’ private studios. In one case the interview was conducted in a coffee shop, and on one occasion the interview took place by telephone. I conducted the interviews during the 2005–2006 academic year, enabled in part by The UCSD Center for Study of Race and Ethnicity’s Chicano/Latino Studies and Ethnic Studies Summer Fellowship.

4. In contrast, based on my experience with this community, the two teachers who did not claim to consider race or ethnicity in relation to vocal timbre seemed unusual to me.

5. I use pseudonyms throughout to refer to voice teachers and students who participated in this research.

6. I use the word “diagnosis” in a loose way here, but with a wink towards the porous boundaries between a voice teacher’s aesthetic and medical listening. Because the body is the singer’s instrument, it is quite common that voice teachers and students discuss overall health issues, offer health advice, and refer the students to be examined by specific medical doctors—more so than in other music conservatory teacher-student relationships formed around other instrument learning. I make this observation based on personal and anecdotal experience. Additionally, otolaryngologists, who mainly serve people who are not professionally dependent on their voices, stress general body care (hydration, work environment, drugs, gastroesophageal reflux, and so on) in their work on vocal health, which also goes by the term “vocal hygiene.”

7. What might produce an Armenian or Korean versus an American sound? The question is asked here within the context of the United States. While a distinct timbre might be attributed to the singer’s mother tongue, this timbre is also believed to be retained when singers sing in other languages. More importantly, I have not heard this observation regarding vocal health made about singers who appear to be European American. The parallel between Armenian Americans (who may or may not share Armenian as a first language) and European Americans is that both sing in foreign languages – say, Italian – but the connection between vocal health and ethnicity is only made in regard to those appearing as Armenian American.

8. Their rhetoric also evidences similar assumptions about nationality and ethnicity, but the scope of this paper only allows for a discussion of race.

9. The early twentieth-century teacher-student relationship between Chinese-American vaudevillian Lee Tung Foo and his voice teacher Margaret Blake Alverson captures this dynamic. Theirs was a relationship that “broke racial barriers but never transcended their limits,” Kristyn Moon notes (23). See also Blake Alverson’s account of the story ([1913] 2006).

10. While past pedagogical texts connected race and vocal timbre, some current respected pedagogical texts do not. For my discussion of racial formation in classical vocal training, see Nina Eidsheim, “Race,” “Voice,” and “Marian Anderson.” For the latter, see Potter, Vocal Authority 47).

11. The assessments here, in terms of gender, are infinitely complex. Why is it left unexamined whether the voice in question could even be the sound of a male imitating a female voice? Or a female impersonating a male voice, a child’s, an animal’s, and so on?

12. This concern is at the center of the discussion in my book-in-progress, Measuring Race: Listening to Vocal Timbre and Vocality in African-American Popular Music.

13. This phrase is close to Steven Goodman’s “the micropolitics of frequency” (187). While Goodman’s terminology is useful, I do have some strong reservations about it, as I understand the term/concept “frequency” to imply stability—akin to the figure of sound. I discuss this further in Sensing Sound.

14. When we move beyond mono-sensory ideas of music, we easily sense the “cracks” in these beliefs (Eidsheim, Sensing Sound).

15. What is now referred to as the international style of singing is based in the Italian bel canto school, but is also flexible enough to be well received in several other regions of the world. Indeed, the “international school of singing” generally refers to the style practiced by singers who travel among the most prominent world opera stages.

16. Richard Miller notes that, although there are recognizable national tonal preferences and techniques, no nation exhibits monolithic conformity. Miller estimates that over half of the teachers within a given national school adhere to the national tonal preference, while the remaining singers and teachers are devoted to international practices. (Miller, National Schools xix.) Tone preference is also influenced by teacher migration and relocation. For example, many German teachers associate themselves with the historic international Italianate School as a result of the legacy of the master vocal pedagogue G.B Lamperti, an Italian expatriate who taught in Munich.

17. In addition, one’s preference for a particular repertoire can affect the sound of one’s voice, as the repertoire’s method of “setting the voice” and demanding certain techniques from it will shape the voice.

18. A vocal onset is the way in which a singer performs the beginning of a musical phrase. This may be accomplished with an attack, or by “easing” more softly into the note. To those unfamiliar with vocal technique this might not seem like such a radical difference, but for vocal pedagogues and singers it is very important. Listeners who are not voice professionals might not consciously register these different onset practices, but attentive listeners can develop an awareness of an overall difference in the sound.

19. In contrast, there is the Nordic “soft” onset wherein airflow precedes sound, the German weicher Einsatz (whisper onset), a reaction against the earlier Sprengeinsatz (hard onset), and so on (Miller, National Schools xix–xx).

20. We can address this from an intensely material point of view (exemplified in Eidsheim, Sensing Sound; Moten, and Stras), but we can also discuss it on the symbolic level, which I do here.

21. The “work” carried out through the FoS becomes clear when we consider it in contrast to “multisensorial listening”—see Eidsheim, “Voice”; Sensing Sound.

22. Of course, much of the hermeneutic work that is carried out deconstructs the process of interpretation through interpretation itself. While much of that work is invaluable to understanding the process of racial construction, the challenge remains in areas that cannot easily undergo hermeneutic analysis, including certain aspects of vocal timbre and categories that, due to their naturalization, are impenetrable to any kind of critical analysis.

