Speech Perception: The End Is the Beginning

Blog #6 in the Phonology Means Nothing and Other Astounding and Very Practical Facts about Speech Sound Disorders Blog Series

For more information about this series, see the Phonology Means Nothing Series welcome page

By Jaimie L. Gilbert, PhD and Ken Bleile, PhD
July 15, 2020

Speech reception entails sound traveling via air, bone and tissue, fluid, and electricity. Where speech reception ends, speech perception begins, turning electrical impulses into meaningful speech and communication. 

Splice It

An end goal of speech sound production is to communicate a message or intent. That is, listeners need to not only receive speech but also perceive speech. So how do humans make sense of the received coded auditory information? My suggestion is that we “SPLICE” it, using Sensory cues, Perceptual decisions, Linguistic cues, Indexical cues, Cognitive skills, and Environmental cues.

S: Sensory Cues

The auditory pathway and the encoding of sound obviously play a critical role in speech reception and speech perception. However, remember that other senses are simultaneously providing sensory input and information that can influence speech reception and speech perception. To illustrate, while a listener hears speech, they may also see the speaker, may feel the ground beneath their feet, or smell a cup of coffee in their hand, all of which may influence their speech perception.

P: Perceptual Decisions

A percept is the meaning assigned to the sensory input. The decision of what meaning to assign to sensory input is influenced by past knowledge and experiences. Given this, we interpret sensory input and make decisions about what our senses are telling us, which may or may not be the same as the intended message.

L: Linguistic Cues

Language and linguistics impose a sort of structure on which to map the received sound and determine which possible meaning (allowed according to linguistic rules) it best matches. That is, when making perceptual decisions regarding speech, listeners apply their knowledge of language(s). Specifically, knowing which sounds are present in a language and how they can be connected (phonology) limits the possibilities of what a speaker said. Similarly, knowledge of how a language sequences sound may alter their acoustic characteristics and can affect the ability to correctly interpret and perceive speech. At the syllabic or morphemic level, knowing what prefixes or endings words can take (morphology), meanings they represent (semantics), how morphemes are sequenced (syntax), and how language is shaped by context (discourse-pragmatics) all influence perceptual decisions.

I: Indexical Cues

Different talkers have differently shaped vocal tracts, leading to variations in the acoustic characteristics of their voices. Who produced the speech sound can influence a listener’s perception of the speech in different ways (Kreiman & Sidtis, 2011). For example:

 •  A speaker may have specific quirks in how they speak.

•  A listener may be very familiar with the speaker, or perhaps doesn’t know the speaker at all (see, for example, Nygaard, Sommers, & Pisoni, 1994).

•  A speaker may have an either familiar or unfamiliar accent or a dialect (see, for example, Bradlow & Bent, 2008).

C: Cognitive Skills

Connected speech entails rapid processing, even when the speech is slow, requiring our speech reception and perception systems to keep pace with masses of the incoming information. Memory and attention, among many other cognitive skills, help us process speech and make use of linguistic knowledge.


Some information needs to be stored in memory as we make the best match or the best guess as to what a speaker said (see, for example, Pisoni, 1973, 1993). Receiving and perceiving speech memory systems requires:

•  Short-term memory (what was said just prior, what were the characteristics of the prior speech)

•  Working memory (ability to manipulate short-term memory to, for example, weigh different possibilities of the intended message based on speech characteristics)

•  Long-term memory (knowledge of semantics and syntax)


Attention skills influence our ability to focus on incoming auditory information, especially when the message competes for a listener’s attention. For example, speech perception occurs while a listener might also be checking notifications on their phone, texting, or wondering what they will have for lunch.

E: Environmental Cues

Do you remember that the sound wave travels through air? That it travels through air and through bone, tissue, and fluid before becoming electricity? Now imagine that sound wave mixing with one, two, three, or a dozen different sound waves occurring in the environment at the same time. Speech sounds travel through air before reaching the ear, traversing auditory pathway and being perceived. However, speech sounds are not the only sounds that reach the ear. Perhaps you can picture a family gathering where many people are all talking at once or a restaurant with sounds from the kitchen, utensils clattering on dishes, and multiple conversations. These examples illustrate that speech you want to hear does not often occur in an isolated environment. It competes with other sounds.

Another environmental feature (although it could also be considered another branch of linguistics) is the pragmatic environment. What is the purpose of the communication? Is it personal or professional? What is the social context? For example, are you interviewing for a job, trying to impress someone, talking with your mom on the phone, or goofing off with your best friend? All these varied contexts may influence speech perception.

And, if all this SPLICING seems complicated, remember that the human brain processes 10 to 14 speech sounds per second. A hearing loss means that listeners lose information about sound, complicating matters even further.  


Bradlow, A. R., & Bent, T. (2008). Perceptual adaptation to non-native speech. Cognition, 106, 707–729.

Kreiman, J., & Sidtis, D. (2011). Foundations of voice studies: An interdisciplinary approach to voice production and perception. Malden, MA: Wiley-Blackwell.

Nygaard, L. C., Sommers, M. S., & Pisoni, D. B. (1994). Speech perception as a talker-contingent process. Psychological Science, 5, 42–46.

Pisoni, D. B. (1973). Auditory and phonetic memory codes in the discrimination of consonants and vowels. Perception & Psychophysics, 13, 253–260.

Pisoni, D. B. (1993). Long-term memory in speech perception: Some new findings on talker variability, speaking rate and perceptual learning. Speech Communication, 13, 109–125.