Transcription: Creating a Record of Speech

By Ken Bleile, PhD
November 11, 2020

The topic of this blog is transcription. I suspect you may be thinking, “I don’t think that sounds very interesting.” When you finish the blog, I believe you will say, “Yup. Not very interesting.” However, I hope you also add (whether you say it aloud of not), “But valuable” (or maybe just “useful”).

Here are four practical suggestions to help create a reliable transcription of speech.

1. Development and Transcription

Infants: Either transcribe the entire vocalization phonetically or use a checklist.  Typically, an infant checklist displays five developmental levels of vocalizations (vegetative, cooing, vocal play, canonical babbling, variegated babbling) and a clinician checks off the category in which a vocalization belongs.

Toddlers and preschoolers: Transcribe the whole word, because the presence of a sound in one part of a word can affect the production of another sound elsewhere in the word.

Students: Typically, transcribe isolated sounds or use checkmarks if errors affect individual late acquired sounds.

2. Recording Speech

If you believe recording speech improves your clinical work, then record, but remember that it takes just as long (and usually much longer!) to listen to an audio recording or watch a video as it did to make the recording and that transcribing is likely to require much more additional time. Because of the time it takes, extensive recording of children’ speech is probably not practical outside of university settings. For this reason, many clinicians restrict their recordings to challenging clinical cases, to educational purposes, and—less often—to document clinical progress.

A suggestion: If recording a child’s speech, repeat the word after the child so that you can later identify what the child said. Also say in a quiet voice any characteristics that might be difficult to identify on the tape (e.g., “Oh, I saw your tongue between your teeth”).

3. Special Symbols and Diacritics

Extensive use of special symbols and diacritics consumes time and may lead to transcriptions in which phonetic symbols become lost among the accompanying wiggles, wavy lines, and circles. On the other hand, too few diacritics result in missing clinically significant aspects of speech. The following general rules may help in deciding which diacritics to include in transcriptions:

•  Include only clinically relevant diacritics. Do not attempt the nearly impossible task of transcribing all phonetic details in a child’s speech.

•  Exclude diacritics that describe relatively minor, predictable aspects of speech production. For example, you would not use diacritics to indicate a child produces [s] with lip rounding when preceding [w] in sweet or that a student usually produces [r] without voicing when following voiceless consonants in such words as pride.

•  Exclude diacritics that describe aspects of speech that a child produces in an adult manner. For example, do not use diacritics to indicate stress patterns and syllable boundaries that conform to that of the adult language. To illustrate, do not use a diacritic to indicate stress if the child says begin with stress on the second syllable, but do so if the child says the same word with stress on the first syllable.

•  If you like, on the top of the first page of the transcription sheet, list the diacritics you are likely to need based on a child’s level of speech development.

•  While transcribing a child’s speech, if you hear a sound pronounced in a way that you cannot readily describe, transcribe the closest approximation you can, and place an “X” under it. For example, if you hear something “[s]-like,” but somehow different than standard [s], transcribe it as [s] with an “X” underneath. Continue placing “X” under the [s] until you identify how the child is producing the sound. When this occurs, define the “X.” For example, you might indicate at the bottom of the page that “X = voiceless lateral postalveolar fricative.”

4. Common Errors

Labiodental Stops: Labiodental stops typically replace bilabial stops. They sound like bilabial stops, but the upper teeth touch the lower lip, similar or identical to the position for [f] and [v]. The labiodental symbol is a little “m” under the consonant.

Bilabial Fricatives: Bilabial fricatives typically replace labiodental fricatives ([f] and [v]). Bilabial fricatives often sound like a cross between a fricative and a glide. The lips come together as for [p] and [b]. The labiodental symbols for unvoiced and voiced consonants are [φ] and [β].             

Unaspirated Stops:  Unaspirated stops typically replace aspirated stops. Voiceless unaspirated stops often are mistaken for voiced stops. For a person who is unfamiliar with voiceless unaspirated stops, the sound often "jumps" between a voiceless and voiced sound. To illustrate, if a [b] "jumps" in perception between [p] and [b], it may be a voiceless unaspirated stop. The symbol for unaspirated stops is a raised little open circle to the right of the consonant [p˚].

If you want to learn to hear these sounds, ask a speaker of a language that has voiceless, unaspirated stops to produce [p], [t], or [k]. Alternately, sometimes making a sound facilitates hearing it better. To do this, place your hand in front of your mouth and say [ph]. Feel the puff of air on your hand. Now, keep repeating [ph], working to reduce the puff of air on your hand. When the puff of air is gone, but the sound is not quite [b], you probably have made a voiceless unaspirated [p].

Glottal Stops: Glottal stops may replace a variety of consonants. Before a vowel in the same syllable (for example, replacing [t] in two), a glottal stop may sound like a sudden onset of the vowel. Between vowels (for example, replacing [d] in middle), it may sound like someone imitating a Cockney accent. At the end of a syllable following a vowel (for example, replacing [d] in hid), it may sound like the preceding vowel was suddenly cut off. The symbol for a glottal stop is [ʔ].

[w]-coloring: [w]-coloring of [r] replaces consonantal [r]. With rounded lips, [r] sounds like a cross between [w] and [r]. The symbol for [w] coloring of consonantal [r] is a small [w] under the consonant. Alternately, some place a small raised [w] to the right of [r].

Lisped Sounds: Lisped sounds typically replace [s] and [z].  A child can make a lisped sound either with the tongue touching the teeth or protruding slightly between the teeth as for [θ] or [ð]; sometimes the tongue either is more forward or retracted than for [θ] and [ð]. The symbol for lisped [s] or [z] is a small tooth under the affected consonant.

Bladed Sounds: Bladed sounds typically replace [s] and [z]. A bladed consonant is made with the tongue blade raised and may sound like the sound that begins “she” or the consonant in “azure.” The symbol for bladed consonants is a small half oval (like a frown) under the affected consonant.

Lateral Sounds: Lateral sounds typically replace [s] and [z]. A lateral sound is made with one or both sides of the tongue lowered so that, similar to [s], air flows over the sides of the tongue rather than over its center (Gibbon, 1999; Goozée et al., 2007). Sometimes you can feel air near the sides of a person’s mouth as they make a lateralized sound. A lateralized sound can be voiced or voiceless and sometimes sounds wet. A lateralized sound sometimes is difficult to distinguish from a bladed sound. The symbol for lateral [s] is [ʪ] and for lateral [s] is [ʫ].

Wet Sounds: Wet sounds may accompany one or several sounds, especially in the speech of a person with oral motor difficulties. Wet sounds sound, well, wet. They also sound slushy and may be accompanied by drooling. Any sound may be made wet. The symbol for wet sounds is a little open circle under the symbol. To illustrate, the symbol for a wet [t] is a small open circle under the [t].


Gibbon, F. E. (1999). Undifferentiated lingual gestures in children with articulation/phonological disorders. Journal of Speech, Language, and Hearing Research, 42, 382–397.

Goozée, J. V., Murdoch, B., Ozanne, A., Cheng, Y., Hill, A., & Gibbon, F. (2007). Lingual kinematics and coordination in speech-disordered children exhibiting differentiated versus undifferentiated lingual gestures. International Journal of Language & Communication Disorders, 42, 703–724.