Phonetic Transcription for Speech-Language Pathology: Where It’s Been, Where We’re Going
The study of phonetics reaches back at least two millennia, making it one of the oldest areas of behavioral science (Ohala, 1991). Over time, the need for phonetics has expanded to include speech science, speech-language pathology, sociolinguistics, second language instruction, literacy education, and speech technologies such as automatic speech recognition and speech-to-text. An important component of phonetics is phonetic transcription, the written representation of speech sounds using a standardized set of alphabet symbols such as the International Phonetic Alphabet. Transcription is a skill acquired through extensive practice with either live or recorded speech. In speech-language pathology, transcription typically is introduced in the early years of professional education and is a mainstay of clinical practice, especially for articulation or phonological disorders/differences. Transcription takes two major forms, broad and narrow. Broad transcription is suited to the phonemic elements of a language enabling recognition of words and morphemes, whereas narrow transcription is used to capture more subtle variations such as within-segment modifications (e.g., allophones).
Although phonetic transcription has long been a part of clinical education and application in speech-language pathology, it is not a dormant topic. Within the last two decades, several books, chapters, and journal articles have been published on clinical education and clinical application of phonetic transcription. These articles will shape the future of clinical phonetics because they call for enhancements in education and practice, as discussed in the following.
Basic Sciences
Typically, instruction includes basic aspects of anatomy, physiology, and acoustics, which are the foundation for speech production. Phonetics textbooks vary widely in depth and scope of coverage of basic science. The usual approach is to describe speech produced by adults, often adult males only. However, speech is highly sexually dimorphic and varies substantially across development from infancy to adulthood. A lifespan perspective is important in providing clinical services to a diverse population.
Clinical Application
Phonetic transcription is a basic tool of the SLP (speech-language pathologist) or SLT (speech-language therapist). White et al. (2022) comment that, “SLTs are unique amongst healthcare and primary/secondary education professionals in their ability to transcribe speech phonetically, and indeed to listen to speech objectively (p. 1072). However, the nature of transcription varies across clinical settings and practitioners, and this is especially so for the use of narrow transcription (Shaw & Yanushevskaya, 2021; Windsor, 2011). Effective use of phonetic transcription entails consideration of several factors described in the following.
Reliability of Phonetic Transcription
Reports on the reliability of phonetic transcribers across transcribers are highly variable, ranging from a low of about 50% to better than 90% agreement (Amorosa et al., 1985; Gooch et al., 2001; Mallaband, 2024; Pitt et al., 2005; Seifert et al., 2020; Shriberg & Lof, 1991). Reliability is affected by several factors, including expertise of the transcriber, characteristics of the speaker, purpose and method of transcription, and type of speech material. Therefore, reliability is specific to the circumstances of a particular transcription, and generalization to other circumstances can be uncertain. The rather low values of reliability in some studies have raised concerns about the application of phonetic transcription to speech disorders and sociolinguistics (Kerswill & Wright, 1990; Mallaband, 2024; Munson et al., 2010; Oller & Eilers, 1975). Added to this concern are reports indicating low levels of confidence in phonetic transcription, especially narrow transcription, by speech-language pathologists/therapists (Knight et al., 2018; Shaw & Yanushevskaya, 2021).
Shriberg and Kent (2002) proposed an approach to measuring reliability of transcription based on the functional equivalence of transcriptions, defined as “essentially equivalent phonetic transcriptions of a target behaviour that uses alternative symbolization” (pp. 372–373). They suggested that transcriptions can be compared and categorized as to whether they are identical, functionally equivalent. or nearly functionally equivalent. Seifert et al. (2020) used this analysis, concluding that it (a) identifies differences that are deemed important and may hold clinical value (b) allows for flexibility in accounting for the influence of surrounding sounds in connected speech samples, and (c) can be used to measure the reliability of broad and narrow transcriptions, or a mixture of the two.
Teaching Methods
Suggested procedures to improve reliability and validity of clinical transcription include the following educational approaches: fieldwork-like practice (Ashby, 2007; Mompeán et al., 2011), concurrent acoustic analysis (Mompeán et al., 2011; Vaissiere, 2003), real-language and real-life materials (Mompeán et al., 2011), active learning opportunities (Roberts & Lemoncello, 2023), enhanced interactivity and feedback with online instruction (Luque et al., 2024; Mompeán et al., 2011; Titterington & Bates, 2017), and peer assessment (Krueger, 2022). From this menu of choices, instructors can select a method that is judged to be most appropriate for available resources, student backgrounds, and teaching objectives. Methods can be combined or sequenced to provide guided instruction.
Speaker Variation
The sounds of English can differ across speakers from different regions, social groups, or language backgrounds, and language variety that clinicians speak can impact the way they transcribe and analyze other varieties. Clinicians must avoid overdiagnosis and underdiagnosis of speech disorder when working with clients whose language variety differs from their own.
Technological Advances
Human audition performs very well for general purposes of speech communication, extracting sufficient information from the acoustic signal to recognize phonemes and words, stress patterns, and other features. However, it does not always detect smaller acoustic variations that may have potential clinical value. These include covert contrasts, undifferentiated gestures, and coarticulation, all of which have been ascertained primarily by acoustic or physiological methods. Several authors have noted the potential of instrumental data to supplement transcription (Amorosa et al., 1985; Babatsouli, 2015; Stemberger & Bernhardt, 2020; Titterington & Bates, 2017). Babatsouli (2015) classified speech technologies as either non-intrusive or intrusive, depending on whether the analyses are based on the emitted speech signal (non-obtrusive) or on the physiological processes on speech production (intrusive). Non-obtrusive methods are more easily implemented and are generally well tolerated by speakers at different levels of development or mental/physical status.
Swift advances in artificial intelligence and automatic speech recognition (ASR) hold promise of automated phonetic transcription (Li et al., 2025). Multilingual ASR could be helpful in the clinical description of speech from individuals with different language varieties (e.g., dialects or multilingual backgrounds). Training of these systems with an increasing number of languages may lead to virtually universal phonetic transcription, beyond the capability of highly practiced human transcribers. Some available systems include: https://autoipa.org/, https://autophon.org/, https://www.mimikstudio.com/. However, available systems are primarily developed using adult speech and face challenges when used with young children. Although these tools serve as advancements, clinicians will still need to be critical evaluators of automated technologies and their applied use in clinical practice (human in the loop), making the training of clinical phoneticians a continued priority.
Clinical Phonetics Textbook
The sixth edition of Clinical Phonetics integrates information on the various aspects of phonetics in clinical services, including topics such as articulatory and acoustic descriptions of speech sounds, coarticulation, suprasegmentals, narrow transcription, and language variations. Emphasis is given to the rich diversity of language backgrounds and communication styles across the human population. The text features opportunities for online and in-person instruction, providing sound files representing a wide range of speakers with different language variations and different speech-language disorders. Students can retrieve sound files with a click of a button and re-listen as often as desired.
The new edition builds on previous editions and the inspired work of Lawrence D. Shriberg, Ph.D. (1939–2024), who made profound clinical and research contributions to children’s speech sound disorders. We honor his memory by retaining his name as first author.
References
Amorosa, H., von Benda, U., Wagner, E., & Keck, A. (1985). Transcribing phonetic detail in the speech of unintelligible children: A comparison of procedures. International Journal of Language & Communication Disorders, 20(3), 281–287. https://doi.org/10.3109/13682828509012268
Ashby, P. (2007). Fieldwork for success. In Proceedings of the Phonetics Teaching and Learning Conference: PTLC 2007, 24–26 Aug 2007, London, UK.
Babatsouli, E. (2015). Technologies for the study of speech: Review and an application. Themes in Science and Technology Education, 8(1), 17–32.
Gooch, J. L., Hardin-Jones, M., Chapman, K. L., Trost-Cardamone, J. E., & Sussman, J. (2001). Reliability of listener transcriptions of compensatory articulations. The Cleft Palate-Craniofacial Journal, 38(1), 59–67. https://doi.org/10.1597/1545-1569_2001_038_0059_roltoc_2.0.co_
Kerswill, P., & Wright, S. (1990). The validity of phonetic transcription: Limitations of a sociolinguistic research tool. Language Variation and Change, 2(3), 255–275. https://doi.org/10.1017/S0954394500000363
Knight, R. A., Bandali, C., Woodhead, C., & Vansadia, P. (2018). Clinicians’ views of the training, use and maintenance of phonetic transcription in speech and language therapy. International Journal of Language & Communication Disorders, 53(4), 776–787. https://doi.org/10.1111/1460-6984.12381
Krueger, B. I. (2022). Using peer-assessment to mitigate variability in graduate students’ phonetic transcription skills. Clinical Linguistics & Phonetics, 36(6), 579–596. https://doi.org/10.1080/02699206.2021.1955300
Li, Y., Pham, D. S., Ward, R., Hennessey, N., & Tan, T. (2025). Using AI to automate phonetic transcription and perform forced alignment for clinical application in the assessment of speech sound disorders. In Workshop on Large Language Models and Generative AI for Health at AAAI 2025.
Luque, J. S., Tessel, C. A., & Grover, V. (2024). Teaching phonetics online: Lessons from before and during the pandemic. Perspectives of the ASHA Special Interest Groups, 9(6), 1829–1835. https://doi.org/10.1044/2024_PERSP-23-00146
Mallaband, L. J. (2024). The agreement of phonetic transcriptions between paediatric speech and language therapists transcribing a disordered speech sample. International Journal of Language & Communication Disorders, 59(5), 1981–1995. https://doi.org/10.1111/1460-6984.13043
Mompeán, J. A., Ashby, M., & Fraser, H. (2011). Phonetics teaching and learning: An overview of recent trends and directions. In Proceedings of the International Congress of Phonetic Sciences (pp. 96–99).
Munson, B., Edwards, J., Schellinger, S. K., Beckman, M. E., & Meyer, M. K. (2010). Deconstructing phonetic transcription: Covert contrast, perceptual bias, and an extraterrestrial view of Vox Humana. Clinical Linguistics & Phonetics, 24(4–5), 245–260. https://doi.org/10.3109/02699200903532524
Ohala, J. J. (1991, August). The integration of phonetics and phonology. In Proceedings of the 12th International Congress of Phonetic Sciences (Vol. 1, pp. 1–16).
Oller, K., & Eilers, R. E. (1975). Phonetic expectation and transcription validity. Phonetica, 31(3–4), 288–304. https://doi.org/10.1159/000259675
Pitt, M.A., Johnson, K., Hume, E., Kiesling, S. & Raymond, W. (2005). The Buckeye corpus of conversational speech: Labeling conventions and a test of transcriber reliability. Speech Communication, 45(1), 89–95.
Roberts, T., & Lemoncello, R. (2023). Partnering with students to increase engagement and inclusion in an undergraduate phonetics & acoustics course. Teaching and Learning in Communication Sciences & Disorders, 7(1). https://doi.org/10.30707/TLCSD7.1.1675490380.893709
Seifert, M., Morgan, L., Gibbin, S., & Wren, Y. (2020). An alternative approach to measuring reliability of transcription in children’s speech samples: Extending the concept of near functional equivalence. Folia Phoniatrica et Logopaedica, 72(2), 84–91. https://doi.org/10.1159/000502324
Shaw, Á., & Yanushevskaya, I. (2021). Students’ views and experiences of the training and use of phonetic transcription in speech and language therapy–the Irish perspective. Clinical Linguistics & Phonetics, 36(2–3), 276–291. https://doi.org/10.1080/02699206.2021.1874055
Shriberg, L. D., & Kent, R. D. (2002). Clinical phonetics (3rd ed). Pearson.
Shriberg, L. D., & Lof, G. L. (1991). Reliability studies in broad and narrow phonetic transcription. Clinical Linguistics & Phonetics, 5(3), 225–279. https://doi.org/10.3109/02699209108986113
Stemberger, J. P., & Bernhardt, B. M. (2020). Phonetic transcription for speech-language pathology in the 21st century. Folia Phoniatrica et Logopaedica, 72(2), 75–83. https://doi.org/10.1159/000500701
Titterington, J., & Bates, S. (2017). Practice makes perfect? The pedagogic value of online independent phonetic transcription practice for speech and language therapy students. Clinical Linguistics & Phonetics, 32(3), 249–266. https://doi.org/10.1080/02699206.2017.1350882
Vaissiere, J. (2003). New tools for teaching phonetics. In Proc. 15th ICPhS (pp. 309–312).
White, S., Hurren, A., James, S., & Knight, R. A. (2022). ‘I think that's what I heard? I'm not sure’: Speech and language therapists’ views of, and practices in, phonetic transcription. International Journal of Language & Communication Disorders, 57(5), 1071–1084. https://doi.org/10.1111/1460-6984.12740
Windsor, F. (2011) The broad or narrow way? Speech and Language Therapy in Practice, Winter, 14–16.
