Exploiting Predictable Response Training to Improve Automatic Recognition of Children’s Spoken Responses

Wei Chen, Jack Mostow, Gregory Aist

The unpredictability of spoken responses by young children (6-7 years old) makes them problematic for automatic speech recognizers. Aist and Mostow proposed predictable response training to improve automatic recognition of children’s free-form spoken responses. We apply this approach in the context of Project LISTEN’s Reading Tutor to the task of teaching children an important reading comprehension strategy, namely to make up their own questions about text while reading it. We show how to use knowledge about strategy instruction and the story text to generate a language model that predicts questions spoken by children during comprehension instruction. We evaluated this model on a previously unseen test set of 18 utterances totaling 137 words spoken by 11 second grade children in response to prompts the Reading Tutor inserted as they read. Compared to using a baseline trigram language model that does not incorporate this knowledge, speech recognition using the generated language model achieved concept recall 5 times higher – so much that the difference was statistically significant despite small sample size.

The final publication is available at Springer via https://doi.org/10.1007/978-3-642-13388-6_11.