Home / Gadgets / Voice assistants don’t work for kids: The problem with speech recognition in the classroom

Voice assistants don’t work for kids: The problem with speech recognition in the classroom


Before the pandemic, more than 40% of new internet users were children. Estimates now suggest that children’s screen time has surged by 60% or more with children 12 and under spending upward of five hours per day on screens (with all of the associated benefits and perils).

Although it’s easy to marvel at the technological prowess of digital natives, educators (and parents) are painfully aware that young “remote learners” often struggle to navigate the keyboards, menus and interfaces required to make good on the promise of education technology.

Against that backdrop, voice-enabled digital assistants hold out hope of a more frictionless interaction with technology. But while kids are fond of asking Alexa or Siri to beatbox, tell jokes or make animal sounds, parents and teachers know that these systems have trouble comprehending their youngest users once they deviate from predictable requests.

The challenge stems from the fact that the speech recognition software that powers popular voice assistants like Alexa, Siri and Google was never designed for use with children, whose voices, language and behavior are far more complex than that of adults.

It is not just that kid’s voices are squeakier, their vocal tracts are thinner and shorter, their vocal folds smaller and their larynx has not yet fully developed. This results in very different speech patterns than that of an older child or an adult.

From the graphic below it is easy to see that simply changing the pitch of adult voices used to train speech recognition fails to reproduce the complexity of information required to comprehend a child’s speech. Children’s language structures and patterns vary greatly. They make leaps in syntax, pronunciation and grammar that need to be taken into account by the natural language processing component of speech recognition systems. That complexity is compounded by interspeaker variability among children at a wide range of different developmental stages that need not be accounted for with adult speech.

vocal pitch changes with age

Changing the pitch of adult voices used to train speech recognition fails to reproduce the complexity of information required to comprehend a child’s speech. Image Credits: SoapBox Labs