Home / Mobile / iPhones can now automatically recognize and label buttons and UI features for blind users

iPhones can now automatically recognize and label buttons and UI features for blind users

Apple has always gone out of its way to build features for users with disabilities, and VoiceOver on iOS is an invaluable tool for anyone with a vision impairment — assuming every element of the interface has been manually labeled. But the company just unveiled a brand new feature that uses machine learning to identify and label every button, slider and tab automatically.

Screen Recognition, available now in iOS 14, is a computer vision system that has been trained on thousands of images of apps in use, learning what a button looks like, what icons mean and so on. Such systems are very flexible — depending on the data you give them, they can become expert at spotting cats, facial expressions or, as in this case, the different parts of a user interface.

The result is that in any app now, users can invoke the feature and a fraction of a second later every item on screen will be labeled. And by “every,” they mean every — after all, screen readers need to be aware of every thing that a sighted user would see and be able to interact with, from images (which iOS has been able to create one-sentence summaries of for some time) to common icons (home, back) and context-specific ones like “…” menus that appear just about everywhere.

The idea is not to make manual labeling obsolete — developers know best how to label their own apps, but updates, changing standards and challenging situations (in-game interfaces, for instance) can lead to things not being as accessible as they could be.

I chatted with Chris Fleizach from Apple’s iOS accessibility engineering team, and Jeff Bigham from the AI/ML accessibility team, about the origin of this extremely helpful new feature. (It’s described in a paper due to be presented next year.)

A phone showing a photo of two women smiling and voiceover describing the photo

Image Credits: Apple

“We looked for areas where we can make inroads on accessibility, like image descriptions,” said Fleizach. “In iOS 13 we labeled icons automatically — Screen Recognition takes it another step forward. We can look at the pixels on screen and identify the hierarchy of objects you can interact with, and all of this happens on device within tenths of a second.”

The idea is not a new one, exactly; Bigham mentioned a screen reader, Outspoken, which years ago attempted to use pixel-level data to identify UI elements. But while that system needed precise matches, the fuzzy logic of machine learning systems and the speed of iPhones’ built-in AI accelerators means that Screen Recognition is much more flexible and powerful.

It wouldn’t have been possible just a couple of years ago — the state of machine learning and the lack of a dedicated unit for executing it meant that something like this would have been extremely taxing on the system, taking much longer and probably draining the battery all the while.

But once this kind of system seemed possible, the team got to work prototyping it with the help of their dedicated accessibility staff and testing community.

“VoiceOver has been the standard-bearer for vision accessibility for so long. If you look at the steps in development for Screen Recognition, it was grounded in collaboration across teams — Accessibility throughout, our partners in data collection and annotation, AI/ML, and, of course, design. We did this to make sure that our machine learning development continued to push toward an excellent user experience,” said Bigham.