8 min read

    Voice-First UX: Designing AI for Blind Users

    by Deep Parmar

    CTO at Sunbots Innovations LLP | Director at Xwits Developers Pvt Ltd

    Voice-First UX for Blind Users | Deep Parmar

    The Screen-Free Design Constraint

    Most UX design happens on screens. Even voice assistants like Siri and Alexa are often designed with a parallel visual interface — the phone screen, the Echo Show display, the card on a smart TV. The design process assumes someone will eventually see something.

    SmartON has no such assumption. Our users are visually impaired. They interact with the device entirely through voice: questions spoken, responses heard. There is no fallback to a visual interface. This constraint forces design decisions that most voice UI projects never have to make.

    Here are the design principles we developed through user testing with visually impaired users in Ahmedabad.

    Principle 1: Responses Must Be Scannable by Ear

    Sighted users scan visual content — they glance at headings, skip to the relevant section, ignore irrelevant details. Blind users listen linearly. This means every piece of unnecessary information in a response is time the user must sit through before getting to what they need.

    The response format we converged on for currency detection: "[denomination]. [orientation]." Two pieces of information. No preamble. No explanation. The user asked "what note is this?" — they don't need to hear "I have identified the currency note you're pointing at as..." before the answer.

    A useful exercise: read your responses aloud and measure how long it takes to hear the actionable information. If it takes more than 3 seconds to hear something the user can act on, the response is too long.

    Principle 2: Assume Nothing About Spatial Awareness

    Visual interfaces use spatial metaphors constantly: "the button in the top right corner," "swipe left," "the icon next to the search bar." These metaphors are meaningless without vision.

    In MIRA's scene descriptions, we replaced spatial metaphors with action-oriented instructions. Not "glass door at the 2 o'clock position from your current orientation" but "glass door slightly to your left — push bar at waist height." The first is accurate; the second is actionable.

    This applies to navigation as well. "Turn right in 50 meters" is a spatial instruction that works for sighted users. "Walk forward until the texture changes, then face left toward the sound of traffic" is an instruction calibrated to non-visual perception.

    Principle 3: Explicit State, Always

    Sighted users can see when an application is loading, processing, or waiting for input. Visual affordances — spinning indicators, highlighted buttons, grayed-out states — communicate application state without words.

    In a voice interface, application state must be communicated explicitly. When MIRA receives a request and starts processing, it says "checking..." immediately — before the inference completes. When it's waiting for input, a soft chime signals "I'm listening." When it encounters an error, it says what happened and what the user can do next.

    Silence is particularly dangerous in a voice interface for blind users. A sighted user who gets no response from an app can see whether the app is frozen, loading, or has crashed. A blind user who hears silence has no way to distinguish between processing and failure.

    Principle 4: Interrupt and Override Must Be Instant

    If MIRA is reading a long document section and the user wants to stop, they need a way to interrupt immediately. In a visual interface, users can click anywhere — there are always interactive elements visible. In a voice interface, interruption requires either a wake word, a physical button, or a silence-based detection (detecting that the user is speaking while the system is speaking).

    We implemented a hardware button on SmartON's Jetson Nano unit that immediately halts any current speech output and returns to listening mode. The button is positioned at a consistent location (top center of the unit) and has a distinct tactile feel. Software-only interruption — wake words and silence detection — is less reliable than hardware for critical use cases.

    Principle 5: Test with Real Users, Not Simulated Impairment

    The most important principle is the simplest: test with visually impaired users, not with sighted testers using blindfolds. Sighted users who simulate blindness during testing use learned strategies for navigating without vision — "I'll just count steps" — that aren't representative of users who have developed different adaptive strategies over years or decades.

    Our most important product insights came from sessions where users found failure modes we'd never considered: a user discovered that the currency detection was unreliable when the note was held at arm's length (we'd tested only at close range); another discovered that the audio feedback volume was inaudible in a typical market environment (we'd tested in a quiet office).

    Real-world testing in authentic environments is irreplaceable for accessibility products. Lab testing with simulated impairment produces products that work in labs.

    Building accessible AI? I'm happy to talk through voice UX design. Reach out → or read more about SmartON at getsmartonai.com →

    Frequently Asked Questions

    Quick answers about this topic — also indexed by AI search engines via FAQPage schema.

    Share this article: