Designing a voice AI that callers can just talk to — in Hindi, English or Hinglish — and the builder that lets a business launch one from a single sentence.
ROLE
Sole IC designer
TEAM
PM (AI) · Voice Engineering
SCOPE
Conversation design · call UX · bot builder
STATUS
In production · 24/7
MY SLICE
I led conversation design, the call experience and the bot-builder UX. Voice engineering owned the speech stack and latency; PM (AI) owned scope and rollout.
3 languages
Hindi · English · Hinglish, mid-call switching
Sub-second
turn latency budget — silence is the enemy
Barge-in
interrupt the bot mid-sentence, it yields
Full context
carried into every human handoff
( THE PROBLEM )
“Press 1 to repeat these options”
For twenty years, calling a business meant navigating a keypad maze designed around the company's org chart — not the caller's problem. People don't think in menus. They just say what they want.
THE OLD IVR
> Press 1 for orders
> Press 2 for payments
> Press 3 for delivery
> Press 9 to repeat…
╳ caller hangs up at level 3
THE GEN-AI VOICEBOT
“Haan, mera order abhi tak nahi aaya — refund chahiye.”
Samajh gayi — order #48210, 45 minute late hai. Refund abhi process kar doon, ya reschedule better rahega?
✓ intent + language + emotion understood, zero menus
Unlike a rigid IVR, the voicebot understands varied phrasing, handles follow-ups mid-flow, and — when it can't resolve — escalates to a human with the full conversation attached, so nobody repeats themselves.
( RESEARCH — BEFORE A SINGLE SCREEN )
We Listened to a Thousand Calls First
A voicebot has two users who never meet: the caller on the line, and the business team who builds and babysits it. The research had to cover both — before we designed either side.
01 · CALL RECORDINGS
Hours of real IVR recordings, mapped for the exact moment callers gave up — mashing 0, repeating themselves, hanging up at menu level three.
02 · FLOOR SHADOWING
Sat beside agents taking the calls bots escalated — to learn what a “good handoff” actually needs on the receiving end.
03 · BUILDER INTERVIEWS
Interviewed the ops and product people who'd actually configure the bot — how they describe a policy, what they fear going live, what “testing” means to them.
04 · WIZARD-OF-OZ CALLS
Before the AI was ready, humans played the bot over real phone lines — scripted tone, deliberate pauses — to measure how much latency and formality callers would forgive.
WHAT WE HEARD → WHAT IT BECAME
“The moment I hear a menu, I start pressing zero.”
— caller, 34 · shadowed IVR session
→ No menus. Intent understood from the first sentence.
“If it goes silent for two seconds, I think the call dropped.”
— Wizard-of-Oz pilot participant
→ The sub-second latency budget became a hard design requirement.
“I don't want to design a flowchart. I want to tell it the policy.”
— ops lead · builder interview
→ The sentence-first builder: identity and objective as plain editable text.
“I won't put this in front of customers until I've called it myself.”
— product owner · pilot rollout
→ “Talk to your bot” — test on your own phone before a single customer hears it.
( THE CALL, DESIGNED )
A Voice You Can Interrupt
The hardest part of voice isn't what the bot says — it's the timing. Every design decision serves the rhythm of real speech.
LISTENING00:41
“EMI due date badha sakte ho is month?”
Bilkul. Aapki EMI ₹4,250 hai, due 10th ko. Main 17th tak extend kar sakti hoon — ek baar confirm…
⚡ caller interrupts — bot yields mid-sentence
“Haan haan, 17th theek hai — link bhej do.”
Done — 17th lock ho gaya. Payment link SMS pe aa raha hai. 😊
↳ 640ms turnhinglish detectedcommitment captured
DECISION 01 · LATENCY IS UX
On voice, 2 seconds of silence feels broken. We treated the sub-second turn as a design requirement — streaming speech recognition, noise cancellation and response queues all serve that budget.
DECISION 02 · INTERRUPTION IS NORMAL
Humans talk over each other. Barge-in handling lets the caller cut the bot off at any word — the bot stops, listens, and picks the thread back up without restarting its script.
DECISION 03 · SPEAK THE CALLER'S LANGUAGE
Most Indian callers code-switch mid-sentence. The bot detects Hindi, English or Hinglish per utterance and mirrors it — no “For Hindi, press 2.”
DECISION 04 · EMPATHY OVER EFFICIENCY
Intent and emotion drive the response. A frustrated repeat caller gets a shorter path and a softer tone — not the standard script, faster.
( WHAT THE SYSTEM HAD TO DO )
Six Capabilities, One Voice
01
Deep personalization
Responses draw on customer data, history and intent signals — a repeat caller is never greeted like a stranger.
02
Flexible across use cases
Sales, support, collections — tone, workflows and guardrails configure to the business, its policies and its brand voice.
03
Learns from your data
Documents, rules and structured inputs ground every answer — and the knowledge keeps improving with each conversation.
04
Human-like conversation
Understands intent and emotion, responds with empathy, and moves naturally across languages and follow-ups.
05
Low-latency voice engine
Streaming speech recognition, noise cancellation and interruption handling keep the rhythm of real speech.
06
Enterprise-ready scale
API integrations, high availability, fallbacks, and continuous optimization — designed for millions of calls, not demos.
( ONE BOT, MANY JOURNEYS )
Designed as a System, Deployed as Journeys
◆Sales pitches that adapt & convert◆Instant support, seamless escalation◆Lead qualification at scale◆Healthcare appointment booking◆Service scheduling◆EMI & payment reminders◆ Multilingual by default
The design challenge wasn't any single journey — it was a conversation framework where a collections reminder and a sales pitch share the same voice engine, guardrails and handoff patterns, differing only in workflow and tone.
( THE OTHER USER: THE BUSINESS )
Built From a Sentence
A voicebot is only as good as the team that configures it. So instead of a flowchart editor, the builder starts with plain language — describe the callers, the goal and the guardrails, and the AI designs the call flow, tests it and launches.
exotel · build an AI voicebot
↑ step 1 · describe — one sentence in, a working voicebot out
build bot · configuration, refined with the co-pilot
↑ step 2 · refine — a co-pilot rewrites greeting, objective and tone; identity & objective are plain editable text, not flowchart nodes
talk to your bot · test with a real call before launch
↑ step 3 · test — call your own bot on a real phone line or in the browser, before a single customer hears it
( WHEN THE BOT STEPS ASIDE )
The Handoff is the Product
A voicebot that traps people is worse than an IVR. When a case is complex or emotional, the bot transfers to a human — with the conversation summary, captured details and sentiment already on the agent's screen.
1 · BOT DETECTS THE LIMIT
Fraud dispute — sensitive, emotional, policy-bound. The bot doesn't improvise; it routes.
2 · CONTEXT PACKAGED
Summary, intent, captured details and a frustrated sentiment flag travel with the call.
3 · HUMAN PICKS UP WARM
The agent's Customer Context Panel is pre-filled. Nobody says “please repeat that.”
( THE NEXT LEVEL: HUMANS ON THE LOOP )
The Agent-Monitored Contact Center
Bots don't replace the floor — they change what the floor watches. Agents supervise live bot conversations, whisper corrections mid-call, and take over the moment a conversation turns critical.
live monitoring · one agent, many bot conversations
↑ the monitor — a crisis-flagged call: AI-ranked queue, live summary, transcript, whisper & takeover
transcript search · live highlight
Search a transcript while the call is still running — every match highlighted, jump between results.
the queue · resolved & empty states
The queue's quieter moments — conversations resolve out of the list, and the empty state invites the next pick.
RECOMMENDED QUEUE
One agent can't watch a hundred calls. AI ranks live conversations by sentiment, urgency and crisis signals — the queue shows the five that need eyes now.
WHISPER MODE
Agents coach the bot silently mid-call — “offer the voucher, skip the survey” — and the bot adjusts without the caller ever hearing a second voice.
TRANSCRIPT SEARCH
Live transcripts are searchable while the call is still running — jump to every mention of “refund” before deciding to step in.
ONE-TAP TAKEOVER
When a call is flagged Crisis, the agent takes over mid-sentence — the bot bows out, the human inherits transcript, summary and sentiment.
tenant analytics · every bot in one view
The tenant pulse — combined call volume, blended resolution rate and bot health, one glance.
bot performance leaderboard
Every bot ranked by volume and resolution — warning and critical bots earn attention before callers feel it.
Voice is the oldest interface. We just made software fluent in it.
The measure of this design isn't how smart the bot sounds — it's how rarely the caller notices they're talking to one, and how gracefully it steps aside when they should't be.