GEN-AI VOICEBOT · EXOTEL

Conversations,
Not Call Trees

Designing a voice AI that callers can just talk to — in Hindi, English or Hinglish — and the builder that lets a business launch one from a single sentence.

ROLE
Sole IC designer
TEAM
PM (AI) · Voice Engineering
SCOPE
Conversation design · call UX · bot builder
STATUS
In production · 24/7
MY SLICE
I led conversation design, the call experience and the bot-builder UX. Voice engineering owned the speech stack and latency; PM (AI) owned scope and rollout.
3 languages
Hindi · English · Hinglish, mid-call switching
Sub-second
turn latency budget — silence is the enemy
Barge-in
interrupt the bot mid-sentence, it yields
Full context
carried into every human handoff
( THE PROBLEM )

“Press 1 to repeat these options”

For twenty years, calling a business meant navigating a keypad maze designed around the company's org chart — not the caller's problem. People don't think in menus. They just say what they want.

THE OLD IVR
> Press 1 for orders
> Press 2 for payments
> Press 3 for delivery
> Press 9 to repeat…
╳ caller hangs up at level 3
THE GEN-AI VOICEBOT
“Haan, mera order abhi tak nahi aaya — refund chahiye.”
Samajh gayi — order #48210, 45 minute late hai. Refund abhi process kar doon, ya reschedule better rahega?
✓ intent + language + emotion understood, zero menus

Unlike a rigid IVR, the voicebot understands varied phrasing, handles follow-ups mid-flow, and — when it can't resolve — escalates to a human with the full conversation attached, so nobody repeats themselves.

( RESEARCH — BEFORE A SINGLE SCREEN )

We Listened to a Thousand Calls First

A voicebot has two users who never meet: the caller on the line, and the business team who builds and babysits it. The research had to cover both — before we designed either side.

01 · CALL RECORDINGS
Hours of real IVR recordings, mapped for the exact moment callers gave up — mashing 0, repeating themselves, hanging up at menu level three.
02 · FLOOR SHADOWING
Sat beside agents taking the calls bots escalated — to learn what a “good handoff” actually needs on the receiving end.
03 · BUILDER INTERVIEWS
Interviewed the ops and product people who'd actually configure the bot — how they describe a policy, what they fear going live, what “testing” means to them.
04 · WIZARD-OF-OZ CALLS
Before the AI was ready, humans played the bot over real phone lines — scripted tone, deliberate pauses — to measure how much latency and formality callers would forgive.
WHAT WE HEARD → WHAT IT BECAME
“The moment I hear a menu, I start pressing zero.”
— caller, 34 · shadowed IVR session
→ No menus. Intent understood from the first sentence.
“If it goes silent for two seconds, I think the call dropped.”
— Wizard-of-Oz pilot participant
→ The sub-second latency budget became a hard design requirement.
“I don't want to design a flowchart. I want to tell it the policy.”
— ops lead · builder interview
→ The sentence-first builder: identity and objective as plain editable text.
“I won't put this in front of customers until I've called it myself.”
— product owner · pilot rollout
→ “Talk to your bot” — test on your own phone before a single customer hears it.
( THE CALL, DESIGNED )

A Voice You Can Interrupt

The hardest part of voice isn't what the bot says — it's the timing. Every design decision serves the rhythm of real speech.

LISTENING 00:41
“EMI due date badha sakte ho is month?”
Bilkul. Aapki EMI ₹4,250 hai, due 10th ko. Main 17th tak extend kar sakti hoon — ek baar confirm…
⚡ caller interrupts — bot yields mid-sentence
“Haan haan, 17th theek hai — link bhej do.”
Done — 17th lock ho gaya. Payment link SMS pe aa raha hai. 😊
↳ 640ms turn hinglish detected commitment captured
DECISION 01 · LATENCY IS UX
On voice, 2 seconds of silence feels broken. We treated the sub-second turn as a design requirement — streaming speech recognition, noise cancellation and response queues all serve that budget.
DECISION 02 · INTERRUPTION IS NORMAL
Humans talk over each other. Barge-in handling lets the caller cut the bot off at any word — the bot stops, listens, and picks the thread back up without restarting its script.
DECISION 03 · SPEAK THE CALLER'S LANGUAGE
Most Indian callers code-switch mid-sentence. The bot detects Hindi, English or Hinglish per utterance and mirrors it — no “For Hindi, press 2.”
DECISION 04 · EMPATHY OVER EFFICIENCY
Intent and emotion drive the response. A frustrated repeat caller gets a shorter path and a softer tone — not the standard script, faster.
( WHAT THE SYSTEM HAD TO DO )

Six Capabilities,
One Voice

01
Deep personalization

Responses draw on customer data, history and intent signals — a repeat caller is never greeted like a stranger.

02
Flexible across use cases

Sales, support, collections — tone, workflows and guardrails configure to the business, its policies and its brand voice.

03
Learns from your data

Documents, rules and structured inputs ground every answer — and the knowledge keeps improving with each conversation.

04
Human-like conversation

Understands intent and emotion, responds with empathy, and moves naturally across languages and follow-ups.

05
Low-latency voice engine

Streaming speech recognition, noise cancellation and interruption handling keep the rhythm of real speech.

06
Enterprise-ready scale

API integrations, high availability, fallbacks, and continuous optimization — designed for millions of calls, not demos.

( ONE BOT, MANY JOURNEYS )

Designed as a System,
Deployed as Journeys

Sales pitches that adapt & convert Instant support, seamless escalation Lead qualification at scale Healthcare appointment booking Service scheduling EMI & payment reminders ◆ Multilingual by default

The design challenge wasn't any single journey — it was a conversation framework where a collections reminder and a sales pitch share the same voice engine, guardrails and handoff patterns, differing only in workflow and tone.

( THE OTHER USER: THE BUSINESS )

Built From a Sentence

A voicebot is only as good as the team that configures it. So instead of a flowchart editor, the builder starts with plain language — describe the callers, the goal and the guardrails, and the AI designs the call flow, tests it and launches.

exotel · build an AI voicebot
Builder start — Agent: Lead Conversion, describe your audience and goal, composer with suggestion chips
↑ step 1 · describe — one sentence in, a working voicebot out
build bot · configuration, refined with the co-pilot
Build Bot — co-pilot chat beside Bot Configuration with AI Agent tabs, activation status, identity and objective
↑ step 2 · refine — a co-pilot rewrites greeting, objective and tone; identity & objective are plain editable text, not flowchart nodes
talk to your bot · test with a real call before launch
Talk to your bot — phone call and web call test, daily limit, bot overview with languages and voice model
↑ step 3 · test — call your own bot on a real phone line or in the browser, before a single customer hears it
( WHEN THE BOT STEPS ASIDE )

The Handoff is the Product

A voicebot that traps people is worse than an IVR. When a case is complex or emotional, the bot transfers to a human — with the conversation summary, captured details and sentiment already on the agent's screen.

1 · BOT DETECTS THE LIMIT
Fraud dispute — sensitive, emotional, policy-bound. The bot doesn't improvise; it routes.
2 · CONTEXT PACKAGED
Summary, intent, captured details and a frustrated sentiment flag travel with the call.
3 · HUMAN PICKS UP WARM
The agent's Customer Context Panel is pre-filled. Nobody says “please repeat that.”
( THE NEXT LEVEL: HUMANS ON THE LOOP )

The Agent-Monitored Contact Center

Bots don't replace the floor — they change what the floor watches. Agents supervise live bot conversations, whisper corrections mid-call, and take over the moment a conversation turns critical.

live monitoring · one agent, many bot conversations
Live monitoring — crisis-flagged interaction with takeover, live summary, transcript and whisper to bot
↑ the monitor — a crisis-flagged call: AI-ranked queue, live summary, transcript, whisper & takeover
transcript search · live highlight
Transcript search with live result highlighting across a running bot conversation
Search a transcript while the call is still running — every match highlighted, jump between results.
the queue · resolved & empty states
Interactions recommended for you with sentiment tags, a resolved conversation, and the empty monitoring state
The queue's quieter moments — conversations resolve out of the list, and the empty state invites the next pick.
RECOMMENDED QUEUE
One agent can't watch a hundred calls. AI ranks live conversations by sentiment, urgency and crisis signals — the queue shows the five that need eyes now.
WHISPER MODE
Agents coach the bot silently mid-call — “offer the voucher, skip the survey” — and the bot adjusts without the caller ever hearing a second voice.
TRANSCRIPT SEARCH
Live transcripts are searchable while the call is still running — jump to every mention of “refund” before deciding to step in.
ONE-TAP TAKEOVER
When a call is flagged Crisis, the agent takes over mid-sentence — the bot bows out, the human inherits transcript, summary and sentiment.
tenant analytics · every bot in one view
Tenant analytics — total calls across all bots, overall resolution rate and active bot health
The tenant pulse — combined call volume, blended resolution rate and bot health, one glance.
bot performance leaderboard
Bot performance leaderboard — all bots ranked by call volume and resolution rate with health badges
Every bot ranked by volume and resolution — warning and critical bots earn attention before callers feel it.
CONTINUES IN → Case 01 · Contact Center Workspace — where the human side of this handoff lives
speak
don't press
( THE TAKEAWAY )

Voice is the oldest interface.
We just made software fluent in it.

The measure of this design isn't how smart the bot sounds — it's how rarely the caller notices they're talking to one, and how gracefully it steps aside when they should't be.

back to all work ↗ next: the workspace →
GEN-AI VOICEBOT · EXOTEL DESIGNED BY ANJALI S.