How AI receptionists actually work

A walkthrough of what happens between the moment a customer dials your number and the moment a booking lands on your calendar, with the actual pieces underneath. No marketing fluff.

May 19, 20265 min readEmmanuel De Leon

When someone calls a business that uses an AI receptionist, the experience feels simple. The phone rings once, a voice answers, the customer talks like they would to any human, the appointment ends up on the schedule, and the owner gets a text. From the caller's side, that is the whole story.

Underneath, six things happen in a specific order. Here is what each one is, why it matters, and what separates a good AI receptionist from a bad one.

1. The call routes to the AI line

When the customer dials your number, the call hits a voice-over-IP carrier (Twilio, Telnyx, or similar) instead of a desk phone. The carrier converts the audio into a real-time audio stream and forwards it to the AI infrastructure within roughly 100 milliseconds.

This is the first place a bad AI receptionist fails. If routing adds half a second of latency, every reply feels like a delay, and callers hang up. A good AI receptionist targets sub-200ms end-to-end audio routing.

2. Speech-to-text transcribes what the caller said

The audio stream is sent to a speech-to-text engine, typically Deepgram or OpenAI's Whisper variant, which transcribes the caller's words into text in roughly 80 to 150 milliseconds for short utterances.

The quality of the transcription matters more than people think. If the caller says "I need a tune-up on my HVAC" and the model hears "tonal pop on my Mac," the rest of the conversation is broken. Domain-tuned models that know vertical-specific words (HVAC, capacitor, vent, ductwork) handle this better than off-the-shelf models.

3. A large language model decides what to say next

The transcribed text gets handed to a large language model (Claude, GPT-4o, or similar) along with three things:

The system prompt, which is your business's voice, services, hours, prices, and rules.
The conversation so far, so the model has context.
A list of tools, like "check the calendar," "book an appointment," "log a lead," "transfer to the owner."

The model reads all of this, decides whether to talk, ask a clarifying question, or call a tool. Modern LLMs do this in 400 to 800 milliseconds.

This is where the AI gets smart. The system prompt is the difference between a clumsy script and a real receptionist. A good prompt has the business's actual scripts, the answers to FAQs, the up-sell triggers, and the rules for when to escalate to a human. At Traccion we tune this per business, not per vertical.

4. Tools fire against your real systems

If the LLM decides to book an appointment, it calls the "book_appointment" tool, which is a real API call that hits your calendar (Google Calendar, Square, Housecall Pro, ServiceTitan, whichever you use). The tool returns success or failure to the LLM.

This is the second place bad AI receptionists fail. If the integration is fake (the AI says "you're booked" but no calendar entry exists), the customer shows up to a closed door. The integration has to be real, and the AI has to confirm the booking succeeded before telling the caller it did.

A good AI receptionist has 10 to 15 tools wired up. At Traccion, the receptionist named Manuel has these out of the box: check_availability, book_appointment, qualify_lead, request_callback, send_sms_confirmation, escalate_to_owner, lookup_customer, log_recording, send_quote, set_callback_reminder. Each one hits real infrastructure.

5. Text-to-speech speaks the response back

The LLM's text response is sent to a text-to-speech engine (ElevenLabs, Cartesia, or PlayHT) which produces audio in 200 to 500 milliseconds. This audio is streamed back to the caller through the same VoIP carrier.

A good AI receptionist does this incrementally, sending audio as the LLM is still generating, so the caller hears the start of a sentence before the whole sentence is finished. This is what lets the receptionist sound like a real person and respond within 1.2 seconds end to end.

The voice matters too. A flat synthetic voice gives the system away. A warm, pace-tuned voice in the right accent for the market does not. Voice cloning, where the AI sounds like the business owner, is a Growth-tier feature at Traccion for a reason. It works.

6. Post-call work happens automatically

After the call ends, three more things happen.

The transcript and recording are stored, searchable from the business owner's dashboard.
The booking is confirmed via SMS to the customer with the time, address, and any prep instructions.
The owner gets a text with the caller's name, callback number, what they need, and a link to the recording.

This is where small businesses get the most leverage. Every call becomes searchable. Owners can see at 6pm exactly what calls came in during the day, who booked, who did not, and which calls need a human follow-up. No more "what did Maria want when she called Tuesday." It is in the dashboard.

What this all adds up to

The end-to-end latency target is about 1.2 seconds from when the caller stops speaking to when the AI starts replying. Human receptionists average 2 to 4 seconds for the same gap. A well-tuned AI receptionist is faster than a person, available 24/7, never tired, never short, never forgetting a question.

The cost of all of this is the part people are not expecting. The combined cost of telephony, transcription, LLM inference, text-to-speech, and storage is roughly $0.08 to $0.18 per minute of conversation depending on the LLM and voice. A small business doing 200 minutes a month is paying around $16 to $36 in raw infrastructure. The rest of the monthly price is the engineering team that builds the system prompts, the integrations, the dashboard, and the maintenance.

How to know if you need one

If you answer your own phone, you need one. The math is brutal. A solo HVAC operator who misses two calls a week worth $250 each loses $26,000 a year. An entry AI receptionist runs a couple thousand dollars a year. The break-even is a handful of missed jobs.

The harder question is which one to get. Most AI receptionists on the market are templates: the same script every business gets, with the name swapped. A real receptionist needs to know your services, your hours, your pricing, your service area, your scripts, your follow-up logic. At Traccion, every receptionist is tuned per business.

30 minutes. No deck. Just the work.

We map your operations and hand you a ranked list of AI wins by ROI. Free.

Book a consulting call