🎉 AI Engineers: Join our free webinar on how to improve your RAG performance. Secure your spot >

May 22, 2024 - last updated
GenAI For Practitioners

How to build LLM-based phone assistants with OpenAI, Twilio, & Aporia

With the release of Large Language Models (LLMs), like GPT-4 and Llama, voice assistants have become more capable and accurate at performing complex tasks. Developers can use APIs to build a custom voice assistant experience for almost any use case.

Alon Gubkin
Alon Gubkin

Alon is the CTO of Aporia.

8 min read Apr 04, 2024

From setting reminders, playing music, and controlling smart home devices, LLM-based voice assistants like Siri, Alexa, and Google Assistant have integrated seamlessly into our daily lives over the last decade. Their popularity, usage, and consumer adoption are increasing rapidly.

In the US alone, the number of voice assistant users is expected to reach 157.1 million by 2026, compared to 142 million in 2022, with global estimates reaching 8.4 billion units in 2024. 

With the release of Large Language Models (LLMs), like GPT-4 and Llama, voice assistants have become more capable and accurate at performing complex tasks. Developers can use APIs to build a custom voice assistant experience for almost any use case.

In this article, we will show you how to: 

  • Build a phone assistant using GPT-3.5 Turbo LLM from OpenAI API
  • Handle user-assistant conversations using Twilio.
  • Mitigate AI hallucinations in real time with Aporia.  

Code setup & phone assistant architecture overview

You can also follow along with the code setup on the YouTube tutorial here. To follow this step-by-step tutorial, you need a basic understanding of the following code components:

  • Hono library: Hono is a small and super fast web framework for building full-stack applications, web APIs, edge applications, etc. It is written in Typescript and works with a range of runtime servers. We will use an empty HTTP server using Node.js.
  • Twilio phone number: When someone calls your Twilio phone number, it will send an HTTP request to the server. You can buy a phone number using Twilio Console. Then configure the phone number to the server where your application is hosted.
  • Twilio API: Twilio provides APIs to build Programmable Voice applications. We’ll use its TwiML XML instruction set or tags to set up responses for incoming calls.
  • OpenAI API: OpenAI provides many text processing and generation capabilities using its API. We’ll use its Chat Completions API to generate a conversation between the user and the assistant.
  • ngrok: It is a secure application delivery platform. We will use it to tunnel the HTTP server to the internet.
  • Aporia API: Aporia provides APIs to control AI chatbot and assistant performance. Here, we’ll use its Guardrails solution to mitigate LLM hallucinations in real time. 
Overview of a basic phone assistant architecture using Twilio and OpenAI

Step-by-step guide to build an AI phone assistant

1. Implement post API for incoming calls.

Set Up a Basic Call Response

When someone calls your Twilio number, the server receives a POST HTTP request for the /incoming-call API.

Let’s implement the /incoming-call API to return a basic Twilio response. In the code snippet below, we are using the VoiceResponse() method from TwiML to format a simple say() response: “Hello, how are you?” and return it as XML for the incoming call.

app.post('/incoming-call', (c) => {
 const voiceResponse = new twiml.VoiceResponse()
 voiceResponse.say(“Hello, how are you?”)


 c.header("Content-Type", "application/xml")
 return c.body(voiceResponse.toString())
})

Now, start your server and call the number. You will hear the set response.

Speech detection & transcription – Listen to the user

Now, start listening to the user sentences using TwiML’s gather() method. It takes a few arguments, like: 

  • input: such as speech or dtmf 
  • speechTimeout: to detect when the user takes a pause in speech 
  • speechModel: to determine the type of speech that Twilio will transcribe to text. 
  • action: determines the next API call (/respond in this case) after the transcription is complete. 

Add the following code snippet to your /incoming-call API after the say() method.

 voiceResponse.gather({
   input: ["speech"],
   speechTimeout: "auto",
   speechModel: 'experimental_conversations',
   enhanced: true,
   action: '/respond',
 })

2. Implement API endpoint/respond

Twilio will pass the result of the speech recognition model to this HTTP request. First, collect the speech recognition result in an HTTP form data object. Use OpenAI API to start a conversation based on the user’s collected response. For that, you need to import the OpenAI library and initialize a new OpenAI instance with the API key to use its Chat Completion API.

When the user says something, the OpenAI Chat Completion API will generate a response using the GPT-3.5 Turbo model. We’ll then pass this response back to the user using the Twilio say() method and redirect the user to /incoming-call API to continue the conversation.

The code snippet below demonstrates this process. 

app.post('/respond', async (c) => {
 const formData = await c.req.formData()
 const voiceInput = formData.get("SpeechResult")?.toString()!


 const chatCompletion = await openai.chat.completions.create({
   model: "gpt-3.5-turbo",
   messages: [
       { role: "user", content: voiceInput }
   ],
   temperature: 0,
 })


 const assistantResponse = chatCompletion.choices[0].message.content


 const voiceResponse = new twiml.VoiceResponse()
 voiceResponse.say(assistantResponse!)
 voiceResponse.redirect({ method: "POST" }, "/incoming-call")


 c.header("Content-Type", "application/xml")
 return c.body(voiceResponse.toString())


})

3. Maintain the conversation state

So far, we have built a basic conversation assistant. But, the problem is that it cannot keep track of the entire user-assistant conversation. So, we need to maintain the speech state to save the conversation context.

For this purpose, we’ll use an HTTP cookie from the hono/cookie library to store the conversation history between the /incoming-call and /respond APIs. We’ll use getCookie() and setCookie() methods from the library.

First, create a new cookie (if one does not already exist) called “messages” in the  /incoming-call API to store a new conversation. Use the setCookie() method to store the state of the initial conversation.

Now, use the getCookie() in the /respond API endpoint. Push the current user message into the messages cookie. Pass messages cookie to the OpenAI Chat Completion API. Push the assistant response generated by Chat Completion API in the messages cookie as well. Finally, set the cookie with the updated messages.

Here are the complete /incoming-call and /respond API endpoints with state maintained for conversation history.

import { serve } from '@hono/node-server'
import { Hono } from 'hono'
import { logger } from 'hono/logger'
import { twiml } from 'twilio';
import OpenAI from 'openai';
import { getCookie, setCookie } from 'hono/cookie'


const openai = new OpenAI()


const app = new Hono()
app.use('*', logger())


const INITIAL_MESSAGE = "Hello, how are you?"

app.post('/incoming-call', (c) => {
 const voiceResponse = new twiml.VoiceResponse()


 if (!getCookie(c, "messages")) {
   // This is a new conversation!
   voiceResponse.say(INITIAL_MESSAGE)
   setCookie(c, "messages", JSON.stringify([
     {
       role: "system",
       content: `
         You are a helpful phone assistant for a pizza restaurant.
         The restaurant is open between 10-12 pm.
         You can help the customer reserve a table for the restaurant.
       `
     },
     { role: "assistant", content: INITIAL_MESSAGE }
   ]))
 }


  voiceResponse.gather({
   input: ["speech"],
   speechTimeout: "auto",
   speechModel: 'experimental_conversations',
   enhanced: true,
   action: '/respond',
 })


 c.header("Content-Type", "application/xml")
 return c.body(voiceResponse.toString())
})
app.post('/respond', async (c) => {
 const formData = await c.req.formData()
 const voiceInput = formData.get("SpeechResult")?.toString()!


 let messages = JSON.parse(getCookie(c, "messages")!)
 messages.push({ role: "user", content: voiceInput })


 const chatCompletion = await openai.chat.completions.create({
   model: "gpt-3.5-turbo",
   messages,
   temperature: 0,
 })


 const assistantResponse = chatCompletion.choices[0].message.content
 messages.push({ role: "assistant", content: assistantResponse })
 console.log(messages)


 setCookie(c, "messages", JSON.stringify(messages))


 const voiceResponse = new twiml.VoiceResponse()
 voiceResponse.say(assistantResponse!)
 voiceResponse.redirect({ method: "POST" }, "/incoming-call")


 c.header("Content-Type", "application/xml")
 return c.body(voiceResponse.toString())


})


const port = 3000
console.log(`Server is running on port ${port}`)


serve({
 fetch: app.fetch,
 port
})

4. Implement guardrails to mitigate risks

Now that our phone assistant is capable of understanding and responding to user queries, it’s key to ensure that these interactions are not just intelligent, but also secure and reliable. 

Layered between the OpenAI API and Twilio interface, Aporia Guardrails acts as a robust safeguard, preventing risks like hallucinations, data leakage, and inappropriate responses that could undermine the assistant’s effectiveness.

To integrate Aporia Guardrails with your codebase, a one-line change is all that’s needed: 

const openai = new OpenAI({
	baseURL: aporia_guardrails_url,
	defaultHeaders: {“X-APORIA-API-KEY”: aporia_guardrails_api_key}
})
  • Setup: Easily integrate Aporia Guardrails into the development environment. This step ensures every interaction is analyzed for safety and accuracy, aligning with predefined standards.
  • Customization: Tailor the guardrails to suit the assistant’s needs. This could involve setting parameters to guard against specific risks, like prompt injections or unintended data leaks.
  • Continuous detection & mitigation: Aporia not only provides app security but continuously monitors interactions, adapting to new threats and ensuring the assistant’s responses remain within the safety guidelines and aligned with business KPIs.
  • Deployment: With Aporia Guardrails in place, deploy the assistant with confidence, knowing that it’s equipped to handle interactions securely, maintaining user trust and regulatory compliance.

What’s Next?

If you have followed this tutorial along, you have built a working voice assistant that can be customized for any number of use cases and is safeguarded against hallucinations and AI risks. You can now connect this assistant with Calendar or your reservation system, or whatever application you need. 

More of a visual learner? Check out the video:

Learn more about mitigating hallucinations in real time with Aporia Guardrails:

Book a demo today.

Green Background

Control All your GenAI Apps in minutes