10 Steps to Safeguard LLMs in Your Organization
As organizations rapidly adopt Large Language Models (LLMs), the security landscape has evolved into a complex web of challenges that...
Prompt engineering sucks. Break free from the endless tweaking with this revolutionary approach - Learn more
Securing AI systems is tricky, ignoring it is risky. Discover the easiest way to secure your AI end to end - Learn more
From setting reminders, playing music, and controlling smart home devices, LLM-based voice assistants like Siri, Alexa, and Google Assistant have integrated seamlessly into our daily lives over the last decade. Their popularity, usage, and consumer adoption are increasing rapidly.
In the US alone, the number of voice assistant users is expected to reach 157.1 million by 2026, compared to 142 million in 2022, with global estimates reaching 8.4 billion units in 2024.
With the release of Large Language Models (LLMs), like GPT-4 and Llama, voice assistants have become more capable and accurate at performing complex tasks. Additionally, various Open Source LLMs offer customizable solutions, making them an attractive alternative for developers seeking flexibility in their projects.
In this article, we will show you how to:
You can also follow along with the code setup on the YouTube tutorial here. To follow this step-by-step tutorial, you need a basic understanding of the following code components:
When someone calls your Twilio number, the server receives a POST HTTP request for the /incoming-call
API.
Let’s implement the /incoming-call
API to return a basic Twilio response. In the code snippet below, we are using the VoiceResponse()
method from TwiML to format a simple say()
response: “Hello, how are you?” and return it as XML for the incoming call.
app.post('/incoming-call', (c) => {
const voiceResponse = new twiml.VoiceResponse()
voiceResponse.say(“Hello, how are you?”)
c.header("Content-Type", "application/xml")
return c.body(voiceResponse.toString())
})
Now, start your server and call the number. You will hear the set response.
Now, start listening to the user sentences using TwiML’s gather()
method. It takes a few arguments, like:
/respond
in this case) after the transcription is complete. Add the following code snippet to your /incoming-call
API after the say()
method.
voiceResponse.gather({
input: ["speech"],
speechTimeout: "auto",
speechModel: 'experimental_conversations',
enhanced: true,
action: '/respond',
})
Twilio will pass the result of the speech recognition model to this HTTP request. First, collect the speech recognition result in an HTTP form data object. Use OpenAI API to start a conversation based on the user’s collected response. For that, you need to import the OpenAI library and initialize a new OpenAI instance with the API key to use its Chat Completion API.
When the user says something, the OpenAI Chat Completion API will generate a response using the GPT-3.5 Turbo model. We’ll then pass this response back to the user using the Twilio say()
method and redirect the user to /incoming-call
API to continue the conversation.
The code snippet below demonstrates this process.
app.post('/respond', async (c) => {
const formData = await c.req.formData()
const voiceInput = formData.get("SpeechResult")?.toString()!
const chatCompletion = await openai.chat.completions.create({
model: "gpt-3.5-turbo",
messages: [
{ role: "user", content: voiceInput }
],
temperature: 0,
})
const assistantResponse = chatCompletion.choices[0].message.content
const voiceResponse = new twiml.VoiceResponse()
voiceResponse.say(assistantResponse!)
voiceResponse.redirect({ method: "POST" }, "/incoming-call")
c.header("Content-Type", "application/xml")
return c.body(voiceResponse.toString())
})
So far, we have built a basic conversation assistant. But, the problem is that it cannot keep track of the entire user-assistant conversation. So, we need to maintain the speech state to save the conversation context.
For this purpose, we’ll use an HTTP cookie from the hono/cookie library to store the conversation history between the /incoming-call
and /respond
APIs. We’ll use getCookie()
and setCookie()
methods from the library.
First, create a new cookie (if one does not already exist) called “messages” in the /incoming-call API to store a new conversation. Use the setCookie() method to store the state of the initial conversation.
Now, use the getCookie()
in the /respond API endpoint. Push the current user message into the messages cookie. Pass messages cookie to the OpenAI Chat Completion API. Push the assistant response generated by Chat Completion API in the messages cookie as well. Finally, set the cookie with the updated messages.
Here are the complete /incoming-call and /respond
API endpoints with state maintained for conversation history.
import { serve } from '@hono/node-server'
import { Hono } from 'hono'
import { logger } from 'hono/logger'
import { twiml } from 'twilio';
import OpenAI from 'openai';
import { getCookie, setCookie } from 'hono/cookie'
const openai = new OpenAI()
const app = new Hono()
app.use('*', logger())
const INITIAL_MESSAGE = "Hello, how are you?"
app.post('/incoming-call', (c) => {
const voiceResponse = new twiml.VoiceResponse()
if (!getCookie(c, "messages")) {
// This is a new conversation!
voiceResponse.say(INITIAL_MESSAGE)
setCookie(c, "messages", JSON.stringify([
{
role: "system",
content: `
You are a helpful phone assistant for a pizza restaurant.
The restaurant is open between 10-12 pm.
You can help the customer reserve a table for the restaurant.
`
},
{ role: "assistant", content: INITIAL_MESSAGE }
]))
}
voiceResponse.gather({
input: ["speech"],
speechTimeout: "auto",
speechModel: 'experimental_conversations',
enhanced: true,
action: '/respond',
})
c.header("Content-Type", "application/xml")
return c.body(voiceResponse.toString())
})
app.post('/respond', async (c) => {
const formData = await c.req.formData()
const voiceInput = formData.get("SpeechResult")?.toString()!
let messages = JSON.parse(getCookie(c, "messages")!)
messages.push({ role: "user", content: voiceInput })
const chatCompletion = await openai.chat.completions.create({
model: "gpt-3.5-turbo",
messages,
temperature: 0,
})
const assistantResponse = chatCompletion.choices[0].message.content
messages.push({ role: "assistant", content: assistantResponse })
console.log(messages)
setCookie(c, "messages", JSON.stringify(messages))
const voiceResponse = new twiml.VoiceResponse()
voiceResponse.say(assistantResponse!)
voiceResponse.redirect({ method: "POST" }, "/incoming-call")
c.header("Content-Type", "application/xml")
return c.body(voiceResponse.toString())
})
const port = 3000
console.log(`Server is running on port ${port}`)
serve({
fetch: app.fetch,
port
})
Now that our phone assistant is capable of understanding and responding to user queries, it’s key to ensure that these interactions are not just intelligent, but also secure and reliable.
Layered between the OpenAI API and Twilio interface, Aporia Guardrails acts as a robust safeguard, preventing risks like hallucinations, data leakage, and inappropriate responses that could undermine the assistant’s effectiveness.
To integrate Aporia Guardrails with your codebase, a one-line change is all that’s needed:
const openai = new OpenAI({
baseURL: aporia_guardrails_url,
defaultHeaders: {“X-APORIA-API-KEY”: aporia_guardrails_api_key}
})
If you have followed this tutorial along, you have built a working voice assistant that can be customized for any number of use cases and is safeguarded against hallucinations and AI risks. You can now connect this assistant with Calendar or your reservation system, or whatever application you need.
Learn more about mitigating hallucinations in real time with Aporia Guardrails:
As organizations rapidly adopt Large Language Models (LLMs), the security landscape has evolved into a complex web of challenges that...
Large language models (LLMs) are rapidly reshaping enterprise systems across industries, enhancing efficiency in everything from customer service to content...
The rapid adoption of Large Language Models (LLMs) has transformed the technological landscape, with 80% of organizations now regularly employing...
The rapid rise of Generative AI (GenAI) has been nothing short of phenomenal. ChatGPT, the flagship of popular GenAI applications,...
Imagine an AI assistant that answers your questions and starts making unauthorized bank transfers or sending emails without your consent....
Imagine if your AI assistant leaked sensitive company data to competitors. In March 2024, researchers at Salt Security uncovered critical...
Insecure Output Handling in Large Language Models (LLMs) is a critical vulnerability identified in the OWASP Top 10 for LLM...
In February 2023, a Stanford student exposed Bing Chat’s confidential system prompt through a simple text input, revealing the chatbot’s...