Red Teaming for Large Language Models: A Comprehensive Guide
Imagine a world where AI-powered chatbots suddenly start spewing hate speech or where a medical AI assistant recommends dangerous treatments....
From setting reminders, playing music, and controlling smart home devices, LLM-based voice assistants like Siri, Alexa, and Google Assistant have integrated seamlessly into our daily lives over the last decade. Their popularity, usage, and consumer adoption are increasing rapidly.
In the US alone, the number of voice assistant users is expected to reach 157.1 million by 2026, compared to 142 million in 2022, with global estimates reaching 8.4 billion units in 2024.
With the release of Large Language Models (LLMs), like GPT-4 and Llama, voice assistants have become more capable and accurate at performing complex tasks. Additionally, various Open Source LLMs offer customizable solutions, making them an attractive alternative for developers seeking flexibility in their projects.
In this article, we will show you how to:
You can also follow along with the code setup on the YouTube tutorial here. To follow this step-by-step tutorial, you need a basic understanding of the following code components:
When someone calls your Twilio number, the server receives a POST HTTP request for the /incoming-call
API.
Let’s implement the /incoming-call
API to return a basic Twilio response. In the code snippet below, we are using the VoiceResponse()
method from TwiML to format a simple say()
response: “Hello, how are you?” and return it as XML for the incoming call.
app.post('/incoming-call', (c) => {
const voiceResponse = new twiml.VoiceResponse()
voiceResponse.say(“Hello, how are you?”)
c.header("Content-Type", "application/xml")
return c.body(voiceResponse.toString())
})
Now, start your server and call the number. You will hear the set response.
Now, start listening to the user sentences using TwiML’s gather()
method. It takes a few arguments, like:
/respond
in this case) after the transcription is complete. Add the following code snippet to your /incoming-call
API after the say()
method.
voiceResponse.gather({
input: ["speech"],
speechTimeout: "auto",
speechModel: 'experimental_conversations',
enhanced: true,
action: '/respond',
})
Twilio will pass the result of the speech recognition model to this HTTP request. First, collect the speech recognition result in an HTTP form data object. Use OpenAI API to start a conversation based on the user’s collected response. For that, you need to import the OpenAI library and initialize a new OpenAI instance with the API key to use its Chat Completion API.
When the user says something, the OpenAI Chat Completion API will generate a response using the GPT-3.5 Turbo model. We’ll then pass this response back to the user using the Twilio say()
method and redirect the user to /incoming-call
API to continue the conversation.
The code snippet below demonstrates this process.
app.post('/respond', async (c) => {
const formData = await c.req.formData()
const voiceInput = formData.get("SpeechResult")?.toString()!
const chatCompletion = await openai.chat.completions.create({
model: "gpt-3.5-turbo",
messages: [
{ role: "user", content: voiceInput }
],
temperature: 0,
})
const assistantResponse = chatCompletion.choices[0].message.content
const voiceResponse = new twiml.VoiceResponse()
voiceResponse.say(assistantResponse!)
voiceResponse.redirect({ method: "POST" }, "/incoming-call")
c.header("Content-Type", "application/xml")
return c.body(voiceResponse.toString())
})
So far, we have built a basic conversation assistant. But, the problem is that it cannot keep track of the entire user-assistant conversation. So, we need to maintain the speech state to save the conversation context.
For this purpose, we’ll use an HTTP cookie from the hono/cookie library to store the conversation history between the /incoming-call
and /respond
APIs. We’ll use getCookie()
and setCookie()
methods from the library.
First, create a new cookie (if one does not already exist) called “messages” in the /incoming-call API to store a new conversation. Use the setCookie() method to store the state of the initial conversation.
Now, use the getCookie()
in the /respond API endpoint. Push the current user message into the messages cookie. Pass messages cookie to the OpenAI Chat Completion API. Push the assistant response generated by Chat Completion API in the messages cookie as well. Finally, set the cookie with the updated messages.
Here are the complete /incoming-call and /respond
API endpoints with state maintained for conversation history.
import { serve } from '@hono/node-server'
import { Hono } from 'hono'
import { logger } from 'hono/logger'
import { twiml } from 'twilio';
import OpenAI from 'openai';
import { getCookie, setCookie } from 'hono/cookie'
const openai = new OpenAI()
const app = new Hono()
app.use('*', logger())
const INITIAL_MESSAGE = "Hello, how are you?"
app.post('/incoming-call', (c) => {
const voiceResponse = new twiml.VoiceResponse()
if (!getCookie(c, "messages")) {
// This is a new conversation!
voiceResponse.say(INITIAL_MESSAGE)
setCookie(c, "messages", JSON.stringify([
{
role: "system",
content: `
You are a helpful phone assistant for a pizza restaurant.
The restaurant is open between 10-12 pm.
You can help the customer reserve a table for the restaurant.
`
},
{ role: "assistant", content: INITIAL_MESSAGE }
]))
}
voiceResponse.gather({
input: ["speech"],
speechTimeout: "auto",
speechModel: 'experimental_conversations',
enhanced: true,
action: '/respond',
})
c.header("Content-Type", "application/xml")
return c.body(voiceResponse.toString())
})
app.post('/respond', async (c) => {
const formData = await c.req.formData()
const voiceInput = formData.get("SpeechResult")?.toString()!
let messages = JSON.parse(getCookie(c, "messages")!)
messages.push({ role: "user", content: voiceInput })
const chatCompletion = await openai.chat.completions.create({
model: "gpt-3.5-turbo",
messages,
temperature: 0,
})
const assistantResponse = chatCompletion.choices[0].message.content
messages.push({ role: "assistant", content: assistantResponse })
console.log(messages)
setCookie(c, "messages", JSON.stringify(messages))
const voiceResponse = new twiml.VoiceResponse()
voiceResponse.say(assistantResponse!)
voiceResponse.redirect({ method: "POST" }, "/incoming-call")
c.header("Content-Type", "application/xml")
return c.body(voiceResponse.toString())
})
const port = 3000
console.log(`Server is running on port ${port}`)
serve({
fetch: app.fetch,
port
})
Now that our phone assistant is capable of understanding and responding to user queries, it’s key to ensure that these interactions are not just intelligent, but also secure and reliable.
Layered between the OpenAI API and Twilio interface, Aporia Guardrails acts as a robust safeguard, preventing risks like hallucinations, data leakage, and inappropriate responses that could undermine the assistant’s effectiveness.
To integrate Aporia Guardrails with your codebase, a one-line change is all that’s needed:
const openai = new OpenAI({
baseURL: aporia_guardrails_url,
defaultHeaders: {“X-APORIA-API-KEY”: aporia_guardrails_api_key}
})
If you have followed this tutorial along, you have built a working voice assistant that can be customized for any number of use cases and is safeguarded against hallucinations and AI risks. You can now connect this assistant with Calendar or your reservation system, or whatever application you need.
Learn more about mitigating hallucinations in real time with Aporia Guardrails:
Imagine a world where AI-powered chatbots suddenly start spewing hate speech or where a medical AI assistant recommends dangerous treatments....
Building and deploying large language models (LLMs) enterprise applications comes with technical and operational challenges. The promise of LLMs has...
Last year’s ChatGPT and Midjourney explosion, sparked a race for everyone to develop their own open source LLMs. From Hugging...
LLM Jailbreaks involve creating specific prompts designed to exploit loopholes or weaknesses in the language models’ operational guidelines, bypassing internal...
What is a Prompt Injection Attack in an LLM? Prompt injection is a type of security vulnerability that affects most...
While some people find them amusing, AI hallucinations can be dangerous. This is a big reason why prevention should be...
In the dynamic AI Landscape, the fusion of generative AI and Large Language Models (LLMs) stands as a focal point...
Setting the Stage You’re familiar with LLMs for coding, right? But here’s something you might not have thought about –...