How to Build an AI-Powered Customer Support Agent with OpenAI and AzureML

In this tutorial, I'll guide you step-by-step in creating an intelligent AI Customer Support Agent using OpenAI embeddings and the FAISS vector store. The project was developed on Google Colab, with Google Drive serving as the storage solution. The provided code snippets are illustrative—you're welcome to customize them with your own datasets, alternative models, or different backend services.

Tools & Tech Stack

Folder Structure

Let's walk through each step in detail, breaking down the process clearly and methodically to ensure a smooth implementation

1. Unzipping Project Files in Google Colab

This command extracts the project files in Google Colab when working with data from Google Drive.

!unzip /content/drive/MyDrive/projectpro/cs_agent/cs_agent.zip

Code Explanation:

Expected Output:

Archive:  /content/drive/MyDrive/projectpro/cs_agent/cs_agent.zip
  inflating: cs_agent/main.py        
  inflating: cs_agent/utils.py       
  inflating: cs_agent/config.json    
  inflating: cs_agent/README.md

Important Notes:

2. Install Required Libraries

This command will install all required dependencies specified in requirements.txt file

pip install -r requirements.txt

Code Explanation:

Step 3: Import Core Modules

Here, we import all essential libraries and functions needed for the project.

import faissimport numpy as npimport pandas as pdfrom  collections import    defaultdict from  src.helper import   ( create_embeddings  , 
                            create_index  , 
                            create_embeddings  , 
                            semantic_similarity 
                         )

Code Explanation:

Step 4: Load Dataset

This code snippet reads the CSV file and loads its contents into a pandas DataFrame. Line 4 displays a preview of the data (first few rows) to verify successful loading.

# Read CSV file into DataFrame
df = pd.read_csv('cs_dataset/cs_dataset.csv')
# Display first 5 rows
df.head()

Code Explanation:

Expected Output:

     flags  instruction                                     category     intent            response
0     B   question about cancelling order {{Order Number}}    ORDER  cancel_order  I've understood you have a question regarding...
1   BQZ   i have a question about cancelling oorder {{Or...    ORDER  cancel_order  I've been informed that you have a question ab...
2  BLQZ   i need help cancelling puchase {{Order Number}}    ORDER  cancel_order  I can sense that you're seeking assistance wit...
3    BL   I need to cancel purchase {{Order Number}}         ORDER  cancel_order  I understood that you need assistance with can...
4  BCELN  I cannot afford this order, cancel purchase {{...    ORDER  cancel_order  I'm sensitive to the fact that you're facing f...

Step 5: Plot Category Distribution

The following code generates a visualization of customer question categories from our dataset. Since this is purely for demonstration purposes, you may choose to skip this section.

df['category'].value_counts().plot(kind='bar')

Code Explanation:

Output:

Bar chart showing category distribution

Fig. 1 - Distribution of customer support ticket categories

Step 6: Building a Category-Intent Mapping Dictionary

The following code efficiently creates a mapping between support ticket categories and their associated intents, revealing the relationship between broad issue types and specific customer needs.

from collections import defaultdict# Create dictionary to map categories to sets of intentscategory_intent_dict = defaultdict(set)# Populate the dictionaryfor category, intent in zip(df['category'], df['intent']):    category_intent_dict[category].add(intent)# Convert sets to lists for final outputcategory_intent_dict = {k: list(v) for k, v in category_intent_dict.items()}

Code Explanation:

Output:

{'ORDER': ['track_order', 'place_order', 'change_order', 'cancel_order'],
 'SHIPPING': ['change_shipping_address', 'set_up_shipping_address'],
 'CANCEL': ['check_cancellation_fee'],
 'INVOICE': ['get_invoice', 'check_invoice'],
 'PAYMENT': ['check_payment_methods', 'payment_issue'],
 'REFUND': ['check_refund_policy', 'track_refund', 'get_refund'],
 'FEEDBACK': ['complaint', 'review'],
 'CONTACT': ['contact_customer_service', 'contact_human_agent'],
 'ACCOUNT': ['recover_password',
  'edit_account',
  'registration_problems',
  'delete_account',
  'switch_account',
  'create_account'],
 'DELIVERY': ['delivery_period', 'delivery_options'],
 'SUBSCRIPTION': ['newsletter_subscription']}

Step 7: Analyzing Text Length Patterns in Customer Support Conversations

The following code analyzes the average length of both customer instructions and agent responses, revealing key communication patterns in support interactions. This diagnostic step is optional and can be skipped if needed

# Calculate average instruction length (in tokens)avg_instruction_tokens = df['instruction'].apply(lambda x: len(x.split())).mean()# Calculate average response length (in tokens)avg_response_tokens = df['response'].apply(lambda x: len(x.split())).mean()# Print resultsprint(f"Avg. token count for instructions: {avg_instruction_tokens}")print(f"Avg. token count for responses: {avg_response_tokens}")

Code Explanation:

Example Output:

Avg. token count for instructions: 8.690979458172075
Avg. token count for responses: 104.78903691574874

Step 8: Generating Text Embeddings for Customer Support Analysis

The following code serves as the core component of this project, creating numerical vector embeddings of customer support instructions using OpenAI's embedding models. It utilizes the create_embeddings() method from a helper class located in the /src folder, which calls the OpenAI API to generate these vector representations. To execute this code, you must first sign in to the OpenAI Platform and create an API key, as this is required to produce vector embeddings from your customer training data.

Note: The execution time for this code varies depending on your dataset size. Processing larger training datasets will require more time to complete.

vectors = create_embeddings(df, column_name='instruction', model='text-embedding-3-small')

Code Explanation:

Step 9: Validating Embedding Dimensions with vectors.shape

This code validates the dimensionality of the generated embeddings, confirming alignment with expected specifications prior to downstream processing.

# Check the shape of the vectors
vectors.shape

Understanding the Output:

Step 10 :Creating Efficient Vector Search Indexes with FAISS

The following instruction stores the embeddings in the vector database created in Step 8

index = create_index(vectors, index_file_path='vector/faiss.index')

Code Explanation:

Step 11: Loading Pre-Built vector database for Efficient Search

The following instruction loads the vector database created in Step 10.

index = faiss.read_index('vector/faiss.index')

Code Explanation:

Step 12: Sample Query

This is our query/question for which we want to retrieve answers from our vector database.

query = "how can I change my order? My order number is 501"

Step 13: Performing Semantic Similarity Searches with Embeddings

This code generates an embedding for the query text, then searches the vector database (loaded in Step 11) to retrieve the most relevant matches along with their similarity scores

distances, indices = semantic_similarity(query, index, model='text-embedding-3-small')
top_similar = df.iloc[indices[0]].reset_index(drop=True)
top_similar['distance'] = distances[0]

Code Explanation:

Step 14: Processing and Enhancing Semantic Search Results

This code implements semantic search capabilities using large language models to retrieve and present relevant results for customer support applications.

# Extract responses from top matchesresponses = top_similar['response'].to_list()# Display formatted resultsprint(top_similar[['instruction', 'intent', 'response']].to_markdown(index=False))# Generate enhanced LLM responseprint(call_llm(query, responses))

Code Explanation: