Overview

I tried building a RAG-based chat using Azure OpenAI, LlamaIndex, and Gradio, so here are my notes.

Azure OpenAI

Create an Azure OpenAI resource.

Then, click "Endpoint: Click here to view endpoint" to note down the endpoint and key.

Then, navigate to the Azure OpenAI Service.

Go to "Model catalog" and deploy "gpt-4o" and "text-embedding-3-small".

The result is displayed as follows.

Downloading the Text

This time, we target "The Tale of Genji" published on Aozora Bunko (a free digital library of Japanese literature).

Download the texts in bulk using the following script.

import requests
from bs4 import BeautifulSoup
import os

url = "https://genji.dl.itc.u-tokyo.ac.jp/data/info.json"

response = requests.get(url).json()

selections = response["selections"]

for selection in selections:

    members = selection["members"]

    for member in members:

        aozora_urls = []

        for metadata in member["metadata"]:

            if metadata["label"] == "aozora":

                aozora_urls = metadata["value"].split(", ")

        for aozora_url in aozora_urls:

            filename = aozora_url.split("/")[-1].split(".")[0]

            opath = f"data/text/{filename}.txt"

            if os.path.exists(opath):
                continue
                # pass

            response = requests.get(aozora_url)

            response.encoding = response.apparent_encoding

            soup = BeautifulSoup(response.text, "html.parser")

            div = soup.find("div", class_="main_text")

            txt = div.get_text().strip()

            os.makedirs(os.path.dirname(opath), exist_ok=True)

            with open(opath, "w") as f:
                f.write(txt)

Creating the Index

Prepare environment variables.

AZURE_OPENAI_ENDPOINT=xxxx
AZURE_OPENAI_API_KEY=xxxx

Then, create the index using the following script.

import os
from llama_index.llms.azure_openai import AzureOpenAI
from llama_index.embeddings.azure_openai import AzureOpenAIEmbedding
from llama_index.core import SimpleDirectoryReader, Settings, VectorStoreIndex

# Environment variables
api_key = os.getenv("AZURE_OPENAI_API_KEY")
api_version = "2024-05-01-preview"
azure_endpoint = os.getenv("AZURE_OPENAI_ENDPOINT")

# LLM
llm = AzureOpenAI(
    model="gpt-4o",
    deployment_name="gpt-4o",
    api_key=api_key,
    azure_endpoint=azure_endpoint,
    api_version=api_version,
)

# Embedding
embed_model = AzureOpenAIEmbedding(
    model="text-embedding-3-small",
    deployment_name="text-embedding-3-small",
    api_key=api_key,
    azure_endpoint=azure_endpoint,
    api_version=api_version,
)

Settings.llm = llm
Settings.embed_model = embed_model

# Data Source -> Document conversion step
documents = SimpleDirectoryReader(
    input_dir="./data/text"
).load_data()

# Save
index = VectorStoreIndex.from_documents(documents)
index.storage_context.persist(persist_dir="./data/index")

Gradio

Finally, create an app using Gradio.

import os
import gradio as gr
from llama_index.core import StorageContext, load_index_from_storage, Settings
from llama_index.llms.azure_openai import AzureOpenAI
from llama_index.embeddings.azure_openai import AzureOpenAIEmbedding

api_key = os.getenv("AZURE_OPENAI_API_KEY")
api_version = "2024-05-01-preview"
azure_endpoint = os.getenv("AZURE_OPENAI_ENDPOINT")

llm = AzureOpenAI(
    model="gpt-4o",
    deployment_name="gpt-4o",
    api_key=api_key,
    azure_endpoint=azure_endpoint,
    api_version=api_version,
)

# You need to deploy your own embedding model as well as your own chat completion model
embed_model = AzureOpenAIEmbedding(
    model="text-embedding-3-small",
    deployment_name="text-embedding-3-small",
    api_key=api_key,
    azure_endpoint=azure_endpoint,
    api_version=api_version,
)

Settings.llm = llm
Settings.embed_model = embed_model

# rebuild storage context
storage_context = StorageContext.from_defaults(persist_dir="./data/index")

# load index
index = load_index_from_storage(storage_context)

query_engine = index.as_query_engine(similarity_top_k=10)

# Function to handle chat messages with history
def echo(message, history):
    print("History:", history)
    context = "\n".join([f"User: {user_msg}\nBot: {bot_msg}" for user_msg, bot_msg in history])
    full_context = f"{context}\nUser: {message}"
    response = query_engine.query(full_context).response
    history.append((message, response))
    return response # history

demo = gr.ChatInterface(
    fn=echo,
    examples=[
        "What kind of person is Hikaru Genji?",
        "What kind of person is Yugao?"
    ],
    title="Llama Index Chatbot",
)
demo.launch()

The chatbot was successfully created as shown below.

Summary

There may be some misunderstandings on my part, but I hope this serves as a helpful reference.