Introduction

Wouldn’t it be great if you could query your SQL database using natural language instead of writing complex SQL queries? Thanks to open-source technologies like LlamaIndex and Groq, you can now build an AI-powered application that allows users to chat with their SQL database effortlessly.

In a previous article I wrote about the fundamentals of RAG and the concepts used in RAG to generate a response based on your own data. In this guide, we’ll walk through how to set up a natural language SQL (NL-SQL) query engine using LlamaIndex, Groq, and SQLAlchemy. We’ll also leverage Hugging Face embeddings to improve query performance. By the end of this tutorial, you’ll be able to ask questions about your SQL database in plain English and get accurate responses.


Prerequisites

Before we dive in, ensure you have Python installed and set up your environment by installing the required dependencies.

Install Dependencies

Create a requirements.txt file and add the following:

llama-index
llama-index-llms-groq
llama-index-embeddings-huggingface
sqlalchemy
python-dotenv

Then, install the dependencies using pip:

pip install -r requirements.txt

Additionally, you’ll need a free API key from Groq to access its LLM capabilities. Store this key in a .env file as follows:

groqApi=your_api_key_here

Step 1: Set Up Your Database

For this tutorial, we’ll use a SQLite database (football.sqlite) you can find HERE with a table named matchs that stores football match data. If you’re using a different database, update the SQLAlchemy connection string accordingly.

from sqlalchemy import create_engine

engine = create_engine("sqlite:///football.sqlite")

This establishes a connection to our SQL database.


Step 2: Load Environment Variables

To keep our API keys secure, we’ll use dotenv to load environment variables from the .env file.

from dotenv import load_dotenv
import os

# Load the .env file
load_dotenv()

# Access the API key which is stored in the .env file
api_key = os.getenv('groqApi')

This ensures that our API key remains confidential while making it accessible in our code.


Step 3: Set Up the Embedding Model

We’ll use Hugging Face embeddings to improve how queries are processed and matched to database content, otherwise OpenAI embedding will be used by default.

from llama_index.embeddings.huggingface import HuggingFaceEmbedding
from llama_index.core import Settings

# Set the embedding model
embed_model = HuggingFaceEmbedding(model_name="sentence-transformers/all-MiniLM-L6-v2")

# Add the embedding model to llama-index as a default model
Settings.embed_model = embed_model

Embeddings help transform text into numerical representations that allow the model to understand semantic similarities.


Step 4: Initialize the Groq LLM

You can choose among a long list of LLM Models supported by Groq We’ll use Groq’s Llama3-70b-8192 model to interpret and process natural language queries.

from llama_index.llms.groq import Groq

# Initialize Groq LLM
llm = Groq(model="llama3-70b-8192", api_key=api_key)

This LLM will be responsible for converting user queries into SQL statements.


Step 5: Define the SQL Database and Query Engine

Now, we’ll configure LlamaIndex to interact with our database. We specify the table (matchs) that we want to include in queries.

from llama_index.core import SQLDatabase
from llama_index.core.query_engine import NLSQLTableQueryEngine

# Define the SQLDatabase
sql_database = SQLDatabase(engine, include_tables=["matchs"])

# Create a query engine
query_engine = NLSQLTableQueryEngine(sql_database=sql_database, tables=["matchs"], llm=llm)

The NLSQLTableQueryEngine acts as an interface between the LLM and the database, allowing us to execute natural language queries.


Step 6: Query the Database Using Natural Language

Now that our setup is complete, let’s try querying our database using plain English:

# Query the engine
response = query_engine.query("show me the last time Waregem and Genk played")

print(response)

Instead of writing SQL manually, the AI converts the natural language query into an SQL statement, executes it, and returns the result.


Conclusion

With just a few steps, we’ve built an AI-powered SQL query engine that allows users to interact with their database using natural language. By leveraging LlamaIndex, Groq, Hugging Face embeddings, and SQLAlchemy, we’ve created a powerful tool for querying structured data without writing SQL queries manually.

Key Takeaways:

✅ Use LlamaIndex to connect natural language queries to SQL databases.

✅ Leverage Groq’s LLM to generate SQL queries dynamically.

✅ Improve query understanding with Hugging Face embeddings.

✅ Securely store API keys using dotenv.

This approach can be applied to various databases and business domains, making data access more intuitive and efficient. Try it out with your own dataset and let us know your experience!


Similar Posts