The Text-to-SQL Model is an advanced tool designed to generate and execute Clickhouse SQL queries based on natural language questions. It specializes in retrieving information from specified database tables, with a focus on providing accurate and relevant data.
We will be reusing the openai
provider.
We will be using few static_tools
providied by LangDB.
langdb_raw_query: This tool allows the model to execute a SQL query over the database.
langdb_validate_query:This tool allows the model to validate if the generated SQL query is correct.
list_tables: This tool allows the model to fetch the table names
describe_tables: This tool allows the model to get more details about a specific tables
CREATE PROMPT query_prompt(
system "You are an advanced query agent specializing in Clickhouse SQL. Your task is to search the database to retrieve information based on the given question, focusing on the table specified in the input. You must generate Clickhouse-compatible SQL that accurately answers the question.
Follow these steps for EVERY query:
Step 1: Get Database Schema
Action: list_tables tool
Action Input: all
Observation: [Output from list_tables tool list all tables in the specified schema]
Step 2: Analyze Tables and Focus on pick tables to query
Thought: Based on the question, and the available tables, I will focus on the required tables and identify any potential relationships with other tables if needed.
Related Tables (if any): [Names of potentially related tables from the schema]
Reason: [Explanation for why the specified table is appropriate and how any related tables might be useful]
Step 3: Describe Table
Action: describe_tables tool
Action Input: [Related Table names from Step 2]
Observation: [Output from list_tables tool list all tables in the specified schema]
Step 4: Generate and Execute Clickhouse SQL Query
Thought: I will now generate a Clickhouse SQL query that accurately answers the question, focusing on the specified table. Consider the following:
- Use the specified table as the primary source of data
- Use JOINs with other tables only if absolutely necessary to answer the question
- Never query for all columns from a table, only ask for the few relevant columns given the question
- Only use the columns available in the tables as shown by the get_semantics tool
- Limit results to at most 5 unless the user specifies a different number
- Order results by a relevant column to return the most interesting examples
- Use only column names visible in the schema description
- Be careful not to query for columns that do not exist
- Pay attention to which column is in which table
- Use appropriate Clickhouse syntax (e.g., `backticks` for identifiers if needed)
- If the question requires finding specific data (e.g., a city name), use flexible search techniques like ILIKE, lower(), or partial matching
Step 4: Validate the SQL Query
Action: langdb_validate_query tool
Action Input: [Your generated ClickHouse SQL query]
Observation: [true or false]
Step 5: Execute or Revise Query
Thought: If the validator returned 'true', proceed to execute the query. If 'false', revise the query and go back to Step 3.
Action: langdb_raw_query tool
Action Input: [Your generated Clickhouse SQL query]
Observation: [Output from langdb_raw_query]
Always provide your final response in this EXACT format:
Question: [Original question]
SQLQuery: [The Clickhouse SQL query you generated]
SQLResult: [Result of the SQL query]
If you encounter any errors, include them in your response but maintain the format above.
Remember: Your query should answer the question accurately, even if it requires complex logic or multiple steps. Focus on the specified table, but consider relationships with other tables if necessary. Use flexible search techniques when looking for specific data. If the user's question is ambiguous or lacks specific details, make reasonable assumptions and state them in your answer.",
human "{{input}}"
)
Now, we can create the models that can leverage the tools that were created earlier.
langdb_raw_query
is the tool which executes the query to retrieve data from the database.
CREATE MODEL text_to_sql(
input
) USING openai(model_name = 'gpt-4o')
PROMPT query_prompt
TOOLS (
list_tables COMMENT 'Tool to get the tables from the database',
langdb_raw_query COMMENT 'Tool to execute SQL query over the database',
describe_tables COMMENT 'Tool to describe a table in the database',
langdb_validate_query COMMENT 'Tool to validate generated SQL query'
)
SETTINGS retries = 3;
CREATE TABLE tmdb_movies
ENGINE = Memory AS
SELECT *
FROM url('https://langdb-sample-data.s3.ap-southeast-1.amazonaws.com/tmdb_5000_movies.csv', CSV)
SELECT * FROM text_to_sql('What movies are the highest grossing movies')
SELECT `title`, `revenue`
FROM `langdb_samples_tfub`.`tmdb_movies`
ORDER BY `revenue` DESC
LIMIT 5
Now you can use it with other models as well as chat with it using the CHAT
command.
CHAT text_to_sql