LangDB aims to bridge the gap between data and AI world by running them alongside. You can create an entire LangDB experience just using SQL commands all the way from extraction to creating a chat agent.
This sample demonstrates the following:
extract_text
command.View
model
that connects to OpenAI
APIs.CHAT
command.--Create table for storing pdf chunks
CREATE TABLE pdf_chunks (
id UUID DEFAULT generateUUIDv4(),
content `String`
)
ENGINE = MergeTree
ORDER BY (id, content);
INSERT INTO pdf_chunks(content)
select text from chunk(
(
SELECT content FROM
extract_text( path => 'https://d18rn0p25nwr6d.cloudfront.net/CIK-0000320193/faab4555-c69b-438a-aaf7-e09305f87ca3.pdf',
type => 'pdf')
),
chunk_size => 200,
type => 'Word'
)
SELECT count(*) FROM pdf_chunks;
SELECT * FROM pdf_chunks LIMIT 2;
LangDB offers a convenient method to generate embeddings using the custom embedding type model function for development and testing purposes.
CREATE EMBEDDING MODEL generate_embeddings
USING openai(model='text-embedding-ada-002', encoding_format='float')
--Creating a table for storing embeddings
CREATE TABLE pdf_embeddings (
id UUID,
content `String`,
embeddings `Array`(`Float32`)
)
ENGINE = MergeTree
ORDER BY id;
Use generate_embed
model to generate embeddings for each chunk and store them into pdf_embeddings
table.
INSERT INTO pdf_embeddings
SELECT id, content, embedding FROM generate_embeddings((
SELECT p.id, content
FROM pdf_chunks AS p
LEFT JOIN pdf_embeddings AS pe ON p.id = pe.id
WHERE p.id != pe.id
ORDER BY p.id
), input=>content)
-- check embeddings
SELECT * FROM pdf_embeddings LIMIT 2;
View
let you create access endpoints that can be used both for API consumption as well as RAG inputs for LLMs.
Here we are creating a view named similar()
to perform a vector search against similar chunks to feed into our LLM model.
CREATE VIEW similar(query String "Query to search similar sections in pdf documents") AS
WITH query AS (
SELECT embedding::Array(Float32) AS query FROM generate_embeddings($query)
)
SELECT
p.id,
p.content,
cosineDistance(p.embeddings, query) AS similarity
FROM
pdf_embeddings p
CROSS JOIN
query
ORDER BY
similarity ASC
LIMIT 5
Sample vector search using similar()
function:
--returns similar sections in pdf documents matching the input string
SELECT * FROM similar('Apple liabilities') LIMIT 2
SELECT * FROM similar('PLEASE NOTE: THERE IS NO PROOF OF CLAIM FORM FOR') LIMIT 2
You can dynamically create models and prompts on the fly using CREATE
commands that can leverage tools that you have created.
similar
function as a tool.CREATE MODEL IF NOT EXISTS search(
input
) USING openai(model_name= 'gpt-4o-mini')
PROMPT (system "You are a helpful assistant. You have been provided with similar tool to search over the SEC fillings.
Your task is also to help users with any queries they might have about the SEC filings.
Go through all the content and then respond.",
human "{{input}}")
TOOLS (
similar COMMENT 'View to search embeddings of the PDF'
)
SETTINGS retries = 1;
Using the created model, we can perform search()
query that leverages similar()
tool and responds.
This can immediately be plugged into the frontends to also provide a chat interface.
SELECT * FROM search('Earnings of Apple in 2022')