Is the workflow free to run?

n8n itself can be self-hosted for free, but you still pay for Mistral OCR, OpenAI embeddings, Gemini calls, and Qdrant hosting.

What credentials do I need?

API keys for Mistral, OpenAI, Google Gemini, and connection details for your Qdrant instance plus Google Drive OAuth.

How do I import this into n8n?

Download the JSON workflow file and import it via the n8n editor; then add the required credentials and activate.

Build a PDF Document RAG System with Mistral OCR, Qdrant and Gemini AI

Verified

Automates PDF ingestion, OCR extraction, vector storage, and Gemini-powered RAG queries.

n8nAI & LLMAdvanced👁 20K views

Open template

Updated 2026-06-16

What this workflow does

This workflow builds an automated pipeline that ingests PDFs, extracts and vectorizes text, stores embeddings in Qdrant, and supports retrieval-augmented queries with Gemini.

It targets developers and teams implementing document-based AI applications that require scalable ingestion and accurate question answering.

Who is this for?

AI engineers, data teams, and knowledge-management groups in legal, research, or enterprise settings who need to turn large PDF collections into queryable knowledge bases.

What problem it solves

Manually extracting text from scanned PDFs and building searchable indexes is slow and error-prone; teams struggle to get accurate answers from document archives without heavy custom coding.

Live workflow preview

Interactive canvas of every node and connection — scroll and click to explore. Powered by n8n's preview.

Open the template on n8n to import and run it. View source template →

What it automates

Legal contract search

Upload client contracts from Google Drive; OCR extracts clauses, vectors are stored in Qdrant, and Gemini answers questions about obligations or renewal dates.

Research paper QA

Ingest academic PDFs, split and embed sections, then let researchers ask natural-language questions across hundreds of papers without reading each one.

Invoice archive lookup

Process batches of scanned invoices; the workflow stores line-item data so finance teams can query totals, vendors, or dates instantly.

How the workflow works

The 11 nodes in this automation, in order.

1HTTP RequesthttpRequest
2Google DrivegoogleDrive
3Codecode
4Summarization Chain@n8n/n8n-nodes-langchain.chainSummarization
5Question and Answer Chain@n8n/n8n-nodes-langchain.chainRetrievalQa
6Embeddings OpenAI@n8n/n8n-nodes-langchain.embeddingsOpenAi
7Vector Store Retriever@n8n/n8n-nodes-langchain.retrieverVectorStore
8Token Splitter@n8n/n8n-nodes-langchain.textSplitterTokenSplitter
9Default Data Loader@n8n/n8n-nodes-langchain.documentDefaultDataLoader
10Qdrant Vector Store@n8n/n8n-nodes-langchain.vectorStoreQdrant
11Google Gemini Chat Model@n8n/n8n-nodes-langchain.lmChatGoogleGemini

Apps & integrations used

HTTP RequestGoogle DriveSummarization ChainQuestion and Answer ChainEmbeddings OpenAIVector Store RetrieverToken SplitterDefault Data LoaderQdrant Vector StoreGoogle Gemini Chat Model

How to set up Build a PDF Document RAG System with Mistral OCR, Qdrant and Gemini AI

1Add Google Drive node and select the folder containing PDFs as trigger.
2Insert HTTP Request node configured for Mistral OCR API to extract text from each file.
3Connect Token Splitter and Default Data Loader to chunk the extracted text.
4Use Embeddings OpenAI node followed by Qdrant Vector Store to index the chunks.
5Add Vector Store Retriever, Summarization Chain, and Question and Answer Chain nodes.
6Attach Google Gemini Chat Model to the Q&A chain and wire the final output.

How to customize this workflow

→Replace OpenAI embeddings with another provider supported by n8n.
→Change trigger from Google Drive to S3, Dropbox, or a webhook.
→Insert an extra Summarization Chain before vector storage for long documents.
→Add a filter step after OCR to skip files below a confidence threshold.