Skip to content
Build a PDF Document RAG System with Mistral OCR, Qdrant and Gemini AI logo

Build a PDF Document RAG System with Mistral OCR, Qdrant and Gemini AI

Verified

Automates PDF ingestion, OCR extraction, vector storage, and Gemini-powered RAG queries.

n8nAI & LLMAdvanced👁 20K views
Open template
Updated 2026-06-16

What this workflow does

This workflow builds an automated pipeline that ingests PDFs, extracts and vectorizes text, stores embeddings in Qdrant, and supports retrieval-augmented queries with Gemini.

It targets developers and teams implementing document-based AI applications that require scalable ingestion and accurate question answering.

Who is this for?

AI engineers, data teams, and knowledge-management groups in legal, research, or enterprise settings who need to turn large PDF collections into queryable knowledge bases.

What problem it solves

Manually extracting text from scanned PDFs and building searchable indexes is slow and error-prone; teams struggle to get accurate answers from document archives without heavy custom coding.

Live workflow preview

Interactive canvas of every node and connection — scroll and click to explore. Powered by n8n's preview.

Open the template on n8n to import and run it. View source template →

What it automates

Legal contract search

Upload client contracts from Google Drive; OCR extracts clauses, vectors are stored in Qdrant, and Gemini answers questions about obligations or renewal dates.

Research paper QA

Ingest academic PDFs, split and embed sections, then let researchers ask natural-language questions across hundreds of papers without reading each one.

Invoice archive lookup

Process batches of scanned invoices; the workflow stores line-item data so finance teams can query totals, vendors, or dates instantly.

How the workflow works

The 11 nodes in this automation, in order.

  1. 1HTTP RequesthttpRequest
  2. 2Google DrivegoogleDrive
  3. 3Codecode
  4. 4Summarization Chain@n8n/n8n-nodes-langchain.chainSummarization
  5. 5Question and Answer Chain@n8n/n8n-nodes-langchain.chainRetrievalQa
  6. 6Embeddings OpenAI@n8n/n8n-nodes-langchain.embeddingsOpenAi
  7. 7Vector Store Retriever@n8n/n8n-nodes-langchain.retrieverVectorStore
  8. 8Token Splitter@n8n/n8n-nodes-langchain.textSplitterTokenSplitter
  9. 9Default Data Loader@n8n/n8n-nodes-langchain.documentDefaultDataLoader
  10. 10Qdrant Vector Store@n8n/n8n-nodes-langchain.vectorStoreQdrant
  11. 11Google Gemini Chat Model@n8n/n8n-nodes-langchain.lmChatGoogleGemini

Apps & integrations used

HTTP RequestGoogle DriveSummarization ChainQuestion and Answer ChainEmbeddings OpenAIVector Store RetrieverToken SplitterDefault Data LoaderQdrant Vector StoreGoogle Gemini Chat Model

How to set up Build a PDF Document RAG System with Mistral OCR, Qdrant and Gemini AI

  1. 1Add Google Drive node and select the folder containing PDFs as trigger.
  2. 2Insert HTTP Request node configured for Mistral OCR API to extract text from each file.
  3. 3Connect Token Splitter and Default Data Loader to chunk the extracted text.
  4. 4Use Embeddings OpenAI node followed by Qdrant Vector Store to index the chunks.
  5. 5Add Vector Store Retriever, Summarization Chain, and Question and Answer Chain nodes.
  6. 6Attach Google Gemini Chat Model to the Q&A chain and wire the final output.

How to customize this workflow

  • Replace OpenAI embeddings with another provider supported by n8n.
  • Change trigger from Google Drive to S3, Dropbox, or a webhook.
  • Insert an extra Summarization Chain before vector storage for long documents.
  • Add a filter step after OCR to skip files below a confidence threshold.

Build a PDF Document RAG System with Mistral OCR, Qdrant and Gemini AI: pros & cons

Pros

  • +End-to-end automation from upload to query
  • +Combines specialized OCR, embeddings, and chat models
  • +Modular design with subflows for easy scaling
  • +Uses production-grade Qdrant for vector storage

Cons

  • Requires paid API keys for Mistral, OpenAI, and Gemini
  • OCR quality varies with document scan quality
  • Advanced setup needs familiarity with vector DB configuration
Did you find this helpful?

Frequently asked questions

It ingests PDFs via Google Drive, runs Mistral OCR, embeds the text with OpenAI, stores vectors in Qdrant, and answers questions using Gemini.

User reviews

Verified reviews from the community shape this listing's rating.

Loading reviews…

Sign in to review

Promote Build a PDF Document RAG System with Mistral OCR, Qdrant and Gemini AI

Add this badge to your website, or share the tool.

DFeatured on DhanasviBuild a PDF Document RAG System with Mistral OCR, Qdrant and Gemini AI 0