Skip to content
kreuzberg-dev/kreuzberg logo

kreuzberg-dev/kreuzberg

Verified

Fast extraction of text, metadata, and code insights from many formats.

SkillsAgent Skills 8.5kOpen source
View on GitHub
Updated 2026-06-16
kreuzberg-dev/kreuzberg GitHub repository

What is kreuzberg-dev/kreuzberg?

Kreuzberg is an open-source tool that extracts text, metadata, and code intelligence from a wide range of documents and source files. It processes content at native speeds across numerous languages and formats while remaining lightweight.

The system relies on tree-sitter for semantic analysis of code, returning structured results such as functions, classes, and imports. Plugins allow extension for specialized tasks like OCR backends or custom validators.

Developers and data teams use it when they need reliable parsing in production pipelines or research workflows that span multiple programming ecosystems.

Capabilities

Extract text from documents
Parse tables from files
Retrieve metadata
Handle 62+ formats

What you can build with kreuzberg-dev/kreuzberg

Codebase Analysis

Scan large repositories to extract symbols and docstrings for documentation or refactoring tools.

Document Processing Pipelines

Convert mixed-format files into clean text and metadata for downstream search or indexing systems.

Multi-Language Tooling

Build cross-language utilities that need consistent extraction results from Rust, Python, JavaScript, and more.

Install kreuzberg-dev/kreuzberg

Install
pip install kreuzberg
Quick start
/plugin marketplace add kreuzberg-dev/plugins
/plugin install kreuzberg@kreuzberg
  1. 1Choose the binding for your language from the project repository.
  2. 2Install via the package manager shown for that binding, such as pip or cargo.
  3. 3Import the library in your code and point it at a target file or directory.
  4. 4Call the extraction function and inspect the returned metadata and code intelligence.
  5. 5Extend behavior by registering custom plugins if additional processing is required.

kreuzberg-dev/kreuzberg: pros & cons

Pros

  • +Broad coverage of 96 formats and 306 languages in one library
  • +Native performance without GPU hardware
  • +Consistent API across many language bindings
  • +Plugin system for adding OCR or validation logic

Cons

  • Elastic license may restrict certain commercial uses
  • Initial setup varies by language binding chosen
  • Full feature set requires installing language-specific dependencies
Did you find this helpful?

Frequently asked questions

No, it runs at native speeds on standard CPU hardware.

User reviews

Verified reviews from the community shape this listing's rating.

Loading reviews…

Sign in to review

Promote kreuzberg-dev/kreuzberg

Add this badge to your website, or share the tool.

DFeatured on Dhanasvikreuzberg-dev/kreuzberg 0