The workflow template is free; you only pay for the connected services like OpenAI, OpenRouter, and Qdrant.

What credentials do I need?

Google Sheets OAuth2, OpenRouter or OpenAI for models and embeddings, plus Qdrant and any tool-specific keys.

How do I import it into n8n?

Download the JSON template and import it directly into your n8n instance via the workflow menu.

Evaluate tool usage accuracy in multi-agent AI workflows using Evaluation nodes

Verified

Evaluate AI agent tool usage accuracy using n8n Evaluation nodes.

n8nAI & LLMIntermediate👁 3.1K views

Open template

Updated 2026-06-15

What this workflow does

This workflow uses Evaluation Trigger and Evaluation nodes to test whether an AI Agent correctly invokes tools such as Calculator and Call n8n Workflow Tool. It incorporates OpenRouter Chat Model, Embeddings OpenAI, and Qdrant Vector Store to run multi-agent scenarios and assign binary metrics for tool accuracy.

It is designed for AI developers building autonomous agents in n8n who require quantitative verification of tool selection against predefined expectations.

Who is this for?

AI developers and teams building multi-agent systems in n8n who need to quantitatively evaluate tool usage behavior against ground truth.

What problem it solves

Autonomous agents often make unverified tool calls; this workflow measures whether expected tools were actually used during execution.

Live workflow preview

Interactive canvas of every node and connection — scroll and click to explore. Powered by n8n's preview.

Open the template on n8n to import and run it. View source template →

What it automates

Dataset-driven agent testing

Run batches of test queries from Google Sheets to check if an agent calls the correct tools like Calculator or Qdrant search.

Debugging multi-tool agents

Compare logged intermediate steps against expected tools to identify when an agent skips or misuses available functions.

Performance metric tracking

Assign pass/fail scores for tool_called accuracy and store results for ongoing monitoring of agent reliability.

How the workflow works

The 7 nodes in this automation, in order.

1AI Agent@n8n/n8n-nodes-langchain.agent
2Embeddings OpenAI@n8n/n8n-nodes-langchain.embeddingsOpenAi
3Calculator@n8n/n8n-nodes-langchain.toolCalculator
4Call n8n Workflow Tool@n8n/n8n-nodes-langchain.toolWorkflow
5Qdrant Vector Store@n8n/n8n-nodes-langchain.vectorStoreQdrant
6OpenRouter Chat Model@n8n/n8n-nodes-langchain.lmChatOpenRouter
7Evaluationevaluation

Apps & integrations used

AI AgentEmbeddings OpenAICalculatorCall n8n Workflow ToolQdrant Vector StoreOpenRouter Chat ModelEvaluation

How to set up Evaluate tool usage accuracy in multi-agent AI workflows using Evaluation nodes

1Connect Google Sheets OAuth2 credential and link your test dataset document
2Configure OpenRouter or OpenAI credentials for the chat model and embeddings
3Set up Qdrant Vector Store with sample queries and results
4Define agent tools including Calculator, web search, and summarizer
5Choose trigger: chat input or Evaluation Trigger node
6Run workflow and review Evaluation node output for tool match results

How to customize this workflow

→Swap OpenRouter Chat Model for another supported LLM
→Change trigger from Evaluation Trigger to a scheduled workflow
→Add extra tools via Call n8n Workflow Tool node
→Store evaluation results in a different database instead of Google Sheets