Skip to content

UI-TARS 7B

Verified

ByteDance multimodal model for integrated image and text processing.

BytedanceMultimodalClosed
Vision
Model page
Updated 2026-06-15

About UI-TARS 7B

UI-TARS 7B uses a multimodal architecture that processes both images and text in a single forward pass. Its 128000-token context window enables handling of lengthy documents paired with visual elements. The model remains closed-source and is distributed under ByteDance control.

Strengths include unified understanding of visual scenes and accompanying text without requiring separate encoders. This design reduces pipeline complexity for developers working on image-text workflows. Typical usage covers document analysis, visual question answering, and content moderation pipelines.

Users integrate the model through ByteDance APIs for production applications that need synchronized image and text reasoning. Its closed nature ensures consistent updates while limiting direct fine-tuning by external parties.

Capabilities

Multimodal image-text understanding
User interface and screenshot analysis
GUI element recognition and interaction
Long-context multimodal reasoning
Visual task planning for agents
Text generation grounded in visual inputs

How UI-TARS 7B compares

UI-TARS 7B (striped bar) vs other multimodal on intelligence, speed and price.

Price

USD per 1M output tokens · Lower is better · UI-TARS 7B ranks #11 of 139

$0.15
Qwen3.5-9B
$0.15
Gemma 3 12B
$0.16
Gemma 3 27B
$0.18
Llama Guard 4 12B
$0.20
Ministral 3 14B 2512
$0.20
Mistral Small 3.2 24B
$0.20
UI-TARS 7B
$0.26
Qwen3.5-Flash
$0.28
MiMo-V2.5
$0.30
Llama 4 Scout
$0.30
Seed 1.6 Flash
$0.30
Voxtral Small 24B 2507
$0.33
Gemma 4 26B A4B

Sources: Artificial Analysis (intelligence, speed) · OpenRouter (price).

Best for

GUI Automation Scripting

Analyzes screenshots to identify interface elements and generates grounded interaction steps for building reliable automation scripts in desktop or web applications.

Visual Agent Development

Performs long-context multimodal reasoning to create step-by-step task plans that let agents navigate and operate software interfaces from visual input alone.

UI Screenshot Analysis

Provides detailed text descriptions and element recognition for user interface layouts, supporting design reviews or accessibility audits directly from images.

Strengths & limitations

Strengths

  • +Specialized for UI/GUI tasks
  • +Efficient 7B scale with practical deployment
  • +Strong handling of extended 128k context
  • +Native support for image + text inputs

Limitations

  • Narrow specialization may limit general-purpose use
  • Smaller model size constrains complex reasoning depth
  • Performance tied to UI-style visual domains

Cost calculator

Estimate what UI-TARS 7B would cost for your usage.

$0.00020
per request
$2
estimated / month

Based on UI-TARS 7B's $0.10/1M input · $0.20/1M output. Estimate only — actual cost varies by provider and caching.

Quick start

OpenRouter's API is OpenAI-compatible — most SDKs work by just swapping the base URL. Only the model slug changes between models.

JavaScript · openai
import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "https://openrouter.ai/api/v1",
  apiKey: process.env.OPENROUTER_API_KEY,
});

const completion = await client.chat.completions.create({
  model: "bytedance/ui-tars-1.5-7b",
  messages: [{ role: "user", content: "Hello!" }],
});

console.log(completion.choices[0].message.content);

Model slug: bytedance/ui-tars-1.5-7b

Editor's verdict

Our take on UI-TARS 7B

UI-TARS 7B is Bytedance's proprietary multimodal with a 128K-token context window.

At $0.20 per 1M output tokens, it is very cost-efficient for its class.

It is available through Bytedance's API and aggregators like OpenRouter.

Best suited to specialized for ui/gui tasks and efficient 7b scale with practical deployment.

Did you find this helpful?

Frequently asked questions

The model handles up to 128000 tokens of context for processing extended multimodal sequences.

User reviews

Real, verified reviews from the community shape this model's rating.

Loading reviews…

Sign in to review

Promote UI-TARS 7B

Add this badge to your website, or share the tool.

DFeatured on DhanasviUI-TARS 7B 1