Skip to content
AppAgent logo

AppAgent

Verified

Multimodal agent that controls Android apps through taps and swipes.

Autonomous AgentsAutomation 6.8kOpen source
View on GitHub
Updated 2026-06-15
AppAgent GitHub repository

What is AppAgent?

AppAgent is an open-source multimodal framework that enables large language models to interact with Android applications using a simplified action space of taps and swipes.

The system learns app-specific behaviors either through independent exploration or by watching human demonstrations, storing the resulting knowledge for later use on complex tasks.

It targets developers and researchers who need a permission-light way to automate or test mobile apps on real devices or emulators.

Capabilities

operate smartphone apps
multimodal app interaction
execute mobile tasks
gui navigation

What you can build with AppAgent

Social media automation

Follow users on X by navigating the app interface step by step after learning the required actions.

CAPTCHA solving

Demonstrate the ability to pass visual challenges that require precise screen interactions.

Unlabeled UI navigation

Activate a grid overlay to locate and tap elements that lack numeric tags or clear identifiers.

Install AppAgent

Quick start
cd AppAgent
pip install -r requirements.txt
  1. 1Install Android Debug Bridge on your computer.
  2. 2Enable USB debugging in the Android device's developer settings.
  3. 3Connect the device to the PC with a USB cable.
  4. 4Clone the AppAgent repository and configure the chosen vision-language model.
  5. 5Launch the agent and provide a task description for execution on the device.

AppAgent: pros & cons

Pros

  • +Works without any system-level backend permissions
  • +Builds reusable knowledge from exploration or demos
  • +Supports alternative vision models including free options
  • +Includes optional grid overlay for precise control

Cons

  • Requires a physical Android device or emulator setup
  • Depends on the quality of the underlying vision-language model
  • Currently focused on Android only
Did you find this helpful?

Frequently asked questions

It supports GPT-4V by default and offers qwen-vl-max as a free alternative with lower performance.

User reviews

Verified reviews from the community shape this listing's rating.

Loading reviews…

Sign in to review

Promote AppAgent

Add this badge to your website, or share the tool.

DFeatured on DhanasviAppAgent 0