Skip to content

ghrepos/android-action-kernel

ย 
ย 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

6 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

๐Ÿ“ฑ Android Use

The AI Agent That Works Where Laptops Can't

Open-source library for AI agents to control native Android apps

๐Ÿš› Built for field workers โ€ข ๐Ÿ“ฆ Logistics โ€ข ๐Ÿš— Gig economy โ€ข ๐Ÿฆ Mobile-first industries


Twitter Stars License Python


๐ŸŽฅ Watch the Demo That Got 700K Views

โ–บ See it automate a logistics workflow in 60 seconds โ†’

Driver texts a photo โ†’ Agent opens WhatsApp โ†’ uses scanner app โ†’ opens banking app โ†’ submits invoice


โญ Star this repo โ€ข ๐Ÿš€ Try it now โ€ข ๐Ÿ’ฌ Join waitlist


๐Ÿš› The Problem: You Can't Fit a Laptop in a Truck Cab

Browser agents only work on websites. Computer Use requires a desktop.

But real work happens on mobile devices in places where laptops don't fit:

  • ๐Ÿš› Truck drivers submit invoices from the cab using factoring apps
  • ๐Ÿ“ฆ Delivery drivers scan packages on handheld devices
  • ๐Ÿš— Gig workers accept orders on phones between rides
  • ๐Ÿ—๏ธ Field technicians log work orders on tablets
  • ๐Ÿฆ Mobile banking happens on phones, not web browsers

There are 3 billion Android devices and zero AI agent access โ€” until now.


๐ŸŽฌ Real Example: Instant Logistics Payday

Watch Android Use automate an entire logistics workflow:

Before โ€” Manual (10+ minutes)

1. Driver takes photo of Bill of Lading
2. Opens WhatsApp, sends to back office
3. Back office downloads image
4. Opens banking app, fills invoice form
5. Uploads documents
6. Submits for payment

After โ€” Automated (30 seconds)

# Driver just texts the photo. Agent does the rest.
run_agent("""
1. Get latest image from WhatsApp
2. Open native scanner app and process it
3. Switch to RTS Pro factoring app
4. Fill invoice form with extracted data
5. Upload PDF and submit for payment
""")

โœ… Result: Driver gets paid faster โ€” no back-office work and no laptop needed.


๐Ÿ’ก Why This Works (The Secret Sauce)

-### ๐Ÿšซ Computer Use (Anthropic)

  • Requires a desktop or laptop
  • Takes screenshots โ†’ uses OCR
  • Sends images to vision model
  • $0.15 per action
  • 3-5 second latency
  • Doesn't work on phones

โœ… Android Use (This Library)

  • Works on handheld devices
  • Reads accessibility tree (XML)
  • Structured data โ†’ LLM
  • $0.01 per action (95% cheaper)
  • <1 second latency
  • Native mobile app control

The breakthrough: Android's accessibility API provides structured UI data (buttons, text, coordinates) without expensive vision models.

Real impact: 95% cost savings + 5x faster + works where laptops can't.


๐Ÿ”ฅ Traction

Launched 24 hours ago with the logistics demo:

  • ๐Ÿš€ 700,000+ views on X/Twitter (watch demo)
  • ๐Ÿ’ฌ Flooded with DMs from logistics companies, gig platforms, field service providers
  • ๐Ÿ—๏ธ Built in 48 hours to validate demand
  • ๐ŸŽฏ Beta pilots testing with trucking companies and delivery fleets
  • ๐Ÿฆ Factoring companies asking about integration

๐Ÿ‘‰ If you're in logistics, gig economy, or field services, star this repo to follow development!


๐Ÿ“Š The Market: Mobile-First Industries

Industry Why They Need This Market Size Current State
๐Ÿš› Logistics Drivers use factoring apps (RTS Pro, OTR Capital) in truck cabs $10.5T Manual, no laptop access
๐Ÿš— Gig Economy Uber/Lyft/DoorDash drivers optimize between apps on phones $455B Tap manually, lose 20% earnings
๐Ÿ“ฆ Last-Mile Amazon Flex, UPS drivers scan packages on handhelds $500B+ Proprietary apps, no APIs
๐Ÿ—๏ธ Field Services Techs log work orders on tablets on-site $200B+ Mobile-only workflows
๐Ÿฆ Mobile Banking Treasury ops, reconciliation on native banking apps $28T 2FA + biometric locks

Total: $40+ trillion in GDP from mobile-first workflows

Browser agents can't reach these. Desktop agents don't fit. Android Use is the only solution.


๐Ÿš€ Quick Start (60 Seconds)

Prerequisites

  • Python 3.10+
  • Android device or emulator (USB debugging enabled)
  • ADB (Android Debug Bridge)
  • OpenAI API key

Installation

# 1. Clone the repo
git clone https://github.com/actionstatelabs/android-action-kernel.git
cd android-action-kernel

# 2. Install dependencies
pip install -r requirements.txt

# 3. Setup ADB
brew install android-platform-tools  # macOS
# sudo apt-get install adb           # Linux

# 4. Connect device & verify
adb devices

# 5. Set your OpenAI API key:
export OPENAI_API_KEY="sk-..."

# 6. Run your first agent
python kernel.py

Try It: Logistics Example

from kernel import run_agent

# Automate the workflow from the viral demo
run_agent("""
Open WhatsApp, get the latest image, 
then open the invoice app and fill out the form
""")

Other examples:

  • "Accept the next DoorDash delivery and navigate to restaurant"
  • "Scan all packages and mark them delivered in the driver app"
  • "Check Chase mobile for today's transactions"

๐Ÿ’ผ Use Cases Beyond Logistics

๐Ÿš— Gig Economy Multi-Apping

Problem: Drivers lose 20%+ earnings manually switching between DoorDash, Uber Eats, Instacart.

run_agent("Monitor all delivery apps, accept the highest paying order")

Impact: Instant acceptance, maximize earnings, reduce downtime.


๐Ÿ“ฆ Package Scanning Automation

Problem: Drivers manually scan 200+ packages/day in proprietary apps.

run_agent("Scan all packages in photo and mark as loaded in Amazon Flex")

Impact: Bulk scanning, eliminate manual entry, speed up loading.


๐Ÿฆ Mobile Banking Operations

Problem: Treasury teams reconcile transactions across multiple mobile banking apps.

run_agent("Log into Chase mobile and export today's wire transfers")

Impact: Automate reconciliation, fraud detection, compliance.


๐Ÿฅ Healthcare Mobile Workflows

Problem: Staff extract patient data from HIPAA-locked mobile portals.

run_agent("Open Epic MyChart and download lab results for patient 12345")

Impact: Data extraction, appointment booking, records management.


๐Ÿงช Mobile App QA Testing

Problem: Manual testing of Android apps is slow and expensive.

run_agent("Create account, complete onboarding, make test purchase")

Impact: Automated E2E testing, regression tests, CI/CD integration.


๐Ÿ› ๏ธ How It Works (Technical Deep Dive)

The 3-Step Loop

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚  Goal: "Get image from WhatsApp, submit invoice"   โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                        โ†“
       โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
       โ”‚  1. ๐Ÿ‘€ PERCEPTION                  โ”‚
       โ”‚  โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”  โ”‚
       โ”‚  $ adb shell uiautomator dump      โ”‚
       โ”‚                                     โ”‚
       โ”‚  Accessibility Tree (XML):         โ”‚
       โ”‚  <Button text="Download Image"     โ”‚
       โ”‚          bounds="[100,500][300,600]"โ”‚
       โ”‚          clickable="true" />        โ”‚
       โ”‚                                     โ”‚
       โ”‚  Parsed to JSON:                   โ”‚
       โ”‚  {"text": "Download Image",        โ”‚
       โ”‚   "center": [200, 550],            โ”‚
       โ”‚   "clickable": true}               โ”‚
       โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                        โ†“
       โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
       โ”‚  2. ๐Ÿง  REASONING (GPT-4)           โ”‚
       โ”‚  โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”  โ”‚
       โ”‚  Prompt: "Goal: Get WhatsApp image"โ”‚
       โ”‚  "Screen: [Download Image button]" โ”‚
       โ”‚                                     โ”‚
       โ”‚  GPT-4 Response:                   โ”‚
       โ”‚  {                                  โ”‚
       โ”‚    "action": "tap",                 โ”‚
       โ”‚    "coordinates": [200, 550],       โ”‚
       โ”‚    "reason": "Download the image"   โ”‚
       โ”‚  }                                  โ”‚
       โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                        โ†“
       โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
       โ”‚  3. ๐Ÿค– ACTION (ADB)                โ”‚
       โ”‚  โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”  โ”‚
       โ”‚  $ adb shell input tap 200 550     โ”‚
       โ”‚                                     โ”‚
       โ”‚  โœ… Image downloaded!              โ”‚
       โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                        โ†“
                  Repeat until done

Why Accessibility Tree > Screenshots

Approach Cost Speed Accuracy Works on Device
Screenshots (Computer Use) $0.15/action 3-5s 70-80% โŒ Desktop only
Accessibility Tree (Android Use) $0.01/action <1s 99%+ โœ… Handheld devices

Technical advantage: Accessibility tree provides structured data (text, coordinates, hierarchy) without image encoding/OCR.


๐Ÿ—๏ธ Code Architecture

kernel.py (131 lines)
โ”œโ”€โ”€ get_screen_state()     # Dump & parse accessibility tree
โ”‚   โ””โ”€โ”€ sanitizer.py       # XML โ†’ JSON (54 lines)
โ”œโ”€โ”€ get_llm_decision()     # GPT-4 reasoning
โ””โ”€โ”€ execute_action()       # ADB commands
    โ”œโ”€โ”€ tap (x, y)
    โ”œโ”€โ”€ type "text"
    โ”œโ”€โ”€ home / back
    โ””โ”€โ”€ done

Total core logic: <200 lines. Simple, hackable, extensible.

๐Ÿ“– API Reference (Click to expand)

Run an Agent

from kernel import run_agent

run_agent(
    goal="Open WhatsApp and download the latest image",
    max_steps=10  # Max actions before timeout
)

Available Actions

# Tap coordinates
{"action": "tap", "coordinates": [540, 1200]}

# Type text
{"action": "type", "text": "Invoice #12345"}

# Navigate
{"action": "home"}  # Home screen
{"action": "back"}  # Previous screen

# Wait/Complete
{"action": "wait"}  # Wait for loading
{"action": "done"}  # Goal achieved

Get Screen State

from kernel import get_screen_state

screen_json = get_screen_state()
# Returns: [{"text": "Submit", "center": [540, 1200], ...}]

๐Ÿ—บ๏ธ Roadmap

โœ… Now (MVP - 48 hours)

  • Core agent loop (perception โ†’ reasoning โ†’ action)
  • Accessibility tree parsing
  • GPT-4 integration
  • Basic actions (tap, type, navigate)

๐Ÿšง Next 2 Weeks

  • PyPI package: pip install android-use
  • Multi-LLM support: Claude, Gemini, Llama
  • WhatsApp integration: Pre-built actions for messaging
  • Error recovery: Retry logic, fallback strategies

๐Ÿ”ฎ Next 3 Months

  • App-specific agents: Pre-trained for RTS Pro, OTR Capital, factoring apps
  • Cloud device farms: Run at scale on AWS Device Farm, BrowserStack
  • Vision augmentation: Screenshot fallback when accessibility insufficient
  • Multi-step memory: Remember context across sessions

๐Ÿš€ Long-term Vision

  • Hosted Cloud API: No-code agent execution (waitlist below)
  • Agent marketplace: Buy/sell vertical-specific automations
  • Enterprise platform: SOC2, audit logs, PII redaction, fleet management
  • Industry partnerships: Direct integration with factoring companies, gig platforms

โ˜๏ธ Cloud API Waitlist

Don't want to host it yourself? Join the waitlist for our managed Cloud API.

What you get:

  • โœ… No device setup required
  • โœ… Scale to 1000s of simultaneous agents
  • โœ… Pre-built integrations (WhatsApp, factoring apps, etc.)
  • โœ… Enterprise features (audit logs, compliance, SLAs)

โ†’ Join the waitlist (Coming Q1 2026)


๐Ÿค Contributing

Want to help build the future of mobile AI agents?

๐Ÿ”ฅ High Priority

  • Logistics app templates: RTS Pro, OTR Capital, Axle, TriumPay integrations
  • WhatsApp automation: Message parsing, image extraction
  • Error handling: Robustness for unreliable connections (truck cabs!)
  • Documentation: Tutorials, video walkthroughs
  • Testing: E2E tests for common workflows

How to Contribute

  1. โญ Star this repo (most important!)
  2. ๐Ÿด Fork it
  3. ๐ŸŒฟ Create branch: git checkout -b feature/factoring-app-support
  4. โœ๏ธ Commit: git commit -m 'Add RTS Pro integration'
  5. ๐Ÿ“ค Push: git push origin feature/factoring-app-support
  6. ๐ŸŽ‰ Open PR

Special focus: If you work in logistics, gig economy, or field services, your domain expertise is invaluable!


๐ŸŒŸ Show Your Support

โญ Help Us Reach 1,000 Stars โญ

We got 700K views but only 12 stars. Help us fix that!

Click here to star this repo โ†’


โญ Star on GitHub

Support the project

Star now โ†’

๐Ÿฆ Share on X

Help logistics companies find this

Tweet โ†’

๐Ÿ’ฌ Join Waitlist

Get early Cloud API access

Sign up โ†’


Progress: โ–ˆโ–ˆโ–ˆโ–ˆโ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘ 12 / 1,000 stars

Help us reach 1,000 stars to validate demand for the Cloud API!


๐Ÿง‘โ€๐Ÿ’ป About the Creator

Built by Ethan Lim - AI engineer, YC W26 applicant

The Origin Story

I was interviewing truck drivers for a logistics automation project. One driver showed me his phone and said:

"I have to manually type invoice data from this Bill of Lading photo into the RTS Pro app. Takes 10 minutes every delivery. I can't use a laptop because it doesn't fit in the cab."

That's when it clicked: AI agents exist for web and desktop, but the real economy runs on handheld devices.

I looked at existing solutions:

  • Browser Use: Only works on websites โŒ
  • Computer Use: Requires a laptop ($0.15/action, vision model) โŒ

Neither solved the truck cab problem. So I built Android Use in 48 hours using Android's accessibility API.

The result:

  • 95% cheaper (accessibility tree vs vision)
  • 5x faster (<1s latency)
  • Works on handheld devices โœ…

I posted the demo showing a logistics workflow. 700K views in 24 hours. The market validated the need is real.

What's Next

This started as a library for developers. But based on demand, we're building:

  1. Open-source core (this repo) - Foundation for everyone
  2. App-specific templates - RTS Pro, factoring apps, gig platforms
  3. Cloud API - Hosted solution for non-technical users
  4. Enterprise platform - SOC2, SLAs, fleet management

Vision: Make AI agents accessible to the 3 billion people who work on mobile devices.


๐Ÿ“ฌ Contact

For logistics companies: If you want to pilot this with your drivers, reach out directly.


๐Ÿ“Š By the Numbers

Since launch (24 hours ago):

  • ๐Ÿ‘€ 700,000+ views on X
  • โญ 12 GitHub stars (help us get to 1,000!)
  • ๐Ÿ’ฌ 150+ DMs from companies
  • ๐Ÿš› 5 logistics company pilots
  • ๐Ÿฆ 3 factoring company partnership discussions

Market data:

  • ๐Ÿš› 3.5M truck drivers in US alone
  • ๐Ÿ“ฆ 60M gig economy workers globally
  • ๐Ÿ’ฐ $40T+ in mobile-first GDP

๐Ÿ“„ License

MIT License - see LICENSE

Free for personal and commercial use. Build on it, sell services with it, integrate it into your logistics platform.

Why MIT? We want this to become the standard for mobile AI agents. Open source = faster adoption.


๐Ÿ™ Acknowledgments

Built on:

  • Browser Use - Web agent inspiration
  • Anthropic Computer Use - Proved UI control works
  • Android Accessibility API - The enabling technology
  • The 700K people who watched and validated this need

Special thanks to:

  • Truck drivers who showed me the real problem
  • Early beta testers in logistics
  • Everyone sharing and supporting this project


๐Ÿš› Built for Workers Who Can't Fit a Laptop in Their Workspace

Whether you're in a truck cab, on a delivery route, or in the field...

AI agents should work where you do.


โญ STAR THIS REPO โ€ข ๐ŸŽฅ WATCH DEMO โ€ข ๐Ÿฆ SHARE IT


git clone https://github.com/actionstatelabs/android-action-kernel.git
cd android-action-kernel
pip install -r requirements.txt
python kernel.py

Join the 700K+ people who believe AI agents deserve to work on mobile devices.



Made with โค๏ธ for field workers โ€ข Star โ€ข Follow @ethanjlim โ€ข Share

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%