Hanzi Browse Documentation

Hanzi Browse gives AI agents a real browser. Use it locally with your own model, or embed it in your product via the API.

Use Hanzi Browse now

Install the extension + MCP server. Bring your own model. One command to get started.

Use Hanzi Browse now

One command sets up everything: detects your browsers, installs the Chrome extension, finds AI agents on your machine, and configures MCP.

npx hanzi-browse setup

Supports Claude Code, Cursor, Windsurf, Claude Desktop, and Codex.

What setup does

Checks for the Chrome extension — opens the install page if missing
Scans for supported AI agents on your machine
Adds Hanzi Browse as an MCP server to each agent's config
Imports credentials (Claude Code OAuth, Codex, or API key)

Supported credentials

Source	How
Claude Code	Auto-detected from `claude login`
Codex	Auto-detected from `codex login`
API key	Set `ANTHROPIC_API_KEY` env var or enter during setup
Custom endpoint	Any OpenAI-compatible API (Ollama, LM Studio, etc.)

Manual setup

If you prefer to configure manually:

# Claude Code
claude mcp add browser -- npx -y hanzi-browse

# Cursor / Windsurf (mcp.json)
{
  "mcpServers": {
    "browser": {
      "command": "npx",
      "args": ["-y", "hanzi-browse"]
    }
  }
}

Test it

After setup, ask your agent something that needs a browser:

"Go to Hacker News and tell me the top 3 stories right now"

Build with Hanzi Browse

Embed browser automation in your product. Your app calls the Hanzi Browse API, a real browser executes the task, you get the result back.

How it works

Quick start: let your AI agent build it

Copy this prompt into Claude Code, Cursor, or any AI coding agent. It has everything the agent needs to integrate Hanzi Browse into your project.

Add browser automation to this project using the Hanzi Browse API. Read the codebase first, then ask me:

1. What browser task should Hanzi Browse automate? (e.g. "read patient chart", "fill out a form", "extract data from a web portal")
2. Where in the UI should the browser pairing flow go? (e.g. settings page, onboarding, a dedicated page)
3. Where should task results appear? (e.g. inline in the app, a chat interface, a dashboard)

Then build the integration using this API reference:

## Hanzi Browse API (base URL: https://api.hanzilla.co)

Auth: `Authorization: Bearer hic_live_...` header on all requests.

### Core flow
1. Create pairing token → show user a link → they connect their browser
2. Run tasks against their connected browser → poll for results
3. Show the answer in your app

### Endpoints

POST /v1/browser-sessions/pair
  Body: {"label": "User Name", "external_user_id": "your_user_id"}
  Returns: {"pairing_token": "hic_pair_...", "expires_in_seconds": 300}
  → Build a link: https://api.hanzilla.co/pair/{pairing_token}
  → User clicks it, their Chrome auto-pairs. Token expires in 5 min.

GET /v1/browser-sessions
  Returns: {"sessions": [{"id": "...", "status": "connected", "label": "..."}]}

POST /v1/tasks
  Body: {"task": "description", "browser_session_id": "...", "url": "optional", "context": "optional"}
  Returns: {"id": "task_id", "status": "running"}
  → task: what to do (max 10K chars). Be specific.
  → url: starting page (optional). If set, agent navigates there first.
  → context: extra info like form data, preferences (max 50K chars).

GET /v1/tasks/:id
  Returns: {"status": "running|complete|error", "answer": "...", "steps": 4}
  → Poll every 2s until status != "running". Typical task takes 10-60s.

POST /v1/tasks/:id/cancel
  → Stops a running task.

GET /v1/tasks/:id/steps
  Returns: {"steps": [{"step": 1, "status": "tool_use", "toolName": "navigate", ...}]}
  → Full execution log for debugging.

GET /v1/tasks/:id/screenshots/:step
  Returns: {"screenshot": "iVBORw0KGgo..."}
  → Base64 JPEG screenshot at a specific step. Prefix with data:image/jpeg;base64, to display.

GET /v1/billing/credits
  Returns: {"free_remaining": 20, "credit_balance": 0, "free_tasks_per_month": 20}

### Key details
- 20 free tasks/month, then $0.05 per completed task. Errors are free.
- Tasks timeout after 30 min. Use cancel to stop early.
- Browser sessions last 30 days and auto-reconnect.
- Two key types: secret (hic_live_) for server-side, publishable (hic_pub_) for client-side embed widget.
- POST /v1/tasks accepts optional webhook_url — Hanzi Browse POSTs the result to your URL on completion.
- Embed widget: <script src="https://browse.hanzilla.co/embed.js"></script> + HanziConnect.mount()
- The user needs the Hanzi Browse Chrome extension installed.
  Install link: https://chromewebstore.google.com/detail/iklpkemlmbhemkiojndpbhoakgikpmcd

### Example: Express + SDK (full integration)
  See: https://github.com/hanzili/hanzi-browse/tree/main/examples/partner-quickstart

Read the codebase to understand the stack and project structure, then ask me the 3 questions above. After I answer, build the full integration.

Or follow the steps manually

Install the Chrome extension — from the Chrome Web Store. Your users will also need this — pairing fails silently without it.
Sign in — open your developer console (Google or email)
Create an API key — from the console, or via POST /v1/api-keys
Pair a browser — generate a pairing token, send your user a link (/pair/{token}). Token expires in 5 minutes.
Run a task — POST /v1/tasks with a task and browser session ID

Sample app: See examples/partner-quickstart for a full working integration (Express + SDK + embed widget).

TypeScript SDK

The SDK wraps the REST API with typed methods, automatic polling, and error handling.

npm install @hanzi-browse/sdk

import { HanziClient } from '@hanzi-browse/sdk';

const client = new HanziClient({ apiKey: 'hic_live_...' });

// 1. Create a pairing token (give the URL to your user)
const { pairingToken } = await client.createPairingToken({
  label: 'Dr. Smith',
  externalUserId: 'user_123',
});
// Send user to: https://api.hanzilla.co/pair/{pairingToken}

// 2. Check for a connected session
const sessions = await client.listSessions();
const connected = sessions.find(s => s.status === 'connected');

// 3. Run a task (polls until complete, 5 min timeout)
const result = await client.runTask({
  browserSessionId: connected.id,
  task: 'Go to example.com and read the page title',
});
console.log(result.answer);
console.log(result.status); // 'complete' | 'error' | 'cancelled'

All methods: createPairingToken, listSessions, deleteSession, createTask, getTask, runTask, cancelTask, listTasks, getTaskSteps, getScreenshot, createApiKey, listApiKeys, deleteApiKey, getUsage, getCredits, health.

Errors throw HanziError with .status (HTTP code) and .data (response body). The SDK retries transient polling errors in runTask() automatically.

Authentication

All API endpoints (except /v1/health) require authentication. Two methods are supported:

Method	Use case	How
API key	Server-to-server, SDK	`Authorization: Bearer hic_live_...`
Session cookie	Developer console, browser	Set automatically after sign-in via Better Auth

curl

curl https://api.hanzilla.co/v1/api-keys \
  -H "Authorization: Bearer hic_live_your_key_here"

API keys are scoped to a workspace. Each key can access all sessions, tasks, and usage within its workspace. Keys are hashed at rest — the plaintext is shown once on creation.

API Keys

POST /v1/api-keys

Create a new API key for your workspace.

# Request
curl -X POST https://api.hanzilla.co/v1/api-keys \
  -H "Authorization: Bearer hic_live_..." \
  -H "Content-Type: application/json" \
  -d '{"name": "production"}'

# Response (201)
{
  "id": "550e8400-e29b-41d4-a716-446655440000",
  "key": "hic_live_a1b2c3d4e5f6...",
  "name": "production",
  "workspace_id": "...",
  "_warning": "Save this key now. It will not be shown again."
}

GET /v1/api-keys

List all API keys. Returns prefixes only, not full keys.

DELETE /v1/api-keys/:id

Delete an API key. Integrations using this key will immediately stop working.

Key types

Type	Prefix	Use case	Permissions
Secret	`hic_live_`	Server-side, SDK	All endpoints
Publishable	`hic_pub_`	Client-side, embed widget	Pair browsers, list sessions only

# Create a publishable key
curl -X POST https://api.hanzilla.co/v1/api-keys \
  -H "Authorization: Bearer hic_live_..." \
  -H "Content-Type: application/json" \
  -d '{"name": "frontend-widget", "type": "publishable"}'

Security: Never expose secret keys (hic_live_) in client-side code. Use publishable keys (hic_pub_) for the embed widget. Publishable keys can only pair browsers — they cannot create tasks or access billing.

Browser Sessions

POST /v1/browser-sessions/pair

Create a pairing token (5-minute expiry). The user enters this in the Chrome extension to connect their browser.

# Request
curl -X POST https://api.hanzilla.co/v1/browser-sessions/pair \
  -H "Authorization: Bearer hic_live_..." \
  -H "Content-Type: application/json" \
  -d '{"label": "Dr. Smith", "external_user_id": "user_123"}'

# Response (201)
{
  "pairing_token": "hic_pair_a1b2c3...",
  "expires_at": 1710000000000,
  "expires_in_seconds": 300
}

POST /v1/browser-sessions/register

Exchange a pairing token for a session credential. Called by the extension, not your app.

GET /v1/browser-sessions

List all browser sessions with status, label, and external_user_id.

# Response (200)
{
  "sessions": [
    {
      "id": "550e8400-...",
      "status": "connected",
      "label": "Dr. Smith",
      "external_user_id": "user_123",
      "connected_at": 1710000000000,
      "last_heartbeat": 1710000060000
    }
  ]
}

DELETE /v1/browser-sessions/:id

Delete a browser session. The user will need to re-pair.

curl -X DELETE https://api.hanzilla.co/v1/browser-sessions/550e8400-... \
  -H "Authorization: Bearer hic_live_..."

Tasks

POST /v1/tasks

Start a browser automation task. Requires a connected browser session.

# Request
curl -X POST https://api.hanzilla.co/v1/tasks \
  -H "Authorization: Bearer hic_live_..." \
  -H "Content-Type: application/json" \
  -d '{
    "task": "Read the patient chart on the current page",
    "browser_session_id": "550e8400-...",
    "url": "https://example.com/chart",
    "context": "Extract: name, medications, allergies"
  }'

# Response (201)
{
  "id": "task_abc123",
  "status": "running",
  "task": "Read the patient chart on the current page",
  "browser_session_id": "550e8400-..."
}

Field	Required	Description
`task`	Yes	What to do (max 10,000 chars)
`browser_session_id`	Yes	Connected session to run against
`url`	No	Starting URL (max 2,048 chars)
`context`	No	Extra context for the agent (max 50,000 chars)
`webhook_url`	No	URL to POST results to on completion (max 2,048 chars)

GET /v1/tasks/:id

Get task status, answer, steps, and usage.

# Response (200) — completed task
{
  "id": "task_abc123",
  "status": "complete",
  "task": "Read the patient chart on the current page",
  "answer": "Patient: Jane Doe. Medications: Lisinopril 10mg...",
  "steps": 4,
  "usage": { "inputTokens": 12000, "outputTokens": 800, "apiCalls": 5 },
  "browser_session_id": "550e8400-...",
  "created_at": 1710000000000,
  "completed_at": 1710000120000
}

POST /v1/tasks/:id/cancel

Cancel a running task.

GET /v1/tasks/:id/steps

Get the full execution log for a task, including each tool call and result.

# Response (200)
{
  "steps": [
    {"step": 1, "status": "tool_use", "toolName": "navigate", "url": "https://example.com"},
    {"step": 2, "status": "tool_use", "toolName": "read_page"}
  ]
}

GET /v1/tasks/:id/screenshots/:step

Get the screenshot captured at a specific step. Returns base64 JPEG.

# Response (200)
{
  "screenshot": "iVBORw0KGgo..."
}

The screenshot field contains raw base64-encoded image data (JPEG). To display it, prefix with data:image/jpeg;base64,.

Webhooks

Instead of polling, pass a webhook_url when creating a task. Hanzi Browse will POST the result to your URL when the task finishes:

# Create a task with webhook
curl -X POST https://api.hanzilla.co/v1/tasks \
  -H "Authorization: Bearer hic_live_..." \
  -H "Content-Type: application/json" \
  -d '{
    "task": "Read the current page title",
    "browser_session_id": "550e8400-...",
    "webhook_url": "https://yourapp.com/api/hanzi-callback"
  }'

# Hanzi Browse POSTs to your URL on completion:
{
  "event": "task.completed",
  "task": {
    "id": "task_abc123",
    "status": "complete",
    "answer": "The page title is...",
    "steps": 3,
    "usage": { "inputTokens": 8000, "outputTokens": 500, "apiCalls": 4 },
    "created_at": 1710000000000,
    "completed_at": 1710000030000
  }
}

Webhook delivery is fire-and-forget with a 10-second timeout. If your endpoint is down, the result is still available via GET /v1/tasks/:id.

GET /v1/tasks

List recent tasks for your workspace.

Usage

GET /v1/usage

Usage summary for your workspace.

# Response (200)
{
  "totalInputTokens": 150000,
  "totalOutputTokens": 12000,
  "totalApiCalls": 45,
  "totalCostUsd": 0.082,
  "taskCount": 8
}

Browser Pairing

Pairing connects a user's Chrome browser to your workspace. Users pair once — the session lasts 30 days and auto-reconnects on browser restart.

How it works

Your backend calls POST /v1/browser-sessions/pair to get a pairing token
Show your user a link: https://api.hanzilla.co/pair/{token}
User clicks the link → their browser auto-pairs → done

# Your backend generates the link:
curl -X POST https://api.hanzilla.co/v1/browser-sessions/pair \
  -H "Authorization: Bearer hic_live_..." \
  -H "Content-Type: application/json" \
  -d '{"label": "Dr. Smith", "external_user_id": "user_123"}'

# Response:
# { "pairing_token": "hic_pair_abc123...", "expires_in_seconds": 300 }

# Give your user this link:
# https://api.hanzilla.co/pair/hic_pair_abc123...

The pairing page detects the Hanzi Browse extension and pairs automatically. If the extension isn't installed, the user sees an "Install" button.

Sessions auto-reconnect on browser restart — no re-pairing needed. Use label and external_user_id to track which session belongs to which user.

Embed widget (recommended)

Drop-in UI component that handles extension detection, pairing, and connection status — like Stripe's checkout widget.

<script src="https://browse.hanzilla.co/embed.js"></script>
<div id="hanzi-connect"></div>
<script>
  HanziConnect.mount('#hanzi-connect', {
    apiKey: 'hic_pub_...',  // publishable key — safe for client-side
    purpose: 'read your EHR on your behalf',
    onConnected: (sessionId) => {
      // Send sessionId to your backend to run tasks
      fetch('/api/set-session', {
        method: 'POST',
        headers: { 'Content-Type': 'application/json' },
        body: JSON.stringify({ sessionId }),
      });
    },
    onDisconnected: () => {
      // Browser disconnected — show re-pair UI
    },
  });
</script>

Security pattern: Use a publishable key (hic_pub_) in the widget. Send the sessionId to your backend. Your backend uses the secret key (hic_live_) to create tasks. Never put secret keys in client-side code.

Session Metadata

When creating a pairing token, attach a label and external_user_id to map Hanzi Browse sessions to your users:

POST /v1/browser-sessions/pair
{
  "label": "Dr. Smith's browser",
  "external_user_id": "user_abc123"
}

Both fields are inherited by the browser session and returned in GET /v1/browser-sessions. Use them to identify whose browser is whose in your system.

Troubleshooting

Extension not detected

Make sure the Chrome extension is installed and enabled. Reload at chrome://extensions if needed.

Agent can't find Hanzi Browse

Restart your AI agent after running setup. MCP config is written to disk but agents need a restart.

Session disconnected

The browser was closed or lost network. Sessions auto-reconnect when the browser reopens. Check GET /v1/browser-sessions for status before creating tasks.

Task fails or times out

Check that the session is connected. Verify credentials are valid. Tasks have a 30-minute timeout. If the page requires login, make sure the user is signed in.

Pairing token expired

Tokens are valid for 5 minutes. Generate a new one via the developer console or POST /v1/browser-sessions/pair.

API key not working

Keys start with hic_live_. Check that you're using the full key (shown once on creation). Verify the key belongs to the correct workspace.

Error Codes

Status	Meaning	Common cause
`400`	Bad Request	Missing required field, input too long, invalid URL
`401`	Unauthorized	Missing or invalid API key / session cookie
`402`	Payment Required	Plan upgrade needed (when billing is active)
`403`	Forbidden	Session belongs to a different workspace
`404`	Not Found	Resource doesn't exist, or belongs to another workspace
`409`	Conflict	Browser session not connected or expired
`429`	Too Many Requests	Rate limit exceeded (10 tasks/min, 5 concurrent)
`500`	Server Error	Internal error — check `request_id` in response for support
`503`	Service Unavailable	Billing not configured, or server degraded

All error responses include a request_id in the X-Request-Id response header for tracing.

# Error response format
{
  "error": "Browser session is not connected. The extension must be running and registered.",
  "request_id": "a1b2c3d4"
}

Security

Mechanism	Details
API keys	SHA-256 hashed at rest. Plaintext shown once on creation. Prefix stored for display.
Pairing tokens	SHA-256 hashed. 5-minute expiry. Single use — cannot be replayed.
Session tokens	30-day expiry. Auto-rotated by the relay. Revocable.
Workspace isolation	All resources scoped to workspace. Cross-workspace access returns 404.
BYOM privacy	No data leaves your machine. Screenshots sent only to your chosen provider.

Full privacy policy: PRIVACY.md