English

简体中文

Browser Automation Examples

This guide shows how to use AIO Sandbox for browser automation, web scraping, and UI testing.

Overview

AIO Sandbox provides multiple ways to interact with browsers:

VNC Access: Visual browser interaction through remote desktop
Chrome DevTools Protocol (CDP): Programmatic browser control
Browser MCP Server: High-level browser automation tools

VNC Browser Access

Basic VNC Interaction

Access the browser visually through VNC:

# Open VNC interface
open http://localhost:8080/vnc/index.html?autoconnect=true

The VNC interface provides:

Full desktop environment with browser
Visual interaction capabilities
Screenshot and screen recording
Mouse and keyboard input

VNC in Automation Scripts

Use VNC for scenarios requiring visual verification:

import requests
import time

# Take screenshot via VNC API
response = requests.get("http://localhost:8080/vnc/screenshot")
with open("screenshot.png", "wb") as f:
    f.write(response.content)

# Send keyboard input
requests.post("http://localhost:8080/vnc/keyboard", 
              json={"keys": "Hello World"})

Chrome DevTools Protocol (CDP)

CDP Connection

Connect to the browser using CDP:

const CDP = require('chrome-remote-interface');

async function example() {
    const client = await CDP();
    const {Network, Page, Runtime} = client;
    
    // Enable necessary domains
    await Network.enable();
    await Page.enable();
    
    // Navigate to a page
    await Page.navigate({url: 'https://example.com'});
    await Page.loadEventFired();
    
    // Execute JavaScript
    const result = await Runtime.evaluate({
        expression: 'document.title'
    });
    
    console.log('Page title:', result.result.value);
    
    await client.close();
}

example().catch(console.error);

Python CDP Example

Using Python with pychrome:

import pychrome
import time

# Connect to browser
browser = pychrome.Browser(url="http://localhost:8080")
tab = browser.new_tab()

# Enable page domain
tab.Page.enable()

# Navigate to page
tab.Page.navigate(url="https://httpbin.org/get")
tab.wait(1)

# Get page content
result = tab.Runtime.evaluate(expression="document.body.innerText")
print("Page content:", result['result']['value'])

# Clean up
browser.close_tab(tab)

Browser MCP Server API Reference

The @agent-infra/mcp-server-browser package provides the following tools:

browser_navigate - Navigate to a URL
browser_go_back - Go back to previous page
browser_go_forward - Go forward to next page
browser_close - Close the browser

Element Interaction

browser_get_clickable_elements - Get clickable/hoverable/selectable elements
browser_click - Click an element (use with element index)
browser_hover - Hover over an element
browser_select - Select an element
browser_form_input_fill - Fill input fields

Content Retrieval

browser_get_text - Get page text content
browser_get_markdown - Get page as markdown
browser_read_links - Get all page links
browser_screenshot - Take screenshots

Advanced Features

browser_scroll - Scroll the page
browser_evaluate - Execute JavaScript
browser_new_tab - Open new tab
browser_switch_tab - Switch between tabs
browser_tab_list - List all tabs
browser_get_download_list - Get downloaded files

Vision Mode (Optional)

browser_vision_screen_capture - Screenshot for vision mode
browser_vision_screen_click - Vision-based clicking

Browser MCP Server Integration

Python with MCP Client

Use the Browser MCP server for high-level automation:

import asyncio
from mcp import Client
import httpx

async def browser_automation():
    async with httpx.AsyncClient() as http_client:
        # Connect to MCP server
        async with Client("http://localhost:8080/mcp") as client:
            
            # Navigate to page
            result = await client.call_tool("browser_navigate", {
                "url": "https://example.com"
            })
            
            # Take screenshot
            screenshot = await client.call_tool("browser_screenshot", {
                "path": "/tmp/screenshot.png"
            })
            
            # Get page text content
            content = await client.call_tool("browser_get_text")
            
            # Get clickable elements
            elements = await client.call_tool("browser_get_clickable_elements")
            
            print("Page content:", content.content[0].text)
            print("Clickable elements:", elements.content)

asyncio.run(browser_automation())

JavaScript MCP Integration

const { McpClient } = require('@modelcontextprotocol/sdk');

async function browserAutomation() {
    const client = new McpClient("http://localhost:8080/mcp");
    
    try {
        await client.connect();
        
        // Navigate to page
        await client.callTool("browser_navigate", {
            url: "https://news.ycombinator.com"
        });
        
        // Wait for page load
        await new Promise(resolve => setTimeout(resolve, 2000));
        
        // Get clickable elements (like article titles)
        const elements = await client.callTool("browser_get_clickable_elements");
        
        // Get page markdown content
        const markdown = await client.callTool("browser_get_markdown");
        
        console.log("Clickable elements:", elements.content);
        console.log("Page content:", markdown.content);
        
    } finally {
        await client.disconnect();
    }
}

browserAutomation().catch(console.error);

Web Scraping Examples

E-commerce Price Monitoring

import asyncio
import json
from datetime import datetime

async def monitor_prices():
    # Product URLs to monitor
    products = [
        {"name": "Laptop", "url": "https://example-store.com/laptop"},
        {"name": "Phone", "url": "https://example-store.com/phone"}
    ]
    
    results = []
    
    for product in products:
        # Use browser to navigate and extract price
        navigation_result = await client.call_tool("browser_navigate", {
            "url": product["url"]
        })
        
        # Get clickable elements and extract price
        elements = await client.call_tool("browser_get_clickable_elements")
        price_text = await client.call_tool("browser_get_text")
        
        results.append({
            "product": product["name"],
            "price": price_text.content[0].text,
            "timestamp": datetime.now().isoformat(),
            "url": product["url"]
        })
    
    # Save results to file via File API
    await client.call_tool("file_write", {
        "path": "/tmp/price_data.json",
        "content": json.dumps(results, indent=2)
    })
    
    return results

# Run price monitoring
prices = asyncio.run(monitor_prices())
print("Price monitoring complete:", prices)

async def extract_social_posts():
    # Navigate to social media page
    await client.call_tool("browser_navigate", {
        "url": "https://example-social.com/trending"
    })
    
    # Wait for dynamic content to load
    await asyncio.sleep(3)
    
    # Get page content and clickable elements
    content = await client.call_tool("browser_get_text")
    elements = await client.call_tool("browser_get_clickable_elements")
    
    # Process extracted data
    return {
        "page_content": content.content[0].text,
        "clickable_elements": elements.content
    }

Browser Testing Examples

Form Automation Testing

async def test_contact_form():
    # Navigate to form page
    await client.call_tool("browser_navigate", {
        "url": "https://example.com/contact"
    })
    
    # Get form elements first
    elements = await client.call_tool("browser_get_clickable_elements")
    
    # Fill out form fields using form_input_fill
    await client.call_tool("browser_form_input_fill", {
        "selector": "input[name='name']",
        "value": "Test User"
    })
    
    await client.call_tool("browser_form_input_fill", {
        "selector": "input[name='email']",  
        "value": "test@example.com"
    })
    
    await client.call_tool("browser_form_input_fill", {
        "selector": "textarea[name='message']",
        "value": "This is a test message"
    })
    
    # Submit form by clicking submit button
    await client.call_tool("browser_click", {
        "index": 0  # Use index of submit button from elements
    })
    
    # Wait for response
    await asyncio.sleep(2)
    
    # Verify success message by getting page text
    page_text = await client.call_tool("browser_get_text")
    
    assert "Thank you" in page_text.content[0].text
    print("Form submission test passed!")

Performance Testing

async def measure_page_performance():
    start_time = time.time()
    
    # Navigate to page
    await client.call_tool("browser_navigate", {
        "url": "https://example-app.com"
    })
    
    # Wait for page to load (use sleep or evaluate JS)
    await asyncio.sleep(3)
    
    load_time = time.time() - start_time
    
    # Get performance metrics using JavaScript evaluation
    metrics = await client.call_tool("browser_evaluate", {
        "expression": "JSON.stringify({loadTime: performance.timing.loadEventEnd - performance.timing.navigationStart, domContentLoaded: performance.timing.domContentLoadedEventEnd - performance.timing.navigationStart})"
    })
    
    return {
        "total_load_time": load_time,
        "performance_metrics": metrics.result
    }

Playwright Integration

For advanced browser automation, integrate Playwright:

from playwright.async_api import async_playwright
import asyncio

async def playwright_example():
    async with async_playwright() as p:
        # Connect to existing browser in AIO Sandbox
        browser = await p.chromium.connect_over_cdp(
            "ws://localhost:8080/cdp"
        )
        
        page = await browser.new_page()
        
        # Navigate and interact
        await page.goto("https://example.com")
        await page.fill("input[name='search']", "test query")
        await page.click("button[type='submit']")
        
        # Wait for results
        await page.wait_for_selector(".search-results")
        
        # Extract data
        results = await page.query_selector_all(".result-item")
        for result in results:
            title = await result.inner_text()
            print(f"Result: {title}")
        
        await browser.close()

asyncio.run(playwright_example())

Selenium Integration

Use Selenium WebDriver with AIO Sandbox:

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.chrome.options import Options

def selenium_example():
    # Configure Chrome options for remote connection
    chrome_options = Options()
    chrome_options.add_experimental_option("debuggerAddress", "localhost:9222")
    
    # Connect to browser
    driver = webdriver.Chrome(options=chrome_options)
    
    try:
        # Navigate and interact
        driver.get("https://example.com")
        
        # Find and interact with elements
        search_box = driver.find_element(By.NAME, "search")
        search_box.send_keys("test query")
        
        submit_button = driver.find_element(By.CSS_SELECTOR, "button[type='submit']")
        submit_button.click()
        
        # Wait and extract results
        driver.implicitly_wait(10)
        results = driver.find_elements(By.CSS_SELECTOR, ".result-item")
        
        for result in results:
            print(f"Result: {result.text}")
            
    finally:
        driver.quit()

selenium_example()

Best Practices

Resource Management

async def managed_browser_session():
    try:
        # Navigate to starting page
        await client.call_tool("browser_navigate", {
            "url": "https://example.com"
        })
        
        # Perform browser operations
        yield client
        
    finally:
        # Close browser when done
        await client.call_tool("browser_close")

Error Handling

async def robust_browser_automation():
    max_retries = 3
    retry_count = 0
    
    while retry_count < max_retries:
        try:
            # Attempt browser operation
            result = await client.call_tool("browser_navigate", {
                "url": "https://example.com"
            })
            
            if result.success:
                break
                
        except Exception as e:
            retry_count += 1
            if retry_count >= max_retries:
                raise Exception(f"Browser operation failed after {max_retries} retries: {e}")
            
            # Wait before retry
            await asyncio.sleep(2 ** retry_count)

Performance Optimization

async def optimized_scraping():
    # Use multiple tabs for concurrent scraping
    urls = ["url1", "url2", "url3"]
    results = []
    
    for i, url in enumerate(urls):
        if i > 0:
            # Open new tab for additional URLs
            await client.call_tool("browser_new_tab")
            await client.call_tool("browser_switch_tab", {"index": i})
        
        # Navigate to URL
        await client.call_tool("browser_navigate", {"url": url})
        
        # Extract content
        content = await client.call_tool("browser_get_text")
        results.append(content)
    
    return results

Debugging and Monitoring

Screenshot-based Debugging

async def debug_with_screenshots():
    # Take screenshot before action
    await client.call_tool("browser_screenshot", {
        "path": "/tmp/before_action.png"
    })
    
    # Perform action
    await client.call_tool("browser_click", {
        "selector": ".button"
    })
    
    # Take screenshot after action
    await client.call_tool("browser_screenshot", {
        "path": "/tmp/after_action.png"
    })
    
    # Compare or analyze screenshots as needed

Console Log Monitoring

async def monitor_console_logs():
    # Navigate to page
    await client.call_tool("browser_navigate", {
        "url": "https://example.com"
    })
    
    # Use JavaScript evaluation to check for console errors
    console_check = await client.call_tool("browser_evaluate", {
        "expression": """
        (() => {
            const logs = [];
            const originalError = console.error;
            console.error = (...args) => {
                logs.push({level: 'error', message: args.join(' ')});
                originalError.apply(console, args);
            };
            return JSON.stringify(logs);
        })()
        """
    })
    
    print("Console monitoring result:", console_check.result)

Integration with Other AIO Sandbox Components

Browser + File Operations

async def browser_to_file_workflow():
    # Scrape data with browser
    await client.call_tool("browser_navigate", {
        "url": "https://api-docs.example.com"
    })
    
    # Extract API documentation
    docs = await client.call_tool("browser_get_text")
    
    # Save to file
    await client.call_tool("file_write", {
        "path": "/tmp/api_docs.txt",
        "content": docs.content[0].text
    })
    
    # Process with shell command
    await client.call_tool("shell_exec", {
        "command": "grep -E 'POST|GET|PUT|DELETE' /tmp/api_docs.txt > /tmp/endpoints.txt"
    })

Browser + Code Execution

async def browser_driven_analysis():
    # Scrape data
    await client.call_tool("browser_navigate", {
        "url": "https://data-source.com"
    })
    
    # Extract JSON data using JavaScript evaluation
    data = await client.call_tool("browser_evaluate", {
        "expression": "document.querySelector('pre.json-data').textContent"
    })
    
    # Process with Python execution
    analysis_code = f"""
import json
import pandas as pd

# Parse the scraped data
data = json.loads('''{data.result}''')

# Perform analysis
df = pd.DataFrame(data)
summary = df.describe()

print("Data Analysis Summary:")
print(summary.to_string())
"""
    
    result = await client.call_tool("jupyter_execute", {
        "code": analysis_code
    })
    
    print("Analysis Result:", result.content)

This comprehensive guide covers the main approaches to browser automation with AIO Sandbox. Choose the method that best fits your use case:

VNC for visual interaction and debugging
CDP for low-level browser control
MCP Browser Server for high-level automation
Playwright/Selenium for familiar frameworks

For more advanced scenarios, combine browser automation with other AIO Sandbox components like file operations, shell commands, and code execution.

#Browser Automation Examples

#Overview

#VNC Browser Access

#Basic VNC Interaction

#VNC in Automation Scripts

#Chrome DevTools Protocol (CDP)

#CDP Connection

#Python CDP Example

#Browser MCP Server API Reference

#Navigation & Basic Actions

#Element Interaction

#Content Retrieval

#Advanced Features

#Vision Mode (Optional)

#Browser MCP Server Integration

#Python with MCP Client

#JavaScript MCP Integration

#Web Scraping Examples

#E-commerce Price Monitoring

#Social Media Content Extraction

#Browser Testing Examples

#Form Automation Testing

#Performance Testing

#Playwright Integration

#Selenium Integration

#Best Practices

#Resource Management

#Error Handling

#Performance Optimization

#Debugging and Monitoring

#Screenshot-based Debugging

#Console Log Monitoring

#Integration with Other AIO Sandbox Components

#Browser + File Operations

#Browser + Code Execution