Files

10 KiB

GUI Automation Skill for QwenClaw

Overview

This skill provides full GUI automation capabilities to QwenClaw using Playwright. Control web browsers, interact with elements, capture screenshots, and automate complex web workflows.

Version: 1.0.0
Category: Automation
Dependencies: Playwright


Features

🌐 Web Browser Automation

  • Launch Chromium, Firefox, or WebKit
  • Navigate to any URL
  • Handle multiple tabs/windows
  • Custom viewport and user agent

🖱️ Element Interaction

  • Click buttons and links
  • Type text into inputs
  • Fill forms
  • Select dropdown options
  • Check/uncheck checkboxes
  • Hover over elements

📸 Screenshot & Capture

  • Full page screenshots
  • Element-specific screenshots
  • Save to configurable directory
  • Automatic timestamps

📊 Data Extraction

  • Extract text from elements
  • Get attributes (href, src, etc.)
  • Scrape tables and lists
  • Export to JSON/CSV

⌨️ Keyboard & Navigation

  • Press keyboard shortcuts
  • Wait for elements
  • Wait for navigation
  • Execute JavaScript

📥 File Operations

  • Download files
  • Upload files
  • Handle file dialogs

Installation

1. Install Playwright

cd qwenclaw
bun add playwright

2. Install Browser Binaries

# Install all browsers
npx playwright install

# Or specific browsers
npx playwright install chromium
npx playwright install firefox
npx playwright install webkit

3. Enable Skill

Copy this skill to QwenClaw skills directory:

cp -r skills/gui-automation ~/.qwen/qwenclaw/skills/gui-automation

Usage

From Qwen Code Chat

Use gui-automation to take a screenshot of https://example.com

Use gui-automation to navigate to GitHub and click the sign in button

Use gui-automation to fill the login form with username "test" and password "pass123"

Use gui-automation to extract all article titles from the page

From Terminal CLI

# Take screenshot
qwenclaw gui screenshot https://example.com

# Navigate to page
qwenclaw gui navigate https://github.com

# Click element
qwenclaw gui click "#login-button"

# Type text
qwenclaw gui type "search query" "#search-input"

# Extract text
qwenclaw gui extract ".article-title"

# Get page title
qwenclaw gui title

Programmatic Usage

import { GUIAutomation } from './tools/gui-automation';

const automation = new GUIAutomation({
  browserType: 'chromium',
  headless: true,
});

await automation.launch();
await automation.goto('https://example.com');
await automation.screenshot('example.png');
await automation.close();

API Reference

GUIAutomation Class

Constructor

new GUIAutomation(config?: Partial<GUIAutomationConfig>)

Config:

interface GUIAutomationConfig {
  browserType: 'chromium' | 'firefox' | 'webkit';
  headless: boolean;
  viewport?: { width: number; height: number };
  userAgent?: string;
}

Methods

Method Description Example
launch() Launch browser await automation.launch()
close() Close browser await automation.close()
goto(url) Navigate to URL await automation.goto('https://...')
screenshot(name?) Take screenshot await automation.screenshot('page.png')
click(selector) Click element await automation.click('#btn')
type(selector, text) Type text await automation.type('#input', 'hello')
fill(selector, value) Fill field await automation.fill('#email', 'test@test.com')
select(selector, value) Select option await automation.select('#country', 'US')
check(selector) Check checkbox await automation.check('#agree')
getText(selector) Get element text const text = await automation.getText('#title')
getAttribute(sel, attr) Get attribute const href = await automation.getAttribute('a', 'href')
waitFor(selector) Wait for element await automation.waitFor('.loaded')
evaluate(script) Execute JS const result = await automation.evaluate('2+2')
scrollTo(selector) Scroll to element await automation.scrollTo('#footer')
hover(selector) Hover element await automation.hover('#menu')
press(key) Press key await automation.press('Enter')
getTitle() Get page title const title = await automation.getTitle()
getUrl() Get page URL const url = await automation.getUrl()
getHTML() Get page HTML const html = await automation.getHTML()

Examples

Example 1: Take Screenshot

import { GUIAutomation } from './tools/gui-automation';

const automation = new GUIAutomation();
await automation.launch();
await automation.goto('https://example.com');
await automation.screenshot('example-homepage.png');
await automation.close();

Example 2: Fill and Submit Form

const automation = new GUIAutomation();
await automation.launch();

await automation.goto('https://example.com/login');
await automation.fill('#email', 'user@example.com');
await automation.fill('#password', 'secret123');
await automation.check('#remember-me');
await automation.click('#submit-button');

await automation.screenshot('logged-in.png');
await automation.close();

Example 3: Web Scraping

const automation = new GUIAutomation();
await automation.launch();

await automation.goto('https://news.ycombinator.com');

// Extract all article titles
const titles = await automation.findAll('.titleline > a');
console.log('Articles:', titles.map(t => t.text));

// Extract specific data
const data = await automation.extractData({
  topStory: '.titleline > a',
  score: '.score',
  comments: '.subtext a:last-child',
});

await automation.close();

Example 4: Wait for Dynamic Content

const automation = new GUIAutomation();
await automation.launch();

await automation.goto('https://example.com');

// Wait for element to appear
await automation.waitFor('.loaded-content', { timeout: 10000 });

// Wait for navigation after click
await automation.click('#load-more');
await automation.waitForNavigation({ waitUntil: 'networkidle' });

await automation.screenshot('loaded.png');
await automation.close();

Example 5: Execute JavaScript

const automation = new GUIAutomation();
await automation.launch();
await automation.goto('https://example.com');

// Get window size
const size = await automation.evaluate(() => ({
  width: window.innerWidth,
  height: window.innerHeight,
}));

// Modify page
await automation.evaluate(() => {
  document.body.style.background = 'red';
});

await automation.screenshot('modified.png');
await automation.close();

Common Selectors

Selector Type Example
ID #login-button
Class .article-title
Tag input, button, a
Attribute [type="email"], [href*="github"]
CSS div > p.text, ul li:first-child
XPath //button[text()="Submit"]

Configuration

config.json

Create ~/.qwen/qwenclaw/gui-config.json:

{
  "browserType": "chromium",
  "headless": true,
  "viewport": {
    "width": 1920,
    "height": 1080
  },
  "userAgent": "Mozilla/5.0...",
  "screenshotsDir": "~/.qwen/qwenclaw/screenshots",
  "timeout": 30000,
  "waitForNetworkIdle": true
}

Environment Variables

Variable Description Default
PLAYWRIGHT_BROWSERS_PATH Browser binaries location ~/.cache/ms-playwright
QWENCLAW_GUI_HEADLESS Run headless true
QWENCLAW_GUI_BROWSER Default browser chromium

Integration with QwenClaw Daemon

Add to rig-service

Update rig-service/src/agent.rs:

// Add GUI automation capability
async fn automate_gui(task: &str) -> Result<String> {
    let automation = GUIAutomation::default();
    automation.launch().await?;
    
    // Parse and execute task
    let result = automation.execute(task).await?;
    
    automation.close().await?;
    Ok(result)
}

Add to QwenClaw skills

The skill is automatically available when installed in skills/gui-automation/.


Troubleshooting

Issue: "Browser not found"

Solution:

# Install browser binaries
npx playwright install chromium

Issue: "Timeout waiting for element"

Solution:

// Increase timeout
await automation.waitFor('.element', { timeout: 30000 });

// Or wait for specific event
await automation.waitForNavigation({ waitUntil: 'networkidle' });

Issue: "Screenshot not saved"

Solution:

# Ensure directory exists
mkdir -p ~/.qwen/qwenclaw/screenshots

# Check permissions
chmod 755 ~/.qwen/qwenclaw/screenshots

Best Practices

1. Always Close Browser

try {
  await automation.launch();
  // ... do work
} finally {
  await automation.close(); // Always close
}

2. Use Headless for Automation

const automation = new GUIAutomation({ headless: true });

3. Wait for Elements

// Don't assume element exists
await automation.waitFor('#dynamic-content');
const text = await automation.getText('#dynamic-content');

4. Handle Errors

try {
  await automation.click('#button');
} catch (err) {
  console.error('Click failed:', err);
  // Fallback or retry
}

5. Use Descriptive Selectors

// Good
await automation.click('[data-testid="submit-button"]');

// Bad (brittle)
await automation.click('div > div:nth-child(3) > button');

Security Considerations

1. Don't Automate Sensitive Sites

Avoid automating:

  • Banking websites
  • Password managers
  • Private data entry

2. Use Headless Mode

{
  "headless": true
}

3. Clear Cookies After Session

await automation.close(); // Clears all session data

Resources


Changelog

v1.0.0 (2026-02-26)

  • Initial release
  • Full browser automation
  • Screenshot capture
  • Element interaction
  • Data extraction
  • JavaScript execution

License

MIT License - See LICENSE file for details.


GUI Automation skill ready for QwenClaw! 🖥️🤖