10 KiB
10 KiB
GUI Automation Skill for QwenClaw
Overview
This skill provides full GUI automation capabilities to QwenClaw using Playwright. Control web browsers, interact with elements, capture screenshots, and automate complex web workflows.
Version: 1.0.0
Category: Automation
Dependencies: Playwright
Features
🌐 Web Browser Automation
- Launch Chromium, Firefox, or WebKit
- Navigate to any URL
- Handle multiple tabs/windows
- Custom viewport and user agent
🖱️ Element Interaction
- Click buttons and links
- Type text into inputs
- Fill forms
- Select dropdown options
- Check/uncheck checkboxes
- Hover over elements
📸 Screenshot & Capture
- Full page screenshots
- Element-specific screenshots
- Save to configurable directory
- Automatic timestamps
📊 Data Extraction
- Extract text from elements
- Get attributes (href, src, etc.)
- Scrape tables and lists
- Export to JSON/CSV
⌨️ Keyboard & Navigation
- Press keyboard shortcuts
- Wait for elements
- Wait for navigation
- Execute JavaScript
📥 File Operations
- Download files
- Upload files
- Handle file dialogs
Installation
1. Install Playwright
cd qwenclaw
bun add playwright
2. Install Browser Binaries
# Install all browsers
npx playwright install
# Or specific browsers
npx playwright install chromium
npx playwright install firefox
npx playwright install webkit
3. Enable Skill
Copy this skill to QwenClaw skills directory:
cp -r skills/gui-automation ~/.qwen/qwenclaw/skills/gui-automation
Usage
From Qwen Code Chat
Use gui-automation to take a screenshot of https://example.com
Use gui-automation to navigate to GitHub and click the sign in button
Use gui-automation to fill the login form with username "test" and password "pass123"
Use gui-automation to extract all article titles from the page
From Terminal CLI
# Take screenshot
qwenclaw gui screenshot https://example.com
# Navigate to page
qwenclaw gui navigate https://github.com
# Click element
qwenclaw gui click "#login-button"
# Type text
qwenclaw gui type "search query" "#search-input"
# Extract text
qwenclaw gui extract ".article-title"
# Get page title
qwenclaw gui title
Programmatic Usage
import { GUIAutomation } from './tools/gui-automation';
const automation = new GUIAutomation({
browserType: 'chromium',
headless: true,
});
await automation.launch();
await automation.goto('https://example.com');
await automation.screenshot('example.png');
await automation.close();
API Reference
GUIAutomation Class
Constructor
new GUIAutomation(config?: Partial<GUIAutomationConfig>)
Config:
interface GUIAutomationConfig {
browserType: 'chromium' | 'firefox' | 'webkit';
headless: boolean;
viewport?: { width: number; height: number };
userAgent?: string;
}
Methods
| Method | Description | Example |
|---|---|---|
launch() |
Launch browser | await automation.launch() |
close() |
Close browser | await automation.close() |
goto(url) |
Navigate to URL | await automation.goto('https://...') |
screenshot(name?) |
Take screenshot | await automation.screenshot('page.png') |
click(selector) |
Click element | await automation.click('#btn') |
type(selector, text) |
Type text | await automation.type('#input', 'hello') |
fill(selector, value) |
Fill field | await automation.fill('#email', 'test@test.com') |
select(selector, value) |
Select option | await automation.select('#country', 'US') |
check(selector) |
Check checkbox | await automation.check('#agree') |
getText(selector) |
Get element text | const text = await automation.getText('#title') |
getAttribute(sel, attr) |
Get attribute | const href = await automation.getAttribute('a', 'href') |
waitFor(selector) |
Wait for element | await automation.waitFor('.loaded') |
evaluate(script) |
Execute JS | const result = await automation.evaluate('2+2') |
scrollTo(selector) |
Scroll to element | await automation.scrollTo('#footer') |
hover(selector) |
Hover element | await automation.hover('#menu') |
press(key) |
Press key | await automation.press('Enter') |
getTitle() |
Get page title | const title = await automation.getTitle() |
getUrl() |
Get page URL | const url = await automation.getUrl() |
getHTML() |
Get page HTML | const html = await automation.getHTML() |
Examples
Example 1: Take Screenshot
import { GUIAutomation } from './tools/gui-automation';
const automation = new GUIAutomation();
await automation.launch();
await automation.goto('https://example.com');
await automation.screenshot('example-homepage.png');
await automation.close();
Example 2: Fill and Submit Form
const automation = new GUIAutomation();
await automation.launch();
await automation.goto('https://example.com/login');
await automation.fill('#email', 'user@example.com');
await automation.fill('#password', 'secret123');
await automation.check('#remember-me');
await automation.click('#submit-button');
await automation.screenshot('logged-in.png');
await automation.close();
Example 3: Web Scraping
const automation = new GUIAutomation();
await automation.launch();
await automation.goto('https://news.ycombinator.com');
// Extract all article titles
const titles = await automation.findAll('.titleline > a');
console.log('Articles:', titles.map(t => t.text));
// Extract specific data
const data = await automation.extractData({
topStory: '.titleline > a',
score: '.score',
comments: '.subtext a:last-child',
});
await automation.close();
Example 4: Wait for Dynamic Content
const automation = new GUIAutomation();
await automation.launch();
await automation.goto('https://example.com');
// Wait for element to appear
await automation.waitFor('.loaded-content', { timeout: 10000 });
// Wait for navigation after click
await automation.click('#load-more');
await automation.waitForNavigation({ waitUntil: 'networkidle' });
await automation.screenshot('loaded.png');
await automation.close();
Example 5: Execute JavaScript
const automation = new GUIAutomation();
await automation.launch();
await automation.goto('https://example.com');
// Get window size
const size = await automation.evaluate(() => ({
width: window.innerWidth,
height: window.innerHeight,
}));
// Modify page
await automation.evaluate(() => {
document.body.style.background = 'red';
});
await automation.screenshot('modified.png');
await automation.close();
Common Selectors
| Selector Type | Example |
|---|---|
| ID | #login-button |
| Class | .article-title |
| Tag | input, button, a |
| Attribute | [type="email"], [href*="github"] |
| CSS | div > p.text, ul li:first-child |
| XPath | //button[text()="Submit"] |
Configuration
config.json
Create ~/.qwen/qwenclaw/gui-config.json:
{
"browserType": "chromium",
"headless": true,
"viewport": {
"width": 1920,
"height": 1080
},
"userAgent": "Mozilla/5.0...",
"screenshotsDir": "~/.qwen/qwenclaw/screenshots",
"timeout": 30000,
"waitForNetworkIdle": true
}
Environment Variables
| Variable | Description | Default |
|---|---|---|
PLAYWRIGHT_BROWSERS_PATH |
Browser binaries location | ~/.cache/ms-playwright |
QWENCLAW_GUI_HEADLESS |
Run headless | true |
QWENCLAW_GUI_BROWSER |
Default browser | chromium |
Integration with QwenClaw Daemon
Add to rig-service
Update rig-service/src/agent.rs:
// Add GUI automation capability
async fn automate_gui(task: &str) -> Result<String> {
let automation = GUIAutomation::default();
automation.launch().await?;
// Parse and execute task
let result = automation.execute(task).await?;
automation.close().await?;
Ok(result)
}
Add to QwenClaw skills
The skill is automatically available when installed in skills/gui-automation/.
Troubleshooting
Issue: "Browser not found"
Solution:
# Install browser binaries
npx playwright install chromium
Issue: "Timeout waiting for element"
Solution:
// Increase timeout
await automation.waitFor('.element', { timeout: 30000 });
// Or wait for specific event
await automation.waitForNavigation({ waitUntil: 'networkidle' });
Issue: "Screenshot not saved"
Solution:
# Ensure directory exists
mkdir -p ~/.qwen/qwenclaw/screenshots
# Check permissions
chmod 755 ~/.qwen/qwenclaw/screenshots
Best Practices
1. Always Close Browser
try {
await automation.launch();
// ... do work
} finally {
await automation.close(); // Always close
}
2. Use Headless for Automation
const automation = new GUIAutomation({ headless: true });
3. Wait for Elements
// Don't assume element exists
await automation.waitFor('#dynamic-content');
const text = await automation.getText('#dynamic-content');
4. Handle Errors
try {
await automation.click('#button');
} catch (err) {
console.error('Click failed:', err);
// Fallback or retry
}
5. Use Descriptive Selectors
// Good
await automation.click('[data-testid="submit-button"]');
// Bad (brittle)
await automation.click('div > div:nth-child(3) > button');
Security Considerations
1. Don't Automate Sensitive Sites
Avoid automating:
- Banking websites
- Password managers
- Private data entry
2. Use Headless Mode
{
"headless": true
}
3. Clear Cookies After Session
await automation.close(); // Clears all session data
Resources
- Playwright Docs: https://playwright.dev/
- Playwright Selectors: https://playwright.dev/docs/selectors
- Playwright API: https://playwright.dev/docs/api/class-playwright
Changelog
v1.0.0 (2026-02-26)
- Initial release
- Full browser automation
- Screenshot capture
- Element interaction
- Data extraction
- JavaScript execution
License
MIT License - See LICENSE file for details.
GUI Automation skill ready for QwenClaw! 🖥️🤖