Drive the screen: keys, matching, OCR — vmlab

3. Drive the screen: keys, matching, OCR

Automate what has no agent — installers and GUIs — with keystrokes and vision.

After this lesson you can

- Send keystrokes and key chords to a VM - Wait for screen content with image matching and OCR - Combine them into an installer-style loop

Before you start: Your first provision script

During an OS install there is no guest agent — only the console. The VM API covers that end: type_text / send_keys inject input (chords like ctrl-alt-del included), screenshot grabs the console, and wait_for_image / wait_for_text block until a reference image or an OCR'd regex appears on screen before the script proceeds.

This is exactly how vmlab's own Windows and vintage-DOS templates are built: wait for the dialog, press the key, wait for the next screen. The wscript_matching concept in the reference covers thresholds and match regions.

§ 1Exercise: Log in without the agent

At the Alpine console, wait for the login: prompt via OCR, then type the login sequence.

wscript

use vmlab

fn main(lab: Lab) {
    let alp = lab.vm("alp").expect("no vm alp")
    alp.wait_for_text("login:", 300).expect("no login prompt on screen")
    alp.type_text("root\n")
    alp.wait_for_text("#", 60).expect("no shell prompt")
    lab.log("logged in at the console, no agent involved")
}

Expected result

The script proceeds past both waits and logs the final line — the console session is live.

Hint

OCR needs a legible console. If the match times out, vmlab console alp and look: the guest may still be booting, or the prompt text may differ from what you're waiting for.