3. Drive the screen: keys, matching, OCR
Automate what has no agent — installers and GUIs — with keystrokes and vision.
After this lesson you can
- Send keystrokes and key chords to a VM - Wait for screen content with image matching and OCR - Combine them into an installer-style loop
Before you start: Your first provision script
During an OS install there is no guest agent — only the console. The VM API covers that end: type_text / send_keys inject input (chords like ctrl-alt-del included), screenshot grabs the console, and wait_for_image / wait_for_text block until a reference image or an OCR'd regex appears on screen before the script proceeds.
This is exactly how vmlab's own Windows and vintage-DOS templates are built: wait for the dialog, press the key, wait for the next screen. The wscript_matching concept in the reference covers thresholds and match regions.
§ 1Exercise: Log in without the agent
At the Alpine console, wait for the login: prompt via OCR, then type the login sequence.
=
}
Expected result
The script proceeds past both waits and logs the final line — the console session is live.
Hint
OCR needs a legible console. If the match times out, vmlab console alp and look: the guest may still be booting, or the prompt text may differ from what you're waiting for.