Technical

How Avalw Shield works: from camera frame to screen lock in 200ms

11 min readApril 2026By Avalw Team

Shield is not a webcam app. It is not recording video. It is not streaming to a server. It captures a single still frame every 500 milliseconds, runs on-device machine learning inference, and makes a decision in under 200ms. Here is exactly how every step works.

The camera capture pipeline

The first thing to understand is that Shield does not record video. Video is a continuous stream of frames, typically 30 or 60 per second, which is computationally expensive and completely unnecessary for presence detection.

Instead, Shield captures a single still frame every 500 milliseconds. That is two frames per second. This is enough to detect whether someone is present, whether they are looking at the screen, and whether anyone else is nearby. But it uses a fraction of the resources that video would require.

The capture cycle

At no point is the camera frame saved to disk, transmitted over a network, or retained in memory beyond the processing cycle. Each frame exists for approximately 40 milliseconds before being destroyed.

Why 500ms intervals?

Human movement is slow relative to computer processing. A person leaving their desk, turning their head, or approaching from behind takes hundreds of milliseconds at minimum. Sampling at 500ms intervals captures all meaningful movement changes while using 15 to 30 times less CPU than continuous video processing.

Face detection: on-device machine learning

Shield uses the operating system's built-in machine learning frameworks for face detection. On macOS, this is Apple's Vision framework. On Windows, this is Windows ML with DirectML acceleration.

These frameworks provide hardware-accelerated face detection that runs on the device's Neural Engine (Apple Silicon) or GPU (Windows). The key point is that the ML model is embedded in the operating system itself. Shield does not download a model, does not connect to a cloud AI service, and does not send frames anywhere for processing.

What the ML model returns

For each frame, the face detection model returns a structured result containing:

Shield uses a confidence threshold to filter out false positives. A reflection in a window, a face on a poster, or a pattern on a shirt that vaguely resembles a face will be detected with low confidence and ignored.

Presence tracking: are you there?

The simplest feature in Shield is also the most useful. Presence tracking answers one question: is the authorized user sitting in front of the screen?

How it works

When Shield starts, it begins capturing frames and detecting faces. If exactly one face is detected with high confidence, the user is considered present. If zero faces are detected across multiple consecutive frames, the user is considered away.

The state machine

Present vs. Away detection logic

Shield uses a simple state machine with hysteresis to prevent flickering. The transition from "present" to "away" requires multiple consecutive frames with no face detected (configurable via the lock delay setting). The transition from "away" to "present" is immediate upon face detection. This prevents the screen from locking every time you glance sideways or reach for your coffee.

The lock delay is user-configurable. You can set it anywhere from instant (lock after the first frame with no face) to 60 seconds (lock only after a sustained absence). Most users find that 3 to 5 seconds works well: fast enough to protect the screen during a quick break, but slow enough to avoid false triggers during normal fidgeting.

Face recognition: learning your face

Shield goes beyond simple presence detection. It can learn to recognize your specific face, so it knows the difference between you sitting down and someone else sitting down at your computer.

The enrollment process

When you first enable face recognition, Shield captures several frames of your face from slightly different angles. From these frames, it computes a face embedding: a compact mathematical representation of your facial features. This embedding is stored locally on your device and never leaves it.

Continuous improvement

Over time, Shield refines its model of your face. It learns how you look under different lighting conditions, with and without glasses, at slightly different angles. Each recognized frame slightly updates the stored embedding, making recognition more accurate over days and weeks of use.

This is particularly useful for shared computers. If your colleague sits down at your workstation, Shield will detect a face but will not recognize it as yours. The screen remains locked until you authenticate through traditional means.

Privacy note on face recognition

The face embedding stored by Shield is a compact numerical vector, not an image. It cannot be reverse-engineered into a photograph of your face. It is stored in the application's sandboxed container on your device and is inaccessible to other applications. If you uninstall Shield, the embedding is deleted with it.

Attention analysis: are you looking?

Shield's attention analysis goes one step further than presence detection. It determines not just whether you are at the screen, but whether you are actually looking at it.

Eye tracking

Using the facial landmarks returned by the ML model, Shield can determine eye position and gaze direction. It analyzes whether your eyes are open and whether they are directed toward the screen. If you fall asleep at your desk, turn to talk to a colleague for an extended period, or simply zone out while looking away, Shield can detect this.

Use cases for attention analysis

Shoulder Guard: the core innovation

Shoulder Guard is the feature that makes Shield fundamentally different from a screen saver with a timeout. Instead of only detecting whether you are present, Shoulder Guard detects whether anyone else is looking at your screen.

The algorithm

Shoulder Guard works by counting faces. In normal operation, the expected face count is one: yours. When the face count exceeds one, it means someone else is within the camera's field of view and potentially looking at your screen.

Algorithm · Step by step

Shoulder Guard detection flow

Frame captured. ML model detects 2 faces. Face #1 matches stored embedding (that is you). Face #2 is unrecognized. Face #2 bounding box position indicates they are behind or beside you, not directly in front of the screen. Shoulder Guard triggers. Screen content is blurred within 200ms. When Face #2 exits the frame, content is restored.

The algorithm considers several factors before triggering:

The 200ms response time

When Shoulder Guard determines that an unauthorized viewer is present, it takes approximately 200 milliseconds from the moment the face appears in the camera frame to the moment the screen content is fully obscured. Here is the breakdown:

Two hundred milliseconds is faster than a human can read text on a screen from a standing position while walking past. By the time an observer's eyes have focused on the screen content, the blur is already in place.

The human visual system needs approximately 300 to 500 milliseconds to focus on and begin reading unfamiliar text. Shield's 200ms response time means the screen is blurred before the observer can process what they are seeing.

Performance: designed for all-day use

Shield is designed to run continuously from the moment you log in to the moment you shut down. That means performance must be exceptional.

CPU usage: approximately 2%

On a modern Mac with Apple Silicon, Shield uses approximately 2% of CPU capacity. Most of this is the ML inference, which runs on the dedicated Neural Engine rather than the main CPU cores. On Intel Macs and Windows machines with discrete GPUs, ML inference is offloaded to the GPU, with similar overall impact.

Memory: approximately 50MB

Shield's memory footprint is stable at around 50MB. This does not grow over time because frames are processed and immediately discarded. There is no buffer accumulation, no cache growth, no memory leak pattern. The 50MB consists of the application binary, the ML model weights (loaded once at startup), and the working buffers for frame processing.

Battery impact: minimal

On a MacBook Pro running on battery, Shield reduces battery life by approximately 15 to 20 minutes over a full workday. This is roughly equivalent to having one additional browser tab open. The camera hardware itself draws minimal power, and the Neural Engine is extraordinarily energy-efficient because it was specifically designed for exactly this type of inference workload.

Performance comparison

For context, a typical video conferencing app uses 15 to 30% CPU, 200 to 400MB RAM, and significantly impacts battery life because it processes 30 frames per second and encodes/transmits video. Shield processes 2 frames per second and transmits nothing. The difference in resource usage is roughly 15x.

Lock delay configuration

Shield provides granular control over when the screen locks after you leave. The lock delay setting determines how many seconds of no-face-detected frames are required before the screen is locked.

Shoulder Guard, by contrast, has no configurable delay. When an unauthorized face is detected, the response is always immediate. There is no legitimate reason to delay protection against shoulder surfing.

Putting it all together

Here is the complete flow of what happens every 500 milliseconds while Shield is running:

All of this happens in under 50 milliseconds of actual processing time. For the remaining 450 milliseconds, Shield is idle and consuming near-zero resources.

Shield does not watch you. It glances at you, twice per second, for 40 milliseconds at a time. That is enough to keep your screen protected without meaningful impact on your system.

Try Avalw Shield