How Avalw Shield works: from camera frame to screen lock in 200ms
Shield is not a webcam app. It is not recording video. It is not streaming to a server. It captures a single still frame every 500 milliseconds, runs on-device machine learning inference, and makes a decision in under 200ms. Here is exactly how every step works.
The camera capture pipeline
The first thing to understand is that Shield does not record video. Video is a continuous stream of frames, typically 30 or 60 per second, which is computationally expensive and completely unnecessary for presence detection.
Instead, Shield captures a single still frame every 500 milliseconds. That is two frames per second. This is enough to detect whether someone is present, whether they are looking at the screen, and whether anyone else is nearby. But it uses a fraction of the resources that video would require.
The capture cycle
- T+0ms: Shield requests a single frame from the camera hardware via the operating system's camera API (AVFoundation on macOS, Windows.Media.Capture on Windows).
- T+5ms: The camera delivers a raw pixel buffer. This is a grid of RGB color values, not a compressed image file. No JPEG, no PNG, no file is created.
- T+10ms: The pixel buffer is passed directly to the ML inference engine. The buffer remains in memory only.
- T+30ms: The ML model returns its results: face count, face positions, and confidence scores.
- T+35ms: The application logic evaluates the results and decides what action to take.
- T+40ms: The pixel buffer is zeroed out and released from memory.
- T+500ms: The next frame is captured. The cycle repeats.
At no point is the camera frame saved to disk, transmitted over a network, or retained in memory beyond the processing cycle. Each frame exists for approximately 40 milliseconds before being destroyed.
Why 500ms intervals?
Human movement is slow relative to computer processing. A person leaving their desk, turning their head, or approaching from behind takes hundreds of milliseconds at minimum. Sampling at 500ms intervals captures all meaningful movement changes while using 15 to 30 times less CPU than continuous video processing.
Face detection: on-device machine learning
Shield uses the operating system's built-in machine learning frameworks for face detection. On macOS, this is Apple's Vision framework. On Windows, this is Windows ML with DirectML acceleration.
These frameworks provide hardware-accelerated face detection that runs on the device's Neural Engine (Apple Silicon) or GPU (Windows). The key point is that the ML model is embedded in the operating system itself. Shield does not download a model, does not connect to a cloud AI service, and does not send frames anywhere for processing.
What the ML model returns
For each frame, the face detection model returns a structured result containing:
- Face count: An integer. How many faces are visible in the frame.
- Bounding boxes: The rectangular coordinates of each detected face within the frame.
- Confidence scores: A value between 0 and 1 indicating how confident the model is that each detection is actually a face.
- Facial landmarks: Key points such as eye positions, nose, and mouth, used for attention analysis.
Shield uses a confidence threshold to filter out false positives. A reflection in a window, a face on a poster, or a pattern on a shirt that vaguely resembles a face will be detected with low confidence and ignored.
Presence tracking: are you there?
The simplest feature in Shield is also the most useful. Presence tracking answers one question: is the authorized user sitting in front of the screen?
How it works
When Shield starts, it begins capturing frames and detecting faces. If exactly one face is detected with high confidence, the user is considered present. If zero faces are detected across multiple consecutive frames, the user is considered away.
Present vs. Away detection logic
Shield uses a simple state machine with hysteresis to prevent flickering. The transition from "present" to "away" requires multiple consecutive frames with no face detected (configurable via the lock delay setting). The transition from "away" to "present" is immediate upon face detection. This prevents the screen from locking every time you glance sideways or reach for your coffee.
The lock delay is user-configurable. You can set it anywhere from instant (lock after the first frame with no face) to 60 seconds (lock only after a sustained absence). Most users find that 3 to 5 seconds works well: fast enough to protect the screen during a quick break, but slow enough to avoid false triggers during normal fidgeting.
Face recognition: learning your face
Shield goes beyond simple presence detection. It can learn to recognize your specific face, so it knows the difference between you sitting down and someone else sitting down at your computer.
The enrollment process
When you first enable face recognition, Shield captures several frames of your face from slightly different angles. From these frames, it computes a face embedding: a compact mathematical representation of your facial features. This embedding is stored locally on your device and never leaves it.
Continuous improvement
Over time, Shield refines its model of your face. It learns how you look under different lighting conditions, with and without glasses, at slightly different angles. Each recognized frame slightly updates the stored embedding, making recognition more accurate over days and weeks of use.
This is particularly useful for shared computers. If your colleague sits down at your workstation, Shield will detect a face but will not recognize it as yours. The screen remains locked until you authenticate through traditional means.
Privacy note on face recognition
The face embedding stored by Shield is a compact numerical vector, not an image. It cannot be reverse-engineered into a photograph of your face. It is stored in the application's sandboxed container on your device and is inaccessible to other applications. If you uninstall Shield, the embedding is deleted with it.
Attention analysis: are you looking?
Shield's attention analysis goes one step further than presence detection. It determines not just whether you are at the screen, but whether you are actually looking at it.
Eye tracking
Using the facial landmarks returned by the ML model, Shield can determine eye position and gaze direction. It analyzes whether your eyes are open and whether they are directed toward the screen. If you fall asleep at your desk, turn to talk to a colleague for an extended period, or simply zone out while looking away, Shield can detect this.
Use cases for attention analysis
- Auto-dim: If you are looking away for a sustained period, Shield can dim or lock the screen, protecting content from anyone who might walk by while you are distracted.
- Focus tracking: Some users use attention data to understand their own focus patterns. How often they look away, how long their focused sessions last.
- Presentation mode: During presentations, attention analysis is paused so the screen stays active even when you turn to face your audience.
Shoulder Guard: the core innovation
Shoulder Guard is the feature that makes Shield fundamentally different from a screen saver with a timeout. Instead of only detecting whether you are present, Shoulder Guard detects whether anyone else is looking at your screen.
The algorithm
Shoulder Guard works by counting faces. In normal operation, the expected face count is one: yours. When the face count exceeds one, it means someone else is within the camera's field of view and potentially looking at your screen.
Shoulder Guard detection flow
Frame captured. ML model detects 2 faces. Face #1 matches stored embedding (that is you). Face #2 is unrecognized. Face #2 bounding box position indicates they are behind or beside you, not directly in front of the screen. Shoulder Guard triggers. Screen content is blurred within 200ms. When Face #2 exits the frame, content is restored.
The algorithm considers several factors before triggering:
- Face count: More than one face triggers further analysis.
- Face identity: If face recognition is enabled, known authorized faces do not trigger alerts.
- Position: A face at the edge of the frame (peripheral) is treated differently than a face directly behind the user.
- Persistence: A face that appears for a single frame might be a false positive. Shoulder Guard requires confirmation across 2 to 3 consecutive frames before triggering, which still achieves sub-200ms response time at 500ms frame intervals because of processing pipeline overlap.
- Confidence: Low-confidence detections (posters, reflections, patterns) are filtered out.
The 200ms response time
When Shoulder Guard determines that an unauthorized viewer is present, it takes approximately 200 milliseconds from the moment the face appears in the camera frame to the moment the screen content is fully obscured. Here is the breakdown:
- 0-5ms: Frame capture from camera hardware.
- 5-30ms: ML inference on the Neural Engine or GPU.
- 30-50ms: Algorithm evaluation: face count, identity check, position analysis.
- 50-80ms: Decision confirmed. Blur command issued.
- 80-200ms: Blur animation renders over screen content. Content is fully obscured.
Two hundred milliseconds is faster than a human can read text on a screen from a standing position while walking past. By the time an observer's eyes have focused on the screen content, the blur is already in place.
The human visual system needs approximately 300 to 500 milliseconds to focus on and begin reading unfamiliar text. Shield's 200ms response time means the screen is blurred before the observer can process what they are seeing.
Performance: designed for all-day use
Shield is designed to run continuously from the moment you log in to the moment you shut down. That means performance must be exceptional.
CPU usage: approximately 2%
On a modern Mac with Apple Silicon, Shield uses approximately 2% of CPU capacity. Most of this is the ML inference, which runs on the dedicated Neural Engine rather than the main CPU cores. On Intel Macs and Windows machines with discrete GPUs, ML inference is offloaded to the GPU, with similar overall impact.
Memory: approximately 50MB
Shield's memory footprint is stable at around 50MB. This does not grow over time because frames are processed and immediately discarded. There is no buffer accumulation, no cache growth, no memory leak pattern. The 50MB consists of the application binary, the ML model weights (loaded once at startup), and the working buffers for frame processing.
Battery impact: minimal
On a MacBook Pro running on battery, Shield reduces battery life by approximately 15 to 20 minutes over a full workday. This is roughly equivalent to having one additional browser tab open. The camera hardware itself draws minimal power, and the Neural Engine is extraordinarily energy-efficient because it was specifically designed for exactly this type of inference workload.
Performance comparison
For context, a typical video conferencing app uses 15 to 30% CPU, 200 to 400MB RAM, and significantly impacts battery life because it processes 30 frames per second and encodes/transmits video. Shield processes 2 frames per second and transmits nothing. The difference in resource usage is roughly 15x.
Lock delay configuration
Shield provides granular control over when the screen locks after you leave. The lock delay setting determines how many seconds of no-face-detected frames are required before the screen is locked.
- Instant (0 seconds): The screen locks as soon as no face is detected. Best for high-security environments where even a 3-second exposure is unacceptable.
- 3 seconds: The default. Catches brief absences like bathroom breaks while avoiding false triggers when you lean sideways or reach for something.
- 10 seconds: Good for users who frequently look away from the screen or move around their desk while working.
- 30 to 60 seconds: Maximum tolerance. Suitable for home offices where the risk of unauthorized viewing is low but you still want automatic locking for longer absences.
Shoulder Guard, by contrast, has no configurable delay. When an unauthorized face is detected, the response is always immediate. There is no legitimate reason to delay protection against shoulder surfing.
Putting it all together
Here is the complete flow of what happens every 500 milliseconds while Shield is running:
- Camera captures a single frame into a memory buffer.
- The frame is passed to the on-device ML model for face detection.
- The model returns face count, positions, landmarks, and confidence scores.
- If face recognition is enabled, detected faces are compared against stored embeddings.
- The presence tracker updates its state: present or away, with hysteresis.
- The attention analyzer evaluates eye position and gaze direction.
- Shoulder Guard counts faces and checks for unauthorized viewers.
- If any protection action is needed (lock, blur, dim), it is triggered immediately.
- The camera frame buffer is zeroed and released from memory.
- The cycle waits until the next 500ms interval and repeats.
All of this happens in under 50 milliseconds of actual processing time. For the remaining 450 milliseconds, Shield is idle and consuming near-zero resources.
Shield does not watch you. It glances at you, twice per second, for 40 milliseconds at a time. That is enough to keep your screen protected without meaningful impact on your system.