Vector baselines: spotting when a host starts to behave differently¶

Most security monitoring focuses on what happened: a failed login, a new admin, a suspicious process. But some of the strongest early warnings don't come from a single event—they come from a change in how a host normally behaves. When that "normal" shifts, you want to know.

Vector baselines are a way to learn each host's usual "shape" of activity and flag when that shape changes. No fixed thresholds, no long lists of event codes to maintain. Just: this hour looks nothing like what this machine usually does.

The idea in one sentence¶

Instead of watching individual events, we learn what "normal" looks like for each host (per hour, per type of day), then raise a flag when the current hour is statistically far from that normal—even if we can't point to one "bad" event.

Why "shape" matters¶

Every Windows machine produces a stream of event codes: logons, logoffs, process creations, network activity, and so on. Over time, each host settles into a pattern: so many logons per hour, so many process creations, so much of this and that. That pattern is like a fingerprint. It's stable enough that "Monday 10 AM on this workstation" usually looks like previous Mondays at 10 AM—and very different from "Sunday 3 AM" on the same box.

If that pattern suddenly changes—same host, same kind of day, same hour, but a completely different mix and volume of events—something may have changed: malware, abuse, lateral movement, or a misconfiguration. Vector baselines turn that idea into a number: how far is this hour from the learned "normal" for this host and this context?

Vectors, not just counts¶

A single "event count per hour" is too crude: it can't tell the difference between "lots of logons" and "lots of process creations." So we don't use one number. We use a vector: one dimension per event code, and the value in each dimension is how often that code appeared in the hour. That vector is the "shape" of the hour. Different mixes of events produce different shapes.

The system then reduces that shape to a single summary (the vector's "size," or norm) and learns, for each host and each time slot (e.g. workday 10–11, weekend night, etc.), what that size usually is and how much it typically varies. When the current hour's size is many standard deviations away from that baseline—for example, five or more—we treat it as an anomaly: this hour doesn't match this host's normal shape.

Workdays, weekends, holidays¶

"Normal" isn't one number. Activity on a workday morning is different from weekend night or a holiday. So we learn separate baselines for workdays, weekends, and holidays. Low activity on Sunday is normal; the same low activity on Wednesday might not be. That way we don't flood ourselves with false positives when the building is empty.

What you get in practice¶

Per-host, per-context baselines – Each machine gets its own notion of "normal" for each relevant time window and day type.
No manual thresholds – The system learns from data; you don't maintain long lists of "normal" counts per event code.
One number: "how weird is this hour?" – Usually expressed as sigma (Z-score): how many standard deviations the current hour is from the baseline. High sigma ⇒ worth a closer look.
A signal, not a verdict – Vector baseline anomalies are best used as an input to correlation and other rules (e.g. "this host is behaving oddly" plus "unusual logons" = higher confidence), not as the only trigger for an alert.

Where it shines¶

Compromised accounts or hosts – Unusual mix of logons, process creations, or network-related events.
Lateral movement – A usually quiet host suddenly shows a different "shape" of activity.
Malware or misuse – Service creation, scheduled tasks, or other patterns that change the host's typical vector.
Drift – A host's role or usage changes; the baseline drifts and you can detect and then adapt to the new normal.

Takeaway¶

Vector baselines answer a simple question: does this hour look like what this host usually does at this time? By learning that "usual" per host and per context (workday/weekend/holiday), we get a lightweight, adaptive way to spot when a machine starts to behave differently—often before a single event alone would have raised the alarm.

For full technical detail—configuration, weights, sigma thresholds, and how to plug this into your detection pipeline—see Detecting user activity anomalies using event-code vector baselines.