1. Scope
Pick window, axis, segments, tags.
Choose what counts as in-scope before asking any analytic question. Narrow by window name, temporal axis, segment values, and tag filters.
Concepts
Once windows are aligned, comparators can reason about nested containment, coverage with exclusions, transition drift, replay-safe audits, and live provisional rows without hand-written interval joins.
Scenario
Real comparisons rarely produce a single clean overlap. A primary and a backup see different parts of the same outage. A maintenance window partly excludes the outage from the SLA. The incident escalates mid-range, so measurements before and after the boundary must be reported separately. The backup briefly drops out. And the dashboard still has to serve a live number while the window is still open.
Pipeline
Hand-written interval joins tend to blend filtering, normalization, alignment, scoring, and rollup into one query — which is why the answers are hard to audit. Spanfold stages them, so each step is explicit and re-runnable.
1. Scope
Choose what counts as in-scope before asking any analytic question. Narrow by window name, temporal axis, segment values, and tag filters.
2. Normalize
Apply known-at filtering for replay safety, clip open windows to a live horizon, exclude ranges that should not count, and require closed windows for historical runs.
3. Align
Match target and against windows on key, source, partition, and temporal axis. Cohorts collapse many members into one derived lane at this stage.
4. Score
Overlap, residual, missing, coverage, gap, containment, lead/lag, and as-of rows come out of this stage. Each row preserves its originating window ids and range.
5. Project
Group rows by segment, tag, source, or bucket. Compute ratios, histograms, and rollups from the evidence rows — not from counters.
6. Publish
Emit JSON, Markdown, debug HTML, or agent context. Final and provisional rows are kept distinct so a live dashboard cannot pretend to be history.
Row families
Covered magnitude over eligible target magnitude, with exclusions applied at the normalize stage. Ratios stay drill-downable because every ratio preserves its contributing rows.
Uncovered spans surfaced as their own rows. Useful with a minimum-magnitude threshold so that micro-flaps below (say) 30 seconds do not trigger alerts but still appear in exports.
One window fully encloses another. Use it for maintenance-inside-outage exclusions, parent-contains-child explanation, or release-window envelopes. Partial containment surfaces as its own row kind.
Signed transition deltas with a tolerance band. Group the rows into buckets to build histograms, or filter by direction (target-leads / target-lags / within-tolerance) for SLA assertions.
Point-in-time match against the previous or next qualifying window. Combined with known-at filtering, as-of rows let audits replay the evidence that was visible at a specific decision moment.
Rows derived from clipped open windows are labelled provisional and carry the horizon metadata. Final rows from closed history stay stable when the same comparison is replayed later.
Hard examples
Each example is a single Compare plan that layers normalization, exclusion, segment filters, and comparator rows. Read them as templates — the shape is the contribution, not the provider names.
Payments outages must be covered by backup at 99.5%, but scheduled maintenance is excluded — unless the incident has already escalated, in which case the exclusion no longer applies.
var maintenance = pipeline.History
.Query()
.Window("MaintenanceWindow")
.Tag("tier", "business-hours")
.Windows();
var sla = pipeline.History
.Compare("Payment coverage SLA")
.Target("primary", s => s.Source("primary"))
.Against("backup", s => s.Source("backup"))
.Within(scope => scope
.Window("PaymentOutage")
.Axis(TemporalAxis.EventTime)
.Segment("lifecycle", "Incident", "Escalated"))
.Normalize(n => n
.RequireClosedWindows()
.ExcludeRanges(maintenance,
when: ctx => !ctx.Target.Segment("lifecycle").Equals("Escalated")))
.Using(c => c
.Coverage()
.Gap(minimumMagnitude: TimeSpan.FromSeconds(30))
.Containment(ContainmentPolicy.ExcludeFromDenominator))
.Run();
var eligible = sla.CoverageRows.TotalTargetMagnitude();
var covered = sla.CoverageRows.TotalCoveredMagnitude();
var ratio = eligible == TimeSpan.Zero
? 1d
: covered / eligible;
var breaches = sla.GapRows
.Where(g => g.Range.Duration > TimeSpan.FromMinutes(5))
.OrderByDescending(g => g.Range.Duration)
.ToArray();
Two providers should open their outage windows within 500 ms of each other. Bucket the signed deltas into 100 ms buckets and flag any bucket that holds more than 5% of the population.
var drift = pipeline.History
.Compare("Provider start-drift")
.Target("primary", s => s.Source("primary"))
.Against("secondary", s => s.Source("secondary"))
.Within(scope => scope
.Window("DeviceOffline")
.Axis(TemporalAxis.EventTime)
.Since(DateTimeOffset.UtcNow.AddDays(-30)))
.Using(c => c.LeadLag(
LeadLagTransition.Start,
TemporalAxis.EventTime,
toleranceMagnitude: TimeSpan.FromMilliseconds(500)))
.Run();
var buckets = drift.LeadLagRows
.GroupBy(row => (long)Math.Round(row.Delta.TotalMilliseconds / 100d) * 100)
.Select(g => new
{
BucketMs = g.Key,
Count = g.Count(),
OutOfTolerance = g.Count(r => r.Direction != LeadLagDirection.WithinTolerance)
})
.OrderBy(b => b.BucketMs)
.ToArray();
var total = drift.LeadLagRows.Count;
var hot = buckets.Where(b => b.OutOfTolerance > total * 0.05).ToArray();
var p95Drift = drift.LeadLagRows
.OrderBy(r => Math.Abs(r.Delta.TotalMilliseconds))
.ElementAt((int)(total * 0.95))
.Delta;
Did the risk picture change between two decision points because of late-arriving events? Regenerate the view at both known-at horizons and diff window identities and ranges.
var earlier = pipeline.History
.Compare("Risk view @ 12,000")
.Target("risk", s => s.Source("risk-service"))
.Against("market", s => s.Source("market-feed"))
.Within(scope => scope.Window("HighRisk"))
.Normalize(n => n.KnownAtPosition(12_000))
.Using(c => c.Overlap().Residual().Missing())
.Run();
var later = pipeline.History
.Compare("Risk view @ 12,847")
.Target("risk", s => s.Source("risk-service"))
.Against("market", s => s.Source("market-feed"))
.Within(scope => scope.Window("HighRisk"))
.Normalize(n => n.KnownAtPosition(12_847))
.Using(c => c.Overlap().Residual().Missing())
.Run();
var retroactive = later.AllWindows()
.Except(earlier.AllWindows(), WindowIdentityComparer.Default)
.ToArray();
var restated = later.AllWindows()
.Join(earlier.AllWindows(),
nw => nw.WindowId, ow => ow.WindowId,
(nw, ow) => new { nw.WindowId, Before = ow.Range, After = nw.Range })
.Where(x => x.Before != x.After)
.ToArray();
The live coverage number is what the dashboard shows; the final number is what survives replay. Keep both, and compute how much of the live number is still provisional.
var horizon = TemporalPoint.ForPosition(currentPosition);
var live = pipeline.History
.Compare("Live coverage")
.Target("primary", s => s.Source("primary"))
.Against("backup", s => s.Source("backup"))
.Within(scope => scope.Window("PaymentOutage"))
.Normalize(n => n.ClipOpenWindowsTo(horizon))
.Using(c => c.Coverage())
.RunLive(horizon);
static double Ratio(IEnumerable<CoverageRow> rows)
{
var covered = rows.Sum(r => r.CoveredMagnitude.Value);
var target = rows.Sum(r => r.TargetMagnitude.Value);
return target == 0 ? 1d : covered / target;
}
var final = live.CoverageRows.Where(r => r.Finality == RowFinality.Final);
var provisional = live.CoverageRows.Where(r => r.Finality == RowFinality.Provisional);
var finalRatio = Ratio(final);
var liveRatio = Ratio(final.Concat(provisional));
var provisionalPp = liveRatio - finalRatio;
dashboard.Render(
finalRatio: finalRatio,
liveRatio: liveRatio,
provisionalBanding: provisionalPp,
horizon: horizon);
Reading order
A comparison result is most useful when ratios come out of the evidence rows instead of replacing them. That is what keeps audits explainable: every summary number can be walked back to the ranges, window ids, segments, and sources that produced it.
The same evidence rows feed live dashboards, incident reports, regulator-facing audits, and retrospective root-cause reviews — with finality preserved, so a provisional live number can never be mistaken for a settled historical one.