The Year We Stopped Trusting Green KPIs

5G NSA · LTE · KPI Methodology · 6 min read

In earlier generations, a healthy KPI dashboard usually meant a healthy network. By 2021, that assumption quietly stopped being true.

As 5G NSA deployments scaled and traffic patterns shifted, networks that were technically compliant became operationally fragile. KPIs stayed green. Users experienced delays, retries, and intermittent failures that were difficult to reproduce in controlled tests. The problem was not missing counters. It was how existing counters were being interpreted.

What the old KPI model was designed for

Most KPIs in operational use were designed to answer single-layer, binary questions. Did the procedure complete? Was the threshold crossed? These questions made sense when network behavior was relatively sequential and device activity was steady.

KPI What it measured What it could not see RRC setup success Procedure completed without failure code Setup latency variance, repeated setups by same device in short window HO success rate Handover procedure completed Execution delay on GBR bearers, UE sync time at target, post-HO stall duration Throughput above threshold Average PRB throughput met target Burst availability for bursty applications, scheduling gap under competing load Drop rate Abnormal release rate within target Silent session timeouts, retries that recovered technically but degraded experience

Each KPI was technically accurate. Each described a slice of network behavior. None described how the slices connected under real device behavior in 2021.

What had changed about user behavior

By 2021, device activity in a 5G NSA network looked fundamentally different from the steady sessions these KPIs were designed around. Control plane and user plane were split across layers. Applications were bursty. Power-saving behavior made devices enter and exit connected mode more aggressively than the KPI framework anticipated.

5G NSA device behavior vs single-KPI assumptions
Control plane: LTE anchor (eNB) User plane: NR secondary cell (gNB) where available Result: RRC success on LTE does not confirm NR user-plane availability Application behavior: bursty, short sessions, background sync Result: throughput average masks scheduling gap during burst window UE wakes, requests resources, scheduler not immediately ready 100-200ms gap: invisible in hourly average, visible as app latency Power saving (C-DRX, I-DRX, PSM): Result: device in power-save state misses paging registration delay on wake, counted as "idle mode" not "failure" aggregate paging success rate unaffected individual device: delayed reachability, user perceives slow response
What sequence thinking exposed

What started working better was analyzing sequences rather than individual procedure outcomes. Looking at what happened before and after a KPI increment exposed the behavior that the increment alone concealed.

Sequence analysis — RRC transition correlated with scheduler behavior: Single KPI view: RRC setup success rate: 98.4% green Throughput above threshold: yes green No action indicated Sequence view (same cells, same time window): RRC setup: success Time to first UL grant: 180ms (target: 40ms) Scheduler: high competing load at time of grant request NR SCell addition: delayed 340ms post-RRC Application layer: first packet delayed 520ms total User perception: "slow" despite "successful" connection Second sequence: RRC setup: success Device entered C-DRX immediately post-setup (low activity) Paging during DRX cycle: missed Re-registration: 800ms KPI: paging success rate unaffected (device recovered) User: notification delayed, call setup attempt failed silently
Fig 1 — Single-KPI view vs sequence view: same network, same time
time RRC setup UL grant +180ms NR SCell +340ms first packet +520ms total Single KPI: RRC success = green no further action sequence view: 520ms to first packet
What changed in optimization decisions

Sequence thinking changed what parameters were tuned and what outcomes were targeted. Instead of tuning one parameter to improve a binary KPI, the focus shifted to stability across sequences — even if it meant slightly worse headline numbers in some cases.

Example: C-DRX cycle tuning Before sequence analysis: C-DRX long cycle: 320ms Power saving counter: excellent, device battery favorable RRC setup success: unaffected Decision: leave as-is After sequence analysis: C-DRX 320ms cycle: paging miss rate 4.2% for latency-sensitive apps First-packet delay for apps waking device: avg 680ms Neither metric in standard KPI set Decision: reduce C-DRX cycle to 160ms for mixed-traffic sectors Result: paging miss rate -71%, first-packet delay -38% Power saving KPI: minor regression (acceptable tradeoff, documented)

The most valuable insight from that period was not how many KPIs were tracked. It was which ones we stopped trusting blindly. A counter that describes a procedure outcome without describing the sequence it belongs to is not wrong — it is incomplete. Incomplete at the scale of 5G NSA device behavior is operationally indistinguishable from wrong.

The shift from single-KPI monitoring to sequence-aware analysis did not require new counters. It required correlating existing ones differently — RRC transitions with scheduler behavior, mobility events with user-plane stalls, retry patterns with device power states. That correlation discipline became the foundation for how anomaly detection and performance intelligence were built in the years that followed.

5G NSA  ·  LTE  ·  KPI Methodology  ·  RAN Optimization  ·  Performance Engineering  ·  OSS Analytics  ·  Telecommunications

Popular posts from this blog