Two decades on radio networks leave you with a particular skepticism - toward device behavior, vendor benchmarks, that stay green while customers notice something is wrong. Perspective is always from the network floor up. Posts cover anomaly detection, 5G SA/NSA, automation, and building observability into live national networks. The signal was always there. Getting to the insight is the harder part.
Carrier Aggregation Looked Enabled — Until We Looked Per User
LTE · 5G · Carrier Aggregation · User Analytics · 7 min read
Carrier Aggregation is often treated as a checkbox feature. If counters show 2CC, 3CC, or 4CC usage, the assumption is that users are benefiting. Cell-level metrics hide an uncomfortable truth: not all users experience CA the way the network thinks they do.
The limitation is visibility. Traditional KPIs show how often CA is configured, not whether it is effective. They cannot show when a device is technically aggregated but practically constrained — by capability mismatches, scheduling behavior, or radio conditions that suppress throughput on the secondary carriers.
What cell-level CA metrics actually measure
Cell-level CA counters — what they capture and what they don't:
CA utilization rate: % of TTIs where CA was configured
2CC session ratio: % of sessions with 2 component carriers active
3CC/4CC session ratio: as above
What these confirm:
CA feature is active
Network attempted to configure CA for eligible devices
CA was not immediately rejected
What these do not confirm:
Whether secondary carrier(s) added any usable throughput
Whether device capability matched the configured band combination
Whether scheduler allocated resources on secondary carrier
Whether radio conditions on SCC were sufficient for throughput gain
Whether the user in full-buffer state received their CA potential
A cell showing 78% 2CC utilization may have a substantial fraction of those sessions where the secondary carrier contributed near-zero throughput. The utilization counter increments regardless. The user experience does not.
Three data sources — what each contributed
Network KPIs
CA utilization rate per cell
SCC addition success rate
Aggregate throughput per sector
Showed: CA is working Missed: whether it helped each user
EDRs (Engineering Data Records)
Per-session throughput with CA active
Full-buffer state detection
Per-user SCC utilization during session
Showed: throughput gap for aggregated users Revealed: CA configured ≠ CA effective
LSRs (Location Session Records)
Device category and UE capability
Band combination support per device
Software version, radio environment
Showed: why — device and context constraints behind the gap
What the correlation exposed
Combining these three datasets in a cloud analytics pipeline surfaced patterns that no single source revealed independently. Users in full-buffer states — where CA impact should have been highest — showed a wide distribution of actual throughput gains. For a significant subset, CA provided little to no benefit despite being configured.
Pattern 1: device capability mismatch
Cell configured: 3CC (B2 + B4 + B66)
Device category reported via LSR: Cat 11 (2CC max, B2+B4 only)
SCC addition for B66: attempted, rejected at device
Counter logged: SCC addition attempt
Counter NOT logged: rejection reason
User experience: single-carrier throughput despite 3CC cell config
Scale: 14% of CA-configured sessions in one market
affected by capability mismatch of this class
Pattern 2: SCC radio suppression under load
SCC (B66): added successfully
SCC SINR during session: -2 to 2 dB (marginal)
Scheduler behavior: SCC allocated <5% of session TTIs
MCC throughput contribution: <3% of session volume
CA utilization counter: session counted as 2CC
User throughput: functionally single-carrier
Pattern 3: scheduling asymmetry
PCC heavily loaded (82% PRB), SCC lightly loaded (24% PRB)
Expected: scheduler offloads to SCC
Observed: PCC scheduler continued allocating to PCC first
SCC utilization: 11% of eligible TTIs
Cause: cross-carrier scheduling threshold set conservatively
at rollout, never revisited as traffic grew
Fig 1 — CA throughput distribution: per-cell average vs per-user reality
The distribution was telling. 38% of CA-configured users saw less than 10% throughput gain from the additional carrier. Cell-level average masked this entirely — the cell reported healthy CA utilization and reasonable aggregate throughput. The per-user distribution showed that a third of users were receiving minimal benefit from a feature the network considered active.
What optimization looked like from this perspective
Issue identified
Cell-level fix
Per-user fix
Capability mismatch
Not visible at cell level
LSR-based CA eligibility filtering: stop configuring unsupported band combos per device category
SCC radio suppression
SCC addition threshold adjustment (applies to all users)
SINR-conditional SCC activation: only add SCC when per-session SINR above useful threshold
Scheduling asymmetry
Cross-carrier scheduling threshold revisit
Load-aware threshold: trigger at 65% PCC PRB rather than 80%, sized per actual traffic profile
CA effectiveness is a per-user problem, not a per-cell problem. Once that perspective is adopted, CA optimization becomes a data science challenge as much as a radio one — and that is where the meaningful gains are found. Cell-level counters tell you what the network tried. User-level data tells you what worked.
The framework built here — correlating network KPIs with EDR session detail and LSR device context — became a template for how other features were evaluated afterward. Every feature that looks healthy at the cell level but delivers inconsistently at the user level has the same root cause: the monitoring layer was designed around the network's perspective, not the user's. Closing that gap requires data at a different granularity and the infrastructure to combine it at scale.
LTE · 5G · Carrier Aggregation · EDR Analytics · RAN Optimization · User Experience · Performance Engineering · Telecommunications
Integrating Two Live Networks Without Breaking the Customer Network integration · 5G · observability . 8 min read Large-scale network integrations are often described as a cutover problem. In reality, they are a behavioral problem. When two live mobile networks are stitched together, the hardest issues rarely come from radios or core elements in isolation. They emerge at the edges, where assumptions from one network collide with the operational realities of another. One of the earliest lessons during a nationwide integration effort was that roaming logic and native-network logic behave very differently under load. What works well for a roaming footprint can expose weaknesses quickly when millions of devices begin behaving as if the network is home. The issues that surfaced were not configuration gaps. They were assumption gaps. What surfaced only under live conditions None of the hardest problems appeared in lab testing. They appea...
The Evolution of AI in Telecommunications: From Static Models to Autonomous Agents AI · agentic systems · RAN automation 9 min read AI in telecommunications is not one thing. It has been three distinct things, each requiring different infrastructure, different trust models, and different relationships between the system and the engineer operating it. Understanding that progression matters because where you sit in it determines what problems you can actually solve. This is not an abstract observation. The analytics platforms built across the past several years went through each stage in sequence, and each transition changed not just what the system could do but how it was used. The pattern that emerged is worth describing in some detail, because it applies broadly to how AI gets deployed in any operationally complex environment. The three paradigms and what separates them Fig 1 -- Three AI paradigms: capability and autonomy increase left ...
Predictability is a harder engineering target than performance. A network can hit throughput benchmarks and still fail customers — because the failure mode isn't magnitude, it's consistency. Variance that can't be explained by traffic load or device behavior is an engineering debt, not an acceptable range. This became especially evident as VoLTE transitioned from preparation to production reality . The shift exposed a category of problems that lab testing and pre-launch drive campaigns rarely surface. The KPI gap In live LTE networks, many issues did not appear as outright failures. Calls connected, data flowed, and KPIs stayed within limits. Yet subtle inconsistencies — brief latency spikes, uneven uplink behavior, or intermittent retransmissions — created customer-visible quality degradation once voice traffic was introduced. Root cause pattern — VoLTE bearer health vs. perceived quality Uplink scheduling inconsistency → jitter on RTP stream → audio artifac...