Two decades on radio networks leave you with a particular skepticism - toward device behavior, vendor benchmarks, that stay green while customers notice something is wrong. Perspective is always from the network floor up. Posts cover anomaly detection, 5G SA/NSA, automation, and building observability into live national networks. The signal was always there. Getting to the insight is the harder part.
From Two Networks to One: What Large-Scale RAN Integration Really Breaks First
LTE · 5G NSA · RAN Integration · 8 min read
Large network integrations don't fail where people expect them to. Capacity is rarely the first problem. Coverage isn't either.
What breaks first is assumption alignment.
The biggest technical challenge was not spectrum reuse or site consolidation. It was reconciling how two nationwide RANs interpreted the same user behavior differently. On paper, both networks were healthy. KPIs looked reasonable in isolation. Once traffic began shifting at scale, the mismatches surfaced quickly.
Where the mismatches appeared
None of these were red alarms. They showed up as soft degradation — retries, increased setup times, edge failures. The kind of problems customers feel before dashboards turn red.
First failure classes to surface at scale
Mobility at inter-network boundaries:
Handover decisions tuned within each RAN independently
Inter-network boundary: neither RAN's assumptions held
Measurement reporting thresholds misaligned across the boundary
Result: late or failed handovers at transition zones, not visible in either
network's standalone KPI
UE behavior under mixed timer sets:
T3412 / T3324 timer values differed between networks
Devices switching between networks encountered inconsistent
idle-mode behavior: paging gaps, registration delays
Aggregate impact: elevated setup times, not attributed to integration
NSA anchoring under load:
LTE anchor selection logic differed between vendor implementations
Under load, anchor cell selection produced inconsistent NR availability
Devices on "good coverage" cells still failed NR establishment
because scheduler assumptions for NSA split bearer differed
Why site-level readiness checks missed this
Integration validation at this point was largely site-centric: is the site commissioned, are the parameters set, does the cell pass acceptance checks. Each site passed. The interactions between sites at the network boundary were never in scope.
Check performed
What it confirmed
What it missed
Site acceptance
Cell reachable, KPIs within target at low load
Behavior at boundary under sustained traffic from migrating devices
Parameter audit
Parameters match template per network
Interaction between adjacent cells from different networks with different baselines
Standalone KPI review
Each network individually within target
Cross-network mobility path failure rate, not visible in either KPI set alone
Lab / controlled test
Feature functions under modeled device behavior
Real device population behavior differing from lab model at scale
What data sources exposed the patterns
The shift was from site-level checks to flow-level analysis. Engineering data records, call traces, and user-plane telemetry correlated across both networks was the only way to see what was actually happening to devices as they moved between and across the two RANs.
Data sources and what each contributed:
Engineering Data Records (EDRs):
Per-device session history across both networks
Anchor transitions, bearer events, mobility sequences
Identified which device types and mobility patterns
produced failures at inter-network boundaries
Call traces (Uu / X2 / S1):
Timer interactions at boundary handovers
RAN-core signaling timing mismatches
VoLTE bearer re-establishment sequences post-HO
Confirmed RAN-core interaction failures not visible
in radio KPIs alone
User-plane telemetry:
Throughput and latency per session across network segments
Identified "good coverage" cells with scheduler assumption
mismatch causing data session degradation despite strong RF
Combined view:
Certain mobility failures only at specific anchor/secondary transitions
VoLTE issues traced to RAN-core timing, not RF conditions
Scheduler divergence quantified per vendor pair
Fig 1 — Integration validation: site view vs flow view
Integration is not a radio problem. It is a system-behavior problem. The only way to see it is through data that follows the user end-to-end — not data that describes the network site by site.
The question that reframed the validation approach: not "is the site ready" but "does this configuration behave predictably when millions of devices behave differently than the lab models assumed." That is a harder question. It requires data at a different layer. It also prevents the class of post-launch problems that site-level checks structurally cannot see.
That mindset carried forward into 5G Standalone readiness, nationwide KPI normalization, and analytics platforms built to track behavior rather than counters. Whenever the scope of a network change was large enough that no individual team could hold the full system context simultaneously, the answer was always the same: follow the device, correlate the layers, and trust the data over the model.
Integrating Two Live Networks Without Breaking the Customer Network integration · 5G · observability . 8 min read Large-scale network integrations are often described as a cutover problem. In reality, they are a behavioral problem. When two live mobile networks are stitched together, the hardest issues rarely come from radios or core elements in isolation. They emerge at the edges, where assumptions from one network collide with the operational realities of another. One of the earliest lessons during a nationwide integration effort was that roaming logic and native-network logic behave very differently under load. What works well for a roaming footprint can expose weaknesses quickly when millions of devices begin behaving as if the network is home. The issues that surfaced were not configuration gaps. They were assumption gaps. What surfaced only under live conditions None of the hardest problems appeared in lab testing. They appea...
The Evolution of AI in Telecommunications: From Static Models to Autonomous Agents AI · agentic systems · RAN automation 9 min read AI in telecommunications is not one thing. It has been three distinct things, each requiring different infrastructure, different trust models, and different relationships between the system and the engineer operating it. Understanding that progression matters because where you sit in it determines what problems you can actually solve. This is not an abstract observation. The analytics platforms built across the past several years went through each stage in sequence, and each transition changed not just what the system could do but how it was used. The pattern that emerged is worth describing in some detail, because it applies broadly to how AI gets deployed in any operationally complex environment. The three paradigms and what separates them Fig 1 -- Three AI paradigms: capability and autonomy increase left ...
Predictability is a harder engineering target than performance. A network can hit throughput benchmarks and still fail customers — because the failure mode isn't magnitude, it's consistency. Variance that can't be explained by traffic load or device behavior is an engineering debt, not an acceptable range. This became especially evident as VoLTE transitioned from preparation to production reality . The shift exposed a category of problems that lab testing and pre-launch drive campaigns rarely surface. The KPI gap In live LTE networks, many issues did not appear as outright failures. Calls connected, data flowed, and KPIs stayed within limits. Yet subtle inconsistencies — brief latency spikes, uneven uplink behavior, or intermittent retransmissions — created customer-visible quality degradation once voice traffic was introduced. Root cause pattern — VoLTE bearer health vs. perceived quality Uplink scheduling inconsistency → jitter on RTP stream → audio artifac...