Two decades on radio networks leave you with a particular skepticism - toward device behavior, vendor benchmarks, that stay green while customers notice something is wrong. Perspective is always from the network floor up. Posts cover anomaly detection, 5G SA/NSA, automation, and building observability into live national networks. The signal was always there. Getting to the insight is the harder part.
When Network Change Became a Data Problem (Not a Process One)
RAN · Change Management · Cloud Analytics · 7 min read
Network change management used to fail quietly. Not because engineers didn't know what they were doing — but because the system around them wasn't designed for scale. Email threads, spreadsheets, and manual verification worked when change velocity was low. Once cloud-native cores, multi-vendor RANs, and frequent parameter tuning became normal, that model collapsed under its own weight.
The problem wasn't the number of changes. It was the lack of context. Engineers knew what changed. Not always why, what else it touched, or whether the outcome matched the intent.
What the old model actually broke
The failure mode was not dramatic. Changes executed correctly. Parameters landed where they were supposed to. The breakdown was in what came after: no queryable record of what state the network was in before, no automatic comparison of what changed vs what was expected to change, and no structured way to distinguish a planned delta from an unintended side effect.
Post-change investigation, pre-data-platform:
Symptom: HO failure rate increase in region, 3 days post-change
Question: which of the 847 parameter changes in the prior week
is responsible?
Available information:
Change ticket: "Regional parameter alignment — approved"
Specific cells changed: not listed individually
Pre-change values: not captured systematically
Execution log: confirms changes pushed, no per-cell detail
Post-change KPI: degraded, cause unknown
Time to attribution: 4-6 days of manual reconstruction
Outcome: parameter partially reverted based on expert judgment,
not evidence
Same failure class: recurred in different region 6 weeks later
The issue was not process compliance. The approvals were followed. The issue was that the change left no structured evidence trail — and without one, every investigation started from scratch.
Treating change as a data integrity problem
The shift came from recognizing that every change is a state transition. Pre-change state, post-change state, and the delta between them are data. If that data is captured automatically and made queryable, validation becomes objective. If it isn't, verification becomes subjective — dependent on memory, documentation quality, and whoever is available to reconstruct it.
Fig 1 — Change as state transition: before and after as queryable data
What the data platform enabled
Integrating a centralized change platform with a cloud data layer made the network's before and after states queryable. Pre-change parameters, post-change outcomes, and execution context lived in one place. Validation shifted from "did someone check this?" to "what does the data say?"
Before: manual model
After: data-anchored model
verification method
Engineer confirms change executed per ticket
verification method
Automated diff: expected vs actual parameter state
regression detection
KPI review 24-48 hrs post-change, manual
regression detection
Anomaly flag triggered when post-change KPI diverges from baseline
Full parameter history per cell, timestamped, queryable
What automation unlocked downstream
Once baselines and deltas were structured data, downstream logic became possible. The same infrastructure that enabled objective post-change validation also made early pattern detection feasible.
Downstream capabilities enabled by structured change data:
Anomaly detection:
Post-change KPI delta compared against distribution of
historical changes of same type
Flags changes whose outcome falls outside expected range
Before they escalate to customer-visible degradation
Regression detection:
Change combinations that historically co-occurred with
performance regressions identified in the record
New change combinations matching those patterns flagged
for additional pre-execution review
Early ML-assisted risk scoring:
Change metadata (type, scope, region, load conditions)
correlated with historical outcome distributions
High-risk combinations identified before execution
not after
At scale, disciplined change management is not bureaucracy. It is what allows velocity without fragility. Change stopped being an operational burden and became an observable system behavior — just like traffic, mobility, or latency. That is when change management finally scaled with the network.
The biggest gain was not automation for its own sake. It was predictability. Engineers could focus on engineering decisions instead of reconciliation tasks. Audits became traceable. Rollbacks became evidence-based. That infrastructure carried forward directly into how analytics platforms, 5G readiness validation, and network integration programs were instrumented in the years that followed.
Integrating Two Live Networks Without Breaking the Customer Network integration · 5G · observability . 8 min read Large-scale network integrations are often described as a cutover problem. In reality, they are a behavioral problem. When two live mobile networks are stitched together, the hardest issues rarely come from radios or core elements in isolation. They emerge at the edges, where assumptions from one network collide with the operational realities of another. One of the earliest lessons during a nationwide integration effort was that roaming logic and native-network logic behave very differently under load. What works well for a roaming footprint can expose weaknesses quickly when millions of devices begin behaving as if the network is home. The issues that surfaced were not configuration gaps. They were assumption gaps. What surfaced only under live conditions None of the hardest problems appeared in lab testing. They appea...
The Evolution of AI in Telecommunications: From Static Models to Autonomous Agents AI · agentic systems · RAN automation 9 min read AI in telecommunications is not one thing. It has been three distinct things, each requiring different infrastructure, different trust models, and different relationships between the system and the engineer operating it. Understanding that progression matters because where you sit in it determines what problems you can actually solve. This is not an abstract observation. The analytics platforms built across the past several years went through each stage in sequence, and each transition changed not just what the system could do but how it was used. The pattern that emerged is worth describing in some detail, because it applies broadly to how AI gets deployed in any operationally complex environment. The three paradigms and what separates them Fig 1 -- Three AI paradigms: capability and autonomy increase left ...
Predictability is a harder engineering target than performance. A network can hit throughput benchmarks and still fail customers — because the failure mode isn't magnitude, it's consistency. Variance that can't be explained by traffic load or device behavior is an engineering debt, not an acceptable range. This became especially evident as VoLTE transitioned from preparation to production reality . The shift exposed a category of problems that lab testing and pre-launch drive campaigns rarely surface. The KPI gap In live LTE networks, many issues did not appear as outright failures. Calls connected, data flowed, and KPIs stayed within limits. Yet subtle inconsistencies — brief latency spikes, uneven uplink behavior, or intermittent retransmissions — created customer-visible quality degradation once voice traffic was introduced. Root cause pattern — VoLTE bearer health vs. perceived quality Uplink scheduling inconsistency → jitter on RTP stream → audio artifac...