Intermittent connectivity issues
Updates
Following a short period of network instability on 16th September, engineers have monitored the stability of network throughout today. The observed issue appeared to relate to a rare circumstance where NetFlow (a protocol for gathering information about network traffic) causes 100% CPU Load. All other metrics were normal, which strongly suggests this was not a denial of service attack, however this has not been proven conclusively.
As part of the diagnosis, engineers disabled NetFlow on the BGP Cluster and the platform has since been stable. Further routine software updates have also been applied.
The case will now be closed, however the switch side monitoring will remain in place, should the issue recur.
BGP traffic has been stable since the last update and engineers are continuing to monitor. As no root cause of the Excess CPU load has been identified, services are still classified as at risk, although steps have been taken to mitigate long term impact, should the issue recur tomorrow.
Additional focussed monitoring has been introduced to gather in depth data on specific interfaces, to attempt to log as much data as possible should the issue recur, to help further identify the root cause.
Since the reboot the effected device has remained stable and engineers are continuing to investigate the root cause of the CPU load.
Engineers have undertaken an emergency reboot of a core routing device and currently waiting for connectivity to settle. Further updates to follow.
We apologise for interruption caused today.
We have seen further drops in connectivity. Engineers have identified high CPU load on a core routing device as the reason for packet loss and they are still investigating at present to find the cause.
We are currently investigating an intermittent connectivity drop on our transit which is affecting Broadband4 customers. More information to follow.
← Back