by Guest Writer Terry Slattery, CCIE #1026, Principal Architect, NetCraftsmen
Networks contain a lot of technical debt that is difficult to identify and remove. Automation is the ideal tool to identify and eliminate it.
What is Network Technical Debt?
The metaphor of technical debt originated from Ward Cunningham, who used the term in relation to software development. See Introduction to the Technical Debt Concept. We can apply the same concept to networks.
Network technical debt is the accumulation of aging devices, old operating systems, unnecessary or partial configurations, and variances in deployments. This technical debt increases the cost of maintaining and operating the network. In some particularly bad cases, the debt decreases productivity across the entire organization, which can be invisible and incredibly expensive. The cost to the organization is like a tax or interest paid on the technical debt.
The debt is created because of reasons that may be valid at the time. But it must be repaid or it will continue to be a cost to the organization. Let’s examine the sources of network technical debt and the interest that is paid on that debt.
Zombie Devices
When upgrading old network equipment, organizations that operate in fire-fighting mode may skip the removal of old devices due to a lack of time. As soon as new equipment is operational, the network team quickly moves on to the next project. The resulting zombie devices are sometimes are not performing any function and are just consuming power, space, and air conditioning. These devices may also continue to be covered by an increasingly expensive active maintenance contract as the products age.
Abandoned projects create a similar type of technical debt. Projects can be incomplete because of changing priorities, resulting in devices that are left in the network with partial configurations. These remnants may create security holes and confusion about the network’s correctness. Operating system and application updates are the leading cause of security vulnerabilities in networks, making this technical debt exceptionally expensive, should a bad actor break into the network.
Aging Technology
A similar form of network technical debt arises when old products, with their old technology, continue to be used in networks when they should be replaced with more modern products. A great example comes from a consulting job in which we discovered an old interface card that was overloaded by four attached servers. The subsequent packet loss caused significant application slowness for the organization’s main business function. The loss of productivity was, in effect, an expensive tax or interest on the technical debt of using an old interface card.
Older products may also not contain functionality or performance that is required for modern network designs. Out-of-date operating systems are another hazard because they often contain security vulnerabilities that have been eliminated in newer versions. The time to implement work-arounds is the interest paid on this technical debt.
Configuration Complexity
Another form of technical debt is due to out-of-date and unnecessary configurations. A simple example is a configuration that references a former NTP server or DNS server. A typical scenario is that a new server is installed and the configuration was revised to prefer the new server. The old server may still exist as a form of legacy technical debt. The old configuration statements are never removed because it never becomes a priority project. Incorrect configurations become a source of confusion at a later time. Someone will eventually have to determine if the configuration difference is due to a valid exception or simply technical debt. Another example are firewall rules that no one is willing to remove because they aren’t understood. A common source is from features whose default is enabled but are not used.
Addressing Technical Debt
It is safest to assume that your network contains technical debt, even if you think otherwise. For example, do the DNS, NTP, SNMP, AAA, and logging configurations of your network equipment conform to your configuration standards? How about interface configurations? If you’re using spanning tree, is the root bridge congruent with the default router for the subnet? Are all the access control list entries necessary and documented? Are all operating systems up to date and security vulnerabilities mitigated?
The most smoothly operating networks have a policy for every part of the network configuration and use automation to verify whether the network devices adhere to all policies. This may seem like a big task, but it can be done a bit at a time.
Finding and Fixing Technical Debt
Automated tools are necessary for finding and eliminating technical debt. Manual processes aren’t suitable for networks that are larger than a few devices. The following functions provide the basic functionality that’s needed.
Inventory – Identify what’s in the network, how old it is, whether the devices contain known security vulnerabilities, and that the operating system is up to date. Manual processes or automation can be used to update operating systems.
Look for legacy equipment that should be removed from critical parts of the network, perhaps retiring it from service or moving it to a network location that can function with older equipment. Remember to check that the retired devices are removed from maintenance agreements.
Configuration audit – Create configuration standards and identify configurations that deviate from those standards. Don’t try to tackle the entire configuration at once. Build snippets for simple, low-risk elements like NTP, syslog, and SNMP. Then gain experience with automated configuration control by remediating these elements. Migrate to more complex snippets and configuration updates as you learn.
Configuration drift – Monitor configuration changes as they occur, regardless of whether they are manual or automated. This tracks changes across the entire configuration, including elements that configuration audit doesn’t yet cover.
Automate network state assessment – Validate that the network is functioning correctly. The best approach is to build a network source-of-truth database that defines how the network should operate. For example, verify that important routes exist on each router and their source? Which network neighbors should each device have? Are multi-link channels operating correctly? Are MPLS route distinguishers and route targets properly defined in each VRF? Use these checks to verify network state before and after a change to assure that the network is functioning as expected. You’ll avoid applying a change to a network that’s not operating correctly.
Use the network state assessment to periodically verify the network’s operational state to proactively alert you to problems like the failure of a redundant network path.
Network state assessment can also be used to identify inconsistent network physical connectivity. Let’s say that you have a large number of remote branches that use the same basic configuration. But due to various factors, the physical interfaces that are used to interconnect the devices at the site are not consistent. Identifying these differences and correcting them may seem like busy-work, but it is a source of configuration differences that create the opportunity for errors and complicates network automation.
Reduce Technical Debt with Gluware Automation
Gluware is a great tool to identify and remediate network technical debt. First of all, it doesn’t require software development and scripting skills. Next, it can be easily deployed in an existing network (a brownfield deployment). Both of these factors are a substantial advantage when planning a rapid rollout.
You can’t manage what you don’t know about, so inventory is the first step. In the Device Manager app, Gluware’s network discovery finds devices without doing ping sweeps, so it works in IPv6 networks. Network discovery is ideal as the first step because it is a read-only task. The inventory step gathers basic device information like model numbers, operating system version, and the configuration file. Once the inventory is complete, the technical debt identification process can begin.
The Gluware Device Manager app uses the device model, operating system version, and configuration to check the Cisco PSIRT (Product Security Incident Response Team) database to identify known security vulnerabilities. The model and OS version information should also be used to identify unplanned variations in the operating system on the same model of device.
Next, using the Config Drift and Audit app, you are able to build simple configuration audits. Create configuration policies for basic device services like DNS, NTP, SNMP, syslog, and login credentials. Use these policies to identify devices whose configurations deviate from the policies. You can begin remediation using manual processes and switch to automated configuration when you become comfortable with it. Configuration audit should be expanded to other parts of the configurations as you gain experience.
Since the majority of network outages are due to improper changes, this will be the first place to look when diagnosing a network problem. It helps to give Gluware its own login account so that the configuration drift function can easily track who is making each change.
Gluware can perform network configuration validation and network validation of operational data using regular expressions. You can use this function to verify the operational state of the network before and after changes. The validation function can be used for pre-change and post-change checks even if you’re not quite ready to use automated configuration changes.
The Gluware Config Modeling app provides the ability to perform intelligent configuration management of each network feature across your network. Using declarative provisioning, the Gluware orchestration engine will read the current configured state of each feature, compare it to the intended state, then only make the changes necessary to add what is needed, remove what is now out of policy and correct any incorrect statements. This will provide ongoing benefit to keep the network in policy and remove old, unnecessary configuration eliminating that debt along the way.
The combination of functions in Gluware provide all the functionality to begin identifying and remediating the sources of network technical debt. It doesn’t matter if you’re using manual change processes or automation to remediate configuration changes. The important factor is that you’re working to identify and eliminate network technical debt on your way to a more smoothly operating network.