Blog

Tips for Rapidly Adopting Network Automation

By May 12, 2020October 25th, 2020No Comments
Tips for Rapidly Adopting Network Automation - IT RedShirt iStock 11May20

Tips for Rapidly Adopting Network Automation - IT RedShirt iStock 11May20

by Guest Writer Terry Slattery, CCIE #1026,  Principal Architect, NetCraftsmen

Adopting network automation doesn’t have to be a significant project with major workforce retraining and high risk to the organization. Here are tips that you can use to easily make the transition using Gluware.

Network Automation Plays a Critical Role

Is now the right time to start embracing automation? Daily time is spent in the grind and there is little time for adopting automation, or so it seems. However, automation plays a critical role in allowing network operations to move quickly while minimizing risk to the network.

The past few years have seen the need for automation become more strategic and more critical to keeping the business running. While you may think that now is not a good time to adopt automation, I make the argument that any time is the right time to start.

What are the Hurdles?

There are two big hurdles to rapid adoption of network automation. The first is education – learning the skills to perform automation. Gluware allows network engineers to adopt network automation without extensive re-education. To move quickly you want to avoid the years of experience that are required to build good software development skills. Gluware makes this happen by using the configuration syntax that network engineers already know, allowing them to get directly involved in network automation.

Risk management is the second hurdle. Network engineers who have had to deal with the repercussions of a change-gone-wrong are frequently reluctant to apply automation for fear that a bad change will be quickly propagated across the entire network. While that’s true, prudent selection of the tasks to automate and thorough testing processes can minimize risk and achieve its benefits. The key is to start with no-risk tasks, learn how to use automation tools, then use that knowledge as the basis for more advanced tasks.

Another component to reducing risk is learning how and when to use automation tools. It is equally important to understand when to not use a given tool as it is to understand when it can and should be used. This knowledge comes with experience. Along the way, you’ll identify manual processes that need to be replaced with automation-specific processes to reduce risk.

Adopting Network Automation in Steps

There is a progression of steps that facilitate adoption of automation while mitigating risk:

1. Use a Test Network

The first step to avoiding risk is to develop all your automations using a test lab. Ideally, you’ll have at least one physical version of each device that will be automated. Expand the test network with virtual device images.

When you’re developing and testing each of the steps below, start with one lab device to verify that your change functions as desired. When you can verify that the configuration change functions correctly, expand the scope to a few lab devices, then to the production network, also in stages.

The change verification process should also be automated (see the Operational Validation step below), regardless of how many devices it will touch. This is a form of test-driven network automation. In fact, there are benefits in creating the operational validation tests before you create the configuration update. This may seem like unnecessary busy-work at first, but it significantly reduces the risk of automation.

2. Basic Configuration Verification

Start the automation journey with a configuration audit of simple network device configurations like DNS, NTP, SNMP, and Syslog. While these may seem to provide little value, they allow you to learn how Gluware works. Use configuration audits to create deployment standards and their associated configuration standards (policies). Processes can evolve to correct deficiencies in the deployed configurations. I guarantee that you’ll find a lot of historical cruft in the above policies. Larger networks will have groups of devices that use the same policy but different policies between groups, like regional server addresses.

Another simple no-risk check is to enable Gluware’s config drift tracking to identify day-to-day configuration changes. Since most network outages are caused by incorrect configuration changes, enabling config drift tracking can provide rapid insight into problem identification. A benefit of config drift is that you’ll have a backup of each device’s configuration.

As you go through this process, you’ll start to identify other network configuration items that you can check. Start to think about how you would approach the process of automating the correction of any exceptions.

You could begin remediating the exceptions that the audit finds, but it is better to save that for the first steps of doing configuration changes. Note that the only thing you’re doing at this point is reporting deviations to your configuration audit policies.

3. Operational Validation

The next read-only/no-risk step is to gather and verify network operational states, such as:

  • NTP has a peer relationship with its upstream clock source
  • Routing peers are correct, and the right routes are being received
  • Redundant pairs of interfaces and devices are operational (VRRP/HSRP, vPC, etherchannel)
  • Root bridge selection is correct in spanning tree bridged domains
  • MPLS import/export lists
  • Access list entry validation

These checks are done by having Gluware issue show commands and validating network state from the output (a v3.6 feature). Since exceptions to the proper operational state may compromise your network, you should plan to use your existing procedures to correct problems. Note that you’re still not using automation to change the network.

The operational states you’re defining are building a Network Source of Truth, which reflects the network’s operational baseline. Start with the low-hanging fruit. If there’s a commonly occurring problem, automate its check and run it periodically. Don’t forget to include PSIRT security vulnerability checking.

4. Automate Troubleshooting Data Collection

Collecting data to aid troubleshooting is one of the functions that extends the time it takes to understand and identify the cause of a problem. Automating the data collection process can reduce the time, especially in large networks. Create a process that runs a list of show commands on multiple devices and saves the results. It’s then easy to review the output and search for key text.

As with the above steps, you’re only issuing read-only commands—no network changes. Add troubleshooting validation to the source of truth as you diagnose common problems. This step should quickly begin to pay back in time savings. Gluware State Assessment can help with this.

5. Simple, Low-Risk Changes

You should now have a good understanding of Gluware and can start making simple network changes. To minimize risk, select a simple, low-risk task like updating the NTP configs that don’t match your defined policy.

When the update has been successfully applied on your test network, expand it to a few less important devices on the operational network. If something goes wrong anywhere in this process, go back and determine why it didn’t work. You have to understand failures so you can avoid them in the future.

6. More Complex Changes

You should now have enough experience to proceed to more complex configuration changes. As with the above steps, testing is required to minimize risk. Lab testing, a few devices next, then expand the number of devices per run.

Look at automating time-consuming tasks like the following:

  • Tasks that take a long time, are implemented on a few devices, are complex to get right, or where automation can simplify the repetitive sub-elements. Updates to complex ACLs are a good example, as is creating configurations for a new instance of network building block.
  • Tasks that are simple but are performed regularly. VLAN creation and subnet allocation are representative examples.
  • Tasks that are repeated on many devices, such as updating local passwords.

The processes you’ve developed in the prior steps will have helped you build an automation system that minimizes the risk of a major network outage. These steps can’t be short-cut. If you do, you’ll be missing the education and insights that comes from the progression. In this sense, network automation is a journey.

Operating system updates may be automated as well. I recommend significant testing of the deployment and operation to make sure that updates don’t cause other problems. Sometimes it is better to use a workaround than to do an OS upgrade. However, if required, Gluware can automate the process using OS Manager.

Gluware handles a lot of the complexities of configuration updates that other tools require that you explicitly handle with complex syntax or by programming.

Other Notes

Standardize your network design and configurations so that automation processes can be applied to more of the network. Use building-block designs that are repeated across the network to reduce the number of configuration variations. This also allows you to build a lab at a reasonable cost that creates a more realistic testing environment. If your network consists of many so-called snowflake designs, then time spent in standardizing them will result in enormous time savings.

Don’t overlook security configuration elements, like access lists to limit SNMP access or proper login authentication configuration.

Summary

Your first impression may be that you don’t have time to begin the automation journey. But it doesn’t take long before the benefits begin to accrue. It’s not just about saving time. It’s also about making the network more stable and reducing human error. Gluware is one of the advanced tools that can help network engineers achieve automation without extensive education requirements. That puts it on my list of network automation tools.