How to Improve Network Resiliency Using NS1 Connect Filter Chains

adminFebruary 26, 2024

Network service interruption occurs. It’s not a question of if, but when. Cloud platforms and content delivery networks (CDNs) with 100% uptime SLAs are not immune. Like everything else, disruptions happen.

The question is, what do you do if one of your network services goes down? If we lack redundant services, will we go offline? Or do you want to fail over to another provider while maintaining a smooth user experience? How does the failover process work in the backend? Will it be automated or manual?

Most midsize and large businesses have redundant systems in place to help them survive an outage. There may or may not be an automated mechanism to redirect traffic to redundant systems when a core service is interrupted.

IBM NS1 Connect Filter Chain™ technology uses the power of DNS to automatically reroute traffic between service providers when there is a network service outage. By applying a few basic rules, NS1 Connect will monitor network health and switch endpoints as needed. Set rules and priorities in advance. Everything after that happens automatically.

In the NS1 platform, filter chain configuration is applied to individual records within a DNS zone. The filter chain determines how NS1 processes queries for each record, and specifically what answer to return. Each filter chain uses its own logic to process queries. Depending on your operational or business requirements, you can create combinations of filters to achieve specific results.

Of course, not everyone wants to forward failover traffic the same way. So we’ve put together a quick guide on how to build active-active, active-passive, and passive failover systems using filter chains.

Active-Active Failover

In this use case, NS1 or a third-party data source monitors the health of individual endpoints in the application delivery infrastructure. If the data shows an outage in one system, NS1 automatically routes traffic to a secondary system of your choice. These secondary systems are called “active-active” because they are somehow running as part of a load balancing system. If an outage occurs on one system, NS1 rebalances the load to the already active systems.

The first filter in the chain is “Up”. This filter tells the system whether the service provider’s endpoint is operational.

The second filter in the chain is “Shuffle” or “Weighted Shuffle”. If the “Up” filter returns a “false” response for an endpoint, it automatically distributes the traffic to another provider. Shuffle distributes traffic randomly, while Weighted Shuffle distributes traffic according to weights provided by the user.

Finally, specify how many answers DNS will provide to inbound queries. RFC 1912 requires that only one answer be returned for every CNAME query. The “Pick First N” filter allows you to specify the number of responses returned to the requesting client, but should default to 1.

Active-Passive Failover

As in the active-active use case, NS1 or a third-party data source monitors the health of the application delivery infrastructure and routes traffic to the secondary system if the primary system goes down. The difference here is that the secondary system may not be handling traffic yet. Auxiliary systems are optional for redundancy and only operate when needed.

As in the previous example, the first filter in this chain is “Up”. Based on monitoring data, NS1 determines which underlying services are online.

The second filter in this chain is “Priority”. This filter creates logic to prioritize active systems over passive or backup systems. If a higher priority answer is available, it will be sorted first in the list of possible answers. Otherwise, NS1 continues to follow the priority list until it finds an available resource.

Finally, “Select First N” indicates how many answers to forward. In this case, there is only one answer I want to convey.

Manual failover

In some cases, you may want to make failover decisions only after you know more about the situation. In these cases, filter chains are the implementation mechanism used after deciding where to direct traffic. Instead of pointing the data feed to NS1, use active-passive logic to manually turn on filters when needed.

The first filter in this chain is “Up”. The difference here is that you manually define which services are up and down (instead of the data feed doing this for you).

The second filter in this chain is “Priority”, which starts with the active system before the passive or backup system. If a higher priority answer is available, it will be sorted first in the list of possible answers. Otherwise, NS1 continues to follow the priority list until it finds an available resource.

Finally, “Select First N” indicates how many answers to forward. In this case, there is only one answer I want to convey.

Multi-cloud or multi-CDN availability

In the above “active-active” scenario, the filter chain uses simple up/down metrics to steer traffic. But sometimes service availability is more nuanced. For example, local outages in service may result in reduced service quality. Although the Service as a whole is technically “operational,” it may not be operating at optimal capacity. This filter chain allows you to use NS1 Connect’s advanced analytics tools as a data source to add some nuance to what is considered “above”.

The first filter in this chain is “Pulsar Availability Threshold”. This filter allows you to set a percentage value that determines service usage based on availability metrics.

The second filter in the chain is “Weighted Shuffle”, which distributes traffic to other providers that meet the first filter’s definition of “available”. Traffic is distributed according to the weights you provide.

The third filter is “Pulsar Performance Sort”, which takes the weight distribution of the previous filter and directs traffic to the fastest available services and removes poor performing services based on a threshold you define.

Finally, “Select First N” indicates how many answers to forward. In this case, there is only one answer I want to convey.

To learn more about how filter chains can be used to improve performance, resiliency, and reduce costs, dive below.

Avoid outages with resilient, redundant network services.

Was this article helpful?

yesno

Senior Director, Product Marketing

adminFebruary 26, 2024