ESC
Type to search guides, tutorials, and reference documentation.
Verified by Garnet Grid

Software-Defined WAN Architecture Design

Production-ready guide covering software-defined wan architecture design with implementation patterns, code examples, and anti-patterns for enterprise engineering teams.

Software-Defined WAN Architecture Design

TL;DR

This guide delves into the intricacies of designing a Software-Defined WAN (SD-WAN) architecture, covering core concepts, implementation patterns, decision frameworks, and common anti-patterns. It aims to help network engineers and architects make informed decisions based on their team’s scale, existing infrastructure, and operational maturity. Key takeaway: Choosing the right approach depends on your team’s scale, existing infrastructure, and operational maturity.


Why This Matters

Implementing an SD-WAN architecture can significantly enhance network performance and reliability. Key business impacts include:

  • Reduced incident frequency by 40-60%: SD-WAN can automatically reroute traffic to avoid failed links, reducing downtime and improving user experience.
  • Improved application performance by 20-30%: SD-WAN can prioritize critical applications and ensure they receive the necessary bandwidth and quality of service.
  • Cost savings of up to 25%: By leveraging lower-cost broadband connections, SD-WAN can reduce reliance on expensive leased lines.
  • Enhanced security with integrated firewalls: SD-WAN can offer built-in security features, reducing the need for additional hardware and simplifying network management.

Core Concepts

Concept 1: Network Virtualization

Network virtualization involves abstracting the network functions and resources from the underlying hardware, allowing them to be managed and controlled through software. In an SD-WAN architecture, this means that the network can be dynamically reconfigured based on application requirements and network conditions.

# Example of network virtualization using NSX-T (from VMware)
nsx-t create virtual-network gw10001
nsx-t add-subnet gw10001 subnet1 10.0.0.0/16
nsx-t add-router gw10001 router1
nsx-t attach-router gw10001 router1 subnet1

Concept 2: Policy-Based Routing

Policy-based routing allows traffic to be directed based on specific criteria such as application type, user, or time of day. This can be particularly useful in a heterogeneous network environment with multiple WAN connections.

# Example of policy-based routing using OpenDaylight (ODL)
policy = {
    "policy-name": "critical-app-routing",
    "type": "routing-policy",
    "statements": [
        {
            "if": "app-type == 'critical'",
            "then": "route via 'ethernet1/1'"
        },
        {
            "if": "app-type == 'non-critical'",
            "then": "route via 'ethernet1/2'"
        }
    ]
}
odl.post("/policy", data=policy)

Concept 3: Automation and Orchestration

Automation and orchestration are essential for managing the dynamic nature of an SD-WAN environment. This includes automating the failover process, optimizing network performance, and ensuring compliance with security policies.

# Example of automation using Ansible
- name: Deploy SD-WAN policy
  ansible.builtin.uri:
    url: "http://sdwan-manager/api/v1/policy"
    method: POST
    data: "{{ policy }}"
    headers:
      Authorization: "Bearer {{ token }}"
    status_code: 201

Implementation Patterns

Pattern 1: Centralized Management

Centralized management allows for a single point of control and visibility over the entire network. This pattern is suitable for large-scale deployments with multiple locations.

// Example of centralized management using Juniper Contrail
contrail.create-policy("global", "route-target", "65000:1000")
contrail.create-policy("global", "vpn", "vpn-1000")
contrail.create-policy("global", "virtual-network", "vn-1000")
contrail.attach-policy("global", "route-target", "65000:1000", "virtual-network", "vn-1000")

Pattern 2: Edge Devices with Local Policy

Edge devices can operate independently with local policy management, providing greater flexibility and resilience in case of central management failures.

// Example of edge device local policy using Cisco SD-WAN
sdwan-edge.create-policy("local-policy", "route-map", "route-map-100")
sdwan-edge.set-policy("local-policy", "route-map", "route-map-100", "match", "ip", "prefix-list", "prefix-100")
sdwan-edge.set-policy("local-policy", "route-map", "route-map-100", "set", "metric", "500")

Decision Framework

FactorOption AOption BOption C
ScalabilityCentralized management can scale to thousands of locations.Edge devices with local policy can scale to hundreds of locations.Hybrid approach combining both centralized and edge management.
ComplexityCentralized management can become complex with large deployments.Edge devices with local policy are simpler but may lack visibility.Hybrid approach offers a balance between complexity and visibility.
ResilienceCentralized management can fail, leading to network downtime.Edge devices with local policy can continue to function independently.Hybrid approach can mitigate single points of failure.
CostCentralized management may require additional hardware and licenses.Edge devices with local policy may require more local hardware.Hybrid approach balances cost with performance and resilience.

Anti-Patterns

Anti-PatternWhat HappensFix
Over-reliance on Central ManagementNetwork fails when central management fails.Implement edge device local policy to provide redundancy.
Ignoring Security PoliciesNetwork exposes critical data due to unsecured routes.Implement integrated firewalls and enforce security policies.
Inadequate MonitoringNetwork issues go unnoticed, leading to downtime.Set up comprehensive monitoring and alerting systems.
Failure to Optimize TrafficNetwork performance degrades due to suboptimal routing.Use policy-based routing to optimize traffic paths.

Summary

Choosing the right SD-WAN architecture depends on your team’s scale, existing infrastructure, and operational maturity. While centralized management offers scalability and visibility, edge devices with local policy provide resilience and simplicity. A hybrid approach can balance these factors effectively. The right strategy will ensure network performance, reliability, and cost-efficiency tailored to your specific needs.

Summary (Continued)

Choosing the right SD-WAN architecture depends on your team’s scale, existing infrastructure, and operational maturity. While centralized management offers scalability and visibility, edge devices with local policy provide resilience and simplicity. A hybrid approach can balance these factors effectively. The right strategy will ensure network performance, reliability, and cost-efficiency tailored to your specific needs.

Detailed Implementation Patterns

Pattern 1: Centralized Management

Centralized management is ideal for large-scale deployments with multiple locations. It offers a single point of control and visibility over the entire network, making it easier to manage and troubleshoot.

Example Configuration

# Example of centralized management using Juniper Contrail
contrail.create-policy("global", "route-target", "65000:1000")
contrail.create-policy("global", "vpn", "vpn-1000")
contrail.create-policy("global", "virtual-network", "vn-1000")
contrail.attach-policy("global", "route-target", "65000:1000", "virtual-network", "vn-1000")

contrail.create-policy("global", "route-map", "route-map-100")
contrail.set-policy("global", "route-map", "route-map-100", "match", "ip", "prefix-list", "prefix-100")
contrail.set-policy("global", "route-map", "route-map-100", "set", "metric", "500")

contrail.create-policy("global", "application", "app-100")
contrail.set-policy("global", "application", "app-100", "match", "application", "type", "critical")
contrail.set-policy("global", "application", "app-100", "then", "route-map", "route-map-100")

Advantages

  • Scalability: Centralized management can scale to thousands of locations.
  • Visibility: Provides a single pane of glass for network management.
  • Control: Offers granular control over network policies and configurations.

Disadvantages

  • Complexity: Can become complex with large deployments.
  • Single Point of Failure: Network downtime can occur if central management fails.
  • Cost: May require additional hardware and licenses.

Best Practices

  • Redundancy: Implement redundancy in central management to ensure high availability.
  • Monitoring: Set up comprehensive monitoring and alerting systems to detect and respond to issues.
  • Documentation: Maintain detailed documentation to ensure consistency and ease of maintenance.

Pattern 2: Edge Devices with Local Policy

Edge devices with local policy management provide greater flexibility and resilience in case of central management failures. This pattern is suitable for environments with fewer locations.

Example Configuration

# Example of edge device local policy using Cisco SD-WAN
sdwan-edge.create-policy("local-policy", "route-map", "route-map-100")
sdwan-edge.set-policy("local-policy", "route-map", "route-map-100", "match", "ip", "prefix-list", "prefix-100")
sdwan-edge.set-policy("local-policy", "route-map", "route-map-100", "set", "metric", "500")

sdwan-edge.create-policy("local-policy", "application", "app-100")
sdwan-edge.set-policy("local-policy", "application", "app-100", "match", "application", "type", "critical")
sdwan-edge.set-policy("local-policy", "application", "app-100", "then", "route-map", "route-map-100")

sdwan-edge.create-policy("local-policy", "security", "firewall-rule-100")
sdwan-edge.set-policy("local-policy", "security", "firewall-rule-100", "match", "ip", "address", "192.168.1.0/24")
sdwan-edge.set-policy("local-policy", "security", "firewall-rule-100", "then", "permit")

Advantages

  • Simplicity: Edge devices are simpler to manage and configure.
  • Resilience: Network can continue to function independently if central management fails.
  • Cost: May require less hardware and licenses compared to centralized management.

Disadvantages

  • Lack of Visibility: Centralized visibility is limited, making troubleshooting more challenging.
  • Complexity: Policy management can become complex with multiple edge devices.
  • Cost: Additional local hardware may be required.

Best Practices

  • Redundancy: Ensure edge devices have redundancy to prevent single points of failure.
  • Monitoring: Set up comprehensive monitoring and alerting systems to detect and respond to issues.
  • Documentation: Maintain detailed documentation to ensure consistency and ease of maintenance.

Pattern 3: Hybrid Approach

A hybrid approach combines the benefits of centralized management and edge device local policy. This pattern is suitable for environments with a mix of locations and requirements.

Example Configuration

# Example of hybrid approach using VMware NSX-T
nsx-t.create-policy("global", "route-target", "65000:1000")
nsx-t.create-policy("global", "vpn", "vpn-1000")
nsx-t.create-policy("global", "virtual-network", "vn-1000")
nsx-t.attach-policy("global", "route-target", "65000:1000", "virtual-network", "vn-1000")

nsx-t.create-policy("local", "route-map", "route-map-100")
nsx-t.set-policy("local", "route-map", "route-map-100", "match", "ip", "prefix-list", "prefix-100")
nsx-t.set-policy("local", "route-map", "route-map-100", "set", "metric", "500")

nsx-t.create-policy("local", "application", "app-100")
nsx-t.set-policy("local", "application", "app-100", "match", "application", "type", "critical")
nsx-t.set-policy("local", "application", "app-100", "then", "route-map", "route-map-100")

nsx-t.create-policy("local", "security", "firewall-rule-100")
nsx-t.set-policy("local", "security", "firewall-rule-100", "match", "ip", "address", "192.168.1.0/24")
nsx-t.set-policy("local", "security", "firewall-rule-100", "then", "permit")

Advantages

  • Balanced Visibility and Control: Offers a balance between centralized visibility and local control.
  • High Availability: Mitigates single points of failure by leveraging both centralized and edge management.
  • Cost-Efficiency: Balances cost with performance and resilience.

Disadvantages

  • Complexity: Requires more planning and management to ensure proper integration.
  • Cost: May require additional hardware and licenses.
  • Maintenance: Requires more effort to maintain and troubleshoot.

Best Practices

  • Integration: Ensure seamless integration between centralized and edge management.
  • Monitoring: Set up comprehensive monitoring and alerting systems to detect and respond to issues.
  • Documentation: Maintain detailed documentation to ensure consistency and ease of maintenance.

Decision Framework (Continued)

FactorOption A (Centralized Management)Option B (Edge Devices with Local Policy)Option C (Hybrid Approach)
ScalabilityCentralized management can scale to thousands of locations.Edge devices with local policy can scale to hundreds of locations.Hybrid approach combining both centralized and edge management.
ComplexityCentralized management can become complex with large deployments.Edge devices with local policy are simpler but may lack visibility.Hybrid approach offers a balance between complexity and visibility.
ResilienceCentralized management can fail, leading to network downtime.Edge devices with local policy can continue to function independently.Hybrid approach can mitigate single points of failure.
CostCentralized management may require additional hardware and licenses.Edge devices with local policy may require more local hardware.Hybrid approach balances cost with performance and resilience.
VisibilityProvides a single pane of glass for network management.Centralized visibility is limited.Balanced visibility and control.
ControlOffers granular control over network policies and configurations.Policy management can become complex.Balanced control and visibility.
RedundancyRedundancy can be implemented to ensure high availability.Redundancy can be implemented to prevent single points of failure.Redundancy can be implemented to ensure high availability.
MonitoringComprehensive monitoring and alerting systems are essential.Comprehensive monitoring and alerting systems are essential.Comprehensive monitoring and alerting systems are essential.
DocumentationDetailed documentation is necessary for consistency and maintenance.Detailed documentation is necessary for consistency and maintenance.Detailed documentation is necessary for consistency and maintenance.

Anti-Patterns (Continued)

Anti-PatternWhat HappensFix
Over-reliance on Central ManagementNetwork fails when central management fails.Implement edge device local policy to provide redundancy.
Ignoring Security PoliciesNetwork exposes critical data due to unsecured routes.Implement integrated firewalls and enforce security policies.
Inadequate MonitoringNetwork issues go unnoticed, leading to downtime.Set up comprehensive monitoring and alerting systems.
Failure to Optimize TrafficNetwork performance degrades due to suboptimal routing.Use policy-based routing to optimize traffic paths.
Insufficient RedundancyNetwork downtime occurs due to single points of failure.Implement redundancy in both centralized and edge management.
Lack of Detailed DocumentationNetwork management and troubleshooting become inconsistent and inefficient.Maintain detailed documentation for consistency and ease of maintenance.
Ignoring Network PerformanceNetwork performance degrades due to suboptimal configurations.Use performance monitoring tools to optimize network configurations.

Summary (Final)

Choosing the right SD-WAN architecture depends on your team’s scale, existing infrastructure, and operational maturity. While centralized management offers scalability and visibility, edge devices with local policy provide resilience and simplicity. A hybrid approach can balance these factors effectively. The right strategy will ensure network performance, reliability, and cost-efficiency tailored to your specific needs. By considering the trade-offs and implementing best practices, you can design a robust and efficient SD-WAN architecture that meets your organization’s requirements.

Jakub Dimitri Rezayev
Jakub Dimitri Rezayev
Founder & Chief Architect • Garnet Grid Consulting

Jakub holds an M.S. in Customer Intelligence & Analytics and a B.S. in Finance & Computer Science from Pace University. With deep expertise spanning D365 F&O, Azure, Power BI, and AI/ML systems, he architects enterprise solutions that bridge legacy systems and modern technology — and has led multi-million dollar ERP implementations for Fortune 500 supply chains.

View Full Profile →