Nexus 9k VPC (back to back) and FHRP setup in 2 data centers

This post describes the setup of VPCs on a data center interconnect and HSRP as the first hop redundancy protocol for the VLAN interfaces (SVIs). This configuration has been performed on a Nexus 93180YC-EX with software version 7.0(3)I7(8). The switches have the system default switchport command set, so all ports are switchports by default, but this does not matter for the setup.

Background

This configuration is for a setup where the current network “core” is a Catalyst 6500 in VSS mode with a chassis in each data center. This has some benefits in terms of a single management plane for example. The problem with this device was that the customer didn’t have enough line cards to provide redundancy to all connected devices. Two extra line cards per chassis were almost the same cost as 4 Nexus 93180YC-EX switches with some extra gbic SFPs. The reason to go for this particular model is that it allows to use both fiber and gbic SFPs in the same slot. The size of the customer did not allow for more switches, thus going with this flexible switch model combined with gbic SFPs was the reasonable choice.

Each data center has 2 Nexus switches configured in a VPC domain. They connect to the other data center location via a LACP-VPC link. To provide high availability, each Nexus has all the SVIs of all the other Nexus switches, because the VLANs span both data centers. The SVIs in turn are configured with HSRP as first hop redundancy protocol. The HSRP is isolated via an ACL so that each data center has the gateway local and traffic does not needlessly need to traverse the DCI.

The migration strategy will differ heavily per scenario and requirements. In this setup, I went with a L2 connection to the VSS to easily move SVIs in each data center and rollback when necessary to the VSS. After everything is connected to the Nexus switches and tested, the VSS was disconnected from the Nexus switches. The VSS remained operational by itself (without anything attached) for another week, just in case. Afterwards, the VSS was decommissioned and the fibers previously in use by the VSS were added to the Nexus LACP links as extra data links for the DCI (data center interconnect).

Back to Back VPC with HSRP for Nexus 9K switches

VPC configuration

To perform a VPC configuration, you need to activate the vpc feature with feature vpc. Below is a code snippet of the first Nexus switch.

vrf context keepalive

interface Ethernet1/48
  no switchport
  vrf member keepalive
  ip address 10.40.98.1/30
  no shutdown

vpc domain 1
  peer-switch
  role priority 1
  system-priority 8192
  peer-keepalive destination 10.40.98.2 source 10.40.98.1 vrf keepalive
  peer-gateway
  ip arp synchronize

The IP addresses are configured in a separate VRF that has no routing entries. The IP addresses are not used elsewhere in the network.

The other switch in the VPC pair needs to have the destination and source addresses reversed. It’s also important to remember that the secondary location needs to be configured with another VPC domain ID. Nexus VPC domains allow only two devices per domain. If not, you might experience problems with traffic forwarding and establishing LACP tunnels (non-LACP port-channels will establish, but you might experience problems with traffic forwarding).

I configured the peer-switch and peer-gateway because I want both Nexus switches to forward L2 and L3 traffic by themselves.

Configuring the DCI

To configure the DCI, in this example I use port Ethernet 1/46. After configuring the basics for the port, the rest can be applied to the port-channel. Below is a code snippet of the first Nexus switch.

conf t
interface Ethernet1/46
  description To_DC2
  switchport mode trunk
  channel-group 999
  no shutdown
interface port-channel999
  description To_DC2
  switchport mode trunk
  spanning-tree port type 
  vpc 999

Later on, I added the additional links that were used by the VSS for extra bandwidth and redundancy. Those links have a similar config to Ethernet1/46.

HSRP configuration

So why use HSRP? First of all, for each VLAN that is routed by the Nexus switches, you want to have a single gateway IP - a VIP. Each Nexus switch has its own IP addresses assigned to the SVI, so using a first hop redundancy protocol makes it a lot easier and transparent to the connected hosts. Second, HSRP is cisco proprietary and the design involves blocking HSRP traffic between the two data centers so each data center has its own gateway per VLAN. This drastically reduces the traffic load on the DCI and potential latency for a routed packet. Because the FHRP is to be blocked between the two data centers, it’s preferable to use HSRP for this. That way, VRRP is still available for other instances that need a FHRP, such as the firewall clusters. The firewalls wouldn’t be able to speak HSRP in the first place, because in this case, they’re not Cisco branded. Therefore, choosing HSRP for the Nexus platform makes the most sense.

To perform a HSRP configuration, you need to activate the vpc feature with feature hsrp. Below is a code snippet of the first Nexus switch.

interface Vlan100
  description SOME_VLAN
  no shutdown
  no ip redirects
  ip address 10.10.0.250/24
  no ipv6 redirects
  no ip ospf passive-interface
  no ip arp gratuitous hsrp duplicate
  hsrp version 2
  hsrp 100
    preempt delay reload 60
    priority 120
    ip 10.10.0.254

HSRP version 2 is here configured mainly for purposes of matching the VLAN ID to the HSRP group ID. It’s a Cisco recommendation to use a different group per VLAN or subnet.

HSRP ACL

In this design the idea is to get two HSRP primaries, one for each data center so that traffic can be routed on-site instead of traversing the DCI. To accomplish this, HSRP traffic has to be limited to the each location. To achieve this, an HSRP access list has to be set up, applied to the port-channel as well as prevent gratuitous arp on each VLAN SVI that is present in both locations.

NOTE: By default, the switches I was using did not allow for port access-groups due to insufficient TCAM memory allocation. Please refer to the troubleshooting section if you notice similar behavior - Errors related to TCAM entries.

ip access-list DENY_HSRP_IP
  10 deny udp any 224.0.0.2/32 eq 1985
  20 deny udp any 224.0.0.102/32 eq 1985
  30 permit ip any any
interface port-channel999
  ip port access-group DENY_HSRP_IP in
interface Vlan100
  no ip arp gratuitous hsrp duplicate

Spanning Tree

You might also want to consider the spanning-tree topology. Whatever you do, make sure the two VPC peers always have a similar spanning-tree priority. They are seen as one and the same switch to everything that is attached, including the Nexus switches in the other data center.

Because two VLANs are stretched to a tertiary location, I wanted to control spanning-tree a bit more precisely. I used the long method because of the high speed links and defined the DC1 switches are the primary for all VLANs. However, setting that priority is not necessary when you don’t have to worry about another location or, in my case, having the old VSS switches still attached for a short while.

NOTE: VLAN 3967 is as high as NXOS let’s you configure.

spanning-tree pathcost method long
spanning-tree vlan 1-3967 priority 4096

Furthermore, it’s advisable to use a BPDU filter on the DCI and activate storm-control to limit broadcast traffic.

interface port-channel999
  spanning-tree bpdufilter enable
  storm-control broadcast level 1.00

Troubleshooting

When configuring this, I ran into a few issues myself. I’ve provided the solutions to these issues below. The guides I used were written for Nexus 7k and were therefore not 1-on-1 applicable to the Nexus 9K platform.

The logs can show a warning relating to the TCAM (ing-ifacl) memory state for ingress PACL.

2020 Jun 10 11:38:14 Switch01 %ACLQOS-SLOT1-2-ACLQOS_FAILED: ACLQOS failure: TCAM region is not configured for feature PACL class IPv4 direction ingress. Please configure TCAM region Ingress PACL [ing-ifacl] and retry the command.
2020 Jun 10 11:38:14 Switch01 %ETHPORT-5-IF_SEQ_ERROR: Error ("TCAM region is not configured. Please configure TCAM region and retry the command") communicating with MTS_SAP_ACLMGR for opcode MTS_OPC_ETHPM_BUNDLE_MEMBER_BRINGUP (RID_PORT: Ethernet1/46)
2020 Jun 10 11:38:14 Switch01 %ETHPORT-5-IF_DOWN_PORT_CHANNEL_MEMBERS_DOWN: Interface port-channel999 is down (No operational members)
2020 Jun 10 11:38:14 Switch01 last message repeated 1 time
2020 Jun 10 11:38:14 Switch01 %ETHPORT-5-IF_DOWN_ERROR_DISABLED: Interface Ethernet1/46 is down (Error disabled. Reason:TCAM region is not configured. Please configure TCAM region and retry the command)

To view the current setup and allocate memory, you can use show system internal access-list globals. Here you can see how much is allocated and in use. You might need to free up some memory elsewhere in order to allocate it to the ingress interface ACL (PACL).

Switch01(config)# show system internal access-list globals

slot  1
=======


  Atomic Update : ENABLED
  Default ACL   : DENY
  Bank Chaining : DISABLED
  Fabric path DNL : DISABLED
  NS Buffer Profile: Burst optimized
  Min Buffer Profile: all
  EOQ Class Stats: qos-group-0
  NS MCQ3 Alias: qos-group-3
  Ing PG Share: ENABLED
  IPG in Shape: DISABLED
  Classify ns-only : DISABLED
  Ing PG Min: NOT-DISABLED
  Ing PG Headroom reservation: 100
  OQ Drops Type: both
  OQ Stats Type:   [c0]: q 0 both
  [c1]: q 1 both
  [c2]: q 2 both
  [c3]: q 3 both
  [c4]: q 4 both
  [c5]: q 5 both
  [c6]: q 6 both
  [c7]: q 7 both
  [c8]: q 8 both
  [c9]: q 9 both
  peak count type: port
  counter 0 classes: 255
  counter 1 classes: 0
  OOBST Max records: 1000
  DPP Aging Period: 5000
  DPP Max Number of Packets: 120
  AFD ETRAP Aging Period: 50
  AFD ETRAP Byte Count: 1048555
  AFD ETRAP Bandwidth Threshold: 500
  ACL Inner Header Match : DISABLED
  ACL Inner Header Match : DISABLED

  LOU Threshold Value : 5

--------------------------------------------------------------------------------------
                 INSTANCE 0 TCAM Region Information:
--------------------------------------------------------------------------------------
Ingress:
--------
                    Region          TID     Base     Size     Width
--------------------------------------------------------------------------------------
                         NAT         13        0        0         1
                Ingress PACL          1        0        0         1
                Ingress VACL          2        0        0         1
                Ingress RACL          3        0     1792         1
               Ingress RBACL          4        0        0         1
              Ingress L2 QOS          5     1792      256         1
         Ingress L3/VLAN QOS          6     2048      512         1
                 Ingress SUP          7     2560      512         1
         Ingress L2 SPAN ACL          8     3072      256         1
    Ingress L3/VLAN SPAN ACL          9     3328      256         1
               Ingress FSTAT         10        0        0         1
                        SPAN         12     3584      512         1
            Ingress REDIRECT         14        0        0         1
                 Ingress NBM         30        0        0         1
-------------------------------------------------------------------------------------
Total configured size: 4096
Remaining free size: 0
Note: Ingress SUP region includes Redirect region

Egress:
--------
                    Region          TID     Base     Size     Width
--------------------------------------------------------------------------------------
                 Egress VACL         15        0        0         1
                 Egress RACL         16        0     1792         1
                  Egress SUP         18     1792      256         1
               Egress L2 QOS         19        0        0         1
          Egress L3/VLAN QOS         20        0        0         1
-------------------------------------------------------------------------------------
Total configured size: 2048
Remaining free size: 0


--------------------------------------------------------------------------------------
                 INSTANCE 1 TCAM Region Information:
--------------------------------------------------------------------------------------
Ingress:
--------
                    Region          TID     Base     Size     Width
--------------------------------------------------------------------------------------
                         NAT         13        0        0         1
                Ingress PACL          1        0        0         1
                Ingress VACL          2        0        0         1
                Ingress RACL          3        0     1792         1
               Ingress RBACL          4        0        0         1
              Ingress L2 QOS          5     1792      256         1
         Ingress L3/VLAN QOS          6     2048      512         1
                 Ingress SUP          7     2560      512         1
         Ingress L2 SPAN ACL          8     3072      256         1
    Ingress L3/VLAN SPAN ACL          9     3328      256         1
               Ingress FSTAT         10        0        0         1
                        SPAN         12     3584      512         1
            Ingress REDIRECT         14        0        0         1
                 Ingress NBM         30        0        0         1
-------------------------------------------------------------------------------------
Total configured size: 4096
Remaining free size: 0
Note: Ingress SUP region includes Redirect region

Egress:
--------
                    Region          TID     Base     Size     Width
--------------------------------------------------------------------------------------
                 Egress VACL         15        0        0         1
                 Egress RACL         16        0     1792         1
                  Egress SUP         18     1792      256         1
               Egress L2 QOS         19        0        0         1
          Egress L3/VLAN QOS         20        0        0         1
-------------------------------------------------------------------------------------
Total configured size: 2048
Remaining free size: 0

As you can see, I had 0 remaining free space, so I removed some memory from the RACL allocation and assigned it to the ingress PACL. This works in increments of 256.

conf t
 hardware access-list tcam region ing-racl 1536
 hardware access-list tcam region ing-ifacl 256
end

Reload the device and wait for it to come back. Don’t forget to save your config beforehand though!

After reboot, you can find the readdressed memory:

Switch01# sh hardware access-list tcam region
                                    NAT ACL[nat] size =    0
                        Ingress PACL [ing-ifacl] size =  256
                                     VACL [vacl] size =    0
                         Ingress RACL [ing-racl] size = 1536
                       Ingress RBACL [ing-rbacl] size =    0
                     Ingress L2 QOS [ing-l2-qos] size =  256
           Ingress L3/VLAN QOS [ing-l3-vlan-qos] size =  512
                           Ingress SUP [ing-sup] size =  512
     Ingress L2 SPAN filter [ing-l2-span-filter] size =  256
     Ingress L3 SPAN filter [ing-l3-span-filter] size =  256
                       Ingress FSTAT [ing-fstat] size =    0
                                     span [span] size =  512
                          Egress RACL [egr-racl] size = 1792
                            Egress SUP [egr-sup] size =  256
                 Ingress Redirect [ing-redirect] size =    0
                      Egress L2 QOS [egr-l2-qos] size =    0
            Egress L3/VLAN QOS [egr-l3-vlan-qos] size =    0
                           Ingress NBM [ing-nbm] size =    0

Sources

Cisco  Nexus  VPC  FHRP  HSRP