So here is a quick post on how to create a highly available with automatic failover Dynamic VPN in Azure. It can be either S2S (Site to Site) VPN between Azure and On-premises or a VNET to VNET.
Yeah it’s a mouthful :P.
Before we begin if you don’t know how to create a VPN in the Azure Resource manager interface check out the Microsoft post at: https://azure.microsoft.com/en-gb/documentation/articles/vpn-gateway-create-site-to-site-rm-powershell/. So if you ever wanted to have an automatic failover which protects against a vpn tunnel failure in Azure until now it was not possible, but Microsoft has recently introduced a new option when creating a VPN tunnel via powershell which gives us this ability. The property “-RoutingWeight” in the “New-AzureRmVirtualNetworkGatewayConnection” command gives us the ability to set which is the default route for an AddressPrefix. The lower the value of RoutingWeight the higher the priority (ie. 10 is greater than 20). The way this property works is if both tunnels are up the lower value tunnel will be used for that AddressPrefix but if it goes down the secondary tunnel will automatically be switch to as show below. Its important to note that in ARM mode we can create multiple local sites with the same AddressPrefix.
So that was pretty easy but suppose we had multiple prefixes how would it work? Well think of RoutingWeight as firewall rules the lowest one is evaluated first and if the AddressPrefix matches the tunnel is used and processing stop if not continues to the next. As show below on the left we can see how this applies to VNET-VNET VPN tunnels between Azure networks. On the right we see what happens to the traffic when Gateway2 goes down, obviously the traffic for Gateway2 will be dropped as there is no alternative but the connection between Gateway1 and Gateway4 is re-routed automatically.
Unfortunately this type of automatic failover is not full proof as it’s not aware of the links down the line. Let me try and explain this with the diagram below and the following description. Let’s say Gateway2 is up but for some reason only the tunnel between it and Gateway4 is down. Now Gateway4 will follow it’s rules and it will failover traffic for Gateway1 to the alternative but it will termite traffic for Gateway2, with me so far? However while traffic on it’s way from Gateway4 to Gatewy1 will go the correct way, any traffic coming back from Gateway1 to Gateway4 will be send to Gateway2 as it’s not aware of the tunnel problem down the line, which in turn means that packets will be dropped at Gateway2 and communication between Gateway1 and Gateway4 will fail.