Tuesday, January 5, 2016

Understanding Azure Traffic Manager

Azure Traffic Manager provides globally aware DNS resolution to Azure services located in different regions using an intelligent set of processes based on specified configurations and the geographical location of the users local DNS server (LDNS). It's easiest to undestand by looking at an example. Consider the scenario shown below.

In this example, a Traffic Manager profile (which is a specific set of configurations for an address) is created for savtechwebapp1.trafficmanager.net which is configured with 3 endpoints that are separate deployments of the application hosted in Europe, US, and Asia Windows Azure regions.
The Traffic Manager profile name always has trafficmanager.net as the DNS suffix, but you set the prefix part of the name which must be unique within the Traffic Manager service. However, this prefix will not actually be seen by end users.
The organization's actual DNS server has a CName (alias) record created for the public DNS name in the organization's own DNS zone (known as a vanity domain in this case, as it's the public name to make it look nice for end users). The alias record points to the Traffic Manager profile name record (i.e., webapp.savilltech.net is an alias that resolves to savtechwebapp1.trafficmanager.net).
There are several load-balancing options for Traffic Manager profiles that control how requests are distributed. The most common is Performance, which attempts to resolve requests to the Azure service that's closest to the requesting user's local DNS server. This is shown in the example:
  1. A user in the requesting organization enters webapp.savilltech.net in a web browser.
  2. The user's local computer has a DNS configuration, which connects to their organization's DNS server, which sends the DNS request for webapp.savilltech.net. The local DNS server performs a recursive lookup to resolve the name.
  3. The authoritative DNS servers for the target DNS domain, (i.e., savilltech.net) has an alias record created for webapp, which points to the Traffic Manager profile name savtechwebapp1.trafficmanager.net.
  4. The local DNS server now resolves the Traffic Manager returned record. This resolves via the Traffic Manager service. The Traffic Manager services attempt to ascertain the geographically closest Azure service based on a network ICMP latency map between DNS servers and Azure regions. The closest endpoint that is available is returned to the client.
Note the ICMP latency; thus, whichever Azure region is considered closest is always based on the user's local DNS server and not where the users are physically located.
Therefore if you have users in Asia who are using a DNS server in the US, then those users would be directed to services in the US even if there was a service physically located in Asia.
Users that use global DNS servers will not be redirected to the closest Azure service with a high degree of confidence. Also the network latency maps are generated based on ICMP echo requests (pings) to DNS servers. This means if the local DNS server blocks ICMP, then the latency information would not be ascertainable (however, this isn't common as this ICMP approach is an industry standard).
I previously mentioned different types of load-balancing options, which are shown below:

  • Performance - Requests are directed to the Azure service that is closest to the Local DNS server and is online.
  • Round Robin - Requests are directed in a round-robin fashion between all end points that are online.
  • Failover - All requests are directed to the first endpoint, and if that endpoint is not available, the requests go the next endpoint. This is completely independent of Local DNS server physical location.
To ascertain if an endpoint is online every 30 seconds, the Traffic Manager service performs a request using HTTP or HTTPS that can use a custom port to a specific URL, for example a specific application on the web services.
If four requests go unanswered (or if it takes longer than 10 seconds for each request) then the endpoint is considered unavailable and requests will not be directed there. The requests will continue every 30 seconds, and when the endpoint responds again, it will be considered online and requests will once again be directed to the endpoint as appropriate.
Traffic Manager endpoints are cloud services, which means the actual cloud services used need an endpoint for port 80 or 443 or whatever custom port you are using for the monitoring settings to be defined. This means the endpoints could be pointing to a single VM in each cloud service or a load-balanced set in each cloud service.

No comments:

Post a Comment