How to Build Availability Report for AWS Route 53

Overview

AWS Route 53 provides tools to automate DNS configuration to reliably connect external user requests to infrastructure running in AWS. In addition to domain registration, AWS provides dynamic routing services, including latency-based routing, GeoDNS, Geoproximity, and Weighted Round Robin (WRR).

A core Route 53 functionality is the ability to configure health checks which monitor the health of an application and can route incoming traffic to healthy endpoints.

An automation procedure, such as DNS fail-over or service restart, can be initiated by Route 53 once the health check status drops below a certain threshold.

Geographic Distribution

Route 53 executes health checks from different parts of the world independently verify outage and latency information. The latency and connection times collected by the checkers vary widely depending on the geographic proximity of the monitored endpoint to one of the AWS regions used for health checking.

  • us-east-1
  • us-west-1
  • us-west-2
  • sa-east-1
  • ap-southeast-1
  • ap-southeast-2
  • ap-northeast-1

Access Security

AWS publishes a list of IP ranges used by health checker nodes. Your network administrators may need to allow inbound traffic from ROUTE53_HEALTHCHECKS addresses.

{
  "ip_prefix": "192.0.2.0/24",
  "region": "us-east-1",
  "service": "ROUTE53_HEALTHCHECKS"
}

Healthy Endpoint

For HTTP and HTTPS checks, Route 53 considers the endpoint healthy if the service establishes a connection within ten seconds and the endpoint returns an HTTP status code of 2xx or 3xx within two seconds.

For TCP checks, Route 53 determines the endpoint status is healthy if the service establishes TCP connection within ten seconds.

The timeouts are hardcoded.

Monitoring Frequency

Route 53 schedules health checks in multiple regions independently using the same monitoring interval of 30 or 10 seconds (Fast mode).

  • 0.5 requests per second for standard frequency of 30 seconds.
  • 2.0 requests per second for Fast mode frequency of 10 seconds.

Individual checkers are not synchronized, the rate at which requests arrive is uneven.

Route 53 supports health checks HTTP, HTTPS, and TCP protocols.

The services considers the endpoint to be in a Healthy state when the specified percentage of checkers establish a TCP connection and (for HTTP/S) received a 2xx/3xx response code from the server. The response also contains the specified keyword if String Matching is enabled.

When specifying paths for HTTP/S endpoints, factor in the increased traffic sent to the target service to avoid causing excessive load on the server.

HTTPS

Health checks cannot be used to monitor validity of SSL certificates as part of HTTPS endpoint monitoring. Specifically, the service reports Healthy status even if the SSL certificate is expired, self-signed, or otherwise invalid.

Metrics

Built-in monitoring charts display endpoint health statistics for a period of up to 2 weeks.

Route 53 CloudWatch metrics are available only in the us-east-1 region as specified in the Developer Guide (see section To view Route 53 metrics on the CloudWatch console).

Service Availability Dashboards

Offload health check statistics to ATSD and create consolidated dashboards with custom thresholds for alerts and notifications.

Configuration

Prerequisites

  • Create an AWS IAM account to query CloudWatch statistics.
  • Ensure 4 GB RAM is available for the ATSD sandbox container.

Launch ATSD Sandbox

Create an import directory in the current directory:

mkdir import
cd import

Mount this directory to the Docker container to pass AWS credentials to the CloudWatch data collector without exposing sensitive information as environment variables.

Create an aws.propeties file in the import directory and replace KEY and SECRET with AWS Access Key ID and Secret Access Key respectively.

accessKeyId=KEY
secretAccessKey=SECRET

Launch the ATSD sandbox container on a Docker host:

docker run -d -p 8443:8443 -p 9443:9443 -p 8081:8081 \
  --name=atsd-sandbox \
  --volume=$(pwd)/import:/import \
  --env ATSD_IMPORT_PATH='https://github.com/axibase/atsd-use-cases/raw/master/integrations/aws/route53-health-checks/resources/aws-route53-xml.zip' \
  --env COLLECTOR_IMPORT_PATH='https://raw.githubusercontent.com/axibase/atsd-use-cases/master/integrations/aws/route53-health-checks/resources/job_aws_aws-route53.xml' \
  --env COLLECTOR_CONFIG='job_aws_aws-route53.xml:aws.properties' \
  axibase/atsd-sandbox:latest

The sandbox container includes both ATSD and Axibase Collector instances.

Use the Collector instance in the sandbox container to retrieve Route 53 statistics from AWS CloudWatch and store the statistics in ATSD.

Wait until the sandbox is initialized and All applications started is displayed by the start logs.

docker logs -f atsd-sandbox
[Collector] 2018-03-29 17:47:40,329 Job 'aws-route53' completed.
[Collector] 2018-03-29 17:47:40,330 All jobs completed.
[Collector] Checking Collector web-interface port 9443 ...
[Collector] Collector web interface:
[Collector] https://172.17.0.2:9443
[Collector] Collector start completed.
[Collector] For more details see logfile in /opt/axibase-collector/logs/axibase-collector.log
[Collector] Account 'axibase' created.
All applications started

Log in to ATSD using axibase username and axibase password at https://atsd_hostname:8443/.

Health Check Setup Attribute Copy

Configure a cron scheduled task to copy health check attributes into ATSD sandbox as described by ATSD Integration Documentation

Results

Consolidated View

View all working Route 53 health checks on the AWS Route53 tab.

Service Level Reporting

Availability Portal

The built-in portal displays availability statistics.