Explanation: Understanding Route53 Infrastructure
Introduction
This document explains the architecture and design decisions behind Norton's Route53 infrastructure. It covers how the Terraform module is laid out, how hosted zones and records are configured, how the CI/CD pipeline applies changes, and the rationale behind the choices made — including a few quirks that are easy to miss the first time you work on DNS. Read the how-to guide first if you need practical step-by-step instructions for adding records or zones.
The Problem: One Source of Truth for DNS
DNS records are easy to change in the AWS Console — and that is exactly the problem. A single ad-hoc edit can take down an entire product, and there is no merge request, no review, and no record of who did what or why. Add a few engineers and a few years of "quick fixes," and the zone becomes a pile of records that nobody fully understands.
The self-service Route53 setup solves this by treating DNS the same way we treat application code: every change goes through Git, every change is reviewed, and every change is applied by an automated pipeline. Once a zone is in Terraform, the AWS Console becomes a read-only window into it.
DNS is high-stakes infrastructure: a wrong record breaks production for everyone, and the blast radius is global. The pipeline keeps the keys to that room in a place where the door cannot be opened without a witness.
Architecture Overview
Module Structure
The Route53 Terraform module lives at aws/route53/ in the Infrastructure repository:
aws/route53/
├── route53.tf # Hosted zones + records (standard and alias)
├── health_checks.tf # Route53 health checks
├── locals.tf # Combines records from JSON files and inline declarations
├── variables.tf # Input variable definitions
└── outputs.tf # Zone IDs, name servers, health check IDs
Each environment has its own root configuration at accounts/{environment}/Route53/ which calls the module:
accounts/{development|production}/Route53/
├── main.tf # Module call
├── terraform.tfvars # Hosted zones declared here
├── backend.tf # GitLab-managed state
├── variables.tf
├── versions.tf
└── route53-records/ # One JSON file per zone, holding records
├── wwnorton-com.json
├── wwnorton-net.json
└── ...
How Zones and Records Are Configured
Configuration is split between two files for a reason:
terraform.tfvarsdeclares the hosted zones — the top-level DNS containers and their tags, comments, and (for private zones) VPC associations.route53-records/{zone-key}.jsonholds the actual records for that zone — A, CNAME, MX, TXT, alias records, and so on.
The module supports inline records (declared directly in tfvars) as well, but the JSON file approach is what you will see in production. Large zones with hundreds of records are far easier to scan and review as a JSON file than as nested HCL.
Hosted Zones
A hosted zone is the Route53 container that holds all the records for a single domain. Norton runs two flavors:
- Public zones — answer queries from the public internet. Used in the production account for customer-facing domains like
wwnorton.com, and the author sites. - Private zones — answer queries only from VPCs that are explicitly associated with the zone. Used for internal hostnames and for splitting public and private views of the same domain.
Public vs Private Resolution
The same domain name can exist as a private zone in multiple accounts. wwnorton.com lives as a public zone in production and as a private zone in both production and development. A request resolves against the zone associated with the requester's VPC — if there is no association, the resolver falls back to the public zone.
Why two zones for the same name? Internal services often need different targets than external clients (e.g. an app calls another app via an internal load balancer instead of the external one). Splitting public and private views lets us serve the right answer in the right context without exposing internal hostnames or IPs to the internet.
Deletion Protection
Every hosted zone is created with lifecycle { prevent_destroy = true }. Terraform will refuse to plan a destruction of a zone, which means a stray for_each change or accidental key rename cannot delete a zone and take its records with it. To remove a zone intentionally, the lifecycle block has to be edited first — a deliberate, reviewable action.
Records
The module supports every record type that AWS does and every routing policy that Route53 offers. Records come from two sources:
- External JSON files referenced by
records_filein the zone declaration. This is the standard pattern. - Inline records declared directly inside the zone object in tfvars. Useful for one-off records during a quick setup, but rarely seen in production zones.
Standard vs Alias Records
The module produces two slightly different Terraform resources depending on whether the record is an alias to an AWS resource or a regular standard record with a TTL and value list:
Alias records are an AWS-only feature and are the right choice for pointing at AWS resources — they resolve directly to the underlying IP, support the apex of a domain (where CNAMEs are illegal), and are free to query. Use them whenever the target is an ALB, NLB, CloudFront distribution, S3 static website, API Gateway, or another Route53 record in the same account.
How Records Get Their Unique Key
Terraform resources need a stable, unique key for every record. The module builds it from the zone, the record name, the type, and the routing-policy set_identifier:
{zone_key}-{record_name}-{type}-{set_identifier|default}
Two practical consequences:
- The same name + type can exist multiple times within a zone if each entry has a different
set_identifier(used for weighted, latency, geolocation, and failover routing). - Renaming the zone key, the record name, or the type will cause Terraform to plan a destroy + recreate. For a CNAME this is a brief ripple; for the apex
Arecord it is a real outage. Always read the plan carefully.
Routing Policies
A single record can be backed by several different "answers" depending on which routing policy is configured. The module supports all of Route53's policies; pick the one that matches the failure or distribution mode you need.
| Policy | What It Does | When to Use |
|---|---|---|
| Simple (default) | Returns a single answer. | Most records. The apex A of a domain, a static CNAME, an MX. |
| Weighted | Distributes traffic across multiple answers by weight. | Gradual rollouts (e.g. 90/10 between two ALBs), A/B testing, blue-green migrations. |
| Latency | Returns the answer with the lowest latency for the requester's region. | Multi-region deployments where the user should hit the closest endpoint. |
| Geolocation | Routes based on the requester's continent, country, or US subdivision. | Compliance-driven routing (data residency), region-specific content, or sending UK traffic to the UK author site. |
| Failover | Primary/secondary pair backed by a health check. The secondary only answers when the primary is unhealthy. | Active/passive disaster recovery. |
| Multivalue | Returns up to eight healthy answers, randomized. | Cheap pseudo-load-balancing for systems that can tolerate client-side retries. |
Every policy except simple and multivalue requires a set_identifier — that is what lets Route53 (and the module) tell two records of the same name and type apart.
Health Checks
Health checks live independently of records and are referenced by health_check_id. They are used to:
- Drive failover routing — a failover record set marks its primary as unhealthy and answers from the secondary instead.
- Filter weighted/latency/multivalue answers — Route53 silently drops unhealthy answers from the response.
- Page on-call via CloudWatch alarms tied to the check's metric.
The module exposes the full set of check types — HTTP, HTTPS, HTTP/HTTPS string-match, TCP, calculated (combining other checks), and CloudWatch metric. The most common in practice are HTTPS checks against /healthz endpoints.
Health checks bill per check, per region. They are not free. Add them where they actively drive routing or alerting, not as decoration on records that nothing reads.
Delegation Sets
A delegation set is a fixed set of four name servers that Route53 will reuse across multiple hosted zones. Without one, every new zone gets a fresh, random set — which means anyone who owns the registered domain has to update its NS records at the registrar every time.
The module supports reusable delegation sets (var.delegation_sets). When a zone declares a delegation_set_id, all of its NS records come from that fixed set. This is the right pattern when you plan to spin up multiple related zones (e.g. a primary domain plus several vanity domains that should share name servers).
Norton does not currently use delegation sets in production — most zones run on AWS-assigned NS records — but the support is there in the module if a future zone family needs it.
Controlled Subdomains for Load Balancers
Most application records under wwnorton.com and wwnorton.net do not point straight at an AWS load balancer hostname. They go through a layer of controlled subdomains — short, stable CNAMEs like alb-external-prod.wwnorton.com that resolve to the actual ELB. Application records then CNAME to the controlled subdomain.
The pattern buys two things:
- One change, many redirects. Changing the controlled subdomain target moves every application that resolves through it — within the 60-second TTL — without touching every individual record.
- Maintenance mode. Pointing
alb-external-prodatmaintenance.wwnorton.comredirects all production apps to the maintenance page in a single record edit.
The full registry of controlled subdomains and the maintenance procedure live in docs/aws/route53/controlled-subdomains.md in the Infrastructure repository, and the how-to guide cross-links the relevant sections.
CI/CD Pipeline Flow
The Route53 pipeline follows the same shape as every other resource in the Infrastructure repository — fmt, validate, plan, apply — with one important quirk you need to understand before your first MR.
What Happens on a Merge Request
What Happens on Merge to Main
Apply on Route53 is a manual pipeline step. The plan job runs automatically on merge, but the apply job has to be started by hand. DNS is sensitive enough that we want a person to look at the plan one more time and press the button.
OPA Policy Guardrails
Route53 does not currently have OPA policies in policies/accounts/{environment}/. S3, RDS, ElastiCache, MemoryDB, and Amazon MQ each have a policy bundle; Route53 does not yet.
In practice, the guardrails for Route53 today are:
prevent_destroyon every hosted zone — Terraform refuses to remove a zone without a deliberate code change.- CODEOWNERS on the Infrastructure repository — record changes require Platform team review.
- Manual apply step — somebody has to look at the plan and press the button.
- Plan visibility — every record change appears in the MR plan comment, and a destroy of an apex record is hard to miss.
The Platform team is tracking OPA support for Route53 as future work. Likely first checks: enforcing TTL ranges on hot records (the controlled subdomains live at TTL 60s on purpose), preventing allow_overwrite = true outside of explicitly migration MRs, and rejecting deletes of records on the controlled-subdomain registry.
Environment Differences
| Aspect | Development account | Production account |
|---|---|---|
| Public zones | None | Most customer-facing domains (wwnorton.com, author sites, etc.) |
| Private zones | wwnorton.com, wwnorton.net (associated to dev VPCs) | wwnorton.com (associated to prod EKS VPCs) |
| Manual apply | Required | Required |
prevent_destroy | Enabled on all zones | Enabled on all zones |
| Records source | Mostly route53-records/*.json | Mostly route53-records/*.json |
Production carries the public-facing domains and is where the controlled-subdomain pattern earns its keep. Development hosts only private zones because there is no business reason to expose internal dev hostnames to the public internet.
Account Structure
- Development account (
637244866643): private zones for dev/QA/staging. - Production account (
100478842646): all public zones plus the production private zone forwwnorton.com.
Each account has its own GitLab-managed Terraform state file, keyed under development-route53 or production-route53.
Design Rationale
Why JSON Files Instead of Inline HCL
Production zones contain hundreds of records — DKIM keys, ACM validation CNAMEs, vendor verification TXT records, all the application records. Diffing that volume of nested HCL is a slog; diffing JSON is fine, and most tooling (editors, GitLab, GitHub) renders JSON record diffs cleanly out of the box.
The module supports inline records too, so a small zone with five records can stay self-contained in tfvars without spawning a JSON file.
Why a Manual Apply Step
DNS errors are immediately visible to every user of the affected domain. Applying automatically on merge would shave a few minutes off the change cycle, but it would also remove the last human checkpoint before a record goes live. The explicit click is cheap insurance against a typo in a domain name.
Why prevent_destroy on Zones
Recreating a hosted zone gives it a new set of name servers, which means the registrar's NS records have to be updated — and until they are, the domain is unresolvable. prevent_destroy makes that an impossible accident: the only way to remove a zone is to first edit the lifecycle block, which is itself a reviewable change.
Why No OPA Yet
Route53 self-service is newer than S3 and RDS. The team prioritized building the module and migrating zones in before adding policy enforcement on top. Manual apply, CODEOWNERS, and prevent_destroy cover the highest-risk failure modes while OPA support is being scoped — see the OPA section above.
References
Internal (Norton)
- How-to guide: Managing Route53 with Terraform
- Infrastructure Repository: wwnorton/ops/infrastructure
- Route53 module source:
aws/route53/ - Controlled subdomains registry:
docs/aws/route53/controlled-subdomains.md
External (AWS & HashiCorp)
- AWS Route53 Documentation: Amazon Route 53 Developer Guide
- Routing Policies: Choosing a routing policy
- Health Checks: How Route53 checks the health of your resources
- Terraform Resources: aws_route53_zone, aws_route53_record