Explanation: Understanding Route53 Infrastructure

Introduction

This document explains the architecture and design decisions behind Norton's Route53 infrastructure. It covers how the Terraform module is laid out, how hosted zones and records are configured, how the CI/CD pipeline applies changes, and the rationale behind the choices made — including a few quirks that are easy to miss the first time you work on DNS. Read the how-to guide first if you need practical step-by-step instructions for adding records or zones.

The Problem: One Source of Truth for DNS

DNS records are easy to change in the AWS Console — and that is exactly the problem. A single ad-hoc edit can take down an entire product, and there is no merge request, no review, and no record of who did what or why. Add a few engineers and a few years of "quick fixes," and the zone becomes a pile of records that nobody fully understands.

The self-service Route53 setup solves this by treating DNS the same way we treat application code: every change goes through Git, every change is reviewed, and every change is applied by an automated pipeline. Once a zone is in Terraform, the AWS Console becomes a read-only window into it.

DNS is high-stakes infrastructure: a wrong record breaks production for everyone, and the blast radius is global. The pipeline keeps the keys to that room in a place where the door cannot be opened without a witness.

Architecture Overview

Module Structure

The Route53 Terraform module lives at aws/route53/ in the Infrastructure repository:

aws/route53/
├── route53.tf        # Hosted zones + records (standard and alias)
├── health_checks.tf  # Route53 health checks
├── locals.tf         # Combines records from JSON files and inline declarations
├── variables.tf      # Input variable definitions
└── outputs.tf        # Zone IDs, name servers, health check IDs

Each environment has its own root configuration at accounts/{environment}/Route53/ which calls the module:

accounts/{development|production}/Route53/
├── main.tf                # Module call
├── terraform.tfvars       # Hosted zones declared here
├── backend.tf             # GitLab-managed state
├── variables.tf
├── versions.tf
└── route53-records/       # One JSON file per zone, holding records
    ├── wwnorton-com.json
    ├── wwnorton-net.json
    └── ...

How Zones and Records Are Configured

Configuration is split between two files for a reason:

terraform.tfvars declares the hosted zones — the top-level DNS containers and their tags, comments, and (for private zones) VPC associations.
route53-records/{zone-key}.json holds the actual records for that zone — A, CNAME, MX, TXT, alias records, and so on.

The module supports inline records (declared directly in tfvars) as well, but the JSON file approach is what you will see in production. Large zones with hundreds of records are far easier to scan and review as a JSON file than as nested HCL.

Hosted Zones

A hosted zone is the Route53 container that holds all the records for a single domain. Norton runs two flavors:

Public zones — answer queries from the public internet. Used in the production account for customer-facing domains like wwnorton.com, and the author sites.
Private zones — answer queries only from VPCs that are explicitly associated with the zone. Used for internal hostnames and for splitting public and private views of the same domain.

Public vs Private Resolution

The same domain name can exist as a private zone in multiple accounts. wwnorton.com lives as a public zone in production and as a private zone in both production and development. A request resolves against the zone associated with the requester's VPC — if there is no association, the resolver falls back to the public zone.

Why two zones for the same name? Internal services often need different targets than external clients (e.g. an app calls another app via an internal load balancer instead of the external one). Splitting public and private views lets us serve the right answer in the right context without exposing internal hostnames or IPs to the internet.

Deletion Protection

Every hosted zone is created with lifecycle { prevent_destroy = true }. Terraform will refuse to plan a destruction of a zone, which means a stray for_each change or accidental key rename cannot delete a zone and take its records with it. To remove a zone intentionally, the lifecycle block has to be edited first — a deliberate, reviewable action.

Records

The module supports every record type that AWS does and every routing policy that Route53 offers. Records come from two sources:

External JSON files referenced by records_file in the zone declaration. This is the standard pattern.
Inline records declared directly inside the zone object in tfvars. Useful for one-off records during a quick setup, but rarely seen in production zones.

Standard vs Alias Records

The module produces two slightly different Terraform resources depending on whether the record is an alias to an AWS resource or a regular standard record with a TTL and value list:

Alias records are an AWS-only feature and are the right choice for pointing at AWS resources — they resolve directly to the underlying IP, support the apex of a domain (where CNAMEs are illegal), and are free to query. Use them whenever the target is an ALB, NLB, CloudFront distribution, S3 static website, API Gateway, or another Route53 record in the same account.

How Records Get Their Unique Key

Terraform resources need a stable, unique key for every record. The module builds it from the zone, the record name, the type, and the routing-policy set_identifier:

{zone_key}-{record_name}-{type}-{set_identifier|default}

Two practical consequences:

The same name + type can exist multiple times within a zone if each entry has a different set_identifier (used for weighted, latency, geolocation, and failover routing).
Renaming the zone key, the record name, or the type will cause Terraform to plan a destroy + recreate. For a CNAME this is a brief ripple; for the apex A record it is a real outage. Always read the plan carefully.

Routing Policies

A single record can be backed by several different "answers" depending on which routing policy is configured. The module supports all of Route53's policies; pick the one that matches the failure or distribution mode you need.

Policy	What It Does	When to Use
Simple (default)	Returns a single answer.	Most records. The apex `A` of a domain, a static CNAME, an MX.
Weighted	Distributes traffic across multiple answers by weight.	Gradual rollouts (e.g. 90/10 between two ALBs), A/B testing, blue-green migrations.
Latency	Returns the answer with the lowest latency for the requester's region.	Multi-region deployments where the user should hit the closest endpoint.
Geolocation	Routes based on the requester's continent, country, or US subdivision.	Compliance-driven routing (data residency), region-specific content, or sending UK traffic to the UK author site.
Failover	Primary/secondary pair backed by a health check. The secondary only answers when the primary is unhealthy.	Active/passive disaster recovery.
Multivalue	Returns up to eight healthy answers, randomized.	Cheap pseudo-load-balancing for systems that can tolerate client-side retries.

Every policy except simple and multivalue requires a set_identifier — that is what lets Route53 (and the module) tell two records of the same name and type apart.

Health Checks

Health checks live independently of records and are referenced by health_check_id. They are used to:

Drive failover routing — a failover record set marks its primary as unhealthy and answers from the secondary instead.
Filter weighted/latency/multivalue answers — Route53 silently drops unhealthy answers from the response.
Page on-call via CloudWatch alarms tied to the check's metric.

The module exposes the full set of check types — HTTP, HTTPS, HTTP/HTTPS string-match, TCP, calculated (combining other checks), and CloudWatch metric. The most common in practice are HTTPS checks against /healthz endpoints.

Health checks bill per check, per region. They are not free. Add them where they actively drive routing or alerting, not as decoration on records that nothing reads.

Delegation Sets

A delegation set is a fixed set of four name servers that Route53 will reuse across multiple hosted zones. Without one, every new zone gets a fresh, random set — which means anyone who owns the registered domain has to update its NS records at the registrar every time.

The module supports reusable delegation sets (var.delegation_sets). When a zone declares a delegation_set_id, all of its NS records come from that fixed set. This is the right pattern when you plan to spin up multiple related zones (e.g. a primary domain plus several vanity domains that should share name servers).

Norton does not currently use delegation sets in production — most zones run on AWS-assigned NS records — but the support is there in the module if a future zone family needs it.

Controlled Subdomains for Load Balancers

Most application records under wwnorton.com and wwnorton.net do not point straight at an AWS load balancer hostname. They go through a layer of controlled subdomains — short, stable CNAMEs like alb-external-prod.wwnorton.com that resolve to the actual ELB. Application records then CNAME to the controlled subdomain.

The pattern buys two things:

One change, many redirects. Changing the controlled subdomain target moves every application that resolves through it — within the 60-second TTL — without touching every individual record.
Maintenance mode. Pointing alb-external-prod at maintenance.wwnorton.com redirects all production apps to the maintenance page in a single record edit.

The full registry of controlled subdomains and the maintenance procedure live in docs/aws/route53/controlled-subdomains.md in the Infrastructure repository, and the how-to guide cross-links the relevant sections.

CI/CD Pipeline Flow

The Route53 pipeline follows the same shape as every other resource in the Infrastructure repository — fmt, validate, plan, apply — with one important quirk you need to understand before your first MR.

What Happens on a Merge Request

What Happens on Merge to Main

Apply on Route53 is a manual pipeline step. The plan job runs automatically on merge, but the apply job has to be started by hand. DNS is sensitive enough that we want a person to look at the plan one more time and press the button.

OPA Policy Guardrails

Route53 does not currently have OPA policies in policies/accounts/{environment}/. S3, RDS, ElastiCache, MemoryDB, and Amazon MQ each have a policy bundle; Route53 does not yet.

In practice, the guardrails for Route53 today are:

prevent_destroy on every hosted zone — Terraform refuses to remove a zone without a deliberate code change.
CODEOWNERS on the Infrastructure repository — record changes require Platform team review.
Manual apply step — somebody has to look at the plan and press the button.
Plan visibility — every record change appears in the MR plan comment, and a destroy of an apex record is hard to miss.

The Platform team is tracking OPA support for Route53 as future work. Likely first checks: enforcing TTL ranges on hot records (the controlled subdomains live at TTL 60s on purpose), preventing allow_overwrite = true outside of explicitly migration MRs, and rejecting deletes of records on the controlled-subdomain registry.

Environment Differences

Aspect	Development account	Production account
Public zones	None	Most customer-facing domains (`wwnorton.com`, author sites, etc.)
Private zones	`wwnorton.com`, `wwnorton.net` (associated to dev VPCs)	`wwnorton.com` (associated to prod EKS VPCs)
Manual apply	Required	Required
`prevent_destroy`	Enabled on all zones	Enabled on all zones
Records source	Mostly `route53-records/*.json`	Mostly `route53-records/*.json`

Production carries the public-facing domains and is where the controlled-subdomain pattern earns its keep. Development hosts only private zones because there is no business reason to expose internal dev hostnames to the public internet.

Account Structure

Development account (637244866643): private zones for dev/QA/staging.
Production account (100478842646): all public zones plus the production private zone for wwnorton.com.

Each account has its own GitLab-managed Terraform state file, keyed under development-route53 or production-route53.

Design Rationale

Why JSON Files Instead of Inline HCL

Production zones contain hundreds of records — DKIM keys, ACM validation CNAMEs, vendor verification TXT records, all the application records. Diffing that volume of nested HCL is a slog; diffing JSON is fine, and most tooling (editors, GitLab, GitHub) renders JSON record diffs cleanly out of the box.

The module supports inline records too, so a small zone with five records can stay self-contained in tfvars without spawning a JSON file.

Why a Manual Apply Step

DNS errors are immediately visible to every user of the affected domain. Applying automatically on merge would shave a few minutes off the change cycle, but it would also remove the last human checkpoint before a record goes live. The explicit click is cheap insurance against a typo in a domain name.

Why `prevent_destroy` on Zones

Recreating a hosted zone gives it a new set of name servers, which means the registrar's NS records have to be updated — and until they are, the domain is unresolvable. prevent_destroy makes that an impossible accident: the only way to remove a zone is to first edit the lifecycle block, which is itself a reviewable change.

Why No OPA Yet

Route53 self-service is newer than S3 and RDS. The team prioritized building the module and migrating zones in before adding policy enforcement on top. Manual apply, CODEOWNERS, and prevent_destroy cover the highest-risk failure modes while OPA support is being scoped — see the OPA section above.

Explanation: Understanding Route53 Infrastructure

Introduction

The Problem: One Source of Truth for DNS

Architecture Overview

Module Structure

How Zones and Records Are Configured

Hosted Zones

Public vs Private Resolution

Deletion Protection

Records

Standard vs Alias Records

How Records Get Their Unique Key

Routing Policies

Health Checks

Delegation Sets

Controlled Subdomains for Load Balancers

CI/CD Pipeline Flow

What Happens on a Merge Request

What Happens on Merge to Main

OPA Policy Guardrails

Environment Differences

Account Structure

Design Rationale

Why JSON Files Instead of Inline HCL

Why a Manual Apply Step

Why `prevent_destroy` on Zones

Why No OPA Yet

References

Internal (Norton)

External (AWS & HashiCorp)

Introduction​

The Problem: One Source of Truth for DNS​

Architecture Overview​

Module Structure​

How Zones and Records Are Configured​

Hosted Zones​

Public vs Private Resolution​

Deletion Protection​

Records​

Standard vs Alias Records​

How Records Get Their Unique Key​

Routing Policies​

Health Checks​

Delegation Sets​

Controlled Subdomains for Load Balancers​

CI/CD Pipeline Flow​

What Happens on a Merge Request​

What Happens on Merge to Main​

OPA Policy Guardrails​

Environment Differences​

Account Structure​

Design Rationale​

Why JSON Files Instead of Inline HCL​

Why a Manual Apply Step​

Why prevent_destroy on Zones​

Why No OPA Yet​

References​

Internal (Norton)​

External (AWS & HashiCorp)​

Introduction

The Problem: One Source of Truth for DNS

Architecture Overview

Module Structure

How Zones and Records Are Configured

Hosted Zones

Public vs Private Resolution

Deletion Protection

Records

Standard vs Alias Records

How Records Get Their Unique Key

Routing Policies

Health Checks

Delegation Sets

Controlled Subdomains for Load Balancers

CI/CD Pipeline Flow

What Happens on a Merge Request

What Happens on Merge to Main

OPA Policy Guardrails

Environment Differences

Account Structure

Design Rationale

Why JSON Files Instead of Inline HCL

Why a Manual Apply Step

Why `prevent_destroy` on Zones

Why No OPA Yet

References

Internal (Norton)

External (AWS & HashiCorp)