TERRAFORM AT SCALE: MANAGING 40+ AWS ACCOUNTS
Terraform at Scale: Managing 40+ AWS Accounts
On a pharma project I managed Terraform across 40+ AWS accounts in a Digital-SDLC organization with SOC2 compliance requirements. At that scale, Terraform stops being “write some HCL and run apply” and becomes a software engineering problem. Module design, state isolation, CI/CD pipelines, and access control all need deliberate architecture.
Module Taxonomy
I built and maintained 20+ reusable Terraform modules organized by domain:
- Networking — VPC with standardized CIDR allocation, Transit Gateway attachments, Route 53 zones, NAT Gateway configuration
- Compute — ECS Fargate services, Lambda functions with associated IAM roles, EC2 launch templates
- Data — RDS (PostgreSQL, MySQL), S3 buckets with versioning and lifecycle policies, DynamoDB tables
- Security — IAM roles and policies, KMS keys with cross-account grants, WAF web ACLs, Security Hub configuration
- Observability — CloudWatch log groups, metric alarms, CloudTrail organization trails, dashboard templates
- DR — AWS Backup plans with cross-account vaulting, replication configurations
Every module follows the same contract: consistent variable naming (environment, project, tags), well-defined outputs that downstream modules can reference, and auto-generated documentation.
Module Design Principles
The biggest mistake I see in large Terraform codebases is modules that do too much. A VPC module shouldn’t also configure Transit Gateway attachments — those have different lifecycles and different teams responsible for them.
I keep modules focused on a single resource or a tightly coupled group. A VPC module creates the VPC, subnets, route tables, and NACLs. A Transit Gateway attachment module takes a VPC ID as input and handles the peering. Composition happens at the root module level, not inside the modules themselves.
Versioning matters. Every module lives in its own repository with semantic versioning. Workload teams pin to specific versions and upgrade on their own schedule. A breaking change in the VPC module doesn’t force every team to update simultaneously.
State Management
State files live in a centralized S3 bucket in the management account with DynamoDB locking. Each workload gets its own state key path — no shared state files, no cross-team lock contention.
Cross-account deployment uses IAM roles. CI/CD pipelines assume a deployment role in the target account via OIDC federation. Interactive development uses Identity Center (SSO) with permission sets scoped to specific accounts.
The state bucket itself has versioning enabled and a lifecycle policy that keeps 90 days of state history. When someone runs a bad apply, we can recover the previous state without scrambling.
CI/CD: GitHub Actions with OIDC
No static AWS credentials in CI/CD. Every pipeline authenticates through OIDC federation with GitHub Actions:
name: Terraform Deploy
on:
push:
branches: [main]
pull_request:
branches: [main]
permissions:
id-token: write
contents: read
jobs:
plan:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: aws-actions/configure-aws-credentials@v4
with:
role-to-arn: arn:aws:iam::role/GitHubActionsDeployment
aws-region: eu-west-1
- uses: hashicorp/setup-terraform@v3
with:
terraform_version: 1.9.x
- name: Terraform Init
run: terraform init
- name: Terraform Plan
run: terraform plan -out=tfplan
- name: Terraform Apply
if: github.ref == 'refs/heads/main'
run: terraform apply tfplan
The IAM trust policy restricts which repositories and branches can assume the role. A PR from a fork can’t trigger an apply against production. The OIDC thumbprint validation ensures only GitHub’s token service is trusted.
Each account has its own deployment role with least-privilege permissions scoped to the resources Terraform manages. The networking account role can modify VPCs and Transit Gateway. It can’t touch RDS instances. The data account role is the inverse.
Pre-Commit Hooks
Every Terraform repository runs pre-commit hooks:
terraform fmt— consistent formatting, no style debates in reviewsterraform validate— catches syntax errors and missing provider configurationsterraform-docs— regenerates module documentation from variables, outputs, and descriptionstflint— provider-specific linting (deprecated arguments, invalid instance types)checkov— static security analysis (S3 buckets without encryption, security groups with 0.0.0.0/0)
These hooks run locally before every commit and again in CI as a gate. The combination catches the vast majority of issues before a human reviewer even looks at the PR.
The AFT Pipeline
New accounts flow through Account Factory for Terraform. The process:
- A team submits a PR with an account request (email, name, OU, compliance tags)
- PR gets reviewed and merged
- AFT provisions the account through Control Tower
- Account customizations run automatically — VPC creation, Transit Gateway attachment, CloudTrail configuration, Security Hub enablement, IAM baseline roles
- The account appears in Identity Center with the appropriate permission sets
From PR to usable account takes about 30 minutes. The customizations are themselves Terraform modules — the same ones used for day-2 operations. No separate “account setup” scripts that drift from the main codebase.
What Makes It Work
The scale isn’t the hard part. 40 accounts with consistent patterns are easier to manage than 5 accounts with ad-hoc configurations. The discipline is what makes it work: versioned modules, isolated state, OIDC-only authentication, automated compliance checks, and a provisioning pipeline that enforces the same baseline everywhere.
Every shortcut — shared state files, inline resources instead of modules, manually provisioned accounts — becomes a liability at this scale. The upfront investment in structure pays for itself within the first quarter.