Table of Contents

0. Setup & Environment

Terraform is a single binary — install it and you are ready to go. You do not need cloud credentials to start learning; the Docker provider lets you practice the full workflow locally.

Installation

# macOS
brew tap hashicorp/tap
brew install hashicorp/tap/terraform

# Verify
terraform --version
# Terraform v1.7.x on darwin_arm64

# Shell completions (optional)
terraform -install-autocomplete

LocalStack (optional — fake AWS locally)

# Spin up LocalStack — emulates AWS services on port 4566
docker run -d --name localstack \
  -p 4566:4566 \
  -e SERVICES=s3,dynamodb,iam \
  localstack/localstack

# Confirm it is healthy
curl -s http://localhost:4566/_localstack/health | python3 -m json.tool

Quick start with the Docker provider

mkdir ~/tf-refresher && cd ~/tf-refresher

cat > main.tf << 'EOF'
terraform {
  required_providers {
    docker = {
      source  = "kreuzwerker/docker"
      version = "~> 3.0"
    }
  }
}

provider "docker" {}

resource "docker_image" "nginx" {
  name = "nginx:latest"
}

resource "docker_container" "web" {
  name  = "tf-nginx"
  image = docker_image.nginx.image_id

  ports {
    internal = 80
    external = 8080
  }
}
EOF

terraform init    # download provider plugin
terraform plan    # preview changes
terraform apply -auto-approve  # create resources

curl localhost:8080  # nginx welcome page

terraform destroy -auto-approve  # tear everything down
Why Docker for practice
The Docker provider lets you run the complete Terraform lifecycle — init, plan, apply, destroy — with zero cloud credentials and zero cost. Every concept in this guide applies identically to AWS, GCP, or Azure.

1. Core Concepts

Infrastructure as Code

IaC means describing your infrastructure in version-controlled text files rather than clicking through a console. Terraform uses a declarative model: you specify the desired end state and Terraform figures out the sequence of API calls to get there.

ApproachModelExamplePros / Cons
DeclarativeDesired stateTerraform, CloudFormation+ Idempotent, + drift detection / - less flexible logic
ImperativeStep-by-stepBash + AWS CLI, Ansible+ Full control / - hard to reason about state, not idempotent
ProgrammaticCode as configPulumi, CDK+ Real language loops / - complexity, larger dependency tree

The Terraform Workflow

  1. Write — author .tf files describing resources
  2. Initterraform init downloads providers and modules
  3. Planterraform plan computes the diff between desired state and current state
  4. Applyterraform apply executes the plan and updates the state file
  5. Destroyterraform destroy deletes all managed resources

Key Components

Terraform vs Alternatives

ToolLanguageStateMulti-cloudBest for
TerraformHCLExplicit fileYesMulti-cloud, community modules
CloudFormationYAML/JSONAWS-managedAWS onlyAWS-native, tight IAM integration
PulumiTS/Python/GoService/fileYesTeams preferring real languages
CDK for TerraformTS/PythonTerraform stateYesCDK familiar teams moving to TF
AnsibleYAMLNone (push)YesConfig management, not provisioning

2. HCL Syntax

HashiCorp Configuration Language (HCL) is designed to be human-readable and machine-parseable. It feels like a cross between JSON and a scripting language.

Block Types

// terraform block — global settings
terraform {
  required_version = ">= 1.5"
  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = "~> 5.0"
    }
  }
}

// provider block — configure a provider
provider "aws" {
  region = "us-east-1"
}

// resource block — declare a managed resource
resource "aws_s3_bucket" "my_bucket" {
  bucket = "acme-assets-prod"
  tags = {
    Environment = "production"
    Team        = "platform"
  }
}

// data block — read existing resource (not managed here)
data "aws_ami" "ubuntu" {
  most_recent = true
  owners      = ["099720109477"]  // Canonical
  filter {
    name   = "name"
    values = ["ubuntu/images/hvm-ssd/ubuntu-*-22.04-amd64-server-*"]
  }
}

// variable block — input parameter
variable "instance_type" {
  type        = string
  description = "EC2 instance type"
  default     = "t3.micro"
}

// output block — expose value after apply
output "bucket_arn" {
  value       = aws_s3_bucket.my_bucket.arn
  description = "ARN of the S3 bucket"
}

// locals block — computed values used within module
locals {
  common_tags = {
    Project   = "acme"
    ManagedBy = "terraform"
  }
  bucket_name = "${var.environment}-${var.project}-assets"
}

Types

// Primitive types
variable "name"    { type = string }
variable "count"   { type = number }
variable "enabled" { type = bool   }

// Collection types
variable "azs" {
  type    = list(string)
  default = ["us-east-1a", "us-east-1b", "us-east-1c"]
}

variable "tags" {
  type    = map(string)
  default = { env = "prod", team = "platform" }
}

variable "cidr_set" {
  type    = set(string)   // like list but unordered, no duplicates
  default = ["10.0.0.0/24", "10.0.1.0/24"]
}

// Structural types
variable "db_config" {
  type = object({
    engine  = string
    version = string
    port    = number
  })
  default = {
    engine  = "postgres"
    version = "16"
    port    = 5432
  }
}

variable "subnet_cidrs" {
  type = tuple([string, string, string])
  // Fixed-length, each element can differ in type
}

Expressions & Interpolation

// String interpolation
resource "aws_instance" "web" {
  tags = {
    Name = "web-${var.environment}-${count.index}"
  }
}

// Heredoc (multi-line string)
resource "aws_iam_policy" "example" {
  policy = <<-EOT
    {
      "Version": "2012-10-17",
      "Statement": [
        {
          "Effect": "Allow",
          "Action": "s3:GetObject",
          "Resource": "arn:aws:s3:::${aws_s3_bucket.my_bucket.id}/*"
        }
      ]
    }
  EOT
}

// Comments
// single-line comment
# also single-line
/* multi-line
   comment */

3. Resources

Resource Blocks

The resource block is the workhorse of Terraform. The first label is the resource type (provider_resourcetype), and the second is the local name used to reference it.

// Syntax: resource "<TYPE>" "<NAME>" { ... }
// Reference: <TYPE>.<NAME>.<ATTRIBUTE>

resource "aws_vpc" "main" {
  cidr_block           = "10.0.0.0/16"
  enable_dns_hostnames = true
  tags = { Name = "main-vpc" }
}

resource "aws_subnet" "public" {
  vpc_id            = aws_vpc.main.id  // implicit dependency
  cidr_block        = "10.0.1.0/24"
  availability_zone = "us-east-1a"
}

Meta-Arguments

// depends_on — explicit dependency when implicit is not visible
resource "aws_iam_role_policy_attachment" "attach" {
  role       = aws_iam_role.worker.name
  policy_arn = aws_iam_policy.worker_policy.arn
  depends_on = [aws_iam_role.worker]
}

// count — create N identical resources
resource "aws_instance" "web" {
  count         = 3
  ami           = data.aws_ami.ubuntu.id
  instance_type = var.instance_type
  tags          = { Name = "web-${count.index}" }
}
// Reference: aws_instance.web[0].id, aws_instance.web[*].id (splat)

// for_each — create one resource per map/set element (preferred over count)
variable "buckets" {
  type    = map(string)
  default = {
    assets  = "us-east-1"
    backups = "us-west-2"
  }
}

resource "aws_s3_bucket" "buckets" {
  for_each = var.buckets
  bucket   = "acme-${each.key}"
  // each.key = "assets", each.value = "us-east-1"
}
// Reference: aws_s3_bucket.buckets["assets"].id

// lifecycle — control resource replacement behaviour
resource "aws_db_instance" "primary" {
  identifier = "prod-db"
  // ...

  lifecycle {
    create_before_destroy = true   // zero-downtime replacement
    prevent_destroy       = true   // guard against accidental deletion
    ignore_changes        = [password]  // don't drift-detect this field
    replace_triggered_by  = [aws_db_subnet_group.main]  // force replace if dep changes
  }
}

Implicit vs Explicit Dependencies

// Implicit: Terraform sees the reference and builds the graph automatically
resource "aws_security_group" "web_sg" { ... }

resource "aws_instance" "web" {
  vpc_security_group_ids = [aws_security_group.web_sg.id]
  // Terraform knows: create web_sg BEFORE web
}

// Explicit: use depends_on when the relationship is not visible in attributes
resource "null_resource" "bootstrap" {
  depends_on = [aws_db_instance.primary, aws_elasticache_cluster.cache]
  // bootstrap only after both DB and cache are ready
}

4. Variables & Outputs

Input Variables

// variables.tf

variable "region" {
  type        = string
  description = "AWS region to deploy into"
  default     = "us-east-1"
}

variable "instance_count" {
  type        = number
  description = "Number of EC2 instances"
  default     = 1

  validation {
    condition     = var.instance_count >= 1 && var.instance_count <= 10
    error_message = "instance_count must be between 1 and 10."
  }
}

variable "db_password" {
  type        = string
  description = "Database root password"
  sensitive   = true   // masked in plan/apply output and state
}

variable "allowed_cidrs" {
  type        = list(string)
  description = "CIDR blocks allowed to reach the load balancer"
  default     = ["0.0.0.0/0"]
}

Variable Precedence (highest to lowest)

  1. -var or -var-file CLI flags
  2. TF_VAR_name environment variables
  3. terraform.tfvars or terraform.tfvars.json (auto-loaded)
  4. *.auto.tfvars files (auto-loaded alphabetically)
  5. Variable default value
# terraform.tfvars — committed (no secrets!)
region         = "us-east-1"
instance_count = 3

# secrets.auto.tfvars — gitignored
db_password = "hunter2"

# CLI override (highest priority)
terraform apply -var="region=eu-west-1" -var-file="prod.tfvars"

# Environment variable
export TF_VAR_db_password="hunter2"
terraform apply

Output Values

// outputs.tf
output "vpc_id" {
  value       = aws_vpc.main.id
  description = "ID of the main VPC"
}

output "instance_public_ips" {
  value       = aws_instance.web[*].public_ip
  description = "Public IPs of all web instances"
}

output "db_endpoint" {
  value     = aws_db_instance.primary.endpoint
  sensitive = true  // not shown in console output
}
# After apply, access outputs
terraform output vpc_id
terraform output -json  # all outputs as JSON (useful in CI pipelines)

# In a parent module calling a child module:
# module.networking.vpc_id

Local Values

locals {
  // Combine common tags into one map to reuse everywhere
  common_tags = merge(var.extra_tags, {
    Project     = var.project
    Environment = var.environment
    ManagedBy   = "terraform"
  })

  // Compute derived values
  is_prod        = var.environment == "production"
  instance_type  = local.is_prod ? "t3.medium" : "t3.micro"
  bucket_prefix  = "${var.project}-${var.environment}"
}

resource "aws_instance" "web" {
  instance_type = local.instance_type
  tags          = local.common_tags
}

5. State Management

The state file (terraform.tfstate) is Terraform's memory. It maps resource addresses in your config to real infrastructure IDs. Without it, Terraform cannot know what exists and what needs to change.

Never commit state to Git
State files can contain secrets (passwords, private keys). Always add *.tfstate and *.tfstate.backup to .gitignore. Use a remote backend instead.

Remote Backends

// S3 + DynamoDB backend (most common for AWS teams)
terraform {
  backend "s3" {
    bucket         = "acme-terraform-state"
    key            = "prod/networking/terraform.tfstate"
    region         = "us-east-1"
    dynamodb_table = "terraform-locks"  // for state locking
    encrypt        = true
  }
}

// Terraform Cloud / HCP Terraform
terraform {
  cloud {
    organization = "acme-corp"
    workspaces {
      name = "prod-networking"
    }
  }
}

// GCS backend (GCP)
terraform {
  backend "gcs" {
    bucket = "acme-terraform-state"
    prefix = "prod/networking"
  }
}

State Commands

# List all resources in state
terraform state list

# Show details of one resource
terraform state show aws_instance.web[0]

# Move a resource (e.g. after renaming in config)
terraform state mv aws_instance.app aws_instance.web

# Remove a resource from state (stop managing, don't destroy)
terraform state rm aws_s3_bucket.legacy

# Pull remote state to stdout
terraform state pull

# Push local state to remote (use with extreme caution)
terraform state push terraform.tfstate

# Refresh state from real infrastructure (detects drift)
terraform refresh

State Locking

When using a remote backend, Terraform acquires a lock before any operation that writes state. This prevents two engineers from applying simultaneously and corrupting the state file. DynamoDB provides the lock table for S3 backends.

# If a previous run crashed and left a stale lock:
terraform force-unlock <LOCK_ID>
# Get the LOCK_ID from the error message Terraform printed

Recommended .gitignore

# .gitignore for a Terraform project
*.tfstate
*.tfstate.backup
*.tfstate.lock.info
.terraform/
.terraform.lock.hcl  # commit this! It pins provider versions
crash.log
override.tf
override.tf.json
*_override.tf
*_override.tf.json
*.auto.tfvars  # if they contain secrets
Commit .terraform.lock.hcl
The lock file (.terraform.lock.hcl) pins exact provider versions and checksums. Committing it ensures every team member and CI job uses the same provider build. It is the package-lock.json equivalent for Terraform.

6. Providers

Providers are plugins that translate Terraform resource blocks into API calls. Each provider is independently versioned and downloaded during terraform init.

Provider Configuration

// Required providers declaration
terraform {
  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = "~> 5.0"    // >= 5.0, < 6.0
    }
    cloudflare = {
      source  = "cloudflare/cloudflare"
      version = ">= 4.0, < 5.0"
    }
    random = {
      source  = "hashicorp/random"
      version = "~> 3.0"
    }
  }
}

provider "aws" {
  region  = var.aws_region
  profile = "prod"  // from ~/.aws/credentials

  default_tags {
    tags = local.common_tags
  }
}

Multiple Provider Instances (Aliases)

// Deploy to two regions simultaneously
provider "aws" {
  region = "us-east-1"
}

provider "aws" {
  alias  = "west"
  region = "us-west-2"
}

resource "aws_s3_bucket" "primary" {
  bucket = "acme-assets-east"
  // uses the default provider (us-east-1)
}

resource "aws_s3_bucket" "replica" {
  provider = aws.west  // explicitly use the aliased provider
  bucket   = "acme-assets-west"
}

// Passing alias into a module
module "west_infra" {
  source = "./modules/networking"
  providers = {
    aws = aws.west
  }
}

Common Providers

ProviderSourceUse case
AWShashicorp/awsEC2, S3, RDS, IAM, VPC, ...
Google Cloudhashicorp/googleGCE, GCS, GKE, Cloud SQL, ...
Azurehashicorp/azurermVMs, Blob Storage, AKS, ...
Kuberneteshashicorp/kubernetesDeployments, Services, ConfigMaps
Helmhashicorp/helmInstall Helm charts into K8s
Dockerkreuzwerker/dockerLocal containers, images
Cloudflarecloudflare/cloudflareDNS, Workers, Pages, WAF
GitHubintegrations/githubRepos, teams, branch protection
Randomhashicorp/randomUnique IDs, passwords, suffixes
Nullhashicorp/nullTriggers, local-exec provisioners

7. Modules

A module is a directory of .tf files treated as a unit. Modules are the primary mechanism for code reuse and encapsulation in Terraform. Every Terraform configuration is itself a module (the root module).

Standard Module Structure

modules/
  vpc/
    main.tf        # resource definitions
    variables.tf   # input variables
    outputs.tf     # outputs exposed to callers
    versions.tf    # terraform + required_providers
    README.md      # usage documentation
// modules/vpc/variables.tf
variable "cidr_block" {
  type        = string
  description = "VPC CIDR block"
}

variable "public_subnet_cidrs" {
  type        = list(string)
  description = "CIDRs for public subnets (one per AZ)"
}

variable "private_subnet_cidrs" {
  type        = list(string)
  description = "CIDRs for private subnets (one per AZ)"
}

variable "tags" {
  type    = map(string)
  default = {}
}

// modules/vpc/main.tf
resource "aws_vpc" "this" {
  cidr_block           = var.cidr_block
  enable_dns_hostnames = true
  enable_dns_support   = true
  tags                 = merge(var.tags, { Name = "main-vpc" })
}

resource "aws_subnet" "public" {
  count             = length(var.public_subnet_cidrs)
  vpc_id            = aws_vpc.this.id
  cidr_block        = var.public_subnet_cidrs[count.index]
  availability_zone = data.aws_availability_zones.available.names[count.index]
  map_public_ip_on_launch = true
  tags = merge(var.tags, { Name = "public-${count.index}" })
}

// modules/vpc/outputs.tf
output "vpc_id"            { value = aws_vpc.this.id }
output "public_subnet_ids" { value = aws_subnet.public[*].id }

Calling a Module

// root main.tf — calling a local module
module "networking" {
  source = "./modules/vpc"

  cidr_block           = "10.0.0.0/16"
  public_subnet_cidrs  = ["10.0.1.0/24", "10.0.2.0/24"]
  private_subnet_cidrs = ["10.0.10.0/24", "10.0.11.0/24"]
  tags                 = local.common_tags
}

// Use outputs from the module
resource "aws_lb" "web" {
  subnets = module.networking.public_subnet_ids
}

// Calling a module from the Terraform Registry
module "rds" {
  source  = "terraform-aws-modules/rds/aws"
  version = "~> 6.0"

  identifier = "prod-postgres"
  engine     = "postgres"
  engine_version = "16"
  instance_class = "db.t3.medium"
  // ...
}

// Calling a module from Git
module "vpc" {
  source = "git::https://github.com/acme/tf-modules.git//vpc?ref=v2.1.0"
}
Module design principles
A good module has a single responsibility (a VPC module should not also create EC2 instances). Expose only the variables callers need to customise; use sensible defaults for everything else. The Terraform AWS Modules organisation on GitHub is an excellent reference for production-quality module design.

8. Expressions & Functions

Built-in Functions

// String functions
upper("hello")         // "HELLO"
lower("WORLD")         // "world"
trimspace("  hi  ")    // "hi"
format("%s-%d", "web", 3)  // "web-3"
replace("a-b-c", "-", "_") // "a_b_c"

// Collection functions
length(["a","b","c"])     // 3
concat(["a"], ["b","c"])  // ["a","b","c"]
flatten([["a","b"],["c"]]) // ["a","b","c"]
merge({a=1},{b=2})        // {a=1,b=2}
distinct(["a","b","a"])   // ["a","b"]
sort(["c","a","b"])       // ["a","b","c"]
keys({a=1,b=2})           // ["a","b"]
values({a=1,b=2})         // [1,2]
lookup({a=1,b=2}, "a", 0) // 1 (third arg = default)
contains(["a","b"], "a")  // true
toset(["a","a","b"])      // {"a","b"} — deduplicate

// Numeric
max(1,2,3)    // 3
min(1,2,3)    // 1
abs(-5)       // 5
ceil(1.2)     // 2
floor(1.8)    // 1

// Encoding
jsonencode({key = "value"})  // "{\"key\":\"value\"}"
jsondecode("{\"key\":\"value\"}") // {key = "value"}
base64encode("hello")        // "aGVsbG8="
base64decode("aGVsbG8=")     // "hello"

// Networking
cidrsubnet("10.0.0.0/16", 8, 0)  // "10.0.0.0/24"
cidrsubnet("10.0.0.0/16", 8, 1)  // "10.0.1.0/24"
cidrhost("10.0.0.0/24", 5)       // "10.0.0.5"

// File
file("${path.module}/user_data.sh")  // read file contents
templatefile("tpl.sh", { name = "web" })  // render template

// Type conversion
tostring(42)    // "42"
tonumber("42")  // 42
tobool("true")  // true

// Error handling
try(aws_instance.maybe[0].id, "")   // return "" if expression errors
can(regex("^\\d+$", var.input))     // true if expression succeeds

Conditional Expressions

// Ternary: condition ? true_val : false_val
resource "aws_instance" "web" {
  instance_type = var.environment == "production" ? "t3.medium" : "t3.micro"
  monitoring    = var.environment == "production" ? true : false
}

// Conditional count (create resource only in prod)
resource "aws_cloudwatch_metric_alarm" "high_cpu" {
  count = var.environment == "production" ? 1 : 0
  // ...
}

// Conditional for_each
resource "aws_route53_record" "aliases" {
  for_each = var.enable_dns ? toset(var.domains) : toset([])
  // ...
}

For Expressions

// Transform a list
[for s in var.names : upper(s)]
// ["ALICE","BOB","CHARLIE"]

// Filter a list
[for s in var.names : s if length(s) > 3]

// Build a map from a list
{for s in var.names : s => upper(s)}
// {alice = "ALICE", bob = "BOB"}

// Invert a map (swap keys and values)
{for k, v in var.tags : v => k}

// Practical example: create one SG rule per CIDR
resource "aws_security_group_rule" "ingress" {
  for_each = toset(var.allowed_cidrs)

  type        = "ingress"
  from_port   = 443
  to_port     = 443
  protocol    = "tcp"
  cidr_blocks = [each.value]
  security_group_id = aws_security_group.web.id
}

Dynamic Blocks

// dynamic generates repeated nested blocks from a list/map
variable "ingress_rules" {
  type = list(object({
    port        = number
    protocol    = string
    cidr_blocks = list(string)
  }))
  default = [
    { port = 80,  protocol = "tcp", cidr_blocks = ["0.0.0.0/0"] },
    { port = 443, protocol = "tcp", cidr_blocks = ["0.0.0.0/0"] },
  ]
}

resource "aws_security_group" "web" {
  name   = "web-sg"
  vpc_id = aws_vpc.main.id

  dynamic "ingress" {
    for_each = var.ingress_rules
    content {
      from_port   = ingress.value.port
      to_port     = ingress.value.port
      protocol    = ingress.value.protocol
      cidr_blocks = ingress.value.cidr_blocks
    }
  }
}

9. Provisioners & Lifecycle

Lifecycle Block

resource "aws_instance" "web" {
  ami           = data.aws_ami.ubuntu.id
  instance_type = "t3.micro"

  lifecycle {
    // Create the replacement BEFORE destroying the original
    // Useful for stateless services behind a load balancer
    create_before_destroy = true

    // Block any terraform destroy targeting this resource
    prevent_destroy = true

    // Do not detect drift for these attributes
    // Useful when external systems mutate a field (e.g. auto-updated AMI)
    ignore_changes = [
      ami,
      tags["LastUpdated"],
    ]

    // Force replacement if another resource changes
    replace_triggered_by = [
      aws_launch_template.web.latest_version
    ]
  }
}

Provisioners — Use Sparingly

Avoid provisioners when possible
Provisioners are a last resort. They run outside Terraform's plan/apply model, make state unreliable, and can cause half-applied configs. Prefer cloud-init user_data, AWS SSM, or Ansible for post-boot configuration.
resource "aws_instance" "bastion" {
  ami           = data.aws_ami.ubuntu.id
  instance_type = "t3.micro"

  // local-exec: run on the machine running terraform
  provisioner "local-exec" {
    command = "echo ${self.public_ip} >> known_hosts"
  }

  // remote-exec: SSH into the new resource and run commands
  provisioner "remote-exec" {
    inline = [
      "sudo apt-get update -y",
      "sudo apt-get install -y nginx",
    ]
    connection {
      type        = "ssh"
      user        = "ubuntu"
      private_key = file("~/.ssh/id_rsa")
      host        = self.public_ip
    }
  }

  // file: copy a file to the remote instance
  provisioner "file" {
    source      = "configs/nginx.conf"
    destination = "/tmp/nginx.conf"
    connection { ... }
  }

  // on_failure: continue or fail (default: fail)
  provisioner "local-exec" {
    command    = "notify-slack.sh ${self.id}"
    on_failure = continue
  }
}

// null_resource / terraform_data: run provisioners without a real resource
// Use case: trigger a script when inputs change
resource "terraform_data" "db_migrate" {
  triggers_replace = [aws_db_instance.primary.address]

  provisioner "local-exec" {
    command = "flyway migrate -url=jdbc:postgresql://${aws_db_instance.primary.address}/app"
  }
}

10. Workspaces & Environments

Terraform Workspaces

Workspaces give each environment its own state file while sharing the same configuration directory. The default workspace is named default.

# Create and switch to a new workspace
terraform workspace new staging
terraform workspace new production

# List all workspaces (* marks the active one)
terraform workspace list
#   default
# * staging
#   production

# Switch workspace
terraform workspace select production

# Show current workspace
terraform workspace show

# Delete a workspace (must not be active)
terraform workspace delete staging
// Use terraform.workspace in config
locals {
  instance_type = {
    default    = "t3.micro"
    staging    = "t3.small"
    production = "t3.medium"
  }
}

resource "aws_instance" "web" {
  instance_type = local.instance_type[terraform.workspace]
}

resource "aws_db_instance" "primary" {
  // Only create a Multi-AZ replica in production
  multi_az = terraform.workspace == "production"
}

Workspaces vs Directory-per-Environment

ApproachHowProsCons
Workspaces One config dir, multiple state files Simple, DRY config Easy to accidentally apply to wrong env; configs must handle all envs
Directory-per-env envs/staging/, envs/prod/ Full isolation, different configs per env Code duplication; harder to keep in sync
Terragrunt Wrapper with DRY includes Best of both: isolation + DRY Additional tool to learn
Recommendation for teams
For most production setups, use directory-per-environment combined with shared modules. It gives you the isolation needed to safely apply staging changes without risking production state, and it makes code review easier since a PR for staging only touches staging files.

11. Testing & CI/CD

Built-in Checks

# Validate HCL syntax and internal references
terraform validate

# Auto-format all .tf files
terraform fmt
terraform fmt -check      # exit non-zero if any file needs formatting (CI mode)
terraform fmt -recursive  # format all subdirectories

# Show what will change before committing
terraform plan -out=tfplan  # save plan to file
terraform show tfplan       # inspect saved plan

# Apply saved plan (no interactive prompt)
terraform apply tfplan

tflint — Linter

# Install
brew install tflint

# Run (catches deprecated syntax, wrong instance types, etc.)
tflint --init  # download ruleset plugins
tflint

# With AWS ruleset
cat .tflint.hcl << 'EOF'
plugin "aws" {
  enabled = true
  version = "0.27.0"
  source  = "github.com/terraform-linters/tflint-ruleset-aws"
}
EOF
tflint

Security Scanning

# Checkov — broad policy library, SAST for IaC
pip install checkov
checkov -d .
checkov -d . --framework terraform --quiet

# tfsec — Terraform-specific security scanner
brew install tfsec
tfsec .
tfsec . --no-colour --format json > tfsec-report.json

# Trivy — multi-purpose scanner including IaC
brew install trivy
trivy config .

Terratest — Integration Tests

// test/vpc_test.go (Go)
package test

import (
    "testing"
    "github.com/gruntwork-io/terratest/modules/terraform"
    "github.com/stretchr/testify/assert"
)

func TestVPCModule(t *testing.T) {
    t.Parallel()

    opts := &terraform.Options{
        TerraformDir: "../modules/vpc",
        Vars: map[string]interface{}{
            "cidr_block":           "10.0.0.0/16",
            "public_subnet_cidrs":  []string{"10.0.1.0/24"},
            "private_subnet_cidrs": []string{"10.0.10.0/24"},
        },
    }

    defer terraform.Destroy(t, opts)
    terraform.InitAndApply(t, opts)

    vpcID := terraform.Output(t, opts, "vpc_id")
    assert.NotEmpty(t, vpcID)
    assert.Regexp(t, "^vpc-", vpcID)
}

GitHub Actions CI

# .github/workflows/terraform.yml
name: Terraform CI

on:
  pull_request:
    paths: ['**.tf', '**.tfvars']
  push:
    branches: [main]

permissions:
  contents: read
  pull-requests: write

jobs:
  terraform:
    runs-on: ubuntu-latest
    environment: staging

    steps:
      - uses: actions/checkout@v4

      - name: Setup Terraform
        uses: hashicorp/setup-terraform@v3
        with:
          terraform_version: "~1.7"

      - name: Configure AWS credentials
        uses: aws-actions/configure-aws-credentials@v4
        with:
          aws-access-key-id:     ${{ secrets.AWS_ACCESS_KEY_ID }}
          aws-secret-access-key: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
          aws-region:            us-east-1

      - name: Terraform Format Check
        run: terraform fmt -check -recursive

      - name: Terraform Init
        run: terraform init -backend-config="key=staging/terraform.tfstate"

      - name: Terraform Validate
        run: terraform validate

      - name: tflint
        uses: terraform-linters/setup-tflint@v4
        with: { tflint_version: latest }
      - run: tflint --init && tflint

      - name: Checkov
        uses: bridgecrewio/checkov-action@v12
        with:
          directory: .
          soft_fail: true

      - name: Terraform Plan
        id: plan
        run: terraform plan -no-color -out=tfplan 2>&1 | tee plan_output.txt

      - name: Comment Plan on PR
        if: github.event_name == 'pull_request'
        uses: actions/github-script@v7
        with:
          script: |
            const fs = require('fs');
            const plan = fs.readFileSync('plan_output.txt', 'utf8');
            const truncated = plan.length > 65000 ? plan.slice(0, 65000) + '\n...truncated' : plan;
            github.rest.issues.createComment({
              issue_number: context.issue.number,
              owner: context.repo.owner,
              repo: context.repo.repo,
              body: '## Terraform Plan\n```\n' + truncated + '\n```'
            });

      # Only apply on merge to main
      - name: Terraform Apply
        if: github.ref == 'refs/heads/main' && github.event_name == 'push'
        run: terraform apply -auto-approve tfplan

12. Production Patterns

Remote State Bootstrap

Before any other infrastructure can use an S3 backend, you need to create the S3 bucket and DynamoDB table. This is a chicken-and-egg problem — solve it with a one-time bootstrap module using local state.

// bootstrap/main.tf — run ONCE with local state, then never touch again
terraform {
  // Intentionally no backend block — uses local state
  required_providers {
    aws = { source = "hashicorp/aws", version = "~> 5.0" }
  }
}

resource "aws_s3_bucket" "tfstate" {
  bucket        = "acme-terraform-state-${data.aws_caller_identity.current.account_id}"
  force_destroy = false  // protect against accidental deletion

  lifecycle {
    prevent_destroy = true
  }
}

resource "aws_s3_bucket_versioning" "tfstate" {
  bucket = aws_s3_bucket.tfstate.id
  versioning_configuration { status = "Enabled" }
}

resource "aws_s3_bucket_server_side_encryption_configuration" "tfstate" {
  bucket = aws_s3_bucket.tfstate.id
  rule {
    apply_server_side_encryption_by_default {
      sse_algorithm = "AES256"
    }
  }
}

resource "aws_s3_bucket_public_access_block" "tfstate" {
  bucket                  = aws_s3_bucket.tfstate.id
  block_public_acls       = true
  block_public_policy     = true
  ignore_public_acls      = true
  restrict_public_buckets = true
}

resource "aws_dynamodb_table" "tflock" {
  name         = "terraform-locks"
  billing_mode = "PAY_PER_REQUEST"
  hash_key     = "LockID"

  attribute {
    name = "LockID"
    type = "S"
  }

  lifecycle { prevent_destroy = true }
}

data "aws_caller_identity" "current" {}

Three-Tier VPC Module

// modules/vpc/main.tf
data "aws_availability_zones" "available" {
  state = "available"
}

resource "aws_vpc" "this" {
  cidr_block           = var.cidr_block
  enable_dns_hostnames = true
  enable_dns_support   = true
  tags = merge(var.tags, { Name = "${var.name}-vpc" })
}

resource "aws_internet_gateway" "this" {
  vpc_id = aws_vpc.this.id
  tags   = merge(var.tags, { Name = "${var.name}-igw" })
}

resource "aws_subnet" "public" {
  count                   = length(var.public_subnets)
  vpc_id                  = aws_vpc.this.id
  cidr_block              = var.public_subnets[count.index]
  availability_zone       = data.aws_availability_zones.available.names[count.index]
  map_public_ip_on_launch = true
  tags = merge(var.tags, { Name = "${var.name}-public-${count.index + 1}", Tier = "public" })
}

resource "aws_subnet" "private" {
  count             = length(var.private_subnets)
  vpc_id            = aws_vpc.this.id
  cidr_block        = var.private_subnets[count.index]
  availability_zone = data.aws_availability_zones.available.names[count.index]
  tags = merge(var.tags, { Name = "${var.name}-private-${count.index + 1}", Tier = "private" })
}

// NAT Gateway (one per AZ for HA, or one for cost savings)
resource "aws_eip" "nat" {
  count  = var.single_nat_gateway ? 1 : length(var.public_subnets)
  domain = "vpc"
}

resource "aws_nat_gateway" "this" {
  count         = var.single_nat_gateway ? 1 : length(var.public_subnets)
  allocation_id = aws_eip.nat[count.index].id
  subnet_id     = aws_subnet.public[count.index].id
  depends_on    = [aws_internet_gateway.this]
}

resource "aws_route_table" "public" {
  vpc_id = aws_vpc.this.id
  route {
    cidr_block = "0.0.0.0/0"
    gateway_id = aws_internet_gateway.this.id
  }
}

resource "aws_route_table_association" "public" {
  count          = length(aws_subnet.public)
  subnet_id      = aws_subnet.public[count.index].id
  route_table_id = aws_route_table.public.id
}

resource "aws_route_table" "private" {
  count  = length(var.private_subnets)
  vpc_id = aws_vpc.this.id
  route {
    cidr_block     = "0.0.0.0/0"
    nat_gateway_id = var.single_nat_gateway ? aws_nat_gateway.this[0].id : aws_nat_gateway.this[count.index].id
  }
}

resource "aws_route_table_association" "private" {
  count          = length(aws_subnet.private)
  subnet_id      = aws_subnet.private[count.index].id
  route_table_id = aws_route_table.private[count.index].id
}

ECS/Fargate Service

// Minimal ECS Fargate service with ALB
resource "aws_ecs_cluster" "main" {
  name = "${var.name}-cluster"
}

resource "aws_ecs_task_definition" "app" {
  family                   = var.name
  network_mode             = "awsvpc"
  requires_compatibilities = ["FARGATE"]
  cpu                      = var.cpu
  memory                   = var.memory
  execution_role_arn       = aws_iam_role.ecs_exec.arn
  task_role_arn            = aws_iam_role.ecs_task.arn

  container_definitions = jsonencode([{
    name  = var.name
    image = "${var.ecr_repository_url}:${var.image_tag}"
    portMappings = [{ containerPort = var.container_port, protocol = "tcp" }]
    environment = [for k, v in var.env_vars : { name = k, value = v }]
    secrets = [for k, v in var.secrets : { name = k, valueFrom = v }]
    logConfiguration = {
      logDriver = "awslogs"
      options = {
        "awslogs-group"         = "/ecs/${var.name}"
        "awslogs-region"        = data.aws_region.current.name
        "awslogs-stream-prefix" = "ecs"
      }
    }
  }])
}

resource "aws_ecs_service" "app" {
  name            = var.name
  cluster         = aws_ecs_cluster.main.id
  task_definition = aws_ecs_task_definition.app.arn
  desired_count   = var.desired_count
  launch_type     = "FARGATE"

  network_configuration {
    subnets          = var.private_subnet_ids
    security_groups  = [aws_security_group.app.id]
    assign_public_ip = false
  }

  load_balancer {
    target_group_arn = aws_lb_target_group.app.arn
    container_name   = var.name
    container_port   = var.container_port
  }

  deployment_circuit_breaker {
    enable   = true
    rollback = true
  }

  lifecycle {
    ignore_changes = [task_definition]  // allow external deploys via CI
  }
}

S3 + CloudFront Static Site

resource "aws_s3_bucket" "site" {
  bucket = var.domain_name
}

resource "aws_s3_bucket_public_access_block" "site" {
  bucket                  = aws_s3_bucket.site.id
  block_public_acls       = true
  block_public_policy     = true
  ignore_public_acls      = true
  restrict_public_buckets = true
}

resource "aws_cloudfront_origin_access_control" "site" {
  name                              = var.domain_name
  origin_access_control_origin_type = "s3"
  signing_behavior                  = "always"
  signing_protocol                  = "sigv4"
}

resource "aws_cloudfront_distribution" "site" {
  enabled             = true
  default_root_object = "index.html"
  aliases             = [var.domain_name]

  origin {
    domain_name              = aws_s3_bucket.site.bucket_regional_domain_name
    origin_id                = "s3"
    origin_access_control_id = aws_cloudfront_origin_access_control.site.id
  }

  default_cache_behavior {
    allowed_methods        = ["GET","HEAD","OPTIONS"]
    cached_methods         = ["GET","HEAD"]
    target_origin_id       = "s3"
    viewer_protocol_policy = "redirect-to-https"
    compress               = true
    forwarded_values {
      query_string = false
      cookies { forward = "none" }
    }
  }

  custom_error_response {
    error_code         = 404
    response_code      = 200
    response_page_path = "/index.html"  // SPA fallback
  }

  viewer_certificate {
    acm_certificate_arn      = var.acm_cert_arn
    ssl_support_method       = "sni-only"
    minimum_protocol_version = "TLSv1.2_2021"
  }

  restrictions {
    geo_restriction { restriction_type = "none" }
  }
}

// Bucket policy granting CloudFront access
data "aws_iam_policy_document" "site_bucket_policy" {
  statement {
    principals {
      type        = "Service"
      identifiers = ["cloudfront.amazonaws.com"]
    }
    actions   = ["s3:GetObject"]
    resources = ["${aws_s3_bucket.site.arn}/*"]
    condition {
      test     = "StringEquals"
      variable = "AWS:SourceArn"
      values   = [aws_cloudfront_distribution.site.arn]
    }
  }
}

resource "aws_s3_bucket_policy" "site" {
  bucket = aws_s3_bucket.site.id
  policy = data.aws_iam_policy_document.site_bucket_policy.json
}

State File Organization Strategies

StrategyState key patternBest for
Monolithprod/terraform.tfstateSmall teams, simple infra
Per-serviceprod/networking.tfstate, prod/ecs.tfstateIndependent service teams
Per-env per-servicestaging/networking.tfstate, prod/networking.tfstateStrict env isolation
Per-accountSeparate S3 bucket per AWS accountMulti-account orgs, security boundary
Blast radius principle
Split state files to minimise the blast radius of a bad apply. Networking changes rarely — keep it in its own state. Application changes happen frequently — keep them separate. This also speeds up plan/apply times since Terraform only refreshes the relevant subset of resources.
Quick Reference: Most-Used Commands
# Setup
terraform init                     # download providers, init backend
terraform init -upgrade            # upgrade provider versions within constraints
terraform init -reconfigure        # switch to a different backend

# Day-to-day
terraform fmt -recursive           # format all .tf files
terraform validate                 # check syntax + references
terraform plan                     # preview changes
terraform plan -out=tfplan         # save plan for auditing or apply
terraform apply                    # apply with confirmation prompt
terraform apply -auto-approve      # apply without prompt (CI only)
terraform apply tfplan             # apply a saved plan exactly
terraform destroy                  # tear down all resources
terraform destroy -target=aws_instance.web  # destroy specific resource

# State
terraform state list               # list managed resources
terraform state show TYPE.NAME     # inspect a specific resource
terraform state mv OLD NEW         # rename resource in state
terraform state rm TYPE.NAME       # untrack resource (don't destroy)
terraform refresh                  # sync state with real infra

# Debugging
TF_LOG=DEBUG terraform apply       # verbose provider API calls
TF_LOG_PATH=./debug.log terraform plan  # write logs to file
terraform console                  # REPL for testing expressions

# Workspaces
terraform workspace list
terraform workspace new staging
terraform workspace select production

# Outputs
terraform output                   # all outputs
terraform output -json             # JSON for scripting
terraform output -raw bucket_name  # raw string (no quotes)