Managing Cloud Infrastructure with Terraform
A practical guide to transitioning manually managed cloud infrastructure to Terraform IaC (Infrastructure as Code), building reproducible and version-controlled infrastructure
Problem
Required Tools
HashiCorp's IaC tool. Declaratively defines and provisions infrastructure using HCL (HashiCorp Configuration Language).
The official provider for creating and managing AWS resources (VPC, EC2, RDS, S3, etc.) from Terraform.
A backend for remotely storing tfstate files and sharing them across team members. State locking prevents concurrent modifications.
A linter that checks Terraform code for syntax errors, security vulnerabilities, and best practice violations.
Solution Steps
Terraform Initial Setup and Provider Configuration
Install the Terraform CLI and configure the AWS provider. Using a backend block to store tfstate remotely in S3 enables state sharing and locking among team members. Pinning provider versions with required_providers ensures consistent behavior across the team.
# Install Terraform (macOS)
brew install terraform
# Project directory structure
mkdir -p infra/{modules,environments/{dev,staging,prod}}
cd infra
# environments/dev/main.tf - Provider and backend configuration
terraform {
required_version = ">= 1.6.0"
required_providers {
aws = {
source = "hashicorp/aws"
version = "~> 5.30"
}
}
# S3 remote backend (tfstate storage)
backend "s3" {
bucket = "my-terraform-state"
key = "dev/terraform.tfstate"
region = "ap-northeast-2"
dynamodb_table = "terraform-locks" # For state locking
encrypt = true
}
}
provider "aws" {
region = var.aws_region
default_tags {
tags = {
Environment = var.environment
ManagedBy = "terraform"
Project = var.project_name
}
}
}
# Download providers and initialize backend with terraform init
terraform initDefine VPC Network Infrastructure
Define VPC, subnets, internet gateway, NAT gateway, and route tables as code. Separating public subnets (ALB, Bastion) from private subnets (EC2, RDS) is a security fundamental. Placing subnets across 2 or more Availability Zones (AZs) ensures high availability.
# modules/vpc/main.tf
resource "aws_vpc" "main" {
cidr_block = var.vpc_cidr
enable_dns_hostnames = true
enable_dns_support = true
tags = { Name = "${var.project}-vpc" }
}
# Public subnets (2 AZs)
resource "aws_subnet" "public" {
count = 2
vpc_id = aws_vpc.main.id
cidr_block = cidrsubnet(var.vpc_cidr, 8, count.index)
availability_zone = data.aws_availability_zones.available.names[count.index]
map_public_ip_on_launch = true
tags = { Name = "${var.project}-public-${count.index + 1}" }
}
# Private subnets (2 AZs)
resource "aws_subnet" "private" {
count = 2
vpc_id = aws_vpc.main.id
cidr_block = cidrsubnet(var.vpc_cidr, 8, count.index + 10)
availability_zone = data.aws_availability_zones.available.names[count.index]
tags = { Name = "${var.project}-private-${count.index + 1}" }
}
# Internet Gateway
resource "aws_internet_gateway" "main" {
vpc_id = aws_vpc.main.id
tags = { Name = "${var.project}-igw" }
}
# NAT Gateway (for private subnet outbound internet access)
resource "aws_eip" "nat" { domain = "vpc" }
resource "aws_nat_gateway" "main" {
allocation_id = aws_eip.nat.id
subnet_id = aws_subnet.public[0].id
tags = { Name = "${var.project}-nat" }
}
# Route tables
resource "aws_route_table" "public" {
vpc_id = aws_vpc.main.id
route {
cidr_block = "0.0.0.0/0"
gateway_id = aws_internet_gateway.main.id
}
}
resource "aws_route_table" "private" {
vpc_id = aws_vpc.main.id
route {
cidr_block = "0.0.0.0/0"
nat_gateway_id = aws_nat_gateway.main.id
}
}Define EC2 Instances and Security Groups
Manage EC2 instances, AMI selection, and security groups (firewalls) with Terraform. Minimize inbound rules in security groups and only allow SSH access through a Bastion host for safety. Running initial setup scripts with user_data fully automates server provisioning.
# modules/ec2/main.tf
resource "aws_security_group" "app" {
name_prefix = "${var.project}-app-"
vpc_id = var.vpc_id
ingress {
from_port = 80
to_port = 80
protocol = "tcp"
security_groups = [var.alb_sg_id] # Allow access only from ALB
}
ingress {
from_port = 22
to_port = 22
protocol = "tcp"
security_groups = [var.bastion_sg_id] # Allow SSH only from Bastion
}
egress {
from_port = 0
to_port = 0
protocol = "-1"
cidr_blocks = ["0.0.0.0/0"]
}
lifecycle {
create_before_destroy = true
}
}
# Look up latest Amazon Linux 2023 AMI
data "aws_ami" "amazon_linux" {
most_recent = true
owners = ["amazon"]
filter {
name = "name"
values = ["al2023-ami-*-x86_64"]
}
}
resource "aws_instance" "app" {
count = var.instance_count
ami = data.aws_ami.amazon_linux.id
instance_type = var.instance_type
subnet_id = var.private_subnet_ids[count.index % length(var.private_subnet_ids)]
vpc_security_group_ids = [aws_security_group.app.id]
key_name = var.key_pair_name
user_data = <<-EOF
#!/bin/bash
yum update -y
yum install -y docker
systemctl start docker
systemctl enable docker
docker pull ${var.docker_image}
docker run -d -p 80:3000 ${var.docker_image}
EOF
tags = { Name = "${var.project}-app-${count.index + 1}" }
}RDS Database and Variable/Output Management
Place RDS instances in private subnets and configure automatic backups and Multi-AZ. Store sensitive values (DB passwords, etc.) in terraform.tfvars (added to .gitignore) or use AWS Secrets Manager. Output blocks allow other modules or CI/CD pipelines to reference information about created resources.
# modules/rds/main.tf
resource "aws_db_subnet_group" "main" {
name = "${var.project}-db-subnet"
subnet_ids = var.private_subnet_ids
}
resource "aws_security_group" "db" {
name_prefix = "${var.project}-db-"
vpc_id = var.vpc_id
ingress {
from_port = 5432
to_port = 5432
protocol = "tcp"
security_groups = [var.app_sg_id] # Access only from app servers
}
}
resource "aws_db_instance" "main" {
identifier = "${var.project}-db"
engine = "postgres"
engine_version = "16.1"
instance_class = var.db_instance_class
allocated_storage = 20
max_allocated_storage = 100 # Auto storage scaling
db_name = var.db_name
username = var.db_username
password = var.db_password # Injected from tfvars or Secrets Manager
db_subnet_group_name = aws_db_subnet_group.main.name
vpc_security_group_ids = [aws_security_group.db.id]
multi_az = var.environment == "prod" ? true : false
backup_retention_period = 7
skip_final_snapshot = var.environment != "prod"
tags = { Name = "${var.project}-db" }
}
# variables.tf - Variable definitions
variable "project" { type = string }
variable "environment" { type = string }
variable "aws_region" { type = string; default = "ap-northeast-2" }
variable "db_password" { type = string; sensitive = true }
variable "instance_type" { type = string; default = "t3.micro" }
variable "db_instance_class" { type = string; default = "db.t3.micro" }
# outputs.tf - Output definitions
output "db_endpoint" {
value = aws_db_instance.main.endpoint
}
output "vpc_id" {
value = aws_vpc.main.id
}Terraform Modularization and Per-Environment Configuration
Structuring infrastructure into reusable modules allows managing dev/staging/prod environments with the same code. Modules consist of inputs (variables), resources, and outputs, with per-environment tfvars files injecting different values. Use Git tags or Terraform Registry for module version management.
# environments/dev/main.tf - Module calls
module "vpc" {
source = "../../modules/vpc"
project = var.project_name
vpc_cidr = "10.0.0.0/16"
}
module "ec2" {
source = "../../modules/ec2"
project = var.project_name
vpc_id = module.vpc.vpc_id
private_subnet_ids = module.vpc.private_subnet_ids
alb_sg_id = module.alb.security_group_id
bastion_sg_id = module.bastion.security_group_id
instance_count = 2
instance_type = "t3.small"
docker_image = var.docker_image
}
module "rds" {
source = "../../modules/rds"
project = var.project_name
environment = "dev"
vpc_id = module.vpc.vpc_id
private_subnet_ids = module.vpc.private_subnet_ids
app_sg_id = module.ec2.security_group_id
db_instance_class = "db.t3.micro"
db_name = "myapp"
db_username = "admin"
db_password = var.db_password
}
# environments/dev/terraform.tfvars
project_name = "myapp"
aws_region = "ap-northeast-2"
docker_image = "myapp:latest"
# environments/prod/terraform.tfvars (production has different specs)
project_name = "myapp"
aws_region = "ap-northeast-2"
docker_image = "myapp:v1.2.3"
# Per-environment execution
cd environments/dev && terraform plan -var-file="terraform.tfvars"
cd environments/prod && terraform plan -var-file="terraform.tfvars"terraform plan/apply Workflow and CI/CD Integration
The safe workflow is to preview changes with terraform plan first, then apply them with terraform apply. Automatically displaying plan results as comments on PRs enables code review for infrastructure changes. Integrating with GitHub Actions allows automatic apply when merging to the main branch.
# Basic workflow
terraform fmt -check # Code format check
terraform validate # Syntax validation
terraform plan -out=plan.tfplan # Generate change plan
terraform apply plan.tfplan # Apply the plan
# .github/workflows/terraform.yml
name: Terraform CI/CD
on:
pull_request:
paths: ['infra/**']
push:
branches: [main]
paths: ['infra/**']
jobs:
terraform:
runs-on: ubuntu-latest
defaults:
run:
working-directory: infra/environments/prod
steps:
- uses: actions/checkout@v4
- uses: hashicorp/setup-terraform@v3
with:
terraform_version: 1.6.6
- name: Terraform Init
run: terraform init
- name: Terraform Format Check
run: terraform fmt -check -recursive
- name: Terraform Plan
id: plan
run: terraform plan -no-color -out=plan.tfplan
env:
AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID }}
AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
# Post plan results as PR comment
- name: Comment Plan on PR
if: github.event_name == 'pull_request'
uses: actions/github-script@v7
with:
script: |
const plan = `${{ steps.plan.outputs.stdout }}`;
github.rest.issues.createComment({
owner: context.repo.owner,
repo: context.repo.repo,
issue_number: context.issue.number,
body: `### Terraform Plan\n\`\`\`\n${plan}\n\`\`\``
});
# Auto apply on main merge
- name: Terraform Apply
if: github.ref == 'refs/heads/main'
run: terraform apply -auto-approve plan.tfplanCore Code
Core structure of a Terraform project. VPC/EC2/RDS are separated into modules, dev/staging/prod are managed with per-environment tfvars, and state is shared via S3 backend.
# Terraform IaC Core Structure
# ==============================
# infra/
# ├── modules/
# │ ├── vpc/ (main.tf, variables.tf, outputs.tf)
# │ ├── ec2/
# │ └── rds/
# └── environments/
# ├── dev/ (main.tf, terraform.tfvars)
# ├── staging/
# └── prod/
terraform {
required_version = ">= 1.6.0"
required_providers {
aws = { source = "hashicorp/aws"; version = "~> 5.30" }
}
backend "s3" {
bucket = "my-terraform-state"
key = "prod/terraform.tfstate"
region = "ap-northeast-2"
dynamodb_table = "terraform-locks"
encrypt = true
}
}
provider "aws" {
region = "ap-northeast-2"
default_tags { tags = { ManagedBy = "terraform" } }
}
module "vpc" {
source = "../../modules/vpc"
vpc_cidr = "10.0.0.0/16"
}
module "app" {
source = "../../modules/ec2"
vpc_id = module.vpc.vpc_id
subnet_ids = module.vpc.private_subnet_ids
instance_type = "t3.small"
instance_count = 2
}
module "db" {
source = "../../modules/rds"
vpc_id = module.vpc.vpc_id
subnet_ids = module.vpc.private_subnet_ids
app_sg_id = module.app.security_group_id
}Common Mistakes
Committing tfstate files to Git, exposing sensitive information (DB passwords, etc.)
tfstate files contain all resource state and sensitive information in plaintext. Always use an S3 + DynamoDB backend for remote storage, and add *.tfstate and *.tfvars to .gitignore.
Running terraform apply directly without plan, causing unintended resource deletion
Always preview changes with terraform plan -out=plan.tfplan first, then apply only that plan with terraform apply plan.tfplan. In CI/CD, strictly follow the plan -> approve -> apply sequence.
Defining all resources in a single main.tf without modules, making management impossible
Separate resources into functional modules (vpc, ec2, rds, iam, etc.) and call modules from per-environment directories. Clearly defining module interfaces (variables/outputs) increases reusability.