Managing Cloud Infrastructure with Terraform

A practical guide to transitioning manually managed cloud infrastructure to Terraform IaC (Infrastructure as Code), building reproducible and version-controlled infrastructure

Terraform infrastructureIaC setupTerraform AWSterraform plan applyTerraform modulestfstate managementInfrastructure as CodeTerraform state management

Problem

Servers, networks, and databases are being created and managed manually through cloud consoles (AWS Console/GCP Console). Infrastructure configurations subtly differ between dev/staging/production environments, and it is impossible to track who changed what and when. Rebuilding an identical environment during outages takes hours, and there is no code review process for infrastructure changes. The entire infrastructure needs to be codified with Terraform to ensure reproducibility, traceability, and automation.

Required Tools

Terraform CLI

HashiCorp's IaC tool. Declaratively defines and provisions infrastructure using HCL (HashiCorp Configuration Language).

AWS Provider

The official provider for creating and managing AWS resources (VPC, EC2, RDS, S3, etc.) from Terraform.

Terraform Cloud / S3 Backend

A backend for remotely storing tfstate files and sharing them across team members. State locking prevents concurrent modifications.

tflint

A linter that checks Terraform code for syntax errors, security vulnerabilities, and best practice violations.

Solution Steps

Terraform Initial Setup and Provider Configuration

Install the Terraform CLI and configure the AWS provider. Using a backend block to store tfstate remotely in S3 enables state sharing and locking among team members. Pinning provider versions with required_providers ensures consistent behavior across the team.

# Install Terraform (macOS)
brew install terraform

# Project directory structure
mkdir -p infra/{modules,environments/{dev,staging,prod}}
cd infra

# environments/dev/main.tf - Provider and backend configuration
terraform {
  required_version = ">= 1.6.0"

  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = "~> 5.30"
    }
  }

  # S3 remote backend (tfstate storage)
  backend "s3" {
    bucket         = "my-terraform-state"
    key            = "dev/terraform.tfstate"
    region         = "ap-northeast-2"
    dynamodb_table = "terraform-locks"   # For state locking
    encrypt        = true
  }
}

provider "aws" {
  region = var.aws_region

  default_tags {
    tags = {
      Environment = var.environment
      ManagedBy   = "terraform"
      Project     = var.project_name
    }
  }
}

# Download providers and initialize backend with terraform init
terraform init

Define VPC Network Infrastructure

Define VPC, subnets, internet gateway, NAT gateway, and route tables as code. Separating public subnets (ALB, Bastion) from private subnets (EC2, RDS) is a security fundamental. Placing subnets across 2 or more Availability Zones (AZs) ensures high availability.

# modules/vpc/main.tf
resource "aws_vpc" "main" {
  cidr_block           = var.vpc_cidr
  enable_dns_hostnames = true
  enable_dns_support   = true

  tags = { Name = "${var.project}-vpc" }
}

# Public subnets (2 AZs)
resource "aws_subnet" "public" {
  count             = 2
  vpc_id            = aws_vpc.main.id
  cidr_block        = cidrsubnet(var.vpc_cidr, 8, count.index)
  availability_zone = data.aws_availability_zones.available.names[count.index]
  map_public_ip_on_launch = true

  tags = { Name = "${var.project}-public-${count.index + 1}" }
}

# Private subnets (2 AZs)
resource "aws_subnet" "private" {
  count             = 2
  vpc_id            = aws_vpc.main.id
  cidr_block        = cidrsubnet(var.vpc_cidr, 8, count.index + 10)
  availability_zone = data.aws_availability_zones.available.names[count.index]

  tags = { Name = "${var.project}-private-${count.index + 1}" }
}

# Internet Gateway
resource "aws_internet_gateway" "main" {
  vpc_id = aws_vpc.main.id
  tags   = { Name = "${var.project}-igw" }
}

# NAT Gateway (for private subnet outbound internet access)
resource "aws_eip" "nat" { domain = "vpc" }

resource "aws_nat_gateway" "main" {
  allocation_id = aws_eip.nat.id
  subnet_id     = aws_subnet.public[0].id
  tags          = { Name = "${var.project}-nat" }
}

# Route tables
resource "aws_route_table" "public" {
  vpc_id = aws_vpc.main.id
  route {
    cidr_block = "0.0.0.0/0"
    gateway_id = aws_internet_gateway.main.id
  }
}

resource "aws_route_table" "private" {
  vpc_id = aws_vpc.main.id
  route {
    cidr_block     = "0.0.0.0/0"
    nat_gateway_id = aws_nat_gateway.main.id
  }
}

Define EC2 Instances and Security Groups

Manage EC2 instances, AMI selection, and security groups (firewalls) with Terraform. Minimize inbound rules in security groups and only allow SSH access through a Bastion host for safety. Running initial setup scripts with user_data fully automates server provisioning.

# modules/ec2/main.tf
resource "aws_security_group" "app" {
  name_prefix = "${var.project}-app-"
  vpc_id      = var.vpc_id

  ingress {
    from_port       = 80
    to_port         = 80
    protocol        = "tcp"
    security_groups = [var.alb_sg_id]  # Allow access only from ALB
  }

  ingress {
    from_port       = 22
    to_port         = 22
    protocol        = "tcp"
    security_groups = [var.bastion_sg_id]  # Allow SSH only from Bastion
  }

  egress {
    from_port   = 0
    to_port     = 0
    protocol    = "-1"
    cidr_blocks = ["0.0.0.0/0"]
  }

  lifecycle {
    create_before_destroy = true
  }
}

# Look up latest Amazon Linux 2023 AMI
data "aws_ami" "amazon_linux" {
  most_recent = true
  owners      = ["amazon"]
  filter {
    name   = "name"
    values = ["al2023-ami-*-x86_64"]
  }
}

resource "aws_instance" "app" {
  count                  = var.instance_count
  ami                    = data.aws_ami.amazon_linux.id
  instance_type          = var.instance_type
  subnet_id              = var.private_subnet_ids[count.index % length(var.private_subnet_ids)]
  vpc_security_group_ids = [aws_security_group.app.id]
  key_name               = var.key_pair_name

  user_data = <<-EOF
    #!/bin/bash
    yum update -y
    yum install -y docker
    systemctl start docker
    systemctl enable docker
    docker pull ${var.docker_image}
    docker run -d -p 80:3000 ${var.docker_image}
  EOF

  tags = { Name = "${var.project}-app-${count.index + 1}" }
}

RDS Database and Variable/Output Management

Place RDS instances in private subnets and configure automatic backups and Multi-AZ. Store sensitive values (DB passwords, etc.) in terraform.tfvars (added to .gitignore) or use AWS Secrets Manager. Output blocks allow other modules or CI/CD pipelines to reference information about created resources.

# modules/rds/main.tf
resource "aws_db_subnet_group" "main" {
  name       = "${var.project}-db-subnet"
  subnet_ids = var.private_subnet_ids
}

resource "aws_security_group" "db" {
  name_prefix = "${var.project}-db-"
  vpc_id      = var.vpc_id

  ingress {
    from_port       = 5432
    to_port         = 5432
    protocol        = "tcp"
    security_groups = [var.app_sg_id]  # Access only from app servers
  }
}

resource "aws_db_instance" "main" {
  identifier     = "${var.project}-db"
  engine         = "postgres"
  engine_version = "16.1"
  instance_class = var.db_instance_class

  allocated_storage     = 20
  max_allocated_storage = 100  # Auto storage scaling

  db_name  = var.db_name
  username = var.db_username
  password = var.db_password  # Injected from tfvars or Secrets Manager

  db_subnet_group_name   = aws_db_subnet_group.main.name
  vpc_security_group_ids = [aws_security_group.db.id]

  multi_az                = var.environment == "prod" ? true : false
  backup_retention_period = 7
  skip_final_snapshot     = var.environment != "prod"

  tags = { Name = "${var.project}-db" }
}

# variables.tf - Variable definitions
variable "project"           { type = string }
variable "environment"       { type = string }
variable "aws_region"        { type = string; default = "ap-northeast-2" }
variable "db_password"       { type = string; sensitive = true }
variable "instance_type"     { type = string; default = "t3.micro" }
variable "db_instance_class" { type = string; default = "db.t3.micro" }

# outputs.tf - Output definitions
output "db_endpoint" {
  value = aws_db_instance.main.endpoint
}
output "vpc_id" {
  value = aws_vpc.main.id
}

Terraform Modularization and Per-Environment Configuration

Structuring infrastructure into reusable modules allows managing dev/staging/prod environments with the same code. Modules consist of inputs (variables), resources, and outputs, with per-environment tfvars files injecting different values. Use Git tags or Terraform Registry for module version management.

# environments/dev/main.tf - Module calls
module "vpc" {
  source   = "../../modules/vpc"
  project  = var.project_name
  vpc_cidr = "10.0.0.0/16"
}

module "ec2" {
  source             = "../../modules/ec2"
  project            = var.project_name
  vpc_id             = module.vpc.vpc_id
  private_subnet_ids = module.vpc.private_subnet_ids
  alb_sg_id          = module.alb.security_group_id
  bastion_sg_id      = module.bastion.security_group_id
  instance_count     = 2
  instance_type      = "t3.small"
  docker_image       = var.docker_image
}

module "rds" {
  source             = "../../modules/rds"
  project            = var.project_name
  environment        = "dev"
  vpc_id             = module.vpc.vpc_id
  private_subnet_ids = module.vpc.private_subnet_ids
  app_sg_id          = module.ec2.security_group_id
  db_instance_class  = "db.t3.micro"
  db_name            = "myapp"
  db_username        = "admin"
  db_password        = var.db_password
}

# environments/dev/terraform.tfvars
project_name = "myapp"
aws_region   = "ap-northeast-2"
docker_image = "myapp:latest"

# environments/prod/terraform.tfvars (production has different specs)
project_name = "myapp"
aws_region   = "ap-northeast-2"
docker_image = "myapp:v1.2.3"

# Per-environment execution
cd environments/dev && terraform plan -var-file="terraform.tfvars"
cd environments/prod && terraform plan -var-file="terraform.tfvars"

terraform plan/apply Workflow and CI/CD Integration

The safe workflow is to preview changes with terraform plan first, then apply them with terraform apply. Automatically displaying plan results as comments on PRs enables code review for infrastructure changes. Integrating with GitHub Actions allows automatic apply when merging to the main branch.

# Basic workflow
terraform fmt -check     # Code format check
terraform validate       # Syntax validation
terraform plan -out=plan.tfplan  # Generate change plan
terraform apply plan.tfplan       # Apply the plan

# .github/workflows/terraform.yml
name: Terraform CI/CD
on:
  pull_request:
    paths: ['infra/**']
  push:
    branches: [main]
    paths: ['infra/**']

jobs:
  terraform:
    runs-on: ubuntu-latest
    defaults:
      run:
        working-directory: infra/environments/prod
    steps:
      - uses: actions/checkout@v4
      - uses: hashicorp/setup-terraform@v3
        with:
          terraform_version: 1.6.6

      - name: Terraform Init
        run: terraform init

      - name: Terraform Format Check
        run: terraform fmt -check -recursive

      - name: Terraform Plan
        id: plan
        run: terraform plan -no-color -out=plan.tfplan
        env:
          AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID }}
          AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_ACCESS_KEY }}

      # Post plan results as PR comment
      - name: Comment Plan on PR
        if: github.event_name == 'pull_request'
        uses: actions/github-script@v7
        with:
          script: |
            const plan = `${{ steps.plan.outputs.stdout }}`;
            github.rest.issues.createComment({
              owner: context.repo.owner,
              repo: context.repo.repo,
              issue_number: context.issue.number,
              body: `### Terraform Plan\n\`\`\`\n${plan}\n\`\`\``
            });

      # Auto apply on main merge
      - name: Terraform Apply
        if: github.ref == 'refs/heads/main'
        run: terraform apply -auto-approve plan.tfplan

Core Code

Core structure of a Terraform project. VPC/EC2/RDS are separated into modules, dev/staging/prod are managed with per-environment tfvars, and state is shared via S3 backend.

# Terraform IaC Core Structure
# ==============================
# infra/
# ├── modules/
# │   ├── vpc/      (main.tf, variables.tf, outputs.tf)
# │   ├── ec2/
# │   └── rds/
# └── environments/
#     ├── dev/      (main.tf, terraform.tfvars)
#     ├── staging/
#     └── prod/

terraform {
  required_version = ">= 1.6.0"
  required_providers {
    aws = { source = "hashicorp/aws"; version = "~> 5.30" }
  }
  backend "s3" {
    bucket = "my-terraform-state"
    key    = "prod/terraform.tfstate"
    region = "ap-northeast-2"
    dynamodb_table = "terraform-locks"
    encrypt = true
  }
}

provider "aws" {
  region = "ap-northeast-2"
  default_tags { tags = { ManagedBy = "terraform" } }
}

module "vpc" {
  source   = "../../modules/vpc"
  vpc_cidr = "10.0.0.0/16"
}

module "app" {
  source     = "../../modules/ec2"
  vpc_id     = module.vpc.vpc_id
  subnet_ids = module.vpc.private_subnet_ids
  instance_type  = "t3.small"
  instance_count = 2
}

module "db" {
  source     = "../../modules/rds"
  vpc_id     = module.vpc.vpc_id
  subnet_ids = module.vpc.private_subnet_ids
  app_sg_id  = module.app.security_group_id
}

Common Mistakes

Committing tfstate files to Git, exposing sensitive information (DB passwords, etc.)

tfstate files contain all resource state and sensitive information in plaintext. Always use an S3 + DynamoDB backend for remote storage, and add *.tfstate and *.tfvars to .gitignore.

Running terraform apply directly without plan, causing unintended resource deletion

Always preview changes with terraform plan -out=plan.tfplan first, then apply only that plan with terraform apply plan.tfplan. In CI/CD, strictly follow the plan -> approve -> apply sequence.

Defining all resources in a single main.tf without modules, making management impossible

Separate resources into functional modules (vpc, ec2, rds, iam, etc.) and call modules from per-environment directories. Clearly defining module interfaces (variables/outputs) increases reusability.

Related liminfo Services

AWS Reference GCP Reference