Intro
This article focuses on infrastructure deployment using Terraform from a local machine. Before putting a script into a pipline, I usually test its performance from the local environment. Initially I didn’t plan to write this post, because I wanted to outline all the material in another post. But in the process of working on my Terraform script and explaining some aspects, I realized that it would be better to logically divide the whole narrative so as not to mix everything in a pile.
Preparing a Terraform script
The script is a basic infrastructure with EKS cluster, VPC, two IAM roles (developer and manager), etc. At the next stage the script will be added to the CI-pipeline and all variables will be formatted as Gitlab CI/CD variables. Only the main components are shown in the diagram, and the script itself is added at the very end of this paragraph (s. TF-script).

Control flow
In order to make the script flexible enough I use conditionals with the count parameter, for example, for the Metrics Server helm release:
variable "deploy_metrics_server" { description = "Flag to control metrics server deployment" type = bool default = true } # Metrics server resource "helm_release" "metrics_server" { count = var.deploy_metrics_server ? 1 : 0 name = "metrics-server" repository = "https://charts.bitnami.com/bitnami" chart = "metrics-server" namespace = "metrics-server" version = "7.2.16" create_namespace = true set { name = "apiService.create" value = "true" } depends_on = [ aws_eks_node_group.general ] }
If count
is set to true, then an instance of the Metrics Server is deployed in the cluster, otherwise the resource is ignored.
Terraform modules
I decided not to use TF-modules, because several times I faced a lot of problems when it was time to update the script, and since the module is actually a custom assembly of many disparate resources, which is aimed at ease of work and quick deployment, all the functionality is hidden inside. It would seem that you can take it and use it, but it happens that when changing the version of the main sequence, many components turn out to be incompatible due to contradictory parameters that either need to be added for necessity or removed for unnecessity.
For example, I once had a hell of a battle switching EKS module from version 17 to version 20. Originally it was legacy code that needed to be used somehow and initially had no problems, however one day I needed to update it because version 17 is already too old and there is no such thing as a nodegroup, only worker groups, which I desperately needed. Having failed with such a drastic upgrade, I decided to make upgrades sequentially – from 17 to 18, from 18 to 19, from 19 to 20 – but this was also problematic, because the module stubbornly refused to work – there were fundamental differences in authentication methods.
Kubernetes Provider
One of the stumbling blocks was getting credentials for the providers, particularly for Kubernetes:
provider "kubernetes" { host = data.aws_eks_cluster.eks.endpoint cluster_ca_certificate = base64decode(data.aws_eks_cluster.eks.certificate_authority[0].data) exec { api_version = "client.authentication.k8s.io/v1beta1" args = ["eks", "get-token", "--cluster-name", aws_eks_cluster.eks.name] command = "aws" } }
After a couple of hours of trying and failing, the reason was found:
Based on the provided configuration, it seems that you are affected by a bug in Terraform Cloud where in some circumstances when using that authentication method the
awscli
executable which should be installed on Terraform Cloud agent node gets installed slower making it unavailable at the time ofawscli
command execution. This is the reason why sometimes the run is successful, but sometimes it fails.
And the provider for Kubernetes should use token instead:
provider "kubernetes" { host = data.aws_eks_cluster.eks.endpoint cluster_ca_certificate = base64decode(data.aws_eks_cluster.eks.certificate_authority[0].data) token = data.aws_eks_cluster_auth.eks.token }
The TF-script
# Local variables locals { env = "staging" region = "eu-central-1" zoneA = "eu-central-1a" zoneB = "eu-central-1b" zoneC = "eu-central-1c" eks_version = "1.31" eks_name = "test-nest" } # Variables variable "deploy_metrics_server" { description = "Flag to control metrics server deployment" type = bool default = true } variable "create_developer_user" { description = "Flag to control developer user creation" type = bool default = true } variable "create_manager_user" { description = "Flag to control manager user creation" type = bool default = true } # Data data "aws_eks_cluster" "eks" { name = aws_eks_cluster.eks.name } data "aws_eks_cluster_auth" "eks" { name = aws_eks_cluster.eks.name } # Providers provider "aws" { region = local.region profile = "sobercounsel" shared_credentials_files = ["~/.aws/credentials"] } terraform { required_version = ">= 1.0" required_providers { aws = { source = "hashicorp/aws" version = "~> 5.53" } kubernetes = { source = "hashicorp/kubernetes" version = "2.35.0" } helm = { source = "hashicorp/helm" version = "2.16.1" } } } provider "helm" { kubernetes { host = data.aws_eks_cluster.eks.endpoint cluster_ca_certificate = base64decode(data.aws_eks_cluster.eks.certificate_authority[0].data) token = data.aws_eks_cluster_auth.eks.token } } provider "kubernetes" { host = data.aws_eks_cluster.eks.endpoint cluster_ca_certificate = base64decode(data.aws_eks_cluster.eks.certificate_authority[0].data) token = data.aws_eks_cluster_auth.eks.token } # Networking resource "aws_vpc" "aws-vpc" { cidr_block = "10.0.0.0/16" enable_dns_support = true enable_dns_hostnames = true tags = { Name = "${local.env}-vpc" } } resource "aws_internet_gateway" "aws-igw" { vpc_id = aws_vpc.aws-vpc.id tags = { Name = "${local.env}-igw" } } resource "aws_subnet" "privateA" { vpc_id = aws_vpc.aws-vpc.id cidr_block = "10.0.0.0/19" availability_zone = local.zoneA tags = { Name = "${local.env}-private-${local.zoneA}" "kubernetes.io/role/internal-elb" = "1" "kubernetes.io/cluster/${local.env}-${local.eks_name}" = "owned" } } resource "aws_subnet" "privateB" { vpc_id = aws_vpc.aws-vpc.id cidr_block = "10.0.32.0/19" availability_zone = local.zoneB tags = { Name = "${local.env}-private-${local.zoneB}" "kubernetes.io/role/internal-elb" = "1" "kubernetes.io/cluster/${local.env}-${local.eks_name}" = "owned" } } resource "aws_subnet" "privateC" { vpc_id = aws_vpc.aws-vpc.id cidr_block = "10.0.64.0/19" availability_zone = local.zoneC tags = { Name = "${local.env}-private-${local.zoneC}" "kubernetes.io/role/internal-elb" = "1" "kubernetes.io/cluster/${local.env}-${local.eks_name}" = "owned" } } resource "aws_subnet" "publicA" { vpc_id = aws_vpc.aws-vpc.id cidr_block = "10.0.96.0/19" availability_zone = local.zoneA map_public_ip_on_launch = true tags = { Name = "${local.env}-private-${local.zoneA}" "kubernetes.io/role/elb" = "1" "kubernetes.io/cluster/${local.env}-${local.eks_name}" = "owned" } } resource "aws_subnet" "publicB" { vpc_id = aws_vpc.aws-vpc.id cidr_block = "10.0.128.0/19" availability_zone = local.zoneB map_public_ip_on_launch = true tags = { Name = "${local.env}-private-${local.zoneB}" "kubernetes.io/role/elb" = "1" "kubernetes.io/cluster/${local.env}-${local.eks_name}" = "owned" } } resource "aws_subnet" "publicC" { vpc_id = aws_vpc.aws-vpc.id cidr_block = "10.0.160.0/19" availability_zone = local.zoneC map_public_ip_on_launch = true tags = { Name = "${local.env}-private-${local.zoneC}" "kubernetes.io/role/elb" = "1" "kubernetes.io/cluster/${local.env}-${local.eks_name}" = "owned" } } resource "aws_eip" "aws-eip" { domain = "vpc" tags = { Name = "${local.env}-nat" } } resource "aws_nat_gateway" "aws-nat-gw" { allocation_id = aws_eip.aws-eip.id subnet_id = aws_subnet.publicA.id tags = { Name = "${local.env}-nat" } depends_on = [ aws_internet_gateway.aws-igw ] } resource "aws_route_table" "aws-rt-private" { vpc_id = aws_vpc.aws-vpc.id route { cidr_block = "0.0.0.0/0" nat_gateway_id = aws_nat_gateway.aws-nat-gw.id } tags = { Name = "${local.env}-private" } } resource "aws_route_table" "aws-rt-public" { vpc_id = aws_vpc.aws-vpc.id route { cidr_block = "0.0.0.0/0" gateway_id = aws_internet_gateway.aws-igw.id } tags = { Name = "${local.env}-public" } } resource "aws_route_table_association" "privateA" { subnet_id = aws_subnet.privateA.id route_table_id = aws_route_table.aws-rt-private.id } resource "aws_route_table_association" "privateB" { subnet_id = aws_subnet.privateB.id route_table_id = aws_route_table.aws-rt-private.id } resource "aws_route_table_association" "privateC" { subnet_id = aws_subnet.privateC.id route_table_id = aws_route_table.aws-rt-private.id } resource "aws_route_table_association" "publicA" { subnet_id = aws_subnet.publicA.id route_table_id = aws_route_table.aws-rt-public.id } resource "aws_route_table_association" "publicB" { subnet_id = aws_subnet.publicB.id route_table_id = aws_route_table.aws-rt-public.id } resource "aws_route_table_association" "publicC" { subnet_id = aws_subnet.publicC.id route_table_id = aws_route_table.aws-rt-public.id } # EKS resource "aws_iam_role" "eks" { name = "${local.env}-${local.eks_name}-eks-cluster" assume_role_policy = <<POLICY { "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": "sts:AssumeRole", "Principal": { "Service": "eks.amazonaws.com" } } ] } POLICY } resource "aws_iam_role_policy_attachment" "eks" { policy_arn = "arn:aws:iam::aws:policy/AmazonEKSClusterPolicy" role = aws_iam_role.eks.name } resource "aws_eks_cluster" "eks" { name = "${local.env}-${local.eks_name}" version = local.eks_version role_arn = aws_iam_role.eks.arn vpc_config { endpoint_private_access = false endpoint_public_access = true subnet_ids = [ aws_subnet.privateA.id, aws_subnet.privateB.id, aws_subnet.privateC.id ] } access_config { authentication_mode = "API" bootstrap_cluster_creator_admin_permissions = true } depends_on = [ aws_iam_role_policy_attachment.eks ] } # Nodes resource "aws_iam_role" "nodes" { name = "${local.env}-${local.eks_name}-eks-nodes" assume_role_policy = <<POLICY { "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": "sts:AssumeRole", "Principal": { "Service": "ec2.amazonaws.com" } } ] } POLICY } # This policy now includes AssumeRoleForPodIdentity for the Pod Identity Agent resource "aws_iam_role_policy_attachment" "amazon_eks_worker_node_policy" { policy_arn = "arn:aws:iam::aws:policy/AmazonEKSWorkerNodePolicy" role = aws_iam_role.nodes.name } resource "aws_iam_role_policy_attachment" "amazon_eks_cni_policy" { policy_arn = "arn:aws:iam::aws:policy/AmazonEKS_CNI_Policy" role = aws_iam_role.nodes.name } resource "aws_iam_role_policy_attachment" "amazon_ec2_container_registry_read_only" { policy_arn = "arn:aws:iam::aws:policy/AmazonEC2ContainerRegistryReadOnly" role = aws_iam_role.nodes.name } resource "aws_eks_node_group" "general" { cluster_name = aws_eks_cluster.eks.name version = local.eks_version node_group_name = "general" node_role_arn = aws_iam_role.nodes.arn subnet_ids = [ aws_subnet.privateA.id, aws_subnet.privateB.id, aws_subnet.privateC.id ] capacity_type = "SPOT" instance_types = ["t3.small"] scaling_config { desired_size = 2 max_size = 10 min_size = 1 } update_config { max_unavailable = 1 } labels = { role = "general" } depends_on = [ aws_iam_role_policy_attachment.amazon_eks_worker_node_policy, aws_iam_role_policy_attachment.amazon_eks_cni_policy, aws_iam_role_policy_attachment.amazon_ec2_container_registry_read_only, ] # Allow external changes without Terraform plan difference lifecycle { ignore_changes = [scaling_config[0].desired_size] } } # K8S Roles & Role Bindings ## Developer resource "kubernetes_cluster_role" "viewer" { metadata { name = "viewer" } rule { api_groups = ["*"] resources = [ "namespaces", "pods", "configmaps", "secrets", "services" ] verbs = [ "get", "list", "watch" ] } } resource "kubernetes_cluster_role_binding" "viewer-binding" { metadata { name = "viewer-binding" } role_ref { api_group = "rbac.authorization.k8s.io" kind = "ClusterRole" name = "cluster-admin" } subject { kind = "Group" name = "viewer-group" api_group = "rbac.authorization.k8s.io" } } ## Manager resource "kubernetes_cluster_role_binding" "admin-binding" { metadata { name = "admin-binding" } role_ref { api_group = "rbac.authorization.k8s.io" kind = "ClusterRole" name = "cluster-admin" } subject { kind = "User" name = "admin" api_group = "rbac.authorization.k8s.io" } subject { kind = "ServiceAccount" name = "default" namespace = "kube-system" } subject { kind = "Group" name = "manager-group" api_group = "rbac.authorization.k8s.io" } } # IAM ## Developer resource "aws_iam_user" "developer" { count = var.create_developer_user ? 1 : 0 name = "LongView" } resource "aws_iam_policy" "developer_eks" { count = var.create_developer_user ? 1 : 0 name = "AmazonEKSDeveloperPolicy" policy = <<POLICY { "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": [ "eks:DescribeCluster", "eks:ListClusters" ], "Resource": "*" } ] } POLICY } resource "aws_iam_user_policy_attachment" "developer_eks" { count = var.create_developer_user ? 1 : 0 user = aws_iam_user.developer[0].name policy_arn = aws_iam_policy.developer_eks[0].arn } resource "aws_eks_access_entry" "developer" { count = var.create_developer_user ? 1 : 0 cluster_name = aws_eks_cluster.eks.name principal_arn = aws_iam_user.developer[0].arn kubernetes_groups = ["viewer-group"] } ## Manager data "aws_caller_identity" "current" {} resource "aws_iam_role" "eks_admin" { count = var.create_manager_user ? 1 : 0 name = "${local.env}-${local.eks_name}-eks-admin" assume_role_policy = <<POLICY { "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": "sts:AssumeRole", "Principal": { "AWS": "arn:aws:iam::${data.aws_caller_identity.current.account_id}:root" } } ] } POLICY } resource "aws_iam_policy" "eks_admin" { count = var.create_manager_user ? 1 : 0 name = "AmazonEKSAdminPolicy" policy = <<POLICY { "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": [ "eks:*" ], "Resource": "*" }, { "Effect": "Allow", "Action": "iam:PassRole", "Resource": "*", "Condition": { "StringEquals": { "iam:PassedToService": "eks.amazonaws.com" } } } ] } POLICY } resource "aws_iam_role_policy_attachment" "eks_admin" { count = var.create_manager_user ? 1 : 0 role = aws_iam_role.eks_admin[0].name policy_arn = aws_iam_policy.eks_admin[0].arn } resource "aws_iam_user" "manager" { count = var.create_manager_user ? 1 : 0 name = "WithinReason" } resource "aws_iam_policy" "eks_assume_admin" { count = var.create_manager_user ? 1 : 0 name = "AmazonEKSAssumeAdminPolicy" policy = <<POLICY { "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": [ "sts:AssumeRole" ], "Resource": "${aws_iam_role.eks_admin[0].arn}" } ] } POLICY } resource "aws_iam_user_policy_attachment" "manager" { count = var.create_manager_user ? 1 : 0 user = aws_iam_user.manager[0].name policy_arn = aws_iam_policy.eks_assume_admin[0].arn } # Best practice: use IAM roles due to temporary credentials resource "aws_eks_access_entry" "manager" { count = var.create_manager_user ? 1 : 0 cluster_name = aws_eks_cluster.eks.name principal_arn = aws_iam_role.eks_admin[0].arn kubernetes_groups = ["manager-group"] } # Metrics server resource "helm_release" "metrics_server" { count = var.deploy_metrics_server ? 1 : 0 name = "metrics-server" repository = "https://charts.bitnami.com/bitnami" chart = "metrics-server" namespace = "metrics-server" version = "7.2.16" create_namespace = true set { name = "apiService.create" value = "true" } depends_on = [ aws_eks_node_group.general ] }
There are tons of videos on Youtube for helping, understanding and putting together a script. Links to one such series can be found in the references.
References
- Mirrored repo «Infrastellar»
- Create AWS VPC using Terraform: AWS EKS Kubernetes Tutorial – Part 1
- Create AWS EKS Cluster using Terraform: AWS EKS Kubernetes Tutorial – Part 2
- Add IAM User & IAM Role to AWS EKS: AWS EKS Kubernetes Tutorial – Part 3
- AWS Load Balancer Controller Tutorial (TLS): AWS EKS Kubernetes Tutorial – Part 6
- Solution: Error getting credentials
- Inconsistent “getting credentials: exec: executable aws failed with exit code 1” errors #2011
- Terraform tips & tricks: loops, if-statements, and gotchas
- AWS EKS cluster + GitLab CI (remote server)
- Terraform Best Practices. Code structure