EKS Cluster on KodeKloud AWS Playground via Terraform¶
What this runbook is
A complete, verified operational record of provisioning a production-pattern EKS cluster on a KodeKloud AWS Playground account using Terraform — an environment where an AWS Organizations SCP silently blocks several standard EKS operations. Every architectural decision in this runbook exists because a standard approach was blocked and tested against the SCP.
End state:
- VPC with public/private subnets across 3 AZs
- Bastion host in a public subnet (SSH-accessible)
- EKS 1.35 control plane with private API endpoint
- Three AL2023 self-managed worker nodes joined and
Ready - Full
kubectlaccess from the bastion
For every error encountered during development, see the companion runbook: KodeKloud EKS Terraform — Challenges and Fixes →
SCP Constraints — Non-Negotiable on This Platform¶
KodeKloud playground accounts operate under an AWS Organizations SCP that blocks specific actions. These are the confirmed constraints:
| Blocked Action | Consequence | Workaround |
|---|---|---|
iam:PassRole (non-whitelisted name) | Cluster creation fails unless the role name is exactly eksClusterRole | Create roles with whitelisted names in a separate iam-eks.tf |
iam:TagPolicy | KMS encryption IAM policy creation fails | Disable all KMS encryption in the cluster config |
eks:CreateNodegroup | Managed node groups impossible regardless of tool | Self-managed nodes via CloudFormation |
eks:AssociateAccessPolicy | Cannot attach Kubernetes policies to access entries | Use bootstrap_cluster_creator_admin_permissions = true directly on the cluster resource |
eks:DeleteAddon | terraform destroy fails on addon deletion | preserve = true on all EKS addon resources |
logs:DeleteLogGroup | terraform destroy fails on CW log group deletion | Do not manage the log group in Terraform |
Why the terraform-aws-modules/eks/aws module does not work here
The EKS module v21.x silently drops bootstrap_cluster_creator_admin_permissions from the cluster's access_config block when create_iam_role = false. Since iam:PassRole requires the whitelisted name eksClusterRole, create_iam_role must be false — which means the module leaves the cluster with no admin access. The workaround is to bypass the module and use a raw aws_eks_cluster resource where the setting is passed directly to the AWS API.
Repository Structure¶
microservices-demo/terraform/eks-kodekloud/
├── terraform.tf # Provider requirements (aws ~> 6.42)
├── variables.tf # All input variables
├── data.tf # AMI lookup, caller identity, region, public IP
├── vpc.tf # VPC via terraform-aws-modules/vpc (no SCP issues)
├── bastion.tf # Bastion via terraform-aws-modules/ec2-instance
├── iam-eks.tf # SCP-whitelisted IAM roles: eksClusterRole + eksNodeRole
├── eks.tf # Raw aws_eks_cluster + OIDC provider + addons
└── outputs.tf # All outputs
Dev Machine¶
I use SilverStack Dev Machine — a custom root filesystem on iximiuz Labs, which I maintain with all DevOps tools pre-installed (kubectl, eksctl, terraform, helm, helmfile, aws cli, etc.). No local machine setup is required.
Phase 1 — IAM Roles (iam-eks.tf)¶
The SCP allows iam:PassRole only for roles named exactly eksClusterRole and eksNodeRole. The EKS module auto-generates role names that do not match (e.g., silver-stack-eks-cluster-20260610...), so roles must be created explicitly before anything else.
# iam-eks.tf
resource "aws_iam_role" "eks_cluster_role" {
name = "eksClusterRole" # exact whitelisted name — do not change
assume_role_policy = jsonencode({
Version = "2012-10-17"
Statement = [{
Effect = "Allow"
Principal = { Service = "eks.amazonaws.com" }
Action = "sts:AssumeRole"
}]
})
tags = { Name = "eksClusterRole" }
}
resource "aws_iam_role_policy_attachment" "eks_cluster_policy" {
role = aws_iam_role.eks_cluster_role.name
policy_arn = "arn:aws:iam::aws:policy/AmazonEKSClusterPolicy"
}
resource "aws_iam_role" "eks_node_role" {
name = "eksNodeRole" # exact whitelisted name — do not change
assume_role_policy = jsonencode({
Version = "2012-10-17"
Statement = [{
Effect = "Allow"
Principal = { Service = "ec2.amazonaws.com" }
Action = "sts:AssumeRole"
}]
})
tags = { Name = "eksNodeRole" }
}
resource "aws_iam_role_policy_attachment" "node_worker_policy" {
role = aws_iam_role.eks_node_role.name
policy_arn = "arn:aws:iam::aws:policy/AmazonEKSWorkerNodePolicy"
}
resource "aws_iam_role_policy_attachment" "node_cni_policy" {
role = aws_iam_role.eks_node_role.name
policy_arn = "arn:aws:iam::aws:policy/AmazonEKS_CNI_Policy"
}
resource "aws_iam_role_policy_attachment" "node_ecr_policy" {
role = aws_iam_role.eks_node_role.name
policy_arn = "arn:aws:iam::aws:policy/AmazonEC2ContainerRegistryReadOnly"
}
resource "aws_iam_role_policy_attachment" "node_ssm_policy" {
role = aws_iam_role.eks_node_role.name
policy_arn = "arn:aws:iam::aws:policy/AmazonSSMManagedInstanceCore"
}
If the roles already exist from a previous session
KodeKloud playground accounts sometimes retain IAM resources across sessions. If terraform apply fails with EntityAlreadyExists:
Phase 2 — Network and Bastion¶
vpc.tf and bastion.tf use the standard community modules. Neither triggers SCP restrictions — the SCP targets IAM and EKS, not VPC or EC2. No changes to the module defaults are required for KodeKloud compatibility.
Key decisions:
- Private subnets for EKS worker nodes (no direct internet exposure)
- Public subnets for the bastion and NAT gateway
- Single NAT gateway (dev/staging cost tradeoff)
- Bastion IP lock: the security group restricts SSH ingress to the single IP that ran
terraform apply, viadata.http.my_ip
The bastion requires an IAM role attached if SSM Session Manager access is needed. With AmazonSSMManagedInstanceCore attached to eksNodeRole, nodes can be accessed via SSM without a key pair after they join the cluster.
Phase 3 — EKS Cluster (eks.tf)¶
This file uses raw aws_eks_cluster and individual aws_eks_addon resources rather than the terraform-aws-modules/eks/aws module. See the constraints table above for why the module cannot be used reliably here.
# eks.tf
resource "aws_security_group" "eks_additional" {
name = "${var.project_name}-eks-additional-sg"
description = "Allow bastion host to reach EKS API on port 443"
vpc_id = module.vpc.vpc_id
ingress {
description = "HTTPS from bastion host"
from_port = 443
to_port = 443
protocol = "tcp"
security_groups = [aws_security_group.bastion_sg.id]
}
egress {
description = "Allow all outbound"
from_port = 0
to_port = 0
protocol = "-1"
cidr_blocks = ["0.0.0.0/0"]
}
tags = { Name = "${var.project_name}-eks-additional-sg" }
}
resource "aws_eks_cluster" "this" {
name = "${var.project_name}-eks"
role_arn = aws_iam_role.eks_cluster_role.arn
version = var.kubernetes_version
vpc_config {
subnet_ids = module.vpc.private_subnets
endpoint_private_access = true
endpoint_public_access = false
security_group_ids = [aws_security_group.eks_additional.id]
}
access_config {
authentication_mode = "API_AND_CONFIG_MAP"
bootstrap_cluster_creator_admin_permissions = true
}
depends_on = [aws_iam_role_policy_attachment.eks_cluster_policy]
}
data "tls_certificate" "eks" {
url = aws_eks_cluster.this.identity[0].oidc[0].issuer
}
resource "aws_iam_openid_connect_provider" "eks" {
client_id_list = ["sts.amazonaws.com"]
thumbprint_list = [data.tls_certificate.eks.certificates[0].sha1_fingerprint]
url = aws_eks_cluster.this.identity[0].oidc[0].issuer
tags = { Name = "${var.project_name}-eks-oidc" }
}
resource "aws_eks_addon" "vpc_cni" {
cluster_name = aws_eks_cluster.this.name
addon_name = "vpc-cni"
resolve_conflicts_on_create = "OVERWRITE"
preserve = true
}
resource "aws_eks_addon" "kube_proxy" {
cluster_name = aws_eks_cluster.this.name
addon_name = "kube-proxy"
resolve_conflicts_on_create = "OVERWRITE"
preserve = true
}
resource "aws_eks_addon" "pod_identity_agent" {
cluster_name = aws_eks_cluster.this.name
addon_name = "eks-pod-identity-agent"
resolve_conflicts_on_create = "OVERWRITE"
preserve = true
}
bootstrap_cluster_creator_admin_permissions = true is the critical field
This passes bootstrapClusterCreatorAdminPermissions: true directly to the AWS CreateCluster API call. EKS handles the admin access entry internally at creation time — no eks:AssociateAccessPolicy call is made, so the SCP is never triggered. Omitting this field (or relying on the module to set it) results in a cluster the lab user cannot access.
CoreDNS is intentionally absent from the addon list
The aws_eks_addon resource waits up to 20 minutes for Active status. CoreDNS stays Degraded until worker nodes exist to schedule its pods — on a fresh cluster with no nodes, the apply hangs and eventually times out. EKS installs CoreDNS automatically as a built-in Kubernetes deployment; it activates once self-managed nodes join the cluster in Phase 4. No Terraform resource and no manual step are needed.
preserve = true on every addon
eks:DeleteAddon is blocked by the KodeKloud SCP. With preserve = true, terraform destroy removes the resource from state without calling the AWS API, avoiding AccessDeniedException on teardown.
Phase 4 — Apply Terraform¶
Step 1: Clone and configure terraform in the KodeKloud Lab¶
Use Iximiuz playground as Dev Machine
See Dev Machine section for more details.
# Clone and enter the repo
git clone https://github.com/ibtisam-iq/silver-stack.git
cd silver-stack/terraform/aws/eks-kodekloud
# Configure lab credentials
aws configure
aws sts get-caller-identity # confirm identity before applying
# If re-running in the same terminal after a previous lab session, wipe state
rm -f terraform.tfstate terraform.tfstate.backup
terraform init
terraform apply
Stale state file from an expired lab
KodeKloud credentials expire with the session, but the local terraform.tfstate persists across sessions. If the state references resources in the old account (visible as aws_account_id changing in the plan output), delete the state file before applying. Terraform will create everything fresh in the new account. See Challenges runbook → Stale state after lab restart.
Expected apply time: 12 to 15 minutes (EKS control plane creation dominates).
Step 2: Associate OIDC provider manually (no iam:PassRole triggered here)¶
Phase 5 — Self-Managed Nodes (CloudFormation)¶
eks:CreateNodegroup is blocked unconditionally. Worker nodes are provisioned via the AWS-provided EKS CloudFormation node template after Terraform apply.
5.1 — SSH to Bastion and Configure Credentials¶
Install Tools on bastion
Reference: Install Tools on bastion
# Use the ssh command from Terraform outputs
ssh -i silver-stack-eks-bastion-key.pem ubuntu@<bastion_public_ip>
# On the bastion
aws configure
# Enter the same KodeKloud lab credentials used on the dev machine
aws sts get-caller-identity # confirm before proceeding
aws eks update-kubeconfig --region us-east-1 --name silver-stack-eks
kubectl get nodes # should return "No resources found" — NOT Forbidden
Verify kubectl before proceeding
If kubectl get nodes returns a Forbidden or authentication error, stop here and consult the Challenges runbook. Proceeding with broken kubectl access means nodes will join but you will not be able to verify or manage them.
5.2 — Collect Cluster Metadata¶
CLUSTER_NAME=silver-stack-eks
ACCOUNT_ID=$(aws sts get-caller-identity --query Account --output text)
REGION=us-east-1
BASTION_KEY="${CLUSTER_NAME}-bastion-key"
VPC_ID=$(aws eks describe-cluster \
--name "$CLUSTER_NAME" \
--query "cluster.resourcesVpcConfig.vpcId" \
--output text)
CLUSTER_SG=$(aws eks describe-cluster \
--name "$CLUSTER_NAME" \
--query "cluster.resourcesVpcConfig.clusterSecurityGroupId" \
--output text)
# Private subnets for nodes — they are already tagged by vpc.tf
SUBNET_IDS=$(aws ec2 describe-subnets \
--filters "Name=vpc-id,Values=$VPC_ID" \
"Name=tag:kubernetes.io/role/internal-elb,Values=1" \
--query "Subnets[*].SubnetId" \
--output text | tr '\t' ',')
API_SERVER=$(aws eks describe-cluster --name "$CLUSTER_NAME" \
--query "cluster.endpoint" --output text)
CA_DATA=$(aws eks describe-cluster --name "$CLUSTER_NAME" \
--query "cluster.certificateAuthority.data" --output text)
SERVICE_CIDR=$(aws eks describe-cluster --name "$CLUSTER_NAME" \
--query "cluster.kubernetesNetworkConfig.serviceIpv4Cidr" --output text)
K8S_VERSION=$(aws eks describe-cluster --name "$CLUSTER_NAME" \
--query "cluster.version" --output text)
AUTH_MODE=$(aws eks describe-cluster --name "$CLUSTER_NAME" \
--query "cluster.accessConfig.authenticationMode" --output text)
case "$AUTH_MODE" in
API) AUTH_MODE_PARAM="EKS API" ;;
API_AND_CONFIG_MAP) AUTH_MODE_PARAM="EKS API and ConfigMap" ;;
CONFIG_MAP) AUTH_MODE_PARAM="ConfigMap" ;;
esac
echo "VPC: $VPC_ID | SG: $CLUSTER_SG | Subnets: $SUBNET_IDS"
echo "API: $API_SERVER | CIDR: $SERVICE_CIDR | K8s: $K8S_VERSION"
echo "Auth: $AUTH_MODE -> $AUTH_MODE_PARAM"
Verify none of the variables are blank before continuing. A blank value in the CF parameters file will cause the stack to fail silently or use defaults.
5.3 — Confirm Template Parameter Set (Optional)¶
The accepted parameter keys differ between CloudFormation template versions. Always print the authoritative list before writing the params file.
aws cloudformation get-template-summary \
--template-url https://s3.us-west-2.amazonaws.com/amazon-eks/cloudformation/2025-11-26/amazon-eks-nodegroup.yaml \
--query "Parameters[].ParameterKey" \
--output text | tr '\t' '\n' | sort
5.4 — Build Parameters File¶
cat > /tmp/cf-params.json << EOF
[
{"ParameterKey": "ClusterName", "ParameterValue": "$CLUSTER_NAME"},
{"ParameterKey": "ClusterControlPlaneSecurityGroup", "ParameterValue": "$CLUSTER_SG"},
{"ParameterKey": "ApiServerEndpoint", "ParameterValue": "$API_SERVER"},
{"ParameterKey": "CertificateAuthorityData", "ParameterValue": "$CA_DATA"},
{"ParameterKey": "ServiceCidr", "ParameterValue": "$SERVICE_CIDR"},
{"ParameterKey": "AuthenticationMode", "ParameterValue": "$AUTH_MODE_PARAM"},
{"ParameterKey": "NodeGroupName", "ParameterValue": "${CLUSTER_NAME}-nodes"},
{"ParameterKey": "NodeInstanceType", "ParameterValue": "t3.medium"},
{"ParameterKey": "NodeImageIdSSMParam", "ParameterValue": "/aws/service/eks/optimized-ami/$K8S_VERSION/amazon-linux-2023/x86_64/standard/recommended/image_id"},
{"ParameterKey": "NodeVolumeSize", "ParameterValue": "20"},
{"ParameterKey": "VpcId", "ParameterValue": "$VPC_ID"},
{"ParameterKey": "Subnets", "ParameterValue": "$SUBNET_IDS"},
{"ParameterKey": "KeyName", "ParameterValue": "$BASTION_KEY"},
{"ParameterKey": "NodeAutoScalingGroupMinSize", "ParameterValue": "1"},
{"ParameterKey": "NodeAutoScalingGroupMaxSize", "ParameterValue": "5"},
{"ParameterKey": "NodeAutoScalingGroupDesiredCapacity", "ParameterValue": "3"}
]
EOF
cat /tmp/cf-params.json # verify no blank values
AuthenticationMode uses display strings, not API enum values
The CloudFormation template's AuthenticationMode parameter accepts EKS API, EKS API and ConfigMap, or ConfigMap — not the API enum values API, API_AND_CONFIG_MAP, CONFIG_MAP. The case block above translates them. Using the raw enum produces ValidationError: Parameter 'AuthenticationMode' must be one of AllowedValues.
5.5 — Launch the Stack¶
aws cloudformation create-stack \
--stack-name eks-nodes-stack \
--template-url https://s3.us-west-2.amazonaws.com/amazon-eks/cloudformation/2025-11-26/amazon-eks-nodegroup.yaml \
--parameters file:///tmp/cf-params.json \
--capabilities CAPABILITY_IAM
# Poll until CREATE_COMPLETE (~5 min)
watch -n 10 "aws cloudformation describe-stacks \
--stack-name eks-nodes-stack \
--query 'Stacks[0].StackStatus' --output text"
5.6 — Join Nodes to the Cluster¶
The AL2023 template with AuthenticationMode = "EKS API and ConfigMap" automatically creates an EC2_LINUX access entry for the node role, so nodes should register on their own. Verify first:
If nodes are NotReady or absent after 5 minutes, apply the aws-auth ConfigMap manually:
NODE_ROLE_ARN=$(aws cloudformation describe-stacks \
--stack-name eks-nodes-stack \
--query "Stacks[0].Outputs[?OutputKey=='NodeInstanceRole'].OutputValue" \
--output text)
curl -O https://s3.us-west-2.amazonaws.com/amazon-eks/cloudformation/2020-10-29/aws-auth-cm.yaml
sed -i "s|<ARN of instance role (not instance profile)>|$NODE_ROLE_ARN|g" aws-auth-cm.yaml
aws eks update-kubeconfig --region $REGION --name $CLUSTER_NAME
kubectl apply -f aws-auth-cm.yaml
Phase 6 — Verification¶
# All system pods running
kubectl get pods -A
# Three nodes Ready
kubectl get nodes
# OIDC provider present (needed for IRSA in later phases)
aws iam list-open-id-connect-providers
# Cluster access config
aws eks describe-cluster \
--name $CLUSTER_NAME \
--query "cluster.accessConfig"
Expected kubectl get nodes output:
NAME STATUS ROLES AGE VERSION
ip-10-0-1-20.ec2.internal Ready <none> 10m v1.35.x-eks-xxxxxxx
ip-10-0-2-67.ec2.internal Ready <none> 10m v1.35.x-eks-xxxxxxx
ip-10-0-3-52.ec2.internal Ready <none> 10m v1.35.x-eks-xxxxxxx
Phase 7 — Cleanup¶
terraform destroy on KodeKloud has three known SCP-related failure modes. The recommended approach avoids all of them:
# Step 1: Remove addon resources from state (eks:DeleteAddon blocked)
# These have preserve = true, so destroy would attempt nothing anyway,
# but removing them makes the plan cleaner.
terraform state rm 'aws_eks_addon.vpc_cni'
terraform state rm 'aws_eks_addon.kube_proxy'
terraform state rm 'aws_eks_addon.pod_identity_agent'
# Step 2: Destroy the rest
terraform destroy
CloudFormation node stack
terraform destroy does not manage the CloudFormation node stack (it was created manually via AWS CLI). Delete it separately:
Wait for DELETE_COMPLETE before running terraform destroy, otherwise the VPC deletion will fail because the node security group is still attached.
Related Runbooks¶
| Topic | Link |
|---|---|
| Every error encountered during this setup | KodeKloud EKS Terraform — Challenges and Fixes |
| EKS cluster via eksctl (manual approach) | EKS on KodeKloud AWS Playground |
| EBS CSI driver installation | Install EBS CSI Driver |
| AWS Load Balancer Controller | Deploy AWS Load Balancer Controller |