End-to-End Platform Engineering: eksctl, Helmfile Orchestration, and AWS Managed Services on Amazon EKS¶
Overview¶
This is a complete operational record of deploying the Retail Store Sample App onto Amazon EKS cluster ibtisam-iq-eks-cluster (us-east-1, Kubernetes 1.34). It documents every infrastructure decision, command, and validation step taken - from IAM roles and CloudFormation node groups through Helmfile orchestration, EBS storage, ALB Ingress, AWS-managed databases, and observability.
About the Application¶
The Retail Store Sample App is a deliberately polyglot microservices e-commerce store, originally authored by the AWS Containers team and forked at ibtisam-iq/retail-store-sample-app.
It models the kind of heterogeneous stack found in real-world platform engineering - five independent services, five different runtimes, five different persistence backends.
Since this project marked my transition from monolithic 3-tier architectures to polyglot microservices, I conducted an in-depth analysis of the application's source code and inter-service communication. My detailed architectural breakdowns for each service can be found in the repository's runbooks directory.
| Service | Language | Role | Database |
|---|---|---|---|
| UI | Java | Store frontend - routes all user traffic | None (calls all services) |
| Catalog | Go | Product catalog REST API | MySQL / MariaDB |
| Cart | Java | Shopping cart state management | DynamoDB / In-memory |
| Orders | Java | Order processing and persistence | PostgreSQL + SQS (on EKS) |
| Checkout | Node.js | Checkout orchestration | Redis / ElastiCache |
What I Built on Top¶
The upstream repository ships the application source code and base Helm charts. Everything below is original work I authored on top of that foundation.
Per-service values-*.yaml overrides
Each service ships with a base values.yaml inside its own chart/ directory. I studied each one and authored additional override files on top - one per deployment scenario - so the same chart can be deployed across different target environments without touching the chart itself. Each service has its own dedicated runbook documenting every override decision.
Three Helmfile configurations
Rather than running five separate helm install commands in the right order every time, I authored three Helmfile configurations - one per deployment target - each declaring all five releases with explicit dependency ordering via needs::
| Helmfile | Target | Storage | Message Broker | UI Exposure |
|---|---|---|---|---|
helmfile-baremetal-ephemeral.yaml | Any Kubernetes cluster | Ephemeral (no PVC) | In-memory | NodePort |
helmfile-baremetal-persistent.yaml | Bare-metal with local-path | PVC | RabbitMQ | NodePort |
helmfile-eks.yaml | AWS EKS | gp3 EBS PVC | AWS SQS | ALB Ingress |
Any cluster, any Helmfile
The ephemeral and persistent Helmfiles are not bare-metal-exclusive. They can run on any Kubernetes cluster - kubeadm, EKS, GKE - wherever the referenced values-*.yaml assumptions hold. The EKS Helmfile is the one that requires AWS-specific infrastructure: EBS CSI driver, ALB Ingress Controller, DynamoDB, SQS, and ACM - which is exactly what this runbook provisions.
This runbook
The final Helmfile command for this deployment is one line:
But that single command only works after an entire infrastructure stack has been built correctly. This runbook is the record of everything that had to exist before that command could succeed.
Related Runbooks¶
| Topic | Link |
|---|---|
| kubeadm cluster bootstrap (SilverStack) | Cluster Bootstrap Runbook |
| EKS provisioning on KodeKloud Playground | EKS on KodeKloud Runbook |
Getting Started¶
Phases¶
The project deployment is documented across 6 phases. Each phase has its own runbook with step-by-step commands, configurations, and verification steps.
-
Phase 0-3: AWS Infrastructure Prerequisites, IAM Roles, EKS Control Plane, Self-Managed Nodes
-
Phase 4-5: Cluster Add-ons & ACM ALB Controller, EBS CSI Driver, ACM TLS Certificate
-
Phase 6: Application Resources DynamoDB Table, SQS Queue, SNS Topic & Lambda
-
Phase 7: Microservices Deployment Deploying all 5 services using Helmfile
-
Phase 8-9: Observability Stack kube-prometheus-stack, CloudWatch Container Insights
-
Phase 10: End-to-End Validation ALB validation, HTTPS testing, DNS resolution, Stack review
Key Decisions¶
Architectural and engineering decisions made across the deployment phases to accommodate environment constraints and optimize the platform.
Infrastructure and Lab Constraints¶
- Simplified VPC Creation via eksctl. Instead of explicitly defining a VPC and its private subnets in the cluster manifest, I leveraged
eksctl's automated provisioning. This kept the infrastructure code simple while still ensuring the EKS cluster was securely placed into private subnets. (Phase 1-3) - Dev Machine over Bastion for Administration. Instead of installing
kubectl,helmfile, and other DevOps tools onto a bastion host, I administered the cluster directly from my SilverStack Dev Machine which had the complete toolchain pre-installed. (Phase 1-3) - Bastion Host for Node Troubleshooting. While cluster administration was done from the Dev Machine, a bastion host was still utilized to SSH into the self-managed worker nodes for troubleshooting. When nodes initially failed to join the cluster due to an incorrect authentication mode, SSH access via the bastion proved critical for identifying the issue before tearing down and recreating the node stack with the correct parameters. (Phase 1-3)
- Authentication Mode: API_AND_CONFIG_MAP. I explicitly set the authentication mode to
API_AND_CONFIG_MAP(the modern 2023 standard using EKS Access Entries alongside the traditional aws-auth ConfigMap), bypassing legacy restrictions and ensuring robust access control. (Phase 1-3) - IAM Roles via Terraform, not eksctl. The KodeKloud lab user lacks
iam:PassRolepermissions. Allowingeksctlto automatically create roles during cluster creation would fail. Roles were pre-provisioned via Terraform and explicitly passed to eksctl. (Phase 1-3) - Self-Managed Nodes over Managed Node Groups. EKS Managed Node Groups also trigger the blocked
iam:PassRolepermission. To bypass this, worker nodes were deployed manually using an AWS CloudFormation template. (Phase 1-3) - Delayed OIDC Association. OIDC was disabled in
cluster.yaml(withOIDC: false) because it also triggers permission failures during cluster creation. It was associated manually viaeksctl utilsafter the control plane was up. (Phase 1-3)
Deployment Orchestration¶
- Why Helmfile instead of ArgoCD? Because this was my first deep-dive into microservices, I intentionally prioritized mastering deployment orchestration, multi-environment configurations, and release dependencies natively via Helmfile before abstracting the workflow behind a GitOps controller (which I subsequently implemented in my next project).
- Layered
values*.yamlStrategy. Instead of generating a monolithic values file (helm show values) and manually mutating it, I kept the upstream charts pristine. I authored dedicated, decoupledvalues-*.yamloverride files for each service. This allowed me to elegantly expand the Helmfile orchestration across three distinct target behaviors: bare-metal ephemeral, bare-metal persistent, and fully-managed EKS. - Single Runtime Override. Because of the layered
values*.yamlstrategy, all infrastructure configurations were pre-defined. The only manual edit required during the entire deployment phase is injecting the dynamically generated ACM Certificate ARN into the UI service's ingress values file right before executing the grandhelmfile -fcommand. - Helmfile for Dependency Management. Instead of running five separate
helm installcommands, Helmfile was used to declare all five releases. Explicit dependency ordering (needs:) ensures databases are ready before microservices start. (Phase 7)
Cloud-Native Integrations¶
- CloudWatch over ELK Stack. While I utilized the ELK stack with Beats in my other microservices project, this deployment is heavily oriented towards native AWS EKS integrations. Choosing CloudWatch Container Insights via Fluent Bit perfectly aligns with the project's cloud-native focus and seamlessly centralizes logs within the AWS ecosystem. (Phase 8-9)
- Offloading State to AWS Managed Services. Rather than running databases inside the cluster, the EKS Helmfile configuration binds the microservices to DynamoDB (Cart) and SQS/SNS (Orders) via IAM Roles for Service Accounts (IRSA). (Phase 6)
- Shared ALB via Ingress Group. The UI, Prometheus, and Grafana all share a single Application Load Balancer using
alb.ingress.kubernetes.io/group.name: ecom-eks. The ALB routes traffic based on the Host header, eliminating the cost of multiple load balancers. (Phase 8-9) - gp3 as Default StorageClass.
gp2was patched out andgp3was set as the defaultStorageClassfor the EBS CSI driver, providing a cheaper and more performant storage baseline for the stateful databases (MySQL, PostgreSQL). (Phase 4-5)
Screenshots¶
| # | Screenshot | Phase | Link |
|---|---|---|---|
| 01 | CloudFormation EKS Cluster Stack | 10 | 01-cloudformation-eks-cluster-stack-create-complete.png |
| 02 | EKS Cluster Resources with Self-Managed Nodes | 3 | 02-eks-cluster-resources-self-managed-nodes.png |
| 03 | ALB Resource Map and Target Groups | 10 | 03-alb-resource-map-and-target-groups.png |
| 04 | Retail Store Live over HTTPS | 10 | 04-retail-store-live-over-https.png |
| 05 | CloudWatch Container Insights Log Groups | 9 | 05-cloudwatch-container-insights-log-groups.png |