May 13, 2026 · 15 min read

From ENI Math to eBPF: Migrating EKS Off the AWS VPC CNI

Why EKS's native VPC CNI hits a ceiling on pod density, and a step-by-step migration from VPC CNI to Cilium overlay networking on EKS 1.35.

Kubernetes EKS Cilium eBPF CNI AWS Networking

VPC CNI gives every pod a real VPC IP. Sounds elegant on paper. Run a kubectl describe pod once and see FailedScheduling: too many pods and you’ll start questioning the design.

Every pod IP is a secondary address on a node’s ENI, and the per-instance-type limits are baked in. A t3.medium caps at 17 pods. A t3.large at 35. An m5.4xlarge, despite having 64 GB of RAM, gets you 234. The formula is fixed:

Max pods per node = (Max ENIs × (IPv4 per ENI − 1)) + 2

You can’t tune this. Prefix delegation helps a bit (allocates /28 blocks per ENI instead of individual IPs) but you trade IP density for IP waste, and you’re still capped by ENI count.

There’s a second wall behind the first one. Even if you could pack more pods per node, would your VPC tolerate it? A default EKS VPC has thousands of addresses on paper, but they’re shared with everything else, and every pod burns one. You hit a ceiling on cluster-wide pod count well before you hit your CPU budget.

At that point your options narrow:

Bigger instances. Burns IPs faster. Doesn’t fix anything.
Prefix delegation. Buys time. Increases waste.
Custom networking with a secondary CIDR (CGNAT 100.64.0.0/16 etc.). Works, but now you’re managing two IP spaces and the secondary CIDR has its own routing quirks.
Overlay CNI. Move pods off the VPC IP space entirely. Pod count bounded only by node CPU/memory.

I went with option 4. This post is the walkthrough.

The upside is straightforward. Pod density isn’t capped by ENI math anymore, just RAM and CPU. Pod CIDR is whatever I want (I picked 10.244.0.0/16). VPC IP budget stays untouched. And on Cilium specifically, you also get the eBPF data plane and Hubble for flow visibility, which alone is worth the migration even if you weren’t IP-constrained.

The downside is real too. ALB can’t target pods directly anymore, so you get two-stage load balancing: ALB → node → eBPF → pod. VXLAN encapsulation adds about 50 bytes per packet, so bandwidth drops slightly and latency creeps up. Admission webhooks need hostNetwork: true because the EKS API server has no route to overlay pod IPs (this one bit me, more later). And pod security groups stop working — you trade them for Cilium network policies, which are more powerful but a different mental model.

For ten pods on a wide-open VPC, native VPC CNI is fine. For workloads pushing hundreds of pods on a constrained VPC, overlay pays for itself fast. Pick deliberately.

The starting state

The cluster I migrated:

EKS 1.35, region ap-south-1, name minnie-wallace.
4 nodes, managed node group, t3.medium, AL2023 (kernel 6.1, which Cilium needs for its eBPF feature set).
VPC CNI add-on active. Default VPC 172.31.0.0/16. Three public subnets.
Three apps (nginx, httpd, echo), one per namespace, sharing a single ALB via the AWS Load Balancer Controller’s IngressGroup feature. Paths /nginx, /httpd/, /echo.
ALB targeting pods directly via target-type: ip.

Traffic path before migration:

    Client → ALB ──→ Pod IP directly (in VPC CIDR 172.31.0.0/16)
                          │
                          └─→ Pod density per node gated by ENI math
                              (t3.medium: 17 pods max)
                          └─→ Pod IPs consume from VPC subnet budget
                          └─→ ALB target group registers individual pod IPs
                              (target count = pod count)

What I wanted after: same three apps serving the same paths externally, but pods on Cilium overlay (10.244.0.0/16), eBPF handling Services, Hubble running for visibility.

The plan

Six phases. Order is the entire thing — get it wrong and you break pod networking or lock yourself out mid-migration.

Phase	What	Why this order
1	Switch ALB to `target-type: instance`	Decouples ALB from pod IPs before any CNI work. The rest of the migration can happen without touching ingress.
2	Install Cilium alongside VPC CNI	Cilium runs as a DaemonSet on every node but doesn’t touch existing pods. Sets up the eBPF data plane while VPC CNI stays active.
3	Patch `aws-node` to skip new nodes	Label existing nodes so aws-node stays put. Patch the DaemonSet’s nodeSelector. New nodes joining the cluster won’t get VPC CNI.
4	Roll nodes one at a time	Drain, terminate, ASG provisions replacement. Each replacement comes up Cilium-only. Pods scheduled on it get overlay IPs.
5	Remove VPC CNI add-on	All nodes are now Cilium-only. Delete the EKS managed add-on.
6	Enable Hubble + kube-proxy replacement	Pure eBPF data plane. Cilium owns Services, kube-proxy is gone. Hubble UI for flow visibility.

The insight in this ordering is Phase 1. Make the ALB CNI-agnostic before any CNI work and the rest of the migration doesn’t involve ingress.

Phase 3 is the other one. You can’t just delete the aws-node DaemonSet — it breaks pod networking on every node instantly. The trick is a nodeSelector that current nodes match but future ones won’t.

Phase 1 — Decouple the ALB from pod IPs

Nothing about the CNI changes here. Only how the ALB reaches pods.

Services from ClusterIP to NodePort:

kubectl -n default patch svc nginx -p '{"spec":{"type":"NodePort"}}'
kubectl -n httpd   patch svc httpd -p '{"spec":{"type":"NodePort"}}'
kubectl -n echo    patch svc echo  -p '{"spec":{"type":"NodePort"}}'

Ingresses from target-type: ip to target-type: instance:

kubectl -n default annotate ingress nginx \
  alb.ingress.kubernetes.io/target-type=instance --overwrite
kubectl -n httpd annotate ingress httpd \
  alb.ingress.kubernetes.io/target-type=instance --overwrite
kubectl -n echo annotate ingress echo \
  alb.ingress.kubernetes.io/target-type=instance --overwrite

The AWS LB Controller reconciles in about 30 seconds. It creates new target groups (instance type this time), registers the 4 nodes on each Service’s NodePort, updates listener rules, deletes the old IP-type target groups. You might catch one 502 during the cutover. Not worth maintenance-windowing for.

Quick check that all three paths still respond:

ALB=$(kubectl get ingress nginx -o jsonpath='{.status.loadBalancer.ingress[0].hostname}')
for path in nginx httpd/ echo; do
  echo "/$path → $(curl -s -o /dev/null -w '%{http_code}' http://$ALB/$path)"
done

Three 200s. New baseline. From here, the ALB doesn’t know or care which CNI is below it.

Phase 2 — Install Cilium alongside VPC CNI

Helm install with overlay/VXLAN mode and cluster-pool IPAM:

helm repo add cilium https://helm.cilium.io/
helm repo update cilium

helm install cilium cilium/cilium \
  --namespace kube-system \
  --version 1.18.3 \
  --set ipam.mode=cluster-pool \
  --set ipam.operator.clusterPoolIPv4PodCIDRList="{10.244.0.0/16}" \
  --set ipam.operator.clusterPoolIPv4MaskSize=24 \
  --set routingMode=tunnel \
  --set tunnelProtocol=vxlan \
  --set kubeProxyReplacement=false \
  --set bpf.masquerade=false \
  --set cni.exclusive=false

A few of these matter more than the rest.

ipam.mode=cluster-pool makes the operator carve the pod CIDR into /24 slices, one per node. Each node gets up to 256 pod IPs from its slice.

cni.exclusive=false is critical during migration. Cilium installs its CNI config at /etc/cni/net.d/05-cilium.conflist but leaves VPC CNI’s 10-aws.conflist alone. Both coexist on each node throughout the rolling phase.

kubeProxyReplacement=false and bpf.masquerade=false are conservative for now. Kube-proxy still handles Services. iptables still handles masquerade. We’ll flip both later in Phase 6 as a bundle (the reason these are a bundle is the first blocker below).

kubectl -n kube-system get pods -l k8s-app=cilium
kubectl -n kube-system get pods -l name=cilium-operator

Four cilium agent pods, two operator pods, all Running. Nothing in the data plane has actually changed yet. VPC CNI is still active, every existing pod still has a VPC IP. Cilium is installed and idle.

Phase 3 — Stop VPC CNI from claiming new nodes

This is where the trick lives. We want aws-node to keep running on the existing 4 nodes (so their pods don’t lose networking) but to skip any new node that joins the cluster from this point on.

Patch the DaemonSet’s nodeSelector to a label only the existing nodes have.

# Label every current node first
kubectl label nodes --all io.cilium/aws-node-enabled=true

# Then patch the DaemonSet
kubectl -n kube-system patch daemonset aws-node \
  --type='strategic' \
  -p='{"spec":{"template":{"spec":{"nodeSelector":{"io.cilium/aws-node-enabled":"true"}}}}}'

The order matters and it cost me a few panic minutes the first time I read about this pattern. If you patch first, the DaemonSet controller sees no node matches the new selector and immediately evicts aws-node from every running node. That breaks VPC CNI for every pod on every node. Label first, patch second. By the time the selector changes, every existing node already matches it.

kubectl -n kube-system get ds aws-node
# DESIRED=4, CURRENT=4, READY=4

How this looks in practice:

Existing nodes (labeled):
    node-1 [io.cilium/aws-node-enabled=true] → aws-node ✓ + cilium ✓
    node-2 [io.cilium/aws-node-enabled=true] → aws-node ✓ + cilium ✓
    node-3 [io.cilium/aws-node-enabled=true] → aws-node ✓ + cilium ✓
    node-4 [io.cilium/aws-node-enabled=true] → aws-node ✓ + cilium ✓

Future nodes (no label, joining via ASG):
    node-N [no label]                        → aws-node ✗ + cilium ✓

Phase 4’s job: replace every existing node, one at a time, with a fresh unlabeled one.

Phase 4 — Rolling node replacement

For each of the 4 nodes:

NODE=ip-172-31-x-y.ap-south-1.compute.internal
INSTANCE_ID=$(kubectl get node $NODE -o jsonpath='{.spec.providerID}' | awk -F/ '{print $NF}')

kubectl cordon $NODE
kubectl drain $NODE \
  --ignore-daemonsets \
  --delete-emptydir-data \
  --force \
  --grace-period=60

aws ec2 terminate-instances --instance-ids $INSTANCE_ID --region ap-south-1

The drain evicts pods to remaining nodes (still on VPC CNI, so they still get VPC IPs). The ASG sees one fewer instance and provisions a replacement. The new instance joins the cluster without the io.cilium/aws-node-enabled label. Patched aws-node DaemonSet doesn’t schedule on it. Only Cilium is there. Any pod scheduled on this new node gets a 10.244.x.x overlay IP from the operator-allocated /24 slice.

Wait for the new node to be Ready:

kubectl get nodes -w

Then check it:

NEW_NODE=<paste node name>

kubectl -n kube-system get pods -l k8s-app=aws-node -o wide | grep $NEW_NODE
# Empty. aws-node is NOT here.

kubectl get pods -A -o wide | grep $NEW_NODE
# Any pod on this node should have a 10.244.x.x IP.

The cluster spends a few minutes in a hybrid state — some nodes on VPC CNI, others on Cilium. For stateless apps that don’t need cross-CNI pod-to-pod talk, this isn’t a problem. The ALB → NodePort → eBPF-or-iptables → pod path works on every node either way.

When you’re done with all 4:

kubectl get pods -A -o wide
# Every workload pod IP is in 10.244.x.x.
# Pods with hostNetwork: true still use node IPs. That's correct.

Phase 5 — Remove VPC CNI

VPC CNI add-on is doing nothing at this point. Delete it.

aws eks delete-addon \
  --cluster-name minnie-wallace \
  --addon-name vpc-cni \
  --region ap-south-1

The aws-node DaemonSet disappears in about 30 seconds. The scaffolding label can go too:

kubectl label nodes --all io.cilium/aws-node-enabled-

Verify:

kubectl -n kube-system get ds
# No aws-node line.

Migration is functionally done at this point. But not on full Cilium yet. Kube-proxy is still running, masquerade is still iptables. Phase 6 closes that out.

Phase 6 — Full eBPF: kube-proxy replacement + Hubble

Two upgrades.

First, enable kube-proxy replacement and BPF masquerade together. They’re a bundle, which is the first blocker below.

You need the EKS API endpoint, because Cilium with KPR is going to take over translation of the in-cluster kubernetes Service. The agent needs to reach the API server to install its rules before it can resolve the kubernetes ClusterIP through its own eBPF.

aws eks describe-cluster --name minnie-wallace --region ap-south-1 \
  --query 'cluster.endpoint' --output text
# https://XXXX.gr7.ap-south-1.eks.amazonaws.com

Use only the hostname, no scheme. (This is the second blocker. I’ll be honest about it below.)

helm upgrade cilium cilium/cilium \
  --namespace kube-system \
  --version 1.18.3 \
  --reuse-values \
  --set kubeProxyReplacement=true \
  --set bpf.masquerade=true \
  --set k8sServiceHost=XXXX.gr7.ap-south-1.eks.amazonaws.com \
  --set k8sServicePort=443

kubectl -n kube-system rollout restart ds/cilium
kubectl -n kube-system rollout status ds/cilium

Check the eBPF data plane is fully on:

kubectl -n kube-system exec ds/cilium -- cilium status | grep -iE 'proxy|masquerading'
# KubeProxyReplacement:    True   [ens5 ...]
# Masquerading:            BPF    [ens5, ...]   10.244.0.0/16

Both True and BPF. Now delete kube-proxy:

kubectl -n kube-system delete ds kube-proxy

Then Hubble. (If your ALB controller is still on overlay pod IPs at this point, the next command will fail. Read blocker 3 first.)

helm upgrade cilium cilium/cilium \
  --namespace kube-system \
  --version 1.18.3 \
  --reuse-values \
  --set hubble.enabled=true \
  --set hubble.relay.enabled=true \
  --set hubble.ui.enabled=true \
  --set hubble.metrics.enabled='{dns,drop,tcp,flow,port-distribution,icmp,httpV2}'

kubectl -n kube-system rollout restart ds/cilium

kubectl -n kube-system port-forward svc/hubble-ui 12000:80
# http://localhost:12000

Full eBPF data plane in place. Pods communicate via Cilium. Services translated via eBPF. Masquerading via eBPF. Hubble seeing every flow.

The three blockers

The migration itself is fairly well documented. What’s less documented is what breaks along the way. Three things got me. None of them are showstoppers but each cost real time and each is worth knowing about up front.

Blocker 1: `bpf.masquerade` doesn’t work without `kubeProxyReplacement`

My first Helm install had bpf.masquerade=true and kubeProxyReplacement=false. The idea was to get the eBPF masquerade benefits (less iptables, better Hubble visibility) but keep kube-proxy in place as a safety net.

Cilium agents crash-looped on startup. The fatal log:

unable to initialize BPF masquerade support:
BPF masquerade requires NodePort (--enable-node-port="true")

BPF masquerade depends on Cilium’s eBPF NodePort handling. eBPF NodePort handling is part of kube-proxy replacement. So they’re really one thing, not two, and you can’t pick masquerade while keeping kube-proxy.

I had two options. Disable BPF masquerade for now and let iptables masquerade carry the load (works fine with kube-proxy). Or just enable kube-proxy replacement immediately, accepting that I’d lose the safety net during migration.

I picked the first one. Flipped both bits together later in Phase 6.

helm upgrade cilium cilium/cilium \
  --namespace kube-system \
  --reuse-values \
  --set bpf.masquerade=false

Not obvious from the values reference, only obvious from the crash log.

Blocker 2: don’t paste `https://` into `k8sServiceHost`

The EKS endpoint AWS hands you:

https://XXXX.gr7.ap-south-1.eks.amazonaws.com

That’s what describe-cluster returns. I copy-pasted it directly into --set k8sServiceHost=....

Cilium constructs https://[<value>]:443 internally and ended up with:

https://[https://XXXX.gr7.ap-south-1.eks.amazonaws.com]:443

Which obviously doesn’t parse. The agent log:

host must be a URL or a host:port pair:
"https://[https://XXXX.gr7.ap-south-1.eks.amazonaws.com]:443"

Strip the https://. Just the hostname.

helm upgrade cilium cilium/cilium \
  --reuse-values \
  --set k8sServiceHost=XXXX.gr7.ap-south-1.eks.amazonaws.com

Mortifying error. Took me five minutes to spot. Including it here because I’d bet the next person to read this migration writeup will also paste the URL straight from aws eks describe-cluster.

Blocker 3: the ALB controller webhook can’t be reached after migration

This is the most architecturally interesting failure of the whole exercise. Hubble install failed with:

Error: UPGRADE FAILED: failed to create resource: Internal error occurred:
failed calling webhook "mservice.elbv2.k8s.aws": failed to call webhook:
Post "https://aws-load-balancer-webhook-service.kube-system.svc:443/mutate-v1-service":
Address is not allowed

Walking through what happened. The AWS LB Controller registers a mutating admission webhook on Service resources. Whenever a Service is created or updated anywhere in the cluster, the EKS API server calls this webhook before persisting the Service. The webhook is reached through a Service that points to the controller’s pods.

Before migration, those controller pods had VPC IPs (172.31.x.x). The EKS managed API server has an ENI in your VPC. It could reach those pods via standard VPC routing.

After migration, those controller pods are on 10.244.x.x. That CIDR only exists inside the cluster, encapsulated in VXLAN tunnels between Cilium-enabled nodes. The EKS API server has no route to it. So when Helm tries to create the hubble-ui Service, the API server calls the webhook, hits the overlay IP, and AWS networking rejects the packet entirely. Hence “Address is not allowed.”

Fix is one line: put the controller pods on host network so they use the node’s VPC IP again.

helm upgrade aws-load-balancer-controller eks/aws-load-balancer-controller \
  --namespace kube-system \
  --reuse-values \
  --set hostNetwork=true

Pods recreate with node IPs. Webhook reachable again. Service create/update works.

This isn’t a one-off fix. It’s a permanent architectural consequence of running overlay on managed Kubernetes. Anything in your cluster that registers an admission webhook the API server has to call — ALB controller, cert-manager, Kyverno, OPA Gatekeeper, ExternalDNS in some configs — needs to be on hostNetwork: true. The same problem exists on GKE and AKS overlays. It’s not specific to EKS or to Cilium. It’s just what overlay CNI means: pod IPs aren’t reachable from outside the cluster, and the managed control plane is outside the cluster.

If I were doing this migration over again, the very first thing I’d do, before Phase 1, is put every webhook-registering controller on hostNetwork: true.

End state

After all six phases:

kubectl get nodes
# 4 nodes, all on Cilium.

kubectl get pods -A -o wide
# Every workload pod IP in 10.244.x.x range.
# hostNetwork pods (cilium-agent, ALB controller) still use node IPs. Correct.

kubectl -n kube-system get ds
# cilium, cilium-envoy, eks-node-monitoring-agent, eks-pod-identity-agent.
# No aws-node. No kube-proxy.

kubectl -n kube-system exec ds/cilium -- cilium status
# KubeProxyReplacement:    True
# Masquerading:            BPF
# Hubble Relay:            OK

ALB=$(kubectl get ingress nginx -o jsonpath='{.status.loadBalancer.ingress[0].hostname}')
curl http://$ALB/nginx
curl http://$ALB/httpd/
curl http://$ALB/echo

Three 200s.

Traffic path after migration:

    Client → ALB ──→ Node:NodePort ──→ Cilium eBPF ──→ Pod IP (10.244.x.x)
                          │                   │
                          │                   └─→ Cross-node = VXLAN tunnel
                          │
                          └─→ Pod density gated only by node CPU/memory
                          └─→ Pod IPs from overlay CIDR, VPC untouched
                          └─→ ALB target group registers nodes, not pods
                              (target count = node count)

Quick load-distribution sanity check:

for i in {1..2000}; do curl -s http://$ALB/nginx > /dev/null & done; wait

for p in $(kubectl -n default get pods -l app=nginx -o name); do
  count=$(kubectl -n default logs $p 2>/dev/null | wc -l)
  echo "$p: $count"
done

All 10 nginx pods get traffic. Distribution is uneven — Cilium uses hash-based selection per connection and a single curl source IP has terrible hash entropy. Real traffic from many clients distributes much more evenly. No pod sits idle.

Hubble UI populates with live flows:

kubectl -n kube-system port-forward svc/hubble-ui 12000:80

Open http://localhost:12000 and you get the service map. Pick a namespace, watch traffic light up in real time.

What I’d do differently

A few honest notes after the fact.

Enable kube-proxy replacement and BPF masquerade upfront. I deferred them for “migration safety” and ended up doing two Helm upgrades instead of one. The risk of doing both together was smaller than I’d convinced myself.

Put the ALB controller on hostNetwork: true before Phase 1, not after Phase 5 in a panic. If you know you’re going overlay, this is non-optional.

Fix the IMDS hop limit on the node group launch template from day one. I hit a separate Cilium agent issue (context deadline exceeded reading instance metadata) before this migration, and worked around it by passing vpcId to Helm explicitly. The proper fix is HttpPutResponseHopLimit: 2 on the launch template. I’m still on the workaround. I’ll fix it next time I touch the node group.

Make sure every Deployment has readiness probes. With target-type: ip, ALB was health-checking each pod directly. With target-type: instance, that responsibility moves up to kubelet. Without a readiness probe, kubelet has nothing to check, broken pods stay in rotation, and you have no idea. My three demo apps had no probes, which I’d never get away with in production.

What’s next

This post ends at “migration successful, eBPF data plane fully active.” A second post covers the architectural consequences — the stuff nobody warns you about once you’ve actually done it:

Why your ALB target group shows 4 entries when you have 10 pods (and why that’s right).
Two-stage load balancing and the sampling artifacts it creates.
Where pod-level health checking actually lives now.
externalTrafficPolicy: Cluster vs Local and when each makes sense.
Topology spread constraints, because pod scheduling distribution is your problem now.

VPC IP budget intact. Pod density gated only by CPU and RAM. Full flow visibility via Hubble. eBPF data plane end-to-end.

The blockers above are the real value of writing this down. The migration mechanics — label nodes before patching, drain in order, hostNetwork for every webhook controller — those are the things I’ll remember next time. Everything else, I can recover from. These I’d rather not relearn the hard way.

Follow-up: After the Migration: What target-type: instance Actually Changes on EKS Overlay — covers the architectural consequences once the cluster is actually running, including why ALB target groups show node counts instead of pod counts and how traffic actually flows post-migration.