May 16, 2026 · 10 min read

After the Migration: What `target-type: instance` Actually Changes on EKS Overlay

A follow-up to the Cilium migration: what changes when ALB targets nodes instead of pods, how traffic actually flows through NodePort + kube-proxy replacement, and the operational surprises that surface after the migration is 'done'.

Kubernetes EKS Cilium eBPF ALB Networking kube-proxy
After the Migration: What `target-type: instance` Actually Changes on EKS Overlay

A few days after the Cilium migration I wrote up in the first post, someone asked me why the ALB target group showed 4 entries when the nginx Deployment had 10 replicas. Fair question. I had a half-answer ready, but the full answer turned out to be more interesting than I expected.

The migration itself is the visible work. You install Cilium, drain nodes, watch overlay IPs replace VPC IPs, delete kube-proxy, feel briefly powerful. But the architectural shift that comes with target-type: instance — what it actually does to your traffic pattern, your health checking, your scheduling — only surfaces once the cluster is running and you start poking at it.

This post is the things I didn’t know I needed to understand until the migration was done.

The 4-targets-for-10-pods question

ALB target group: 4 entries. Deployment: 10 nginx pods, all Running. New engineer’s first instinct is to assume something is broken. It isn’t.

When you switched from target-type: ip to target-type: instance in Phase 1 of the migration, you changed what the ALB actually targets. Previously, ALB registered individual pod IPs. Now it registers nodes — specifically, each node listening on the Service’s NodePort. The ALB no longer knows anything about pods.

You have 4 nodes. Target group has 4 entries.

The pod count is irrelevant at the ALB layer. Whether you have 10 nginx pods or 200, the ALB only sees nodes. What happens after a request reaches a node is none of the ALB’s business anymore.

How traffic actually flows now

Before:

Client → ALB ──→ Pod IP directly
                 (1 hop, ALB owns load balancing)

After:

Client → ALB ──→ Node:NodePort ──→ Cilium eBPF ──→ Pod IP
                 (2 hops, two load-balancing stages)

Stage one: ALB picks one of your 4 nodes. The selection is roughly round-robin or least-outstanding-requests depending on what you’ve configured. ALB doesn’t care which node, just that it’s healthy.

Stage two: the node receives the packet on its NodePort. Cilium’s eBPF service handler picks one of the 10 nginx pods. Could be a pod on this same node, could be a pod three nodes away. With externalTrafficPolicy: Cluster (the default), it really is any of them — Cilium uses a 5-tuple hash to pick a backend from the full pool. If the chosen pod is on another node, the packet gets VXLAN-encapsulated and tunneled there.

This is the two-stage load balancing tax of overlay networking on managed Kubernetes. It costs one extra hop. It costs a hash decision. It costs about 50 bytes of VXLAN header. None of these individually matter much. They add up to maybe a few hundred microseconds of additional latency per request, which is nothing for HTTP and noticeable for things like high-frequency RPC. Pick your workload.

What they buy you is the freedom to put pods anywhere, on any node, with the ALB not needing to know.

Why Hubble shows fewer pods than you expect

First time I opened Hubble UI after the migration, I had 10 nginx pods running but the service map showed 3 for nginx. Where were the other 7?

Hubble shows pods that have had recent traffic, not pods that exist. The other 7 weren’t broken or invisible to Cilium. They just hadn’t received any requests yet, so they weren’t part of any flow Hubble had captured.

This took me a minute to internalize. Hubble is a flow visualizer. If a pod sits there with zero connections, Hubble has nothing to draw. Run a few hundred requests and the picture changes:

for i in {1..2000}; do curl -s http://$ALB/nginx > /dev/null & done; wait

After this, all 10 pods appear in the service map. The shape of the graph is the shape of your traffic, not your inventory.

The uneven distribution problem

After running that 2000-request burst, the per-pod counts looked like this:

nginx-...-29glm    287 req
nginx-...-7f29l    875 req
nginx-...-bvlcr    332 req
nginx-...-hpmp6    635 req
nginx-...-pf4pk    585 req
nginx-...-ptfkc    384 req
nginx-...-q9pkx    470 req
nginx-...-qdrmb    416 req
nginx-...-qj78n    774 req
nginx-...-tgwmt    594 req

All 10 hit, none idle, but the spread is 287 to 875. About 3x between the busiest and the quietest. For comparison, httpd and echo (3 pods each) came out almost perfectly balanced: 996/1031/1096 and 1029/1044/1061.

Why the asymmetry? Three things compound.

Pod-per-node skew. Quick check on how your pods are distributed:

kubectl -n default get pods -l app=nginx -o wide --no-headers \
  | awk '{print $7}' | sort | uniq -c

The default scheduler does best-effort spreading but doesn’t guarantee balance. With 10 pods on 4 nodes you typically end up with 3-3-2-2 or 4-3-2-1. The ALB sends ~25% of traffic to each node. A node with 4 pods splits that 25% four ways. A node with 1 pod gets the full 25%. Result: a 4x imbalance baked in at the pod level before Cilium even makes a decision.

Source-tuple narrowness. Cilium’s eBPF service handler picks backends by hashing the 5-tuple (source IP, source port, dest IP, dest port, protocol). When you’re testing from one laptop, source IP is fixed and only the source port varies. The hash has less to chew on. Certain backends end up favored over others, especially in shorter test runs. Real traffic from many client IPs averages out far better.

Test session bias. I’d run an earlier 2000-curl burst against just nginx during testing. Those requests skewed the nginx counts before the steady-state loop even started. Httpd and echo were clean.

The takeaway is the distribution isn’t really lumpy in production. It’s lumpy in a contrived test from a single client. Hundreds of source IPs plus topology spread on the Deployment flatten it out.

Pod health checking moved

This is the consequence that worried me most when I noticed it.

Previously, with target-type: ip, the ALB health-checked each pod directly. If pod 7 went bad — started returning 500s, OOMed, deadlocked — the ALB’s health check would fail, ALB would yank it from the target group, and traffic would stop going to it within about 30 seconds.

Now, with target-type: instance, the ALB health-checks the node’s NodePort, not individual pods. As long as something on that node responds, the node passes. Per-pod failures are invisible to the ALB.

This sounds bad. It isn’t, but only because the responsibility moved up the stack, not because it disappeared.

Pod becomes unhealthy
    ↓
kubelet's readiness probe fails on that pod
    ↓
Pod marked NotReady, removed from Endpoints/EndpointSlice
    ↓
Cilium watches Endpoints, updates its eBPF service map
    on every node (within ~2 seconds)
    ↓
No new requests go to that pod, from any node

It’s actually faster than ALB-based per-pod health checking. ALB health check intervals are typically 10-30 seconds. The kubelet → Endpoints → Cilium flow propagates in 2-5 seconds, cluster-wide. Strictly better.

But there’s a load-bearing assumption underneath: your Deployment has a readiness probe defined.

Without one, kubelet has nothing to check, no failure ever propagates, and broken pods stay in rotation indefinitely. With target-type: ip, ALB was your safety net. Now there is no safety net. The probe is the safety net.

kubectl -n default get deployment nginx \
  -o jsonpath='{.spec.template.spec.containers[*].readinessProbe}'

If that returns empty, fix it before something breaks at 2am.

readinessProbe:
  httpGet:
    path: /nginx/
    port: 80
  initialDelaySeconds: 2
  periodSeconds: 5
  failureThreshold: 2

My three demo apps had no probes, which I’d never get away with in production. Operational rule for any cluster on overlay: every Deployment ships with a real readiness probe. No exceptions.

The ALB still does something useful. It health-checks the NodePort, which catches whole-node failures — if every pod for a Service has died on a particular node and the NodePort returns nothing, ALB marks that node unhealthy and stops sending it traffic. Two layers of health checking, just split differently than before:

  • Pod-level: kubelet readiness probe → Endpoints → Cilium eBPF. Fast, granular, every pod.
  • Node-level: ALB → NodePort health check. Slower, coarse, whole-node failures only.

Both are doing something. Neither is doing what the other was.

externalTrafficPolicy: Cluster vs Local

The default is Cluster. Every node, on receiving NodePort traffic, can route to any pod in the cluster — including pods on other nodes, via VXLAN tunnel. Source IP gets SNAT’d to the node IP, so by the time the packet reaches the pod, you’ve lost the original client IP.

Local flips this. A node will only route NodePort traffic to its local pods. If a node has no local pods for that Service, the NodePort returns nothing, ALB marks the node unhealthy, and traffic stops going there. Source IP is preserved because there’s no SNAT involved.

Use Cluster when:

  • Pods are unevenly distributed and you don’t have topology spread set up.
  • You don’t need the original client IP at the pod (most apps).
  • You want any node to be able to absorb load even if its local pods are dead.

Use Local when:

  • You need real client IPs at the pod. Audit logging, geographic routing, anything IP-based.
  • You can guarantee pods are spread across nodes (topology spread or anti-affinity).
  • You want tighter failure isolation: a node with all-local-pods-dead drops out of the ALB pool cleanly.

For my three demo apps, Cluster is the right default. Stateless, don’t care about source IPs, benefit from being routable cluster-wide. For something like a rate-limiter that blocks by client IP, I’d flip to Local and add topology spread constraints to make sure every node has at least one pod.

kubectl -n default patch svc nginx -p '{"spec":{"externalTrafficPolicy":"Local"}}'

One-line change. Don’t flip it without thinking through pod distribution first though.

Topology spread is your problem now

This is where the architectural change starts having operational consequences.

With target-type: ip, pod distribution didn’t matter much for traffic balancing. ALB knew about each pod individually and balanced across them regardless of which node they sat on.

With target-type: instance and externalTrafficPolicy: Cluster, distribution matters because of the two-stage hashing problem above. With target-type: instance and externalTrafficPolicy: Local, distribution matters because nodes without local pods don’t get traffic for that Service at all.

Either way, you’re now responsible for making the scheduler do a good job spreading pods across nodes. The default behavior is best-effort and works fine for most workloads. When it doesn’t, you reach for explicit constraints:

spec:
  template:
    spec:
      topologySpreadConstraints:
        - maxSkew: 1
          topologyKey: kubernetes.io/hostname
          whenUnsatisfiable: ScheduleAnyway
          labelSelector:
            matchLabels:
              app: nginx

This tells the scheduler to keep the difference in pod count between any two nodes within 1. With 10 pods on 4 nodes you’ll reliably get 3-3-2-2 instead of whatever the scheduler felt like.

whenUnsatisfiable: ScheduleAnyway makes it a preference. If for some reason it can’t be satisfied (node taints, resource pressure), pods still schedule somewhere. Use DoNotSchedule if you’d rather leave pods Pending than violate the spread — fine for stateless apps, painful for stateful ones.

I’d add a constraint like this to every meaningful Deployment in a production cluster. Cheap insurance against the failure modes above.

Is this actually worth it?

Honest question, and the right time to ask it is now, after you’ve seen what the migration costs operationally.

The trade-off is clear. You gave up direct ALB-to-pod targeting in exchange for VPC IP space liberation. You picked up an extra hop in the data path. You added a hash collision risk that only really materializes in low-traffic tests. You shifted pod health checking responsibility from ALB to kubelet. You made pod distribution something you have to actually think about.

Worth it?

Depends on your scale.

If you’re running 30 pods across 5 nodes on a wide VPC with no IP pressure, no. Native VPC CNI is simpler, gives you per-pod ALB targeting, and there’s nothing to gain from overlay except the eBPF features. You can get those via Cilium chaining mode (Cilium for eBPF, VPC CNI still owns IPAM) without taking on any of the architectural baggage in this post.

If you’re running 300 pods and your VPC’s /16 is already 60% allocated to other things, yes. Without the migration you’ll hit IP exhaustion within months. With it, pod count is a memory and CPU question, which is a question you can actually answer with bigger nodes.

The middle ground is the interesting case. Maybe you have 80 pods today but you know you’re growing. The migration cost is mostly one-time engineering effort. The architectural baggage is permanent but manageable. The eBPF data plane and Hubble visibility are genuinely valuable independent of the IP story. If you’re going to do this, doing it earlier is cheaper than doing it later — a 4-node test cluster is a much friendlier place to learn the failure modes than a 40-node production one.

I’d do it again. I’d also fix the IMDS hop limit, put the ALB controller on hostNetwork first, define readiness probes on every Deployment, add topology spread constraints to anything that runs more than one replica. Most of the “what I’d do differently” lessons are about getting the operational prep right, not about regretting the migration itself.

What this series covered

Post 1 was the doing. Install Cilium, patch aws-node, roll nodes, delete kube-proxy, enable Hubble, hit three blockers along the way and write them down so the next person doesn’t.

This post was the understanding. Why the ALB target group looks different. Where load balancing actually happens. What moved when pod health checking left the ALB. How externalTrafficPolicy and topology spread fit into the new picture.

What I didn’t cover, and what’s worth its own writeup someday: Hubble in detail (flow filters, policy correlation, dynamic metrics), CiliumNetworkPolicy as a replacement for pod security groups (more powerful, completely different mental model), and the proper way to expose Hubble UI to a team without putting it on the public internet.

The migration was the point of entry. Everything Cilium gives you after is still worth exploring.