← Back to Blog

Migrating off Ingress NGINX

I still remember the first time I migrated a service from Apache Mesos to Kubernetes in 2017. That was the moment I fell in love with Kubernetes. I drank the Kool-Aid.

For the next few days, I migrated service after service until I was fully committed. Most of the applications were simple enough: add a few ingress annotations, wire things up, and voilà. Houston, we have liftoff.

Back then, IngressClass did not exist yet, so annotations were the name of the game. Using nginx-ingress was the standard choice, and for a long time, that worked well.

Fast-forward a couple of years. I joined a company that, like me, had also fully embraced Kubernetes. That is where I got deeper exposure to Istio and Spinnaker. This was the era of Istio installation methods changing from Helm, to operator, back to Helm, then to istioctl. To get a service running, you needed to understand VirtualService and DestinationRule.

It was not especially hard, but for smaller teams, it could feel like overkill. So for many workloads, the default remained simple: use NGINX.

Then came the sad announcement about the future of the NGINX Ingress Controller ecosystem, followed by the rise of Gateway API CRDs. Gateway API is cool, and I like the direction, but migrating is not exactly straightforward when you have hundreds of microservice ingresses, external-dns, cert-manager magic, and fancy NGINX target rewrites sprinkled everywhere.

When I read the announcement, I thought to myself: maybe I should have stuck with Istio.

I asked a few colleagues, many of whom run their own Kubernetes clusters, what they were using these days. Cilium seemed to be the poison of choice. So I spent a few days kicking the tires and giving it a real shot.

Here are my observations.

1. I am uncomfortable with how much eBPF changes host networking

I do not love how Cilium uses eBPF to modify host networking in order to do its magic.

To be clear, I understand why this is powerful. eBPF is impressive, and Cilium is doing some genuinely interesting things with it. But I still have a hard time accepting that uninstalling a Helm chart may require a node restart to fully clean up networking behavior.

Call me old-fashioned, but Cilium is not part of the core Kubernetes control plane, last I checked. If I install something with Helm, I expect to be able to uninstall it cleanly without rebooting nodes.

That may be a simplistic expectation, especially for a CNI, but it is still an operational concern.

2. The documentation leaves too many important trade-offs unclear

The documentation around certain configuration options feels vague or incomplete. There are plenty of examples, but it is not always clear what the operational trade-offs are.

For example this deployment

resource "helm_release" "cilium" {
  name       = "cilium"
  namespace  = "kube-system"
  repository = "https://helm.cilium.io/"
  chart      = "cilium"
  timeout    = 600
  wait       = false
  version    = "1.18.4"

  values = [
    <<-EOT
      cni:
        chainingMode: aws-cni
        exclusive: false
      routingMode: native
      gatewayAPI:
        enabled: true
      kubeProxyReplacement: true
      enableIPv4Masquerade: false
      encryption:
        enabled: true
        type: wireguard
      nodePort:
        enabled: true
      ingressController:
        enabled: true
        loadbalancerMode: shared
    EOT
  ]
}

What are the real trade-offs between direct chaining and non-direct chaining? If you are running a mesh, does it matter? What changes when kubeProxyReplacement is enabled in combination with AWS CNI chaining? What behavior should I expect from Gateway API and the ingress controller when using shared load balancer mode?

Maybe the answers are there, scattered across multiple pages, GitHub issues, and Slack threads. But that is part of the problem. These are not niche configuration details; they affect how traffic flows through the cluster.

For something this foundational, I want crisp documentation that explains not just what a flag does, but when I should use it, when I should avoid it, and what I am giving up either way.

3. AWS NLB support still has sharp edges

Using AWS Network Load Balancers with Cilium was another area where things felt less polished than I expected.

There are open issues and subtle behaviors that can quickly become relevant depending on how your services, health checks, source IP preservation, and Gateway API resources are configured.

This is exactly the kind of thing that makes me nervous in production. The happy path may work, but Kubernetes ingress traffic is rarely just the happy path. Real clusters have legacy annotations, DNS automation, certificate automation, rewrites, redirects, cross-zone load balancing assumptions, and workloads that rely on behavior nobody remembers documenting.

Once you start migrating hundreds of ingresses, the edge cases matter more than the demo.

So where did I land?

After spending a few days with Cilium, I can see why people like it. It is powerful, modern, and clearly pushing Kubernetes networking forward. The eBPF-based approach opens doors that older networking models simply cannot.

But for my use case, it also felt like too much operational surface area.

I wanted a straightforward replacement for NGINX Ingress with a clear migration path, predictable AWS NLB behavior, and documentation that made the trade-offs obvious. Instead, I found myself debugging networking assumptions, reading GitHub issues, and trying to decide whether I was adopting a better ingress story or a much larger networking platform.

So, at least for now, I am back to using Istio. Here is my deployment file:

locals {
  istio_chart_url     = "https://istio-release.storage.googleapis.com/charts"
  istio_chart_version = "1.29.2"

  _gateway_crd_raw  = [for d in split("---", file("${path.module}/crd/gateway-v1.5.1.yaml")) : try(yamldecode(trimspace(d)), null)]
  _gateway_crd_docs = [for doc in local._gateway_crd_raw : doc if doc != null]
  gateway_crds      = { for i, doc in local._gateway_crd_docs : tostring(i) => { for k, v in doc : k => v if k != "status" } }
  gateway_tolerations = [
    {
      key      = "EssentialOnly"
      operator = "Exists"
    },
    {
      key      = "CriticalAddonsOnly"
      operator = "Exists"
    },
  ]
}

resource "kubernetes_manifest" "gateway_api_crds" {
  for_each = local.gateway_crds
  manifest = each.value
}

resource "kubernetes_namespace_v1" "istio" {
  metadata {
    name = "istio-system"
  }
}

resource "helm_release" "istio_base" {
  name       = "istio-base"
  chart      = "base"
  namespace  = kubernetes_namespace_v1.istio.metadata[0].name
  repository = local.istio_chart_url
  version    = local.istio_chart_version

  depends_on = [
    kubernetes_namespace_v1.istio
  ]
}

resource "helm_release" "istio_cni" {
  name       = "istio-cni"
  chart      = "cni"
  namespace  = kubernetes_namespace_v1.istio.metadata[0].name
  repository = local.istio_chart_url
  version    = local.istio_chart_version

  values = [
    yamlencode({
      profile = "ambient"
    })
  ]

  depends_on = [
    helm_release.istio_base
  ]
}

resource "helm_release" "istiod" {
  name       = "istiod"
  chart      = "istiod"
  namespace  = kubernetes_namespace_v1.istio.metadata[0].name
  repository = local.istio_chart_url
  version    = local.istio_chart_version

  values = [
    yamlencode({
      meshConfig = {
        accessLogFile = "/dev/stdout"
      }
      profile = "ambient"
      tolerations = [
        {
          key      = "EssentialOnly"
          operator = "Exists"
        },
        {
          key      = "CriticalAddonsOnly"
          operator = "Exists"
        },
      ]
    })
  ]

  depends_on = [
    helm_release.istio_base
  ]
}

resource "helm_release" "ztunnel" {
  name       = "ztunnel"
  chart      = "ztunnel"
  namespace  = kubernetes_namespace_v1.istio.metadata[0].name
  repository = local.istio_chart_url
  version    = local.istio_chart_version

  depends_on = [
    helm_release.istio_base
  ]
}

resource "helm_release" "istio_ingress_gateway_public" {
  name       = "istio-public"
  chart      = "gateway"
  namespace  = kubernetes_namespace_v1.istio.metadata[0].name
  repository = local.istio_chart_url
  version    = local.istio_chart_version

  values = [
    yamlencode({
      service = {
        type = "LoadBalancer"
        annotations = {
          "service.beta.kubernetes.io/aws-load-balancer-type"                              = "nlb"
          "service.beta.kubernetes.io/aws-load-balancer-scheme"                            = "internet-facing"
          "service.beta.kubernetes.io/aws-load-balancer-proxy-protocol"                    = "*"
          "service.beta.kubernetes.io/aws-load-balancer-cross-zone-load-balancing-enabled" = "true"
          "service.beta.kubernetes.io/aws-load-balancer-backend-protocol"                  = "tcp"
          "service.beta.kubernetes.io/aws-load-balancer-ssl-ports"                         = "443"
        }
      }
      tolerations = local.gateway_tolerations
    })
  ]

  depends_on = [
    helm_release.istiod
  ]
}

resource "helm_release" "istio_ingress_gateway_private" {
  name       = "istio-private"
  chart      = "gateway"
  namespace  = kubernetes_namespace_v1.istio.metadata[0].name
  repository = local.istio_chart_url
  version    = local.istio_chart_version

  values = [
    yamlencode({
      service = {
        type = "LoadBalancer"
        annotations = {
          "service.beta.kubernetes.io/aws-load-balancer-type"                              = "nlb"
          "service.beta.kubernetes.io/aws-load-balancer-scheme"                            = "internal"
          "service.beta.kubernetes.io/aws-load-balancer-proxy-protocol"                    = "*"
          "service.beta.kubernetes.io/aws-load-balancer-cross-zone-load-balancing-enabled" = "true"
          "service.beta.kubernetes.io/aws-load-balancer-backend-protocol"                  = "tcp"
          "service.beta.kubernetes.io/aws-load-balancer-ssl-ports"                         = "443"
          "service.beta.kubernetes.io/aws-load-balancer-internal"                          = "0.0.0.0/0"
        }
      }
      tolerations = local.gateway_tolerations
    })
  ]

  depends_on = [
    helm_release.istiod
  ]
}