1. Executive Summary
nginx-ingress is retiring (security patches stopped March 2026). We must migrate to a new ingress solution. The final choice is not yet decided — options under consideration include self-hosted Istio, GCP Cloud Service Mesh, Envoy Gateway, and other ingress controllers.
Proposed approach: Use Istio as an interim placeholder to unblock application teams and establish the new traffic ingress path, while keeping the final decision open. This allows parallel preparation without committing to a specific long-term solution prematurely.
2. Context & Motivation
| Item | Detail |
|---|---|
| Trigger | nginx-ingress retirement — no security patches after March 31, 2026 |
| Current State | nginx-ingress (likely with Ingress resources) handling all external traffic |
| Final Options | Self-hosted Istio, GCP Cloud Service Mesh (CSM), Envoy Gateway, other ingress controllers |
| Gap | Final decision not yet made; migration cannot wait |
| Interim Solution | Deploy Istio as placeholder — coexists with nginx-ingress, no disruption to current services |
3. Migration Strategy Overview
Phase 1: Install new gateway (Istio) alongside nginx-ingress
Phase 2: Enable GCP Private Service Connect for new gateway
Phase 3: Application teams create Gateway resources → regression/functional/performance tests
Phase 4: (Optional) CoreDNS + ExternalDNS for test DNS automation
Phase 5: DNS cutover to new IP
Phase 6: Observe, then demis nginx-ingress
Network traffic
graph TD
subgraph External_Network ["Outside GCP / On-Prem"]
C[Client / Consumer]
end
subgraph Consumer_Project ["Other GCP Projects (Consumer)"]
C_PSC[PSC Endpoint]
end
subgraph Shared_VPC ["GCP Shared VPC (Networking Project)"]
DNS[Cloud DNS]
subgraph Legacy_Stack ["Legacy Entrance (To be Retired)"]
ILB_Old[Internal Load Balancer]
MIG[MIG Proxy VMs]
end
subgraph Modern_Stack ["Shared VPC Entrance"]
PSC_EP[PSC Endpoint / Forwarding Rule]
end
end
subgraph Service_Project ["GKE Service Project (Producer)"]
subgraph GKE_Cluster ["GKE Cluster"]
subgraph Data_Plane ["Traffic Management"]
direction LR
NGX[Nginx-Ingress Controller]
IST[Istio Ingress Gateway]
SA[PSC Service Attachment]
end
subgraph Workloads ["Application Layer"]
APP[Backend Microservices]
end
end
end
%% Path 1: Legacy Path
C -- "1\. api-gateway.example.com" --> ILB_Old
ILB_Old -- "2\. Forward" --> MIG
MIG -- "3\. Proxy" --> NGX
NGX -- "4\. Route" --> APP
%% Path 2: Shared VPC PSC Path
C -. "1\. api-gateway.example.com" .-> PSC_EP
PSC_EP == "2\. PSC Tunnel" ==> SA
%% Path 3: Cross-Project PSC Path
C_PSC == "Direct Project-to-Project Access" ==> SA
%% Internal Routing
SA ==> IST
IST -. "4\. Gateway API / VirtualService" .-> APP
%% Styling
style Legacy_Stack fill:#fff1f0,stroke:#cf1322,stroke-width:2px,stroke-dasharray: 5 5
style Modern_Stack fill:#f6ffed,stroke:#389e0d,stroke-width:2px
style Consumer_Project fill:#e6fffb,stroke:#13c2c2,stroke-width:2px
style IST fill:#e6f7ff,stroke:#1890ff,stroke-width:2px
style NGX fill:#f5f5f5,stroke:#bfbfbf,color:#8c8c8c
style APP fill:#efdbff,stroke:#722ed1,stroke-width:2px
style SA fill:#fffbe6,stroke:#faad14,stroke-width:2px
Key principle: Both gateways coexist. nginx-ingress continues serving production traffic. Istio is validated in parallel with zero impact.
4. Detailed Migration Steps
Step 1 — Install the New Gateway (Istio) into Clusters
Goal: Deploy Istio ingress gateway as an interim solution alongside the existing nginx-ingress.
Co-existence mechanism:
- nginx-ingress watches
Ingressresources (orIngressClassassigned to it) - Istio ingress gateway watches
GatewayandVirtualServiceresources - No conflict — they watch different Kubernetes resource types
Considerations:
- Keep nginx-ingress running unchanged — do NOT modify existing
Ingressresources - Allocate a separate IP / LoadBalancer for Istio gateway (avoid port conflicts)
Step 2 — Enable GCP Private Service Connect for the New Gateway
Goal: Use GCP Private Service Connect (PSC) as the network entrance for the new gateway, providing private, secure access without exposing traffic to the public internet, no depends on the VM proxies.
Producer side:
- Create an internal passthrough Network Load Balancer for the Istio gateway (Service with internal LB annotation)
- Create a ServiceAttachment to publish the service via PSC
Consumer side (access from other projects and non-gcp):
- Create a PSC endpoint forwarding rule in the shared VPC pointing to the ServiceAttachment(for non-gcp)
- (Optional) create psc endpoint inside the consumer’s gcp project, so no need to access via GKEOUT proxy
Step 3 — Application Team Onboarding & Testing
Goal: Need Application teams’ help to create Gateway and VirtualService resources for our services. Run full regression, functional, and performance tests on both HTTP and TCP ports using the new entrance — with zero impact to current services.
Workflow for each application:
-
Application team creates Gateway resources: Application teams create
GatewayandVirtualServiceresources following Istio’s networking API, using the same hostnames of the previous Ingress objects and routing rules to the backend services. - Set up custom hosts for pods (test traffic routing):
- Point DNS name to the Istio gateway IP (via internal DNS or
hostAliasesfor pod-level testing) on the pods we run the regression tests. - Run tests against the new gateway without affecting production DNS
- Point DNS name to the Istio gateway IP (via internal DNS or
- Test scope:
- ✅ HTTP/HTTPS regression tests
- ✅ TCP port connectivity tests (for denodo JDBC ports)
- ✅ Functional tests (end-to-end)
- ✅ Performance/load tests (compare latency, throughput vs. nginx-ingress baseline)
- ✅ mTLS verification (if enabled in Istio)
- No production impact: Production traffic continues through nginx-ingress. Test traffic uses separate hostnames pointing to the Istio gateway IP.
Step 4 — (Optional) CoreDNS + ExternalDNS for Test DNS Automation
Goal: If feasible, set up a dedicated CoreDNS instance for pods running tests[1], and use ExternalDNS to manage those DNS records. This avoids manual hostAliases or CloudDNS setup for test records, and paves the way for automated DNS management in the final migration.
Why this helps:
- When the final migration happens, CloudDNS records are already managed by ExternalDNS
- No manual CloudDNS setup needed at cutover time — just switch the ExternalDNS provider if needed
Architecture:
Application Pods (test)
↓
CoreDNS (custom, for test zone)
↓
ExternalDNS (watches Istio Gateway resources)
↓
Google CloudDNS (test zone)
Note: This step is optional for the initial migration path, and the self-managed coreDNS instance only works inside the gke cluster. It provides DNS automation benefits but is not a blocking requirement.
Step 5 — DNS Cutover
Goal: Switch production DNS to point to the new gateway IP after all tests pass.
Pre-cutover checklist:
- All application teams have validated their services through the Istio gateway
- Regression, functional, and performance tests pass
- No degradation in latency or error rates vs. nginx-ingress baseline
- Rollback plan validated (Step 6 below)
Cutover steps:
- Update DNS records to point to Istio gateway IP via gcloud DNS commands or Cloud Console
- Or if using ExternalDNS with CloudDNS provider — Need to remove the previous DNS records first, then update the
Gatewayresource hosts to include production hostnames and let ExternalDNS sync automatically
Traffic splitting option (gradual cutover):
- Might need to lower the DNS TTL for quick propagation
Step 6 — Observe & Demise Old Solution
Goal: Monitor the new system in production for several days, then plan retirement of nginx-ingress.
Observation period (recommended: 7–14 days):
- Monitor error rates, latency (p50, p95, p99), throughput
- Compare against pre-migration nginx-ingress baseline
- Check Istio Proxy (Envoy) and control plane health
- Validate mTLS and security policies (if applicable)
Metrics to watch:
- Integrate Istio’s built-in observability addons (Prometheus, Grafana, Kiali) with our current observability tools to monitor metrics
Demise plan (after observation period):
- Confirm no traffic is flowing through nginx-ingress (check LB metrics)
- Scale down the GKEIN proxy
- Disable the nginx-ingress in the tooling module.
- Remove the Ingress objects
- Release any VMs and ilb.
- Update documentation
5. Istio as Interim vs. Final Solution Options
The following table compares the solutions under consideration for the final decision. Istio is being used as the interim placeholder regardless.
| Dimension | Self-Hosted Istio | GCP Cloud Service Mesh (CSM) | Envoy Gateway | Other Ingress Controllers |
|---|---|---|---|---|
| Type | Full service mesh | Managed service mesh (Envoy-based) | API Gateway (north-south only) | Varies |
| Control Plane | In-cluster (istiod) or none (Ambient) | Google-managed (Traffic Director) | In-cluster (Envoy Gateway controller) | Varies |
| Data Plane | Envoy (sidecar or Ambient ztunnel) | Envoy (sidecar or proxyless gRPC) | Envoy (gateway pods only) | Varies |
| East-West (service-to-service) | ✅ Full mesh | ✅ Full mesh (multi-cluster support) | ❌ North-south only | ❌/⚠️ Limited |
| mTLS | ✅ Automatic | ✅ Managed, automatic | ✅ Gateway-to-backend | Varies |
| Advanced Traffic Management | ✅ Full (splitting, mirroring, fault injection) | ✅ Full (global load balancing) | ✅ HTTPRoute, GRPCRoute | Limited |
| Observability | ✅ Kiali, Prometheus, Jaeger | ✅ Cloud Monitoring, Trace (built-in) | ✅ Via Envoy telemetry | Limited |
| Operational Overhead | High (self-managed control plane) | Low (Google-managed) | Medium | Low-Medium |
| Multi-Cluster | ✅ (Ambient multicluster, beta) | ✅ (fleet-wide, global control plane) | ❌ Single cluster | Varies |
| Gateway API Support | ✅ Full | ✅ Full (Kubernetes Gateway API) | ✅ Reference implementation | Partial |
| Resource Overhead | Medium-High (sidecar per pod) | Medium (managed control plane saves ops) | Low (no sidecars) | Low |
| Suitable If… | Need full mesh + custom Envoy filters | GCP-native, multi-cluster, low ops | API gateway only, no mesh needed | Simple ingress needs |
6. Risk Assessment & Rollback Plan
| Risk | Mitigation |
|---|---|
| Istio gateway misconfiguration | Keep nginx-ingress running; test with separate hostnames first |
| Performance degradation | Load test before cutover; compare against nginx-ingress baseline |
| mTLS policy blocks traffic | Start with PERMISSIVE mode; tighten gradually |
| DNS cutover issues | Short TTL (60s) before cutover; keep old LB running as fallback |
| Application team delay | Parallel onboarding; critical services first |
| ExternalDNS misconfiguration | Validate in test zone before touching production DNS |
Rollback plan (if issues after cutover):
- Revert DNS to nginx-ingress IP (with short TTL, propagates quickly) via gcloud DNS commands
- nginx-ingress is still running and serving traffic — zero downtime rollback
7. Open Questions for Discussion
- Final solution decision timeline: When does the team expect to decide between Istio / CSM / Envoy Gateway / other?
- mTLS requirement: Do we need mutual TLS for service-to-service communication, or is north-south TLS termination sufficient?
- Cost consideration: GCP Cloud Service Mesh has no separate charge for the managed control plane, but Envoy Gateway even has zero sidecar overhead
- Step 4 (CoreDNS + ExternalDNS): Should we invest in this automation now, or keep it simple with manual DNS for the interim period?
- Sidecar vs. Ambient mode: If we stay with Istio long-term, should we evaluate Ambient mode (sidecar-free) to reduce resource overhead?
8. References
- Pod’s DNS Config
- Istio / Ingress Side-by-Side Migration Guide
- GKE Private Service Connect Documentation
- ExternalDNS Istio Gateway Source
- Cloud Service Mesh Gateway API Overview
- Envoy Gateway vs Istio Comparison
- Migrating from Ingress NGINX to Gateway API with Istio
This document is a living draft. Updated as the migration progresses and decisions are made.