Skip to content

GEP-1426: HTTPRoute Mesh Binding

  • Issue: #1294
  • Status: Provisional

Overview

Similar to how HTTPRoutes bind to Gateways and manage North/South traffic flows in Gateway API’s ingress use-case, it would be natural to adopt a similar model for traffic routing concerns in service mesh deployments. The purpose of this GEP is to add a mechanism to the Gateway API spec for the purpose of associating HTTPRoutes to a service mesh and offering a model for service owners to manage traffic splitting configurations.

This GEP is intended to establish an implementable, but experimental, baseline for supporting basic service mesh traffic routing functionality through the Gateway API spec.

Personas

This GEP uses the roles and personas defined in the Gateway API security model, and the service "producer" and "consumer" roles defined in GEP-1324: Service Mesh in Gateway API.

Goals

  • MUST allow HTTPRoute traffic rules to be configurable for a mesh service by the application owner/producer.
  • SHOULD allow control by the cluster operator (mesh administrator) to grant permission for whether HTTPRoute resources in a given namespace are allowed to configure mesh traffic routing.
  • SHOULD NOT require downstream "consumer" services to update configuration or DNS addresses for traffic to follow "producer" mesh routing rules configured by upstream services.
  • SHOULD NOT require reconfiguring existing HTTPRoute resources for North/South Gateway configuration.

Non-Goals

  • Supporting "egress" use cases, which is currently a deferred goal, including:
    • Defining how "consumer" traffic rules which could override routing for service upstreams only within the local scope of a namespace or service might be configured.
    • Redirecting calls from arbitrary custom domains to an in-cluster service.
  • Defining how multiple Services or EndpointSlices representing instances of a single "logical" service should present an identity for AuthN/AuthZ or be associated with each other beyond routing rules.
  • Defining how AuthZ should be implemented to secure East/West traffic between services.
  • Defining how Policy Attachment would bind to HTTPRoute, services or a mesh.
  • Defining how HTTPRoutes configured for East/West service mesh traffic management might integrate with North/South Gateways.
  • Handling East/West traffic outside the cluster (VMs, etc).

Implementation Details and Constraints

  • MUST set a status field on HTTPRoute to show if the routing configuration has been applied to the mesh.
  • MUST only be allowed to configure "producer" traffic rules for a Service in the same namespace as the HTTPRoute.
    • Traffic routing configuration defined in this way SHOULD be respected by ALL consumer services in all namespaces in the mesh.
  • SHOULD only be allowed to direct traffic to backendRefs within the same namespace when used for configuration of mesh traffic.
    • This constraint may be revisited in the future, but is intended to help simplify an initial implementation.
    • Attempting to specify an HTTPBackendRef in a different namespace should set a ResolvedRefs status condition with status: "False" and reason RefNotPermitted on the HTTPRoute if this is attempted.
    • When a route rule contains a HTTPBackendRef that is invalid, 500 status codes MUST be returned for requests that would match that route rule.
  • MAY assume that a mesh implements "transparent proxy" functionality to redirect calls to the Kubernetes DNS address for a Service through mesh routing rules.

Introduction

It is proposed that an application owner should configure traffic rules for a mesh service by configuring an HTTPRoute with a Kubernetes Service resource as a parentRef.

This approach is dependent on both the "frontend" role of the Kubernetes Service resource as defined in GEP-1324: Service Mesh in Gateway API when used as a parentRef and the "backend" role of Service when used as a backendRef. It would use the Kubernetes service name to match traffic for meshes implementing "transparent proxy" functionality, but the backendRef endpoints would ultimately be used for the canonical IP address(es) to which traffic should be redirected by rules defined in this HTTPRoute. This approach leverages the existing points of extensibility within the Gateway API spec, and would not require introducing any API changes or new resources, only defining expected behavior.

API

metadata:
  name: foo_route
  namespace: store
spec:
  parentRefs:
  - kind: Service
    name: foo
  rules:
    backendRefs:
    - kind: Service
      name: foo
      weight: 90
    - kind: Service
      name: foo_v2
      weight: 10

In the example above, routing rules have been configured to direct 90% of traffic for the foo Service to the default "backend" endpoints specified by the foo Service selector field, and 10% to the foo_v2 Service. This is determined based on the ClusterIP (for Service) and ClusterSetIP (for ServiceImport) matching, and for "transparent proxy" mesh implementations would match all requests to foo.svc.cluster.local (or arbitrary custom suffix, as the hostname is not specified manually) from within the same namespace, all requests to foo.store.svc.cluster.local from other namespaces, and all requests to foo.store.svc.clusterset.local for multicluster services, within the scope of the service mesh.

Omitting backendRefs

Implementations SHOULD support a terse syntax which allows omitting backendRefs to avoid unnecessary redundancy for simple configurations. If no backendRefs are specified, implementations should direct traffic to the default endpoints of the parentRef service. The behavior should be exactly the same as if the parentRef service was explicitly defined as the only backendRef, so if the parentRef does not have any endpoints, implementations MUST return an HTTP 503 response code (see discussion in #1210).

This syntax has limited utility currently, but could become more relevant if additional functionality beyond traffic splitting is added to the HTTPRoute resource. HTTPRoutes configured in this way MAY be used to manually enroll services into a mesh (which could trigger behavior like injecting a sidecar proxy) if they have not already been enrolled by some other mechanism (as proposed in GEP-1291: Mesh Representation or similar).

metadata:
  name: foo_route
  namespace: store
spec:
  parentRefs:
  - kind: Service
    name: foo
  rules:
    matches:
    - path:
        value: "/bar"

The example above would drop all incoming traffic for HTTP paths other than /bar to the foo Service, following the existing spec to return HTTP 404 response codes for unmatched requests, and HTTP 500 response codes for requests excluded due to an HTTPRouteFilter.

When no HTTPRoute resources or AuthZ configuration are defined, all traffic should implicitly work - this is just how Kubernetes functions. When you create an HTTPRoute targeting a service as a parentRef you are replacing that implicit logic - not adding to it. Therefore, you may be reshaping or restricting traffic via an HTTPRoute configuration (which should be noted is distinct from disallowing traffic by AuthZ).

Allowed service types

Services valid to be selected as a parentRef SHOULD have a way to identify traffic to them - typically by one or more virtual IP(s), DNS hostname(s) or name(s).

Implementations SHOULD support the default ClusterIP Service type as a parentRef, with or without selectors.

"Headless" Services SHOULD NOT be supported as a parentRef, because they do not implement the "frontend" functionality of a service.

Service resource with type: NodePort or type: LoadBalancer MAY be allowed as parentRefs or backendRefs, as these do provision virtual IPs and are effectively ClusterIP services with additional functionality, but it should typically be preferred to expose services publicly through the North/South Gateway API interfaces instead.

Service resources with type: ExternalName SHOULD NOT be allowed as parentRefs or backendRefs due to security concerns, although might eventually play some role in configuring egress functionality.

Services supported as backendRefs SHOULD be consistent with expectations for North/South Gateway API implementations, and MUST have associated endpoints. ClusterIP Services with selectors SHOULD be supported as a backendRef.

Service without selectors

An alternate pattern additionally supported by this approach would be to target a Service without selectors as the parentRef. This could be a clean way to create a pure routing construct and abstract a logical frontend, as traffic would resolve to a backendRef Service with selectors defined on the HTTPRoute, or receive a 4xx/5xx error response if no matching path or valid backend was found.

Multicluster support with ServiceImport

ServiceImport resources allocate a virtual IP in the cluster, so MAY be allowed as a parentRef.

ServiceImport would remain a valid backend option for HTTPRoute resources (but not currently a requirement for core conformance), and could be specified alongside a Service backendRef to split traffic across clusters within a ClusterSet (as defined in the Multi-cluster Service (MCS) APIs project). This could be a way to solve the need described in A use case for using Gateway APIs in Multi-Cluster.

Ports

By default, a Service attachment applies to all ports in the service. Users may want to attach routes to only a specific port in a Service. To do so, the parentRef.port field should be used.

If port is set, the implementation MUST associate the route only with that port. If port is not set, the implementation MUST associate the route with all ports defined in the Service.

hostnames field

GAMMA implementations SHOULD NOT infer any functionality from the hostnames field on HTTPRoute due to current under-specification and reserved potential for future usage or API changes. For the use case of filtering incoming traffic from selected hostnames, it is recommended to guide users toward configuring HTTPHeaderMatch rules for the Host header. Functionality to be explored in future GEPs may include supporting concurrent usage of an HTTPRoute traffic configuration for multiple North/South Gateways and East/West mesh use cases or redirection of egress traffic to an in-cluster Service.

Limiting graph depth, preventing conflicts and cycles

A Service may only be specified as a parentRef of an HTTPRoute if it is:

  • NOT already specified as a parentRef of another HTTPRoute [in the same namespace]
    • Resolving conflicts from multiple HTTPRoutes attempting to target the same Service as a parentRef should follow the existing conflict guidelines.
    • When an existing HTTPRoute targeting the Service as a parentRef has already been accepted, any HTTPRoute with a newer creation timestamp should be rejected, because the match target considered for the specificity rules (the Service) is identical.
    • If creation timestamps are identical (from applying both routes simultaneously), only the HTTPRoute with a name appearing first in alphabetical order should be accepted.
    • Any rejected HTTPRoute should set an Accepted condition with status: "False".
    • This requirement may be relaxed later to follow the existing precedence rules for merging configuration, but is intended to simplify initial implementation.
    • The namespace colocation requirement effectively forecloses using this strategy to delegate route management, but there is service mesh user interest in merging separately-defined routes together to manage large configurations, multiple versions or canary deployments for a single mesh service.
  • NOT already specified as a backendRef of any other HTTPRoute with a Service parentRef
  • This restriction is intended to prevent route "inclusion/delegation" patterns or circular references from multiple tiers of routing rules within mesh configuration.
  • A Service MAY still be listed as a backendRef on a different HTTPRoute which only targets a Gateway as a parentRef. The specific behavior of how Gateways might interact with mesh routing rules will be defined in a future GEP.
    • A Service MAY still be listed as a backendRef on the same HTTPRoute where it is specified as a parentRef, e.g. to only split off some subset of traffic from the default backend endpoints.
    • This could be computationally expensive to validate.

This should be enforced by controller validation, which should set an Accepted: { status: False } condition with a corresponding new ParentRefConflict RouteConditionReason.

The logic described here is a bit verbose, but could be validated with conformance tests, and is intended to allow an HTTPRoute configuring mesh traffic rules to additionally specify a Gateway parentRef (or HTTPRoute parentRef if GEP-1058: Route Inclusion and Delegation moves forward) to apply the same routing rules when exposing a mesh service outside the cluster.

Drawbacks

  • The fact that this pattern is used for mesh configuration is implicit - this may benefit from some additional configuration to map the HTTPRoute to a particular mesh implementation rather than being picked up by any or all GAMMA meshes present in a cluster. Possible approaches include:
  • GEP-1282: Describing Backend Properties may be one path to associating a Service with a mesh, but likely wouldn't be able to handle the application of multiple HTTPRoutes for the same Service, but each intended for different mesh implementations
    • It's currently unclear how relevant this constraint may be, but associating an HTTPRoute with a mesh by this method would additionally require an extra graph traversal step.
  • Expecting a Mesh parentRef or similar reference as proposed in GEP-1291: Mesh Representation may be a preferred eventual path forward, but wouldn't be required initially, with the assumption that only one mesh should typically be present in a cluster.
  • No mechanism for egress redirection of traffic from arbitrary hostnames to a mesh service within this approach (but could still be implemented seperately).

Alternatives

New MeshService (or HttpService, VirtualService, ServiceBinding) resource as parentRef

Introduce a new resource to represent the "frontend" role of a service as defined in GEP-1291: Mesh Representation.

Controller manages new DNS hostname

A controller could create a matching selector-less Service (i.e. no endpoints), to create a .cluster.local name, or could interact with external-dns to create a DNS name in an owned domain.

Ownership/trust would remain based on naming pattern: serviceName.namespace.svc.[USER_DOMAIN]

Separate HttpService, TlsService and TcpService resources could have the benefit of allowing us to define protocol specific elements to the spec along with an embedded CommonServiceSpec, similar to CommonRouteSpec, and keep similar patterns as Service.

Drawbacks
  • May require reconfiguring existing applications to point to a new mesh service hostname - adoption wouldn't be "transparent".
  • The pattern of creating a new pure routing construct would still be implementable following the proposed approach, by manually creating and targeting a new Service without selectors as a parentRef, without the overhead of introducing a new resource.

Manage DNS by binding to an existing Service

A new ServiceBinding resource would directly reference an existing Service to determine which traffic should be intercepted and redirected following configured service mesh routing rules and facilitate "transparent proxy" functionality. This resource could possibly share similar responsibilites as the need identified in GEP-1282: Describing Backend Properties.

kind: ServiceBinding
metadata:
  name: foo_binding
spec:
  parentRefs:
  - kind: Service
    name: foo
---
spec:
  parentRefs:
  - kind: ServiceBinding
    name: foo_binding
  rules:
    backendRefs:
    - kind: Service
      name: foo
      weight: 90
    - kind: Service
      name: foo_v2
      weight: 10

While the HTTPRoute does not directly reference a particular mesh implementation in this approach, it would be possible to design the ServiceBinding resource to specify that.

Drawbacks
  • Introduces an extra layer of abstraction while still having several of the same fundamental drawbacks as a direct parentRef binding to Service.
  • May require reconfiguring Gateway HTTPRoutes to specify ServiceBindings as backendRefs.

Drawbacks

  • The split frontend/backend role of Service is fundamentally an issue with the Service resource, and while upstream changes may be quite slow, this would likely be best addressed through an upstream KEP - introducing a new resource to GAMMA now would likely result in API churn if we expect a similar proposal to be upstreamed eventually.
  • Adopting the proposed Service as parentRef approach wouldn't foreclose the possibility of migrating to a new frontend-only resource in the future, and wouldn't even require a breaking change to HTTPRoute, just adding support for a new parentRef Kind.
  • Would be less clear how to integrate with transparent proxy functionality - it may be possible to design some way to select a Service or hostname to intercept, but abstraction through a seprate resource would make configuration more complex.

Mesh resource as parentRef

This binds an HTTPRoute directly to a cluster-scoped Mesh object as defined in GEP-1291: Mesh Representation.

spec:
  parentRefs:
  - kind: Mesh
    name: cool-mesh

It is currently undefined how this approach may interact with either explicitly configured hostnames or implicit "transparent proxy" routing for Kubernetes Services to determine how traffic should be intercepted and redirected.

This approach is not entirely abandoned, as it could supplement the proposed approach if explicit attachment to a specific mesh is deemed necessary. Additionally, this approach may offer a future option for attaching an HTTPRoute to a mesh, but not a specific service (e.g. to implement mesh-wide egress functionality for all requests to a specific hostname).

Peer to Service resource parentRef

An HTTPRoute could specify a Mesh resource parentRef as a peer to a Service resource parentRef.

spec:
  parentRefs:
  - kind: Mesh
    name: cool-mesh
  - kind: Service
    name: foo
Drawbacks
  • Would require separate HTTPRoute resources to explicitly define different traffic routing rules for the same service on different meshes.

Nested services and hostnames fields in ParentReference

In core conformance, the services would only be valid for Mesh types, and hostnames field only for Gateway. Mesh implementations could still use a Host header match if they wanted limit rules to specific hostnames.

parentRefs:
- kind: Mesh
  name: coolmesh
  services:
  - name: foo
    kind: Service
- kind: Gateway
  name: staging
  hostnames: [staging.example.com]
- kind: Gateway
  name: prod
  hostnames: [prod.example.com]
# Top level hostnames field removed

Moving the hostnames field from HTTPRoute to ParentReference might introduce a clean path for concurrently using a route across North/South and mesh use cases, even without introducing the services field or a new Mesh resource, and even makes pure North/South implementations more flexible by allowing a hostname-per-Gateway scope.

Drawbacks
  • Substantial API change, impacting even North/South use cases
  • Extending this functionality to support mesh-wide egress or arbitrary redirection may still require some sort of bidirectional handshake with a Hostname resource to support configuration across namespaces and limit conflicting configuration.

Phantom parentRef

spec:
  parentRefs:
  - kind: Mesh
    name: istio

This is done by configuring the parentRef, to point to the istio Mesh. This resource does not actually exist in the cluster and is only used to signal that the Istio mesh should be used. In Istio's experimental implementation, the hostnames field on HTTPRoute is used to match mesh service traffic to the routing rules.

New field on HTTPRoute for Service binding

A new field serviceBinding would be added to HTTPRoute to attach to the Service. Alternatively, this could be a new field in HTTPRouteMatch. As with the proposed implementation, this approach could be combined with a Mesh resource or similar as the parentRef, which would just define that the route would be applied to a mesh.

spec:
  serviceBinding:
    name: my-service

OR

spec:
  matches:
    service:
      name: my-service

For either implementation, the type of the serviceBinding or service field should likely be a struct with Group (defaulting to the Kubernetes core API group when unspecified), Kind (defaulting to Service when unspecified) and Name fields, to allow for extensibility to ServiceImport or custom mesh service types.

Drawbacks

  • API addition required, which is somewhat awkwardly ignored for North/South use cases, and could complicate potential for concurrent use of an HTTPRoute across both North/South and mesh use cases.
  • Adding new fields to a relatively stable resource like HTTPRoute could be difficult to do in an experimental way.
  • Following this pattern may lead to subsequent additional fields to further clarify or extend behavior.

Gateway resource with class: mesh as parentRef

To support arbitrary DNS names (owned by a "domain owner persona") we would need a similar mechanism to what Gateway is using for delegating management of HTTPRoutes to namespaces. Instead of duplicating everything - we could use Gateway as is, with class: mesh (or matching the mesh implementation desired name).

kind: Gateway
spec:
  class: mesh
  listeners:
  - name: example
    hostname: "example.com"
---
kind: HTTPRoute
spec:
  parentRefs:
    name: foo_gateway
    sectionName: example
  hostnames: ["example.com", "foo.svc.cluster.local"]

Functionally such a mesh could be implemented using the existing gateway spec - a GAMMA implementation would only remove the extra hop through the Gateway, using sidecars, or it may use a specialized per-namespace gateway to isolate the mesh traffic (like Istio Ambient). Proxyless gRPC could also use this to route directly.

This solution could work well for both non-cluster.local names but also for egress, where a Gateway with class: egress could define names that are external to the mesh and need to either have policies applied or go to a dedicated egress gateway.

Drawbacks

  • Using the HTTPRoute hostnames field to match mesh traffic breaks from the typical Gateway API pattern of explicit Kubernetes resource references, is extremely implicit, and could reduce portability of configuration.
  • Potentially unclear translation between conceptual resource and concrete implementation, particularly for "proxyless" mesh implementations.
  • Service meshes may wish to express egress or other "in-mesh" gateways through an API like this, and it could be confusing to overload this resource too much or conflate different personas who may wish to manage mesh service traffic routing as an application owner separately from egress rules as a service consumer or cluster operator.

ServiceProjection resource as parentRef and backendRefs

This approach is similar to the above ServiceBinding proposal with a couple of major differences:

  • ServiceProjection encapsulates both "frontend" and "backend" roles of the Service resource
  • ServiceProjection could handle the full responsibilities described in GEP-1282: Describing Backend Properties
kind: ServiceProjection
metadata:
    name: foo
    namespace: store
spec:
    serviceRef:
        name: foo
        kind: Service|ServiceImport
    roles:
        frontend:
       backend:
            loadbalancerConfig:
                strategy: RoundRobin
             clientTLS:
                secretRef:
                    ...
---
kind: HTTPRoute
metadata:
  name: foo_route
  namespace: store
spec:
  parentRefs:
  - kind: ServiceProjection
    name: foo
    role: frontend
  rules:
    backendRefs:
    - kind: ServiceProjection
      name: foo
      role: backend
      weight: 90
    - kind: ServiceProjection
      role: backend
      name: foo_v2
      weight: 10

For convenience, ServiceProjection could have a meshRef field that, when set instead of serviceRef, makes all configuration within the ServiceProjection apply to all services in the mesh (the mesh control plane would need to read the Mesh resource). Pursuant to the changes to status semantics in GEP-1364: Status and Conditions Update, it is necessary for the route to attach to something; in this case, the route attaches to the specific role or profile of the ServiceProjection and the mesh control plane should update the route status to reflect that.

Drawbacks

  • May require reconfiguring Gateway HTTPRoutes to specify ServiceProjections as backendRefs.
  • Verbose boilerplate for each service.
Back to top