GEP-1426: xRoutes Mesh Binding¶
- Issue: #1294
- Status: Provisional
Similar to how
xRoutes bind to
Gateways and manage North/South traffic flows in Gateway API’s ingress use-case, it would be natural to adopt a similar model for traffic routing concerns in service mesh deployments. The purpose of this GEP is to add a mechanism to the Gateway API spec for the purpose of associating the various
xRoute types to a service mesh and offering a model for service owners to manage traffic splitting configurations.
This GEP is intended to establish an implementable, but experimental, baseline for supporting basic service mesh traffic routing functionality through the Gateway API spec.
This GEP uses the roles and personas defined in the Gateway API security model, and the service "producer" and "consumer" roles defined in GEP-1324: Service Mesh in Gateway API.
- MUST allow
xRoutetraffic rules to be configurable for a mesh service by the application owner/producer.
- SHOULD allow control by the cluster operator (mesh administrator) to grant permission for whether
xRouteresources in a given namespace are allowed to configure mesh traffic routing.
- SHOULD NOT require downstream "consumer" services to update configuration or DNS addresses for traffic to follow "producer" mesh routing rules configured by upstream services.
- SHOULD NOT require reconfiguring existing
xRouteresources for North/South Gateway configuration.
- Supporting "egress" use cases, which is currently a deferred goal, including:
- Defining how "consumer" traffic rules which could override routing for service upstreams only within the local scope of a namespace or service might be configured.
- Redirecting calls from arbitrary custom domains to an in-cluster service.
- Defining how multiple
EndpointSlicesrepresenting instances of a single "logical" service should present an identity for AuthN/AuthZ or be associated with each other beyond routing rules.
- Defining how AuthZ should be implemented to secure East/West traffic between services.
- Defining how Policy Attachment would bind to
xRoute, services or a mesh.
- Defining how
Routesconfigured for East/West service mesh traffic management might integrate with North/South
- This is a bit tricky in that it's effectively a form of delegation as described in GEP-1058: Route Inclusion and Delegation, and is planned to be explored in a future GEP.
- Handling East/West traffic outside the cluster (VMs, etc).
Implementation Details and Constraints¶
- MUST set a status field on
xRouteto show if the routing configuration has been applied to the mesh.
- MUST only be allowed to configure "producer" traffic rules for a
Servicein the same namespace as the
- Traffic routing configuration defined in this way SHOULD be respected by ALL consumer services in all namespaces in the mesh.
- MAY assume that a mesh implements "transparent proxy" functionality to redirect calls to the Kubernetes DNS address for a
Servicethrough mesh routing rules.
It is proposed that an application owner should configure traffic rules for a mesh service by configuring an
xRoute with a Kubernetes
Service resource as a
This approach is dependent on both the "frontend" role of the Kubernetes
Service resource as defined in GEP-1324: Service Mesh in Gateway API when used as a
parentRef and the "backend" role of
Service when used as a
backendRef. It would use the Kubernetes service name to match traffic for meshes implementing "transparent proxy" functionality, but the
backendRef endpoints would ultimately be used for the canonical IP address(es) to which traffic should be redirected by rules defined in this
xRoute. This approach leverages the existing points of extensibility within the Gateway API spec, and would not require introducing any API changes or new resources, only defining expected behavior.
metadata: name: foo-route namespace: store spec: parentRefs: - kind: Service name: foo rules: backendRefs: - kind: Service name: foo weight: 90 - kind: Service name: foo-v2 weight: 10
In the example above, routing rules have been configured to direct 90% of traffic for the
Service to the default "backend" endpoints specified by the
selector field, and 10% to the
Service. This is determined based on the
ServiceImport) matching, and for "transparent proxy" mesh implementations would match all requests to
foo.svc.cluster.local (or arbitrary custom suffix, as the hostname is not specified manually) from within the same namespace, all requests to
foo.store.svc.cluster.local from other namespaces, and all requests to
foo.store.svc.clusterset.local for multicluster services, within the scope of the service mesh.
Implementations SHOULD support a terse syntax which allows omitting
backendRefs to avoid unnecessary redundancy for simple configurations. If no
backendRefs are specified, implementations should direct traffic to the default endpoints of the
parentRef service. The behavior should be exactly the same as if the
parentRef service was explicitly defined as the only
backendRef, so if the
parentRef does not have any endpoints, implementations MUST return an HTTP 503 response code (see discussion in #1210).
This syntax has limited utility currently, but could become more relevant if additional functionality beyond traffic splitting is added to the
Routes configured in this way MAY be used to manually enroll services into a mesh (which could trigger behavior like injecting a sidecar proxy) if they have not already been enrolled by some other mechanism (as proposed in GEP-1291: Mesh Representation or similar).
metadata: name: foo-route namespace: store spec: parentRefs: - kind: Service name: foo rules: matches: - path: value: "/bar"
The example above would drop all incoming traffic for HTTP paths other than
/bar to the
Service, following the existing spec to return HTTP 404 response codes for unmatched requests, and HTTP 500 response codes for requests excluded due to an
HTTPRoute resources or AuthZ configuration are defined, all traffic should implicitly work - this is just how Kubernetes functions. When you create an
HTTPRoute targeting a service as a
parentRef you are replacing that implicit logic - not adding to it. Therefore, you may be reshaping or restricting traffic via an
HTTPRoute configuration (which should be noted is distinct from disallowing traffic by AuthZ).
Allowed service types¶
Services valid to be selected as a
parentRef SHOULD have a way to identify traffic to them - typically by one or more virtual IP(s), DNS hostname(s), or name(s).
Implementations SHOULD support the default
Service type as a
parentRef, with or without selectors.
Services SHOULD NOT be supported as a
parentRef, because they do not implement the "frontend" functionality of a service.
Service resource with
type: NodePort or
type: LoadBalancer MAY be allowed as
backendRefs, as these do provision virtual IPs and are effectively
ClusterIP services with additional functionality, but it should typically be preferred to expose services publicly through the North/South Gateway API interfaces instead.
Service resources with
type: ExternalName SHOULD NOT be allowed as
backendRefs due to security concerns, although might eventually play some role in configuring egress functionality.
Services supported as
backendRefs SHOULD be consistent with expectations for North/South Gateway API implementations, and MUST have associated endpoints.
Services with selectors SHOULD be supported as a
Service without selectors¶
An alternate pattern additionally supported by this approach would be to target a
Service without selectors as the
parentRef. This could be a clean way to create a pure routing construct and abstract a logical frontend, as traffic would resolve to a
Service with selectors defined on the
HTTPRoute, or receive a 4xx/5xx error response if no matching path or valid backend was found.
Multicluster support with
ServiceImport resources allocate a virtual IP in the cluster, so MAY be allowed as a
ServiceImport would remain a valid backend option for
xRoute resources (but not currently a requirement for core conformance), and could be specified alongside a
backendRef to split traffic across clusters within a
ClusterSet (as defined in the Multi-cluster Service (MCS) APIs project). This could be a way to solve the need described in A use case for using Gateway APIs in Multi-Cluster.
All types currently defined in the gateway-api core (
UDP) are available for use in a Mesh implementation.
If multiple routes with different types both bind to the same Service and Port pair, only a single route type should be applied. The rejected routes should be ignored and have the
RouteConditionAccepted status set to the (new) reason
Route type specificity is defined in the following order (first one wins):
Because UDP is its own protocol, it is orthogonal to these precedence order. Since there is only one UDP-based route, there is currently no conflicts possible; if other UDP-based routes are added a similar ordering will be defined.
Note: these conflicts only occur when multiple different route types apply to the same Service+Port pair. Multiple routes of the same type are valid, and merge according to the route-specific merging semantics.
By default, a
Service attachment applies to all ports in the service. Users may want to attach routes to only a specific port in a Service. To do so, the
parentRef.port field should be used.
port is set, the implementation MUST associate the route only with that port.
port is not set, the implementation MUST associate the route with all ports defined in the Service.
GAMMA implementations SHOULD NOT infer any functionality from the
hostnames field on
GRPCRoute have this field) due to current under-specification and reserved potential for future usage or API changes.
For the use case of filtering incoming traffic from selected HTTP hostnames, it is recommended to guide users toward configuring
HTTPHeaderMatch rules for the
Host header. Functionality to be explored in future GEPs may include supporting concurrent usage of an
xRoute traffic configuration for multiple North/South
Gateways and East/West mesh use cases or redirection of egress traffic to an in-cluster
In a mesh, routes can be configured by two personas:
- Service producers, who want to modify behavior of inbound requests to their services
- Service consumers, who want to modify behavior of outbound requests to other services.
While these concepts are not directly exposed in the API, a route is implicitly fulfilling one of these roles and behaves differently depending on the role.
A route is a producer route when the
parentRef refers to a service in the same namespace. This route SHOULD apply to all incoming requests to the service, including from clients in other namespaces.
Note: Some implementations may only be able to apply routes on client-side proxies. As a result, these will likely only apply to requests from clients who are also in the mesh.
A route is a consumer route when the
parentRef refers to a service in another namespace. Unlike producer routes, consumer routes are scoped only the same namespace. This ensures that for traffic between two namespaces, another unrelated namespace cannot modify their traffic.
Routes of either type can send traffic to
backendRefs in any namespace. Unlike
Gateway bound routes, this is allowed without a
Gateway-bound routes (North-South), routes are opt-in; by default, no Services are exposed (often to the public internet), and a service producer must explicitly opt-in by creating a route themselves, or allowing another namespace to via
ReferenceGrant. For mesh, routes augment existing Services, rather than exposing them to a broader scope. As a result, a
ReferenceGrant is not required in most mesh implementations. Access control, if desired, is handled by other mechanism such as
NetworkPolicy. While uncommon, if a mesh implementation does expose the ability to access a broader scope than would otherwise be reachable, then
ReferenceGrant must be used for cross namespaces references.
Multiple routes for a Service¶
A service may be used as a
parentRef (where we attach to the "Service Frontend") or as a
backendRef (where we attach to the "Service Backend").
In general, when a request is sent to a Service frontend (ex:
curl svc), it should utilize a Route bound to that Service.
However, when sent to a Service backend (ex:
curl pod-ip), it would not.
Similarly, if we have multiple "levels" of Routes defined, only the first will be used, as that is the only one that accesses the Service frontend.
Consider a cluster with routes for a Service in both a Gateway, consumer namespace, and producer namespace:
- Requests from the Gateway will utilize the (possibly merged) set of routes attached to the Gateway
- Requests from a namespace with consumer routes will utilize the (possibly merged) set of routes in the consumer namespace
- Requests from other namespaces will utilize the (possibly merged) set of routes in the producer namespace
The merging of routes occurs only within groups of the same type of routes (Gateway bound, producer, or consumer), and follows the standard route merging behavior already defined.
Note: a possible future extension is to allow
backendRefs to explicitly target a "frontend" or "backend". This could allow chaining multiple routes together. However, this is out of scope for the current GEP.
- The fact that this pattern is used for mesh configuration is implicit - this may benefit from some additional configuration to map the
HTTPRouteto a particular mesh implementation rather than being picked up by any or all GAMMA meshes present in a cluster. Possible approaches include:
- GEP-1282: Describing Backend Properties may be one path to associating a
Servicewith a mesh, but likely wouldn't be able to handle the application of multiple
HTTPRoutesfor the same
Service, but each intended for different mesh implementations
- It's currently unclear how relevant this constraint may be, but associating an
HTTPRoutewith a mesh by this method would additionally require an extra graph traversal step.
- It's currently unclear how relevant this constraint may be, but associating an
- Expecting a
parentRefor similar reference as proposed in GEP-1291: Mesh Representation may be a preferred eventual path forward, but wouldn't be required initially, with the assumption that only one mesh should typically be present in a cluster.
- No mechanism for egress redirection of traffic from arbitrary hostnames to a mesh service within this approach (but could still be implemented seperately).
ServiceBinding) resource as
Introduce a new resource to represent the "frontend" role of a service as defined in GEP-1291: Mesh Representation.
Controller manages new DNS hostname¶
A controller could create a matching selector-less
Service (i.e. no endpoints), to create a
.cluster.local name, or could interact with external-dns to create a DNS name in an owned domain.
Ownership/trust would remain based on naming pattern:
TcpService resources could have the benefit of allowing us to define protocol specific elements to the spec along with an embedded
CommonServiceSpec, similar to
CommonRouteSpec, and keep similar patterns as
- May require reconfiguring existing applications to point to a new mesh service hostname - adoption wouldn't be "transparent".
- The pattern of creating a new pure routing construct would still be implementable following the proposed approach, by manually creating and targeting a new
Servicewithout selectors as a
parentRef, without the overhead of introducing a new resource.
Manage DNS by binding to an existing
ServiceBinding resource would directly reference an existing
Service to determine which traffic should be intercepted and redirected following configured service mesh routing rules and facilitate "transparent proxy" functionality. This resource could possibly share similar responsibilites as the need identified in GEP-1282: Describing Backend Properties.
kind: ServiceBinding metadata: name: foo_binding spec: parentRefs: - kind: Service name: foo --- spec: parentRefs: - kind: ServiceBinding name: foo_binding rules: backendRefs: - kind: Service name: foo weight: 90 - kind: Service name: foo_v2 weight: 10
HTTPRoute does not directly reference a particular mesh implementation in this approach, it would be possible to design the
ServiceBinding resource to specify that.
- Introduces an extra layer of abstraction while still having several of the same fundamental drawbacks as a direct
- May require reconfiguring
- The split frontend/backend role of
Serviceis fundamentally an issue with the
Serviceresource, and while upstream changes may be quite slow, this would likely be best addressed through an upstream KEP - introducing a new resource to GAMMA now would likely result in API churn if we expect a similar proposal to be upstreamed eventually.
- Adopting the proposed
parentRefapproach wouldn't foreclose the possibility of migrating to a new frontend-only resource in the future, and wouldn't even require a breaking change to
HTTPRoute, just adding support for a new
- Would be less clear how to integrate with transparent proxy functionality - it may be possible to design some way to select a
Serviceor hostname to intercept, but abstraction through a seprate resource would make configuration more complex.
Mesh resource as
This binds an
HTTPRoute directly to a cluster-scoped
Mesh object as defined in GEP-1291: Mesh Representation.
spec: parentRefs: - kind: Mesh name: cool-mesh
It is currently undefined how this approach may interact with either explicitly configured
hostnames or implicit "transparent proxy" routing for Kubernetes
Services to determine how traffic should be intercepted and redirected.
This approach is not entirely abandoned, as it could supplement the proposed approach if explicit attachment to a specific mesh is deemed necessary. Additionally, this approach may offer a future option for attaching an
HTTPRoute to a mesh, but not a specific service (e.g. to implement mesh-wide egress functionality for all requests to a specific hostname).
HTTPRoute could specify a
parentRef as a peer to a
spec: parentRefs: - kind: Mesh name: cool-mesh - kind: Service name: foo
- Would require separate
HTTPRouteresources to explicitly define different traffic routing rules for the same service on different meshes.
hostnames fields in
In core conformance, the
services would only be valid for
Mesh types, and
hostnames field only for
Gateway. Mesh implementations could still use a
Host header match if they wanted limit rules to specific hostnames.
parentRefs: - kind: Mesh name: coolmesh services: - name: foo kind: Service - kind: Gateway name: staging hostnames: [staging.example.com] - kind: Gateway name: prod hostnames: [prod.example.com] # Top level hostnames field removed
hostnames field from
ParentReference might introduce a clean path for concurrently using a route across North/South and mesh use cases, even without introducing the
services field or a new
Mesh resource, and even makes pure North/South implementations more flexible by allowing a hostname-per-
- Substantial API change, impacting even North/South use cases
- Extending this functionality to support mesh-wide egress or arbitrary redirection may still require some sort of bidirectional handshake with a
Hostnameresource to support configuration across namespaces and limit conflicting configuration.
spec: parentRefs: - kind: Mesh name: istio
This is done by configuring the
parentRef, to point to the
Mesh. This resource does not actually exist in the cluster and is only used to signal that the Istio mesh should be used. In Istio's experimental implementation, the
hostnames field on
HTTPRoute is used to match mesh service traffic to the routing rules.
New field on
A new field
serviceBinding would be added to
HTTPRoute to attach to the
Service. Alternatively, this could be a new field in
HTTPRouteMatch. As with the proposed implementation, this approach could be combined with a
Mesh resource or similar as the
parentRef, which would just define that the route would be applied to a mesh.
spec: serviceBinding: name: my-service
spec: matches: service: name: my-service
For either implementation, the type of the
service field should likely be a struct with
Group (defaulting to the Kubernetes core API group when unspecified),
Kind (defaulting to
Service when unspecified) and
Name fields, to allow for extensibility to
ServiceImport or custom mesh service types.
- API addition required, which is somewhat awkwardly ignored for North/South use cases, and could complicate potential for concurrent use of an
HTTPRouteacross both North/South and mesh use cases.
- Adding new fields to a relatively stable resource like
HTTPRoutecould be difficult to do in an experimental way.
- Following this pattern may lead to subsequent additional fields to further clarify or extend behavior.
Gateway resource with
class: mesh as
To support arbitrary DNS names (owned by a "domain owner persona") we would need a similar mechanism to what
Gateway is using for delegating management of
HTTPRoutes to namespaces. Instead of duplicating everything - we could use
Gateway as is, with
class: mesh (or matching the mesh implementation desired name).
kind: Gateway spec: class: mesh listeners: - name: example hostname: "example.com" --- kind: HTTPRoute spec: parentRefs: name: foo_gateway sectionName: example hostnames: ["example.com", "foo.svc.cluster.local"]
Functionally such a mesh could be implemented using the existing gateway spec - a GAMMA implementation would only remove the extra hop through the
Gateway, using sidecars, or it may use a specialized per-namespace gateway to isolate the mesh traffic (like Istio Ambient). Proxyless gRPC could also use this to route directly.
This solution could work well for both non-
cluster.local names but also for egress, where a
class: egress could define names that are external to the mesh and need to either have policies applied or go to a dedicated egress gateway.
- Using the
hostnamesfield to match mesh traffic breaks from the typical Gateway API pattern of explicit Kubernetes resource references, is extremely implicit, and could reduce portability of configuration.
- Potentially unclear translation between conceptual resource and concrete implementation, particularly for "proxyless" mesh implementations.
- Service meshes may wish to express egress or other "in-mesh" gateways through an API like this, and it could be confusing to overload this resource too much or conflate different personas who may wish to manage mesh service traffic routing as an application owner separately from egress rules as a service consumer or cluster operator.
ServiceProjection resource as
This approach is similar to the above
ServiceBinding proposal with a couple of major differences:
ServiceProjectionencapsulates both "frontend" and "backend" roles of the
ServiceProjectioncould handle the full responsibilities described in GEP-1282: Describing Backend Properties
kind: ServiceProjection metadata: name: foo namespace: store spec: serviceRef: name: foo kind: Service|ServiceImport roles: frontend: backend: loadbalancerConfig: strategy: RoundRobin clientTLS: secretRef: ... --- kind: HTTPRoute metadata: name: foo_route namespace: store spec: parentRefs: - kind: ServiceProjection name: foo role: frontend rules: backendRefs: - kind: ServiceProjection name: foo role: backend weight: 90 - kind: ServiceProjection role: backend name: foo_v2 weight: 10
ServiceProjection could have a
meshRef field that, when set instead of
serviceRef, makes all configuration within the
ServiceProjection apply to all services in the mesh (the mesh control plane would need to read the
Mesh resource). Pursuant to the changes to status semantics in GEP-1364: Status and Conditions Update, it is necessary for the route to attach to something; in this case, the route attaches to the specific role or profile of the
ServiceProjection and the mesh control plane should update the route status to reflect that.
- May require reconfiguring
- Verbose boilerplate for each service.