An introduction to Service Mesh
Microservices or Microservices-based application architectures have become increasingly popular over the past few years as it provides inherent scalability and agility of deployment. At the same time, it also has security impacts:
• A web of interconnections and communication links to be protected.
• Fine-grained control of traffic behavior
• Need for secure service discovery mechanisms.
• Need to support access controls.
One such way to address these challenges is service mesh which serves as a dedicated infrastructure layer for handling service-to-service communication.
What is Service Mesh?
A service mesh is a dedicated infrastructure layer for facilitating service-to-service communications between microservices. There can be different architectural patterns to implement service mesh, however, it's often implemented using a sidecar proxy. The sidecar is attached to a parent application and provides supporting features for the application. The sidecar also shares the same life cycle as the parent application, is created, and retired alongside the parent.
Conceptually, a Service Mesh can be used to provide infrastructure services for all applications based on the microservices architecture in which there are hundreds of services and tens of instances in each service
A Service Mesh consists of two main architectural layers or components:
● Data plane: The interconnected set of proxies in a Service Mesh that control the inter-services communication represents its data plane
● Control plane: A control plane is a set of APIs and tools used to control and configure data plane (proxy) behavior across the mesh.
How does it work?
The proxy sits in front of each microservice and all communications are passed through it. The proxy is responsible for connection details, traffic management, error and failure handling, and collecting metrics for observability purposes. When proxies talk with other proxies — decoupled from each service — you get a service mesh.
What does it help with?
Here are some of the core features a service mesh product provides:
● Authentication and authorization — Certificate generation, key management, whitelist and blacklist, Single sign-on (SSO) tokens, API keys
● Secure service discovery — Discovery of service endpoints through a dedicated service registry
● Secure communication — Mutual Transport Layer Security (TLS), encryption, dynamic route generation, multiple protocol support, including protocol translation where required (e.g., Hypertext Transfer Protocol (HTTP)1.x, HTTP2, gRPC, etc.)
● Resilience/stability features for communication — Circuit breakers, retries, timeouts, fault injection/handling, load balancing, failover, rate limiting, request shadowing
● Observability/monitoring features — Logging, metrics, distributed tracing
Let’s see a use case?
There can be several use cases for service mesh. To understand it better let's go through a network reliability use case (referenced from service mesh for dummies) to handle network failures in the service mesh automatically and transparently, by retrying failed requests within the parameters set up by the application owners
They can define the timeout budgets for retries and jitter thresholds to restrict the impact of the increased traffic caused by retries on upstream services. If there’s a 503 Service Unavailable error because the server is completely unavailable due to scheduled maintenance, the app operator would need to make sure that the application has resiliency built in to handle the 503 errors returned from Envoy when it stops retrying. One of the main benefits of offloading the retries to the proxy is that you can define the application resiliency settings between services independently of the programming language because the configuration layer is language agnostic.
What’s in the market?
Some of the open-source service mesh solutions are:
- Istio
- Linkerd
- Consul Connect
- Kuma
- Maesh
- ServiceComb-mesher
- Network Service Mesh
- OpenShift Service Mesh by Red Hat.
Conclusion
The best approach to proceed (or not to) with service mesh implementation is to understand, as with any other technology, whether it fits as a solution to your problems.
A service mesh adds operational complexity to the technology stack and therefore is typically deployed if the organization is having trouble scaling service-to-service communication, has a large number of microservices, or has a specific use case to resolve. A pilot project can be implemented on a subset of microservices thereby analyzing the results and benefits it provides leading to full-scale implementation.
In both cases, it’s important to understand and develop a sound understanding of service mesh concepts.