From Monolith to Services: REST, gRPC, Kafka, and Container Infrastructure
The decision to break a monolith apart is not primarily a technical one — it is an organizational one. Independent deployability, diverging scaling profiles, and clear team ownership are the real drivers. But once that decision is made, the questions become architectural: how should services communicate, how do they deploy reliably, and how do you keep the resulting distributed system observable?
When the split is worth it
Before reaching for decomposition, be clear about what problem you are solving. Modularization earns its cost when you have independent deployment needs — teams blocked waiting for each other's changes to ship — meaningfully different scaling profiles, true domain separation with distinct ownership boundaries, or multiple teams working on the same codebase causing coordination overhead that slows everyone down.
Without one of these concrete drivers, splitting adds operational complexity with no payoff. The most common failure mode is a distributed monolith: services that must always call each other synchronously to complete a request, sharing a database schema, and deployed together anyway. All the network latency of microservices with none of the independence.
Start by modeling domain boundaries first, infrastructure second. Extract modules within the same codebase before extracting separate services — in-process boundaries are free to refactor, cross-service boundaries are not. The strangler fig pattern — routing specific traffic to the new service while the monolith handles the rest — is the most reliable migration technique for HTTP-based systems.
REST — the universal baseline
REST over HTTP/JSON is the default for good reason: it is universally understood, works in every language and runtime, is human-readable, and requires no client generation. Browser clients, external partners, and internal teams can all call a REST API without tooling setup.
GET /orders/12345
Authorization: Bearer <token>
200 OK
Content-Type: application/json
{ "id": "12345", "status": "shipped", "total": 149.90, "items": [...] }
The trade-offs are well-known. JSON is verbose — a 1 KB payload in JSON may be 200 bytes in binary. HTTP/1.1 requires a new connection per request unless keep-alive is configured. There is no built-in schema contract, so API drift between producer and consumer surfaces at runtime rather than compile time. Versioning requires discipline (/v1/, Accept headers, or feature flags).
Use REST for: external-facing APIs, browser-accessible endpoints, ad-hoc integrations, and simple synchronous request/response where payload size and call frequency are not the bottleneck.
gRPC and Protocol Buffers — for internal high-throughput
gRPC runs over HTTP/2, uses Protocol Buffer binary serialization, and generates strongly-typed client and server code from a .proto definition. The proto file is the contract — both sides are generated from it, eliminating an entire class of interface mismatch bugs.
syntax = "proto3";
service OrderService {
rpc GetOrder (OrderRequest) returns (Order);
rpc WatchOrders (OrderFilter) returns (stream Order);
}
message OrderRequest { string order_id = 1; }
message Order {
string id = 1;
string customer_id = 2;
float total = 3;
Status status = 4;
enum Status {
PENDING = 0;
SHIPPED = 1;
DELIVERED = 2;
}
}
Payload sizes are typically 5–10x smaller than equivalent JSON. HTTP/2 multiplexing allows multiple concurrent requests over a single connection. Native bidirectional streaming supports patterns like live telemetry feeds and real-time order status — things that require polling or WebSockets with REST.
The trade-offs: gRPC is not natively supported in browsers (a grpc-web proxy is required). Binary encoding is harder to inspect manually — curl and browser devtools do not help. Proto schema evolution requires discipline: field numbers are permanent, removing fields is a breaking change without careful deprecation strategy.
Use gRPC for: internal service-to-service communication, high call frequency, bidirectional streaming, and teams that span multiple languages and need a single typed contract.
Kafka — for decoupled async workflows
Rather than services calling each other, producers emit events to named topics and consumers react independently. The event bus is the only shared dependency. Producer and consumer have no runtime knowledge of each other and can scale, deploy, and fail independently.
// Order service — produces an event
await producer.send({
topic: 'order.placed',
messages: [{
key: order.id,
headers: { correlationId: ctx.correlationId },
value: JSON.stringify({
orderId: order.id,
customerId: order.customerId,
items: order.items,
}),
}],
});
// Inventory service — independent consumer group
await consumer.subscribe({ topic: 'order.placed' });
consumer.run({
eachMessage: async ({ message }) => {
const order = JSON.parse(message.value.toString());
await inventory.reserve(order.items);
},
});
// Notification service — same event, separate consumer group, no coupling
await notifyConsumer.subscribe({ topic: 'order.placed' });
Fan-out is native: adding a third consumer to order.placed requires no changes to the producer or existing consumers. Events are retained on disk and can be replayed — useful for rebuilding a service's state after a migration or replaying failed messages after a bug fix. The consumer group mechanism provides backpressure and load distribution automatically.
The trade-offs are real: Kafka introduces eventual consistency. You cannot use it for operations that require an immediate response. End-to-end tracing requires propagating correlation IDs through message headers, and debugging a failed workflow means following events across topics and services. Running Kafka reliably requires operational attention — partition sizing, retention policies, consumer lag monitoring.
Use Kafka for: workflows that span multiple services, notifications, audit trails, event sourcing, data pipelines, and anywhere you need fan-out or the ability to replay.
Choosing the right protocol
The three are not mutually exclusive. A typical production architecture uses all of them:
| Pattern | Primary fit |
|---|---|
| REST | External API, browser clients, simple CRUD, ad-hoc integrations |
| gRPC | Internal services, high call volume, streaming, multi-language contract |
| Kafka | Async workflows, fan-out, event sourcing, decoupled pipelines |
REST at the external boundary. gRPC between internal services where performance matters. Kafka for anything asynchronous or that needs to fan out across multiple consumers.
Containers — standardizing the runtime
Docker packages the OS layer, runtime, dependencies, and application into an immutable image that runs identically in development, CI, and production. Multi-stage builds keep the final image lean:
FROM node:22-alpine AS builder
WORKDIR /app
COPY package*.json ./
RUN npm ci --omit=dev
COPY . .
RUN npm run build
FROM node:22-alpine
WORKDIR /app
COPY --from=builder /app/dist ./dist
COPY --from=builder /app/node_modules ./node_modules
EXPOSE 8080
USER node
CMD ["node", "dist/main.js"]
Pin base images to a specific digest for reproducibility. Run as a non-root user. Keep images stateless — all persistent state goes to external storage.
Kubernetes — orchestrating services at scale
Kubernetes schedules containers, manages health, and handles rolling deployments across a cluster. Three primitives matter most in practice:
Liveness and readiness probes separate "is the process running" from "is it ready to handle traffic":
livenessProbe:
httpGet: { path: /health, port: 8080 }
initialDelaySeconds: 10
periodSeconds: 15
readinessProbe:
httpGet: { path: /ready, port: 8080 }
initialDelaySeconds: 5
periodSeconds: 5
failureThreshold: 3
Resource requests and limits prevent a single misbehaving service from starving its neighbours:
resources:
requests: { memory: "128Mi", cpu: "100m" }
limits: { memory: "256Mi", cpu: "500m" }
Horizontal Pod Autoscaler scales replica count based on CPU, memory, or custom metrics — Kafka consumer lag is a particularly useful trigger for event-driven services:
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
spec:
scaleTargetRef: { kind: Deployment, name: order-service }
minReplicas: 2
maxReplicas: 20
metrics:
- type: Resource
resource:
name: cpu
target: { type: Utilization, averageUtilization: 70 }
Observability — the non-negotiable
Distribution makes debugging hard by default. The minimum viable setup:
- Correlation IDs — generate a UUID at every entry point (API gateway, Kafka message, scheduled job) and pass it as a header through every downstream call and log line
- Structured logging — JSON logs with consistent fields (
service,correlationId,traceId,duration,statusCode) aggregate cleanly in any log platform - Distributed tracing — OpenTelemetry spans propagated through REST headers, gRPC metadata, and Kafka message headers let you reconstruct the full path of a request across services
Without correlation IDs and structured logs in place before the first incident, debugging a production issue across ten services becomes guesswork.
The honest conclusion
Microservices distribute operational complexity. You earn independent deployability, isolated failure domains, and per-service scaling — and you pay for it with network calls that can fail, distributed state that is hard to keep consistent, and an observability stack that needs to work before everything else is useful. Start with a modular monolith. Extract services when the organizational pressure to do so is concrete, not aspirational. The architecture that ships most reliably is usually the right one.
Have thoughts on this? Reach out directly.
Discuss this article