Building Scalable Microservices with Kubernetes: Lessons from Production
After running microservices on Kubernetes in production for three years, here are the hard lessons learned about service discovery, observability, scaling strategies, and the mistakes that kept us up at night.
Three years ago, our team migrated from a monolithic application to microservices running on Kubernetes. We thought we understood what we were getting into. We were wrong.
Kubernetes isn’t just a container orchestrator—it’s a platform that demands you rethink how applications are designed, deployed, and operated. Here are the lessons we learned the hard way.
Start with Service Mesh
We initially used Kubernetes Services for inter-service communication. Simple, right? Except debugging network issues became a nightmare. We couldn’t trace requests, implement retries consistently, or encrypt service-to-service communication without application changes.
Enter Istio. A service mesh like Istio or Linkerd provides observability, traffic management, and security transparently. Yes, there’s overhead. Yes, it’s worth it. We cut our debugging time by 60% after implementation.
apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
name: payment-service
spec:
hosts:
- payment
http:
- route:
- destination:
host: payment
retries:
attempts: 3
perTryTimeout: 2s
retryOn: 5xx2. Design for Observability First
“I’ll add metrics later”—famous last words. Without proper instrumentation from day one, you’re flying blind. Each service should expose:
Application metrics (request rate, latency, errors)
Business metrics (checkouts completed, payments processed)
Health endpoints that actually test dependencies
We standardized on Prometheus for metrics and Grafana for visualization. More importantly, we made observability a requirement in our definition of done.
3. The Database Problem
Distributing data is hard. We made the classic mistake of trying to share databases between services. Don’t do this. Each service should own its data entirely.
But that introduces new challenges. We implemented the Saga pattern for distributed transactions and embraced eventual consistency. Some queries that were simple JOINs became complex orchestrations. The tradeoff is worth it for service independence.
4. Resource Limits Are Not Optional
Kubernetes lets you specify resource requests and limits. At first, we didn’t bother. Then one service had a memory leak and took down an entire node, cascading into a cluster-wide outage.
Always set limits. Start conservative and adjust based on actual usage. Use Vertical Pod Autoscaler to get recommendations, but review them manually.
resources:
requests:
memory: “256Mi”
cpu: “250m”
limits:
memory: “512Mi”
cpu: “500m”5. GitOps Saves Sanity
Manually applying Kubernetes manifests is error-prone and non-reproducible. We switched to a GitOps workflow using ArgoCD, and it transformed our deployment process.
Now our Git repository is the single source of truth. Changes are peer-reviewed, automated tests run, and ArgoCD ensures the cluster state matches the desired state. Rollbacks are trivial—just revert the commit.
6. Don’t Ignore the Cost
Kubernetes can be expensive if you’re not careful. We were shocked by our first cloud bill. Autoscaling is powerful but can lead to runaway costs if misconfigured.
We implemented:
Spot instances for non-critical workloads
Cluster autoscaling with sane minimums
Resource quotas per namespace
Regular cost audits using Kubecost
7. The Human Factor
Technical challenges aside, the hardest part was organizational. Teams had to own their services end-to-end. Developers became responsible for operational concerns they’d previously ignored.
We invested heavily in training and documentation. We established SRE practices and error budgets. Most importantly, we fostered a blameless postmortem culture where incidents become learning opportunities.
Looking Forward
After three years, Kubernetes has become indispensable. It’s not perfect—the learning curve is steep, the ecosystem overwhelming. But the flexibility and resilience it provides are unmatched.
If you’re starting your microservices journey, don’t expect Kubernetes to solve your problems. It’s a powerful tool, but like all tools, it requires skill to wield effectively. Start small, automate everything, and never stop learning from production.