What is KEDA and does it close the gap between Kubernetes and serverless?

KEDA (Kubernetes Event-Driven Autoscaling) allows Kubernetes to scale deployments to zero based on event sources like queues, Kafka topics, or HTTP traffic. It significantly closes the cost and scalability gap with serverless while keeping workloads in Kubernetes. However, cold start behavior still exists when scaling from zero, and operational overhead of maintaining Kubernetes remains.

Is serverless suitable for latency-sensitive APIs?

It depends on your latency budget. Cold starts of 100ms–500ms are acceptable for many APIs but unacceptable for sub-50ms SLAs. Techniques like provisioned concurrency (Lambda) or minimum instance counts (Cloud Run) eliminate cold starts at the cost of idle compute charges — effectively turning serverless into a partially reserved compute model.

How do we decide which runtime to use for a new microservice?

Use this decision tree: If the service is stateful or has persistent connections — use Kubernetes. If execution exceeds 15 minutes — use Kubernetes. If the service is event-triggered, stateless, and has variable traffic — use serverless. If you need cloud portability — use Kubernetes. When in doubt for a new API or background job, start serverless and migrate to Kubernetes if you hit functional or performance limits.

Cloud Architecture

Kubernetes vs Serverless: Which Runtime Is Right for Your App?

Kubernetes and serverless represent two very different philosophies for running cloud applications. Kubernetes gives you full control over container orchestration with powerful primitives for stateful and complex workloads. Serverless eliminates infrastructure management entirely, trading control for simplicity in event-driven, variable-load scenarios. Choosing between them — or combining them — is one of the most consequential architectural decisions a cloud team makes.

Halkwinds Verdict—Kubernetes is the better choice for stateful, long-running, or operationally complex workloads where control and portability matter. Serverless wins for event-driven functions, background jobs, and workloads with highly variable or unpredictable traffic.

Option A

Kubernetes

Full-control container orchestration for complex, stateful, and long-running workloads

Typical Cost

$500–$5,000+/month for managed clusters; node costs scale with workload; requires FinOps to avoid over-provisioning

Timeline

2–4 weeks to productive cluster; 3–6 months for team proficiency and production hardening

Pros

Full control over runtime environment, networking, storage, and resource allocation

Cloud-agnostic and portable — runs on AWS EKS, GKE, AKS, or on-premises without vendor lock-in

Supports stateful workloads natively via StatefulSets, Persistent Volumes, and Operators

Horizontal pod autoscaling and custom metrics scaling give fine-grained control over capacity

Rich ecosystem of tooling — Helm, Argo, Istio, Prometheus — for enterprise-grade operations

Cons

Significant operational overhead — cluster management, upgrades, and node maintenance require dedicated expertise

Steep learning curve for developers unfamiliar with container orchestration concepts

Cold start is not a concept but idle nodes incur constant costs even at zero traffic

YAML-heavy configuration increases cognitive load and potential for misconfiguration

Scaling to zero is not native — workloads always consume minimum compute resources

Option B

Serverless

Zero-infrastructure event-driven computing that scales from zero to millions

Typical Cost

$0–$200/month for low traffic; can reach thousands at high invocation volumes — requires cost monitoring

Timeline

Hours to days for first function; production-ready in 1–2 weeks with proper observability

Pros

No infrastructure management — the cloud provider handles all patching, scaling, and availability

True scale-to-zero means zero cost when idle, making it extremely cost-effective for variable workloads

Faster time to production for simple functions — deploy in minutes without cluster setup

Automatic scaling handles traffic spikes without pre-provisioning or autoscaler tuning

Pay-per-invocation pricing aligns cost directly with business value delivered

Cons

Cold start latency can be 100ms–2s depending on runtime and package size, unacceptable for some use cases

Execution time limits (15 minutes on Lambda) make it unsuitable for long-running or stateful workloads

Vendor lock-in is high — AWS Lambda, GCP Cloud Run, and Azure Functions have different APIs and behaviors

Debugging and local development tooling are less mature than container-based workflows

Costs can spike unexpectedly with very high invocation volumes compared to reserved compute

Side-by-Side

Detailed Comparison

Dimension	Kubernetes	Serverless	Winner
Operational Overhead	High — cluster management, upgrades, node pools require dedicated ops	Near-zero — provider manages all infrastructure	Serverless
Scalability	Excellent horizontal scaling with HPA; requires tuning and pre-warming	Automatic, near-instant scale-to-zero and scale-out	Serverless
Stateful Workload Support	Native support via StatefulSets, PVs, Operators, and databases	Not suitable — functions are stateless by design	Kubernetes
Cold Start Latency	Pod startup ~1–5 seconds but running pods have no cold start	100ms–2s cold start depending on runtime and bundle size	Kubernetes
Cost at Low Traffic	Idle nodes incur constant cost regardless of traffic	True scale-to-zero — pay only for actual invocations	Serverless
Cost at High Sustained Traffic	Reserved node pools are cost-efficient for steady-state workloads	Per-invocation cost adds up quickly at high sustained load	Kubernetes
Vendor Portability	Cloud-agnostic — CNCF-standard API runs anywhere	High vendor lock-in — each provider has different function APIs	Kubernetes
Developer Experience	Powerful but complex — significant YAML and kubectl knowledge required	Simple deployment model — upload code and configure triggers	Serverless
Long-Running Workloads	Ideal — no execution time limits, supports persistent connections	Limited — max 15 min (Lambda) to 60 min (Cloud Run) execution	Kubernetes
Observability & Debugging	Rich ecosystem — Prometheus, Grafana, Jaeger, structured logging	Provider-native tools available but local debug workflows are immature	Kubernetes

Decision Framework

When to Choose Each Option

Choose Kubernetes when...

Your application is stateful and requires persistent storage, session affinity, or message queue consumers
You need workloads to run longer than 15 minutes without interruption
You require cloud portability or plan to run workloads across multiple clouds or on-premises
Your team has complex networking requirements like service meshes, mutual TLS, or custom ingress rules
Your workload runs at high sustained throughput where reserved node pools are more cost-efficient

Choose Serverless when...

Your workloads are event-driven — API requests, queue consumers, file upload processors, or webhooks
Traffic is highly variable or unpredictable and you want to avoid paying for idle infrastructure
You want to ship new features quickly without allocating engineering time to infrastructure operations
Your functions are short-lived, stateless, and don't require persistent connections or local state
You are building a prototype or MVP and need to validate the concept before investing in Kubernetes

Not sure which is right for your project?

Default to serverless for new event-driven microservices and APIs with variable load. Use Kubernetes for long-running services, stateful applications, ML inference, or workloads requiring specialized hardware, custom networking, or multi-cloud portability.

Related Resources

Related Services

Industries We Serve

Capabilities

Our Platforms

Insights & Resources

Common Questions

Frequently Asked Questions

Absolutely — this is a very common pattern. Many production systems use Kubernetes for core long-running services (API servers, ML inference, databases) while using serverless functions for event processing, background jobs, and lightweight webhooks. AWS EventBridge, GCP Pub/Sub, and Azure Event Grid make it easy to connect the two.

Work With Halkwinds

Ready to Make the Right Decision?

A 30-minute scoping call is enough to recommend the right approach for your specific context, budget, and timeline.

Browse All Comparisons