Bare-Metal Kubernetes: When and Why It Makes Sense

Running Kubernetes on bare-metal hardware is an operational commitment that most teams shouldn’t make. EKS, GKE, and AKS exist to abstract the control plane complexity that bare-metal Kubernetes exposes, and for most workloads they do it well enough to justify the premium. But there’s a specific set of conditions under which bare-metal Kubernetes is not just defensible — it’s the clearly correct choice.

At Figment, we ran Kubernetes across 1,300+ physical servers spanning 13 different bare-metal and cloud providers. The design was a multi-site active-active topology with geographic distribution for blockchain validator infrastructure. That experience shaped a clear-eyed view of when bare-metal Kubernetes is worth the investment and when it’s infrastructure masochism.

The Economics Argument

The clearest argument for bare-metal Kubernetes is cost at sustained scale. Cloud compute is sold at a significant premium over bare-metal hardware costs. That premium buys you managed control plane, elastic scaling, and reduced operational overhead. For workloads with sustained, predictable load profiles, the premium is often not worth what it buys.

A rough benchmark: a dedicated bare-metal server from providers like Hetzner, OVHcloud, or Equinix colocation typically runs 40-60% cheaper per compute unit than equivalent EC2 or GCE instances at on-demand pricing. Even accounting for reserved instance discounts (which close the gap significantly), bare-metal remains cheaper for sustained workloads at meaningful scale.

At Figment’s scale, running 1,300+ servers on equivalent cloud compute would have been cost-prohibitive. The bare-metal investment — with the operational infrastructure to manage it — was the only economically viable model for the infrastructure surface we needed to cover.

The calculation changes at lower scale. Under 50-100 servers, the operational overhead of bare-metal Kubernetes — managing hardware failures, firmware updates, hardware provisioning cycles, data center relationships — often exceeds the cost savings. The break-even point depends on your team’s operational capacity and the specific workload, but it’s generally not worth evaluating until you’re at meaningful scale.

The Performance Argument

Cloud compute adds network layers and hypervisor overhead that bare-metal doesn’t have. For most workloads, this overhead is negligible — a few microseconds of latency, a few percent of CPU overhead. For specific workload categories, it matters:

Latency-sensitive workloads. Financial systems, game servers, real-time data processing — cases where consistent, low-latency networking is a competitive requirement. Bare-metal gives you direct access to hardware networking without hypervisor overhead and without noisy-neighbor effects from co-located cloud tenants.

High I/O workloads. NVMe SSDs on dedicated hardware deliver substantially better I/O performance than cloud volumes, especially at sustained throughput. If your database performance is I/O-bound, bare-metal storage may be a genuine bottleneck eliminator.

Blockchain and cryptographic workloads. Validator infrastructure for proof-of-stake blockchains requires consistent performance and often benefits from hardware security modules that aren’t available in virtualized environments.

Intensive GPU workloads. Training large models on dedicated GPU hardware is significantly cheaper than cloud GPU instances for sustained workloads.

What Bare-Metal Kubernetes Actually Requires

The operational requirements that teams often underestimate:

Hardware provisioning and decommissioning. When a node fails in EKS, you terminate the instance and launch a new one. When a bare-metal node fails, you open a ticket with the data center, wait for hardware replacement (hours to days depending on provider), and re-provision the node. Your Kubernetes cluster must be sized to tolerate N node failures without degraded service — not just the typical cloud assumption of “instances are ephemeral, so node loss is handled gracefully.”

Firmware and BIOS management. Keeping server firmware current across hundreds of servers requires tooling. Dell’s iDRAC lifecycle controller, HP’s iLO, and similar out-of-band management interfaces make this automatable, but the automation has to be built and maintained.

Network infrastructure ownership. In cloud environments, the network is managed. In bare-metal colocation, you own or manage the top-of-rack switching, uplink configuration, and network redundancy. If your switching infrastructure isn’t redundant, it’s a single point of failure for the entire cluster.

Multi-site topology for real redundancy. A single data center, regardless of how well-run, is a risk. Hardware failures, power events, network outages, and physical events are datacenter risks that cloud multi-AZ deployments mitigate. Bare-metal redundancy requires either multi-site colocation or a hybrid cloud topology for the tail cases.

The Kubernetes Control Plane

On cloud-managed Kubernetes, the control plane (API server, etcd, scheduler, controller manager) is operated by the cloud provider. On bare-metal, you operate it yourself. This is the most critical operational responsibility difference.

A self-managed control plane requires:

etcd running with proper replication (minimum 3 nodes, usually 5 for higher-traffic clusters) across physical hardware — ideally separate hardware from worker nodes
Regular etcd backups and tested restore procedures
Control plane node management that ensures updates don’t disrupt cluster availability
Certificate rotation management for the PKI that secures cluster communication

kubeadm is the standard tool for bootstrapping self-managed clusters. It handles the initial control plane installation and provides tooling for control plane upgrades. For production bare-metal clusters, k3s (lightweight Kubernetes) is worth evaluating for smaller clusters; RKE2 (Rancher) is production-grade for larger deployments.

CNI and Networking Decisions

The Container Network Interface (CNI) choice has significant consequences for performance and operational complexity on bare-metal. The decision matrix:

Cilium — based on eBPF, excellent performance, supports BGP peering for direct hardware routing, good network policy implementation. The modern choice for performance-sensitive bare-metal deployments.

Calico — mature, good documentation, supports BGP, flexible routing modes. The established production choice with a longer track record than Cilium.

Flannel — simple, easy to operate, lower performance than Cilium/Calico. Appropriate for smaller, less performance-sensitive clusters.

On bare-metal, the ability to configure BGP peering (where the CNI peers directly with the physical network switches) eliminates a layer of network address translation and significantly improves throughput and latency compared to overlay networks. This is a bare-metal capability that cloud Kubernetes doesn’t offer.

The Hybrid Model That Usually Works Best

Pure bare-metal Kubernetes is appropriate for teams with deep operational expertise and specific workload requirements. For many teams, the more practical model is hybrid: bare-metal worker nodes for compute-intensive, sustained workloads with a cloud-managed control plane or small cloud cluster for the management layer.

This gives you the cost and performance benefits of bare-metal compute for the data plane while maintaining the operational simplicity of managed Kubernetes for the control plane. The complexity is real but bounded.

Our Kubernetes and containers practice has designed bare-metal Kubernetes deployments at production scale. The pattern we recommend most: start with managed Kubernetes, operate it well, understand your workload characteristics, and then evaluate bare-metal when the economics and performance requirements are clear. Related: the DevOps and automation tooling for managing bare-metal Kubernetes — automated provisioning, GitOps deployment, and upgrade management — is closely coupled with the platform design.