RESPONSIBILITIES
- Design, develop, and operate WHOOP’s Kubernetes clusters running on AWS infrastructure
- Drive architectural decisions to improve scalability, resiliency, performance, and security across the build and deployment platform
- Build systems and tooling that increase deployment safety and accelerate release velocity to Kubernetes
- Advance CI/CD capabilities to support frequent, reliable production deployments
- Lead developer productivity improvements through tooling, automation, and platform integrations
- Partner with application, security, and data teams to embed secure-by-default infrastructure practices
- Participate in incident response, root cause analysis, and postmortems to continuously improve platform reliability
- Mentor and provide technical leadership to engineers on the Application Infrastructure team
- Help define and execute the long-term roadmap for infrastructure and Kubernetes management at WHOOP
QUALIFICATIONS:
- 5+ years of experience in DevOps, Platform, Site Reliability, CloudEngineering, or Backend Software Engineering roles
- Deep understanding of Kubernetes architecture and core components
- Strong knowledge of container networking concepts, including overlay networking, service meshes, and network policies
- Experience with multi-cluster Kubernetes environments and inter-cluster communication patterns
- Hands-on experience operating cloud infrastructure, preferably in AWS (e.g., IAM, VPC, EC2, S3, RDS, CloudTrail, Organizations)
- Hands-on experience with Infrastructure as Code tools (e.g. Terraform)
- Experience developing backend or infrastructure-adjacent services using Java, C#, or Python
- Proven ability to evaluate system performance, identify bottlenecks, and use data to drive improvements
- Experience collaborating with multiple stakeholders and prioritizing work for maximum business impact
BONUS QUALIFICATIONS:
- Experience operating Kafka or other large-scale distributed systems
- Experience with Kubernetes security best practices, including RBAC, secrets management, and pod security standards
- Exposure to service reliability practices such as SLOs, SLIs, and error budgets
- Prior experience supporting compliance or security-focused infrastructure initiatives
ABOUT YOU:
- You bring a security-first mindset to everything you build and operate
- You enjoy working on infrastructure that enables hundreds of engineers to move faster
- You’re comfortable operating in complex, high-scale production environments
- You enjoy teaching, mentoring, and raising the technical bar for those around you
- You’re curious, adaptable, and excited to learn across a wide range of systems and technologies
