Technical Program Manager, Infrastructure

San Francisco, CA | New York City, NY | Seattle, WAFull-TimeManagerProduct / Project

You will be redirected to the company career page

About Anthropic

  • Anthropic’s mission is to create reliable, interpretable, and steerable AI systems. We want AI to be safe and beneficial for our users and for society as a whole. Our team is a quickly growing group of committed researchers, engineers, policy experts, and business leaders working together to build beneficial AI systems.

About the Role

  • Anthropic's Infrastructure organization is the engine that powers our mission. Every breakthrough in AI safety research and every interaction users have with Claude depends on the systems we build and operate: massive clusters for training frontier models, production infrastructure serving millions of users reliably, and developer platforms that help engineers move fast without breaking things.
  • As a Technical Program Manager for Infrastructure, you’ll work across multiple infrastructure domains to coordinate complex programs that have broad organizational impact. You’ll be solving novel scaling challenges at the frontier of what's possible, all while maintaining the security and reliability our mission demands.
  • This role is ideal for someone who thrives in ambiguity and believes their job is to make everyone around them more effective. You’ll partner closely with engineering leadership to drive strategic initiatives while ensuring seamless coordination between research, engineering, and product teams.

Developer Productivity & Tooling

  • Drive cross-functional programs to improve developer environments, CI/CD infrastructure, and release processes that enable rapid innovation while maintaining high security standards
  • Coordinate large-scale migrations and platform modernization efforts across engineering teams
  • Partner with teams to measure and improve developer productivity metrics, identifying bottlenecks and driving systematic improvements
  • Lead initiatives to integrate AI tools into development workflows, helping Anthropic be at the forefront of AI-assisted research and engineering

Infrastructure Reliability & Operations

  • Drive programs to establish and achieve reliability targets across training infrastructure and production services
  • Coordinate incident response improvements, post-mortem processes, and on-call rotations that help teams operate effectively
  • Establish metrics and dashboards to track infrastructure health, capacity utilization, and operational excellence

Cross-functional Coordination

  • Serve as the critical bridge between infrastructure teams, research, and product, translating technical complexities into clear updates for a variety of audiences
  • Consult with stakeholders to deeply understand infrastructure, data, and compute needs, identifying solutions to support frontier research and product development
  • Drive alignment on priorities and timelines across teams with competing constraints

You May Be a Good Fit If You

  • Have 5+ years of technical program management experience, with a track record of successfully delivering complex infrastructure programs in ML/AI systems or large-scale distributed systems
  • Have deep technical understanding of infrastructure systems—enough to engage substantively with engineers, identify technical risks, and add value beyond project tracking
  • Excel at creating structure and processes in ambiguous environments, bringing clarity to complex cross-team initiatives
  • Have strong stakeholder management skills and can build trust with both technical and non-technical partners
  • Are comfortable navigating competing priorities and using data to drive technical decisions
  • Have experience with developer productivity initiatives, CI/CD systems, or infrastructure scaling
  • Thrive in fast-paced environments and can balance strategic planning with tactical execution
  • Are obsessed with reliability, scalability, security, and continuous improvement
  • Have a passion for supporting internal partners like research to understand their unique needs
  • Are passionate about AI infrastructure and understand the unique challenges of building and operating systems at frontier scale
  • Experience with Kubernetes, cloud platforms (AWS, GCP, Azure), and ML infrastructure (GPU/TPU/Trainium clusters)
  • Background working with research teams and translating their needs into concrete technical requirements
  • Experience driving adoption of AI tools to improve engineering productivity
  • Familiarity with observability tooling and practices

Deadline to Apply: None, applications will be received on a rolling basis.

  • The annual compensation range for this role is listed below.
  • For sales roles, the range provided is the role’s On Target Earnings ("OTE") range, meaning that the range includes both the sales commissions/sales bonuses target and annual base salary for the role.

How we're different

  • We believe that the highest-impact AI research will be big science. At Anthropic we work as a single cohesive team on just a few large-scale research efforts. And we value impact — advancing our long-term goals of steerable, trustworthy AI — rather than work on smaller and more specific puzzles. We view AI research as an empirical science, which has as much in common with physics and biology as with traditional efforts in computer science. We're an extremely collaborative group, and we host frequent research discussions to ensure that we are pursuing the highest-impact work at any given time. As such, we greatly value communication skills.
  • The easiest way to understand our research directions is to read our recent research. This research continues many of the directions our team worked on prior to Anthropic, including: GPT-3, Circuit-Based Interpretability, Multimodal Neurons, Scaling Laws, AI & Compute, Concrete Problems in AI Safety, and Learning from Human Preferences.

Job Summary

CompanyAnthropic
LocationSan Francisco, CA | New York City, NY | Seattle, WA
TypeFull-Time
LevelManager
DomainProduct / Project