Engineering Manager - Observability

San Francisco, CA | New York City, NYFull-TimeManagerSoftware Engineering

You will be redirected to the company career page

About Anthropic

  • Anthropic’s mission is to create reliable, interpretable, and steerable AI systems. We want AI to be safe and beneficial for our users and for society as a whole. Our team is a quickly growing group of committed researchers, engineers, policy experts, and business leaders working together to build beneficial AI systems.

About the role

  • Anthropic is looking for an Engineering Manager to help lead our Observability team — the group responsible for the metrics infrastructure that keeps Anthropic's most critical systems running. When metrics go down, the company can't tell how training runs are progressing or whether production inference is healthy. This is mission-critical infrastructure with real operational stakes.
  • You'll lead a growing software engineering team, partner with strong technical leads, and manage the internal and external relationships that make a platform team successful at scale. If you've led teams build a metrics or observability system before and thrive in high-operational-tempo environments, this is a rare chance to do it at a company where the infrastructure genuinely matters.

Responsibilities

  • Help grow the Observability team, hiring exceptional software engineers and building a resilient, high-ownership culture
  • Own Anthropic's metrics platform end-to-end — design, reliability, roadmap, and operational excellence
  • Build strong partnerships with internal customers across infrastructure, training, and inference teams to understand needs and manage priorities
  • Partner with the team's technical leads to align on architecture, execution, and hiring
  • Drive operational rigor — making on-call and incident response sustainable and continuously improving

You may be a good fit if you

  • Have 2+ years of engineering management experience leading observability, monitoring, or metrics infrastructure teams
  • Bring domain expertise in metrics infrastructure — you've worked with Prometheus, Grafana, time series databases, or similar technologies
  • Have experience managing an internal platform team with many stakeholders — you know how to manage competing priorities and communicate tradeoffs clearly
  • Are operationally minded — you've led teams with significant on-call burden and know how to make reliability a first-class priority
  • Are a positive, high-energy leader who creates a "we can do this" environment even when things are hard. Life on the exponential is challenging!

Strong candidates may also have experience with

  • Running a metrics or observability system at a company with a large internal customer base
  • Managing external vendor partnerships for observability tooling
  • Observability for ML training or inference workloads
  • Building or operating metrics infrastructure at significant scale
  • The annual compensation range for this role is listed below.
  • For sales roles, the range provided is the role’s On Target Earnings ("OTE") range, meaning that the range includes both the sales commissions/sales bonuses target and annual base salary for the role.

How we're different

  • We believe that the highest-impact AI research will be big science. At Anthropic we work as a single cohesive team on just a few large-scale research efforts. And we value impact — advancing our long-term goals of steerable, trustworthy AI — rather than work on smaller and more specific puzzles. We view AI research as an empirical science, which has as much in common with physics and biology as with traditional efforts in computer science. We're an extremely collaborative group, and we host frequent research discussions to ensure that we are pursuing the highest-impact work at any given time. As such, we greatly value communication skills.
  • The easiest way to understand our research directions is to read our recent research. This research continues many of the directions our team worked on prior to Anthropic, including: GPT-3, Circuit-Based Interpretability, Multimodal Neurons, Scaling Laws, AI & Compute, Concrete Problems in AI Safety, and Learning from Human Preferences.

Job Summary

CompanyAnthropic
LocationSan Francisco, CA | New York City, NY
TypeFull-Time
LevelManager
DomainSoftware Engineering