Engineering Manager - Observability

San Francisco, CA | New York City, NYFull-TimeManagerSoftware Engineering

You will be redirected to the company career page

Anthropic’s mission is to create reliable, interpretable, and steerable AI systems. We want AI to be safe and beneficial for our users and for society as a whole. Our team is a quickly growing group of committed researchers, engineers, policy experts, and business leaders working together to build beneficial AI systems.

Anthropic is looking for an Engineering Manager to help lead our Observability team — the group responsible for the metrics infrastructure that keeps Anthropic's most critical systems running. When metrics go down, the company can't tell how training runs are progressing or whether production inference is healthy. This is mission-critical infrastructure with real operational stakes.
You'll lead a growing software engineering team, partner with strong technical leads, and manage the internal and external relationships that make a platform team successful at scale. If you've led teams build a metrics or observability system before and thrive in high-operational-tempo environments, this is a rare chance to do it at a company where the infrastructure genuinely matters.

Help grow the Observability team, hiring exceptional software engineers and building a resilient, high-ownership culture
Own Anthropic's metrics platform end-to-end — design, reliability, roadmap, and operational excellence
Build strong partnerships with internal customers across infrastructure, training, and inference teams to understand needs and manage priorities
Partner with the team's technical leads to align on architecture, execution, and hiring
Drive operational rigor — making on-call and incident response sustainable and continuously improving

Have 2+ years of engineering management experience leading observability, monitoring, or metrics infrastructure teams
Bring domain expertise in metrics infrastructure — you've worked with Prometheus, Grafana, time series databases, or similar technologies
Have experience managing an internal platform team with many stakeholders — you know how to manage competing priorities and communicate tradeoffs clearly
Are operationally minded — you've led teams with significant on-call burden and know how to make reliability a first-class priority
Are a positive, high-energy leader who creates a "we can do this" environment even when things are hard. Life on the exponential is challenging!

Running a metrics or observability system at a company with a large internal customer base
Managing external vendor partnerships for observability tooling
Observability for ML training or inference workloads
Building or operating metrics infrastructure at significant scale
The annual compensation range for this role is listed below.
For sales roles, the range provided is the role’s On Target Earnings ("OTE") range, meaning that the range includes both the sales commissions/sales bonuses target and annual base salary for the role.

We believe that the highest-impact AI research will be big science. At Anthropic we work as a single cohesive team on just a few large-scale research efforts. And we value impact — advancing our long-term goals of steerable, trustworthy AI — rather than work on smaller and more specific puzzles. We view AI research as an empirical science, which has as much in common with physics and biology as with traditional efforts in computer science. We're an extremely collaborative group, and we host frequent research discussions to ensure that we are pursuing the highest-impact work at any given time. As such, we greatly value communication skills.
The easiest way to understand our research directions is to read our recent research. This research continues many of the directions our team worked on prior to Anthropic, including: GPT-3, Circuit-Based Interpretability, Multimodal Neurons, Scaling Laws, AI & Compute, Concrete Problems in AI Safety, and Learning from Human Preferences.

CompanyAnthropic

LocationSan Francisco, CA | New York City, NY

TypeFull-Time

LevelManager

DomainSoftware Engineering

Similar roles you might like