Research Compute Operations

San Francisco, CA | New York City, NYFull-TimeMid-levelOperations

You will be redirected to the company career page

About Anthropic

  • Anthropic’s mission is to create reliable, interpretable, and steerable AI systems. We want AI to be safe and beneficial for our users and for society as a whole. Our team is a quickly growing group of committed researchers, engineers, policy experts, and business leaders working together to build beneficial AI systems.

About the Role

  • Anthropic's researchers use internal tooling and infrastructure to run the experiments that advance AI safety and capability. This role owns the researcher experience with that tooling — both the day-to-day support and the longer-term product vision. You'll be the person researchers come to when they need help, and the person driving improvements and automation to make that manual help unnecessary over time.
  • This role sits on the Capacity Operations team at the intersection of research and infrastructure.

Responsibilities

  • Serve as a primary point of contact for researchers using internal compute infrastructure, including triaging access issues, resolving researcher requests, and real-time monitoring
  • Proactively monitor usage patterns and work with researchers to optimize their workloads
  • Help design the product roadmap for research inference tooling. You will gather user feedback, prioritize improvements, and drive execution
  • Prototype better tools: dashboards, automations, self-service workflows, and more intuitive interfaces for complex systems
  • Build automations (using Claude) for common operational workflows
  • Serve as a primary point of contact for researchers using internal compute infrastructure, including triaging access issues, resolving researcher requests, and real-time monitoring
  • Proactively monitor usage patterns and work with researchers to optimize their workloads
  • Help design the product roadmap for research inference tooling. You will gather user feedback, prioritize improvements, and drive execution
  • Prototype better tools: dashboards, automations, self-service workflows, and more intuitive interfaces for complex systems
  • Build automations (using Claude) for common operational workflows

You may be a good fit if you

  • Have an engineering background (or equivalent technical depth) and have transitioned into or are drawn to product management, technical operations, or systems design work
  • Can query data, understand infrastructure, debug issues, and build tools and scripts to prototype solutions quickly
  • Are a systems-thinker: when a researcher hits a confusing error, you don't just fix it, you ask why the system produced it and how to prevent it for everyone
  • Are comfortable navigating ambiguity across teams and context-switching between tactical support and strategic design
  • Use Claude or other AI tools daily and are excited to teach others your best practices
  • Have an engineering background (or equivalent technical depth) and have transitioned into or are drawn to product management, technical operations, or systems design work
  • Can query data, understand infrastructure, debug issues, and build tools and scripts to prototype solutions quickly
  • Are a systems-thinker: when a researcher hits a confusing error, you don't just fix it, you ask why the system produced it and how to prevent it for everyone
  • Are comfortable navigating ambiguity across teams and context-switching between tactical support and strategic design
  • Use Claude or other AI tools daily and are excited to teach others your best practices

Strong candidates may also have

  • An understanding of compute infrastructure and familiarity with concepts like rate limiting, autoscaling, and request prioritization
  • Background in ML infrastructure, ML engineering, or research engineering
  • Experience with large-scale accelerator clusters (TPUs, GPUs, or similar)
  • Familiarity with ML training pipelines and how they consume inference capacity
  • Track record of building internal tools or developer platforms that people actually love using
  • Experience in developer experience (DevEx) or platform engineering
  • An understanding of compute infrastructure and familiarity with concepts like rate limiting, autoscaling, and request prioritization
  • Background in ML infrastructure, ML engineering, or research engineering
  • Experience with large-scale accelerator clusters (TPUs, GPUs, or similar)
  • Familiarity with ML training pipelines and how they consume inference capacity
  • Track record of building internal tools or developer platforms that people actually love using
  • Experience in developer experience (DevEx) or platform engineering
  • The annual compensation range for this role is listed below.
  • For sales roles, the range provided is the role’s On Target Earnings ("OTE") range, meaning that the range includes both the sales commissions/sales bonuses target and annual base salary for the role.

How we're different

  • We believe that the highest-impact AI research will be big science. At Anthropic we work as a single cohesive team on just a few large-scale research efforts. And we value impact — advancing our long-term goals of steerable, trustworthy AI — rather than work on smaller and more specific puzzles. We view AI research as an empirical science, which has as much in common with physics and biology as with traditional efforts in computer science. We're an extremely collaborative group, and we host frequent research discussions to ensure that we are pursuing the highest-impact work at any given time. As such, we greatly value communication skills.
  • The easiest way to understand our research directions is to read our recent research. This research continues many of the directions our team worked on prior to Anthropic, including: GPT-3, Circuit-Based Interpretability, Multimodal Neurons, Scaling Laws, AI & Compute, Concrete Problems in AI Safety, and Learning from Human Preferences.

Job Summary

CompanyAnthropic
LocationSan Francisco, CA | New York City, NY
TypeFull-Time
LevelMid-level
DomainOperations