Red Team Engineer, Safeguards

Remote-Friendly (Travel-Required) | San Francisco, CA | Washington, DCFull-TimeMid-levelSoftware Engineering

You will be redirected to the company career page

Anthropic’s mission is to create reliable, interpretable, and steerable AI systems. We want AI to be safe and beneficial for our users and for society as a whole. Our team is a quickly growing group of committed researchers, engineers, policy experts, and business leaders working together to build beneficial AI systems.

Anthropic's Safeguards team is seeking a Red Team Engineer to help ensure the safety of our deployed AI systems and products. In this role, you'll take an adversarial approach to uncover vulnerabilities across our product ecosystem before they can be exploited by malicious actors. Your work will span from technical infrastructure vulnerabilities on our products to emergent risks from advanced AI capabilities
While you'll take best practices from traditional security approaches, the focus is on broader safety implications and novel abuse unique to advanced AI systems and associated products. You'll investigate the full spectrum of potential abuse: from coordinated account manipulation and payment fraud to novel exploitation of product features. You'll simulate sophisticated threat actors who chain multiple attack vectors to achieve their objectives.

Conduct comprehensive adversarial testing across Anthropic’s product surfaces, developing creative attack scenarios that combine multiple exploitation techniques
Research and implement novel testing approaches for emerging capabilities, including agent systems, tool use, and new interaction paradigms
Design and execute 'full kill chain' attacks that emulate real-world threat actors attempting to achieve specific malicious objectives
Build and maintain systematic testing methodologies that evaluate every aspect of our systems.
Develop automated testing frameworks to enable continuous assessment at scale
Collaborate with Product, Engineering, and Policy teams to translate findings into concrete improvements
Help establish metrics for measuring detection effectiveness of novel abuse

Demonstrated experience in penetration testing, red teaming, or application security
Strong technical skills in web application security, including hands-on expertise with security testing tools (Burp Suite, Metasploit, custom scripting frameworks, etc.)
A track record of discovering novel attack vectors and chaining vulnerabilities in creative ways
A public body of work such as CVEs, blog posts, or disclosed bug bounty reports
Experience with security testing tools and the ability to build custom automation
Adaptability to understand and build engagements around emerging threats outside of your direct area of expertise
Strong written and verbal communication skills, with the ability to explain technical concepts to varied audiences
Proven ability to think like an attacker

Experience with AI/ML security or adversarial machine learning
Experience testing API security and rate limiting systems
Background in testing business logic vulnerabilities and authorization bypass techniques
Background in anti-fraud, trust & safety, or abuse prevention systems
Familiarity with distributed systems and infrastructure security
Understanding of AI safety considerations beyond traditional security
Familiarity with abuse detection mechanisms and the ability to engineer novel bypasses
The annual compensation range for this role is listed below.
For sales roles, the range provided is the role’s On Target Earnings ("OTE") range, meaning that the range includes both the sales commissions/sales bonuses target and annual base salary for the role.

We believe that the highest-impact AI research will be big science. At Anthropic we work as a single cohesive team on just a few large-scale research efforts. And we value impact — advancing our long-term goals of steerable, trustworthy AI — rather than work on smaller and more specific puzzles. We view AI research as an empirical science, which has as much in common with physics and biology as with traditional efforts in computer science. We're an extremely collaborative group, and we host frequent research discussions to ensure that we are pursuing the highest-impact work at any given time. As such, we greatly value communication skills.
The easiest way to understand our research directions is to read our recent research. This research continues many of the directions our team worked on prior to Anthropic, including: GPT-3, Circuit-Based Interpretability, Multimodal Neurons, Scaling Laws, AI & Compute, Concrete Problems in AI Safety, and Learning from Human Preferences.

CompanyAnthropic

LocationRemote-Friendly (Travel-Required) | San Francisco, CA | Washington, DC

TypeFull-Time

LevelMid-level

DomainSoftware Engineering

Similar roles you might like