Lead AI Test Automation Specialist

Cebu, PhilippinesFull-TimeLeadAI / Data Science

Skills

Python JavaScript REST CI/CD Jira Selenium Pytest Machine learning LLM Security Problem solving Documentation Large language models Prompt engineering LLM evaluation Model evaluation

You will be redirected to the company career page

At DevRev, we’re building the future of work with Computer – your AI teammate.

Computer is not just another tool. It’s built on the belief that the future of work should be about genuine human connection and collaboration – not piling on more apps.Computer is the best kind of teammate: it amplifies your strengths, takes repetition and frustration out of your day, and gives you more time and energy to do your best work.
How?

Extensions for your teams and customersComputer doesn’t make you choose between new software and old. Its AI-native platform lets you extend existing tools with sophisticated apps and agents. So your teams – and your customers – can take action, seamlessly. These agents work alongside you: updating workflows, coordinating across teams, and syncing back to your systems.

This isn’t just software. Computer brings people back together, breaking down silos and ushering in the future of teamwork, through human-AI collaboration. Stop managing software. Stop wasting time. Start solving bigger problems, building better products, and making your customers happier.
We call this Team Intelligence. It’s why DevRev exists.
Trusted by global companies across multiple industries, DevRev is backed by Khosla Ventures and Mayfield, with $150M+ raised. We are 650+ people, across eight global offices.

Key Responsibilities

Design and implement comprehensive testing strategies for GenAI features, including conversational AI, agentic systems, and LLM-powered workflows Develop automated test suites for prompt testing, including regression tests that detect unintended changes in model behaviour Create evaluation frameworks to measure GenAI quality across multiple dimensions (accuracy, relevance, safety, consistency, latency) Build and maintain test datasets and golden examples that represent diverse user scenarios and edge cases Implement monitoring and alerting systems to detect quality degradation in production GenAI features Perform adversarial testing to identify potential failures, hallucinations, biases, or security vulnerabilities in AI systems Collaborate with engineers to define acceptance criteria and quality gates for AI feature releases Develop tools and frameworks that make it easy for engineers to test their GenAI implementations Conduct user acceptance testing and gather feedback on AI feature performance from internal users Document testing procedures, known issues, and quality metrics in clear, accessible formats Partner with Product and Design teams to ensure AI features meet user experience standards Stay current with GenAI testing methodologies, tools, and industry best practices Your Qualifications PRE or test engineering experience, preferably with AI/ML systems. Strong understanding of GenAI technologies including LLMs, prompt engineering, and AI application patterns Experience with test automation frameworks and scripting (Python, JavaScript, Selenium, Pytest) Knowledge of software testing methodologies (functional, integration, regression, performance, security testing) Ability to design test cases and evaluation criteria for non-deterministic systems Strong analytical and problem-solving skills with attention to detail Experience with API testing tools (Postman, REST Assured) and backend testing Familiarity with CI/CD pipelines and automated testing integration Excellent communication skills for documenting issues and collaboration Preferred Qualifications Experience testing conversational AI, chatbots, or agentic systems Knowledge of ML model evaluation metrics and techniques Familiarity with LLM evaluation frameworks (LangSmith, PromptFoo, Ragas) Experience with performance testing and load testing AI APIs Understanding of responsible AI principles, including fairness, transparency, and safety testing Background in enterprise software or SaaS QA Experience with test management tools (TestRail, Zephyr, Jira) Knowledge of security testing methodologies for AI systems Scripting experience with Python, including working with LLM APIs What Makes This Role Exciting Define Quality practices for GenAI applications Work on cutting-edge AI technologies and help ensure they're reliable and trustworthy Shape quality standards that will impact millions of enterprise users Collaborate closely with engineers, data scientists, and product teams Grow expertise in a highly specialized and increasingly important domain Influence the entire AI product development lifecycle from design to release Join a team that values quality as a first-class concern, not an afterthought
Design and implement comprehensive testing strategies for GenAI features, including conversational AI, agentic systems, and LLM-powered workflows
Develop automated test suites for prompt testing, including regression tests that detect unintended changes in model behaviour
Create evaluation frameworks to measure GenAI quality across multiple dimensions (accuracy, relevance, safety, consistency, latency)
Build and maintain test datasets and golden examples that represent diverse user scenarios and edge cases
Implement monitoring and alerting systems to detect quality degradation in production GenAI features
Perform adversarial testing to identify potential failures, hallucinations, biases, or security vulnerabilities in AI systems
Collaborate with engineers to define acceptance criteria and quality gates for AI feature releases
Develop tools and frameworks that make it easy for engineers to test their GenAI implementations
Conduct user acceptance testing and gather feedback on AI feature performance from internal users
Document testing procedures, known issues, and quality metrics in clear, accessible formats
Partner with Product and Design teams to ensure AI features meet user experience standards
Stay current with GenAI testing methodologies, tools, and industry best practices
PRE or test engineering experience, preferably with AI/ML systems.
Strong understanding of GenAI technologies including LLMs, prompt engineering, and AI application patterns
Experience with test automation frameworks and scripting (Python, JavaScript, Selenium, Pytest)
Knowledge of software testing methodologies (functional, integration, regression, performance, security testing)
Ability to design test cases and evaluation criteria for non-deterministic systems
Strong analytical and problem-solving skills with attention to detail
Experience with API testing tools (Postman, REST Assured) and backend testing
Familiarity with CI/CD pipelines and automated testing integration
Excellent communication skills for documenting issues and collaboration
Experience testing conversational AI, chatbots, or agentic systems
Knowledge of ML model evaluation metrics and techniques
Familiarity with LLM evaluation frameworks (LangSmith, PromptFoo, Ragas)
Experience with performance testing and load testing AI APIs
Understanding of responsible AI principles, including fairness, transparency, and safety testing
Background in enterprise software or SaaS QA
Experience with test management tools (TestRail, Zephyr, Jira)
Knowledge of security testing methodologies for AI systems
Scripting experience with Python, including working with LLM APIs
Define Quality practices for GenAI applications
Work on cutting-edge AI technologies and help ensure they're reliable and trustworthy
Shape quality standards that will impact millions of enterprise users
Collaborate closely with engineers, data scientists, and product teams
Grow expertise in a highly specialized and increasingly important domain
Influence the entire AI product development lifecycle from design to release
Join a team that values quality as a first-class concern, not an afterthought
Design and implement comprehensive testing strategies for GenAI features, including conversational AI, agentic systems, and LLM-powered workflows
Develop automated test suites for prompt testing, including regression tests that detect unintended changes in model behaviour
Create evaluation frameworks to measure GenAI quality across multiple dimensions (accuracy, relevance, safety, consistency, latency)
Build and maintain test datasets and golden examples that represent diverse user scenarios and edge cases
Implement monitoring and alerting systems to detect quality degradation in production GenAI features
Perform adversarial testing to identify potential failures, hallucinations, biases, or security vulnerabilities in AI systems
Collaborate with engineers to define acceptance criteria and quality gates for AI feature releases
Develop tools and frameworks that make it easy for engineers to test their GenAI implementations
Conduct user acceptance testing and gather feedback on AI feature performance from internal users
Document testing procedures, known issues, and quality metrics in clear, accessible formats
Partner with Product and Design teams to ensure AI features meet user experience standards
Stay current with GenAI testing methodologies, tools, and industry best practices
PRE or test engineering experience, preferably with AI/ML systems.
Strong understanding of GenAI technologies including LLMs, prompt engineering, and AI application patterns
Experience with test automation frameworks and scripting (Python, JavaScript, Selenium, Pytest)
Knowledge of software testing methodologies (functional, integration, regression, performance, security testing)
Ability to design test cases and evaluation criteria for non-deterministic systems
Strong analytical and problem-solving skills with attention to detail
Experience with API testing tools (Postman, REST Assured) and backend testing
Familiarity with CI/CD pipelines and automated testing integration
Excellent communication skills for documenting issues and collaboration
Experience testing conversational AI, chatbots, or agentic systems
Knowledge of ML model evaluation metrics and techniques
Familiarity with LLM evaluation frameworks (LangSmith, PromptFoo, Ragas)
Experience with performance testing and load testing AI APIs
Understanding of responsible AI principles, including fairness, transparency, and safety testing
Background in enterprise software or SaaS QA
Experience with test management tools (TestRail, Zephyr, Jira)
Knowledge of security testing methodologies for AI systems
Scripting experience with Python, including working with LLM APIs
Define Quality practices for GenAI applications
Work on cutting-edge AI technologies and help ensure they're reliable and trustworthy
Shape quality standards that will impact millions of enterprise users
Collaborate closely with engineers, data scientists, and product teams
Grow expertise in a highly specialized and increasingly important domain
Influence the entire AI product development lifecycle from design to release
Join a team that values quality as a first-class concern, not an afterthought
Join us in innovating our testing processes and ensuring the delivery of high-quality software products through advanced automation techniques.

Culture

The foundation of DevRev is its culture -- our commitment to those who are hungry, humble, honest, and who act with heart. Our vision is to help build the earth’s most customer-centric companies. Our mission is to leverage design, data engineering, and machine intelligence to empower engineers to embrace their customers.
That is DevRev!