Site Reliability Engineer

Costa RicaFull-TimeMid-levelDevOps

Skills

Python Shell scripting AWS Kubernetes Docker Terraform Ansible CI/CD Jenkins GitHub Actions EC2 S3 Linux Prometheus Grafana ELK GitHub GitLab Security Ownership Kafka

You will be redirected to the company career page

Company Overview

At Zuora, we do Modern Business. We’re helping people subscribe to new ways of doing business that are better for people, companies and ultimately the planet. It’s an approach resulting from the shift to the Subscription Economy that puts customers first by building recurring relationships instead of one-time product sales and focuses on sustainable growth. Through our leading expertise and multi-product suite, we are transforming all industries and working with the world’s most innovative companies to monetize new business models, nurture subscriber relationships and optimize their digital experiences.

The Team & Role

Zuora’s Cloud Engineering teams are responsible for Cloud infrastructures, monitoring performance and uptime, managing internal and external shared services, infrastructure services and more -for Zuora’s customer facing SaaS products and platforms. Our technologists sit across US, Beijing, India, Costa Rica and remotely, using a follow-the-sun model to provide 24x7x365 coverage for critical functions and partner closely with our Engineering, Customer Support, Security, Global Services and Sales teams on a daily basis to keep our customers front and center.

We are seeking a highly skilled and motivated Site Reliability Engineer (SRE) to join our infrastructure team. The ideal candidate will be focused on maximizing system uptime, efficiency, and reliability while building the tools and automation necessary to scale our services. This role requires a strong balance of operational experience and development skills, with deep expertise in cloud environments and modern CI/CD practices.

This is a location specific position that requires you to come into the office regularly to be most effective.

What you’ll do

Reliability & Performance: Maintain and improve the reliability, scalability, and performance of our production systems, targeting a high-availability environment.
Automation: Design, implement, and maintain automation solutions for infrastructure provisioning, deployment, configuration management, and monitoring using Terraform and Jenkins.
Infrastructure Management: Administer, manage, and optimize our cloud infrastructure primarily hosted on AWS, focusing on cost efficiency and secure operations.
Configuration Management: Develop and maintain infrastructure-as-code using Puppet and/or Ansible to ensure consistent and reproducible environments.
Incident Response: Participate in on-call rotation, troubleshoot and resolve critical production incidents, and conduct comprehensive post-mortems to prevent recurrence.
System Hardening: Apply strong Linux administration skills to manage, patch, and secure operating systems and underlying infrastructure.
Messaging & Data Streams: Manage and optimize distributed messaging systems, specifically Kafka, ensuring high throughput and data integrity.

Your experience

2–4 years of relevant experience in SRE, DevOps, or Cloud Engineering roles.
AWS Cloud Fundamentals: Hands-on experience with core AWS services such as EC2, S3, IAM, VPC, RDS, and CloudWatch. Exposure to EKS/ECS is a plus but not mandatory.
Infrastructure as Code: Practical experience using Terraform to provision and manage infrastructure. Able to read, modify, and create modules with guidance.
Configuration Management: Working knowledge of Ansible or similar tools. Able to maintain and troubleshoot existing configurations.
CI/CD Pipelines: Experience supporting or maintaining CI/CD pipelines (e.g., Jenkins, GitHub Actions, GitLab CI). Capable of creating basic pipelines with supervision.
Scripting Skills: Proficiency in Python or Shell scripting for automation tasks and operational tooling.
Linux Administration: Solid understanding of Linux fundamentals, including troubleshooting, package management, networking basics, and system performance monitoring.
Containerization: Basic experience with Docker and understanding of container concepts.
Monitoring & Observability: Familiarity with monitoring tools (e.g., CloudWatch, Prometheus, Grafana).
Incident Support: Experience participating in on-call rotations or production support environments.
Collaboration Skills: Ability to work in cross-functional teams and follow established operational processes and SRE practices.

Nice to haves

Experience with containerization technologies like Docker and Kubernetes (EKS).
Familiarity with logging and monitoring tools (e.g., Prometheus, Grafana, ELK stack).
Knowledge of networking (TCP/IP, Load Balancing, DNS).
Previous experience in a 24/7 high-availability production environment.

#ZEOLife at Zuora

As an industry pioneer, our work is constantly evolving and challenging us in new ways that require us to think differently, iterate often and learn constantly—it’s exciting. Our people, whom we refer to as “ZEOs" are empowered to take on a mindset of ownership and make a bigger impact here. Our teams collaborate deeply, exchange different ideas openly and together we’re making what’s next possible for our customers, community and the world.
As part of our commitment to building an inclusive, high-performance culture where ZEOs feel inspired, connected and valued, we support ZEOs with:
Competitive compensation, variable bonus and performance reward opportunities, and retirement programs
Medical, dental and vision insurance
Generous, flexible time off
Paid holidays, “wellness” days and company wide end of year break
Paid parental leave
Learning & Development stipend
Opportunities to volunteer and give back, including charitable donation match
Free resources and support for your mental wellbeing
Specific benefits offerings may vary by country and can be viewed in more detail during your interview process.
Location & Work Arrangements
Organizations and teams at Zuora are empowered to design efficient and flexible ways of working, being intentional about scheduling, communication, and collaboration strategies that help us achieve our best results. In our dynamic, globally distributed company, this means balancing flexibility and responsibility — flexibility to live our lives to the fullest, and responsibility to each other, to our customers, and to our shareholders. For most roles, we offer the flexibility to work both remotely and at Zuora offices.
Our Commitment to an Inclusive Workplace
Think, be and do you! At Zuora, different perspectives, experiences and contributions matter. Everyone counts. Zuora is proud to be an Equal Opportunity Employer committed to creating an inclusive environment for all.
Zuora does not discriminate on the basis of, and considers individuals seeking employment with Zuora without regards to, race, religion, color, national origin, sex (including pregnancy, childbirth, reproductive health decisions, or related medical conditions), sexual orientation, gender identity, gender expression, age, status as a protected veteran, status as an individual with a disability, genetic information, political views or activity, or other applicable legally protected characteristics.
We encourage candidates from all backgrounds to apply. Applicants in need of special assistance or accommodation during the interview process or in accessing our website may contact us by sending an email to assistance@zuora.com.

Job Summary

CompanyZuora

LocationCosta Rica

TypeFull-Time

LevelMid-level

DomainDevOps

More roles at Zuora

View company profile

Zuora

Site Reliability Engineer

Skills

Company Overview

The Team & Role

What you’ll do

Your experience

Nice to haves

#ZEOLife at Zuora

Job Summary

Similar roles you might like

Senior DevOps Engineer

Gestor DevOps Senior

Senior DevOps

More roles at Zuora

Business System Analyst - HR & Legal Systems

Business Systems Analyst – HR & Legal Systems

Enterprise Account Executive