You will be redirected to the company career page

What You'll Do:

  • Architect & Scale Infrastructure: Design and implement multi-cluster, multi-region Kubernetes deployments using EKS, GKE, and AKS. Build infrastructure that scales across regions and cloud providers.
  • Own Production Systems: Take end-to-end ownership of production infrastructure. Drive incident response, postmortems, and improvements to prevent recurrence.
  • Infrastructure as Code at Scale: Build and maintain Terraform modules for complex infrastructure patterns. Manage thousands of configuration files across clusters, regions, and environments using GitOps principles.
  • GitOps & Deployment Excellence: Design and optimize ArgoCD ApplicationSets and Helm chart architectures. Build deployment pipelines that enable safe, automated releases across hundreds of microservices.
  • Performance & Reliability Engineering: Analyze system performance, identify bottlenecks, and implement optimizations. Improve SLOs through capacity planning, autoscaling, and architectural improvements.
  • Observability & Monitoring: Build and enhance monitoring, alerting, and observability using Prometheus, Grafana, Loki, and custom tooling. Drive visibility into complex distributed systems.
  • Security & Compliance: Implement security controls, compliance frameworks, and best practices across cloud infrastructure. Design secure multi-tenant architectures.
  • Technical Leadership: Mentor engineers, establish best practices, and drive technical decisions. Collaborate with platform, SRE, and product teams to deliver reliable infrastructure.

What We're Looking For:

  • 5+ years in cloud infrastructure engineering, with deep expertise in at least one major cloud provider (AWS preferred)
  • Strong Kubernetes experience: cluster design, operators, controllers, and multi-cluster management
  • Proficiency with Infrastructure as Code: Terraform, CloudFormation, or similar
  • GitOps expertise: ArgoCD, Flux, or similar; experience with ApplicationSets and complex deployment patterns
  • Deep Linux and networking knowledge
  • Experience with distributed systems: Elasticsearch, PostgreSQL, Redis, Kafka, RabbitMQ
  • Monitoring and observability: Prometheus, Grafana, ELK stack, or similar
  • Strong problem-solving skills and experience debugging complex distributed systems
  • Experience with cloud security, compliance (SOC2, ISO27001), and secure-by-design practices
  • Excellent communication skills for working across time zones and with distributed teams
  • Self-directed with a track record of owning problems end-to-end
  • Ability to participate in the teams on-call rotation.

Nice to Have:

  • Experience with multi-cloud architectures and cloud-agnostic patterns
  • Contributions to open-source infrastructure projects
  • Experience with service mesh technologies (Istio, Linkerd)
  • Knowledge of chaos engineering and reliability testing
  • Experience with cost optimization and FinOps practices

Why This Role:

  • Work on infrastructure at scale: hundreds of clusters, thousands of services, global reach
  • Deep technical ownership: design, build, and operate critical systems
  • Modern stack: Kubernetes, GitOps, Infrastructure as Code, cloud-native tools
  • Impact: infrastructure decisions affect millions of users
  • Growth: work with experienced engineers and tackle complex challenges

Job Summary

CompanyExtremenetworks
LocationIreland
TypeFull-Time
LevelStaff
DomainOperations