Principal Cloud Operations Engineer (10166)

San Jose, California, United StatesFull-TimeStaffOperations

You will be redirected to the company career page

Responsibilities:

  • Provide technical leadership in cloud architecture, operational excellence, reliability, and cost optimization across large-scale production environments.
  • Stay current with industry trends and best practices, and leverage AI technologies and cloud service provider platforms (AWS, Google Cloud, and Azure) to improve operational efficiency, scalability, security, and resiliency.
  • Design and ensure secure, reliable, and high-performance communication across multiple regions and cloud service providers.
  • Configure, tune, and operate middleware services, including SQL and NoSQL databases, messaging and streaming platforms, and related infrastructure components.
  • Evaluate, recommend, and lead the adoption of CloudOps and DevOps tools, platforms, and automation solutions.
  • Troubleshoot complex production infrastructure and application issues, providing deep technical expertise and hands-on support when required.
  • Drive root cause analysis (RCA), implement corrective actions, and establish preventive measures to avoid recurrence.
  • Collaborate closely with engineering cloud architects in system design discussions, architecture reviews, and whiteboard sessions.
  • Partner with Development, QA, SRE, and external service providers or carriers to resolve issues and improve system reliability.
  • Design, implement, and evolve deployment automation platforms for Kubernetes-based microservices.
  • Improve service availability, performance, and scalability through automation, tooling, capacity planning, and process improvements.
  • Analyze system and service performance, identify bottlenecks, and deliver actionable recommendations to improve efficiency and resilience.

Qualifications:

  • BS level technical degree required; Computer Science or Engineering background preferred.
  • 8+ years of experience in a CloudOps / DevOps role.
  • Hands on experience with AWS or any public cloud (Azure, GCP etc.).
  • Knowledge of Linux, security and networking fundamentals.
  • Working knowledge of container-based architecture and deployment (Docker, Kubernetes.)
  • Working knowledge of deployment automation development (Terraform, Helm, ArgoCD).
  • Experience in diagnosing and resolving complex application problems.
  • Working knowledge of Elasticsearch, PostgreSQL, Redis, Ignite, Flink, Kafka, and RabbitMQ.
  • Experience with monitoring tools (Nagios, Grafana, Prometheus)
  • Experience with cloud security and compliance implementation is a plus.
  • Strong follow-through and initiative to stay with issues until they are resolved.
  • Comfortable working within a distributed team located in multiple time zones.
  • Salary based on region, qualifications and experience up to USD 160,000 - 200,000

Job Summary

CompanyExtremenetworks
LocationSan Jose, California, United States
TypeFull-Time
LevelStaff
DomainOperations