You will be redirected to the company career page

Responsibilities:

  • Design, build, and develop/enhance state of art machine Learning system infrastructure (cloud and on-premise) core components and architect platforms to create, train and deploy ML models.
  • Build operating dashboards and charts to track system errors, performance and enable root cause analysis.
  • Identify gaps and evaluate relevant tools and technologies as needed to improve processes and systems, leveraging open-source and cloud computing technologies to build effective solutions.
  • Collaborate with the AI team to drive ML projects from conception to completion and production monitoring.

Requirements:

  • Bachelor's or above with a good academic background.
  • 2-4 years of meaningful work experience in DevOps handling complex services.
  • Strong troubleshooting skills to keep our services highly available.
  • Strong expertise and experience with Google Cloud Platform (GCP), Docker, Kubernetes, CI/CD, and Jenkins.
  • Extensive experience in designing, implementing, and maintaining infrastructure as code, preferably using Terraform.
  • Create and maintain deployment manifest files for microservices using HELM.
  • Having LLMOps or MLOps experience is a bonus.
  • Strong expertise is required with deployment at scale on a Kubernetes cluster via HPA.
  • Broad technical background and experience with architecture, design, and operations of cloud solutions and how to meet security compliance requirements.
  • Monitoring system health, ensuring security, scalability, and reliability.
  • Design, implement, and maintain observability, monitoring, logging, and alerting using tools like Prometheus, Grafana, Promtail, Loki, and Datadog.

Job Summary

CompanyLevelAI
LocationNoida
TypeFull-Time
LevelMid-level
DomainDevOps