You will be redirected to the company career page

Responsibilities

  • Architect and scale enterprise-grade AKS clusters built for high concurrency, performance, and real-time AI inference, ensuring the platform is globally distributed and highly available.
  • Leverage Crossplane for Kubernetes-native provisioning of Azure services, creating a Kubernetes-native control plane for rapid scaling of AI services.
  • Champion GitOps practices with Argo CD to standardize deployments across multiple environments and regions, enabling reliable, automated delivery of mission-critical SaaS workloads.
  • Engineer infrastructure that supports data-intensive AI/ML pipelines, integrating compute, storage, and messaging with Kubernetes to power real-time decision intelligence use cases.
  • Optimize scalability and concurrency with autoscaling, pod disruption budgets, and advanced workload scheduling, ensuring millions of daily requests are served with low latency.
  • Develop and maintain automation, tooling, and integrations using Python, Ruby, and Terraform, enabling teams to scale infrastructure and AI services efficiently.
  • Design and enforce secure, compliant, multi-tenant architectures with Azure AD SSO, managed identities, RBAC, and Key Vault integration.
  • Build resilient networking topologies with VNets, VNet peering, Private Link, and service mesh technologies (e.g., Istio, Linkerd) and emissary ingress for advanced security and reliability.
  • Integrate observability frameworks at scale using Prometheus, Grafana, Azure Monitor, and OpenTelemetry, providing deep visibility into performance, availability, and latency.
  • Collaborate closely with AI/ML engineering teams to align infrastructure with real-time inference and streaming data requirements, enabling cutting-edge decision automation.
  • Mentor engineering and operations teams while documenting and evangelizing Kubernetes-native and Azure-native best practices, driving innovation across the organization.

About You

  • 10+ years of cloud infrastructure experience with expert-level skills in Kubernetes and Azure.
  • Proven experience designing and operating multi-tenant SaaS platforms where performance, scalability, and security are critical.
  • Hands-on expertise with Crossplane for Kubernetes-controlled Azure service provisioning.
  • Deep familiarity with Azure services: AKS, AzureFlexible MySQL, Blob Storage, Event Hubs, Key Vault, etc.
  • Strong coding and automation background with Github Actions, Python, and Terraform, plus experience with other high-level programming and scripting languages.
  • Skilled in Infrastructure as Code (Terraform, Crossplane, Helm) and GitOps (Argo CD).
  • In-depth knowledge of Kubernetes networking, autoscaling, and workload orchestration for AI/ML inference workloads.
  • Proficiency with observability tooling: Prometheus, Grafana, Azure Monitor, and OpenTelemetry.
  • A collaborative leader who thrives on mentoring and enabling teams, with excellent communication and documentation skills.
  • Motivated to build the core infrastructure behind AI-powered decision intelligence at global scale, driving meaningful impact for some of the world’s most recognized brands.

Nice to Have

  • Background with large-scale, real-time data streaming platforms.
  • Prior collaboration on AI/ML infrastructure platforms or decision intelligence systems.
  • Contributions to open-source projects, especially in Kubernetes or the cloud-native ecosystem.

Job Summary

CompanyAeraTechnology
LocationRemote US, USA
TypeFull-Time
LevelStaff
DomainOther