Senior DevOps Engineer Nutanix Kubernetes & AI Platform

Vision Unlimited

Employer Active

Posted 28 min ago

Experience

5 - 10 Years

Education

Bachelor of Science(Computers)

Nationality

Any Nationality

Gender

Not Mentioned

Vacancy

1 Vacancy

Job Description

Roles & Responsibilities

Key Responsibilities

  • End-to-End Kubernetes Platform Ownership: Design, deploy, manage, and maintain production-grade Kubernetes clusters on Nutanix Karbon (or native K8s on Nutanix AHV), ensuring high availability, performance, and security.
  • AI/ML Infrastructure Architecture: Architect and implement scalable, cost-efficient infrastructure tailored for AI workloadsincluding GPU orchestration, distributed training, model serving, and data-intensive pipelines.
  • Infrastructure as Code (IaC): Automate provisioning and configuration of Nutanix K8s environments using Terraform, Ansible, Helm, and GitOps workflows (e.g., ArgoCD/Flux).
  • CI/CD for AI Services: Build and maintain secure, efficient CI/CD pipelines for deploying AI microservices, model endpoints, and data processing jobs into K8s environments.
  • Observability & SRE Practices: Implement comprehensive monitoring, logging, and alerting (using Prometheus, Grafana, ELK, OpenTelemetry, etc.) with SLO/SLI tracking for AI platform reliability.
  • Security & Compliance: Enforce zero-trust networking, RBAC, pod security policies, image scanning, and secrets management (e.g., HashiCorp Vault) aligned with enterprise security standards.
  • Performance Optimization: Tune K8s scheduling, storage (Nutanix Files/Objects), networking (CNI), and resource allocation (CPU/GPU/memory) for AI/ML workloads.
  • Collaboration & Enablement: Partner with AI/ML engineers to onboard models and services onto the platform; document best practices and provide self-service tooling.
  • Disaster Recovery & Backup: Implement and test backup/recovery strategies for K8s workloads and persistent data using Nutanix-native or third-party tools (e.g., Velero).

Required Qualifications

  • 5+ years of DevOps/SRE experience with 3+ years focused on Kubernetes in production environments.
  • Deep hands-on experience with Nutanix (AHV, Prism, Karbon, Files, Objects) and managing K8s on-prem or hybrid.
  • Proven track record designing and operating AI/ML infrastructure (e.g., Kubeflow, MLflow, Seldon, KServe, Ray).
  • Expertise in Infrastructure as Code: Terraform, Helm, Ansible, GitOps.
  • Strong scripting/automation skills (Python, Bash, Go).
  • Experience with GPU orchestration (NVIDIA device plugins, MIG, CUDA) in K8s.
  • Solid understanding of networking, storage, and security in K8s (CNI, CSI, RBAC, OPA/Gatekeeper).
  • Familiarity with CI/CD tools (GitLab CI, Jenkins, GitHub Actions) and artifact management (Harbor, JFrog).
  • Experience with observability stacks (Prometheus, Grafana, Loki, Tempo, OpenTelemetry).
  • Bachelors degree in Computer Science, Engineering, or equivalent practical experience.

Preferred Qualifications

  • Nutanix certifications (e.g., NCP-MCI, NCP-DS).
  • CNCF certifications (CKA, CKAD, CKS).
  • Experience with multi-cluster management (Rancher, Anthos, OpenShift).
  • Knowledge of MLOps practices and tools (MLflow, TFX, Kubeflow Pipelines).
  • Experience in regulated industries (finance, healthcare) with compliance needs (SOC2, HIPAA, GDPR).

Desired Candidate Profile

Required Qualifications

  • 5+ years of DevOps/SRE experience with 3+ years focused on Kubernetes in production environments.
  • Deep hands-on experience with Nutanix (AHV, Prism, Karbon, Files, Objects) and managing K8s on-prem or hybrid.
  • Proven track record designing and operating AI/ML infrastructure (e.g., Kubeflow, MLflow, Seldon, KServe, Ray).
  • Expertise in Infrastructure as Code: Terraform, Helm, Ansible, GitOps.
  • Strong scripting/automation skills (Python, Bash, Go).
  • Experience with GPU orchestration (NVIDIA device plugins, MIG, CUDA) in K8s.
  • Solid understanding of networking, storage, and security in K8s (CNI, CSI, RBAC, OPA/Gatekeeper).
  • Familiarity with CI/CD tools (GitLab CI, Jenkins, GitHub Actions) and artifact management (Harbor, JFrog).
  • Experience with observability stacks (Prometheus, Grafana, Loki, Tempo, OpenTelemetry).
  • Bachelors degree in Computer Science, Engineering, or equivalent practical experience.

Preferred Qualifications

  • Nutanix certifications (e.g., NCP-MCI, NCP-DS).
  • CNCF certifications (CKA, CKAD, CKS).
  • Experience with multi-cluster management (Rancher, Anthos, OpenShift).
  • Knowledge of MLOps practices and tools (MLflow, TFX, Kubeflow Pipelines).
  • Experience in regulated industries (finance, healthcare) with compliance needs (SOC2, HIPAA, GDPR).

Company Industry

Department / Functional Area

Keywords

  • Senior DevOps Engineer Nutanix Kubernetes & AI Platform

Disclaimer: Naukrigulf.com is only a platform to bring jobseekers & employers together. Applicants are advised to research the bonafides of the prospective employer independently. We do NOT endorse any requests for money payments and strictly advice against sharing personal or bank related information. We also recommend you visit Security Advice for more information. If you suspect any fraud or malpractice, email us at abuse@naukrigulf.com