Senior DevOps Engineer Nutanix Kubernetes & AI Platform
Vision Unlimited
Employer Active
Posted 28 min ago
Send me Jobs like this
Experience
5 - 10 Years
Job Location
Education
Bachelor of Science(Computers)
Nationality
Any Nationality
Gender
Not Mentioned
Vacancy
1 Vacancy
Job Description
Roles & Responsibilities
Key Responsibilities
- End-to-End Kubernetes Platform Ownership: Design, deploy, manage, and maintain production-grade Kubernetes clusters on Nutanix Karbon (or native K8s on Nutanix AHV), ensuring high availability, performance, and security.
- AI/ML Infrastructure Architecture: Architect and implement scalable, cost-efficient infrastructure tailored for AI workloadsincluding GPU orchestration, distributed training, model serving, and data-intensive pipelines.
- Infrastructure as Code (IaC): Automate provisioning and configuration of Nutanix K8s environments using Terraform, Ansible, Helm, and GitOps workflows (e.g., ArgoCD/Flux).
- CI/CD for AI Services: Build and maintain secure, efficient CI/CD pipelines for deploying AI microservices, model endpoints, and data processing jobs into K8s environments.
- Observability & SRE Practices: Implement comprehensive monitoring, logging, and alerting (using Prometheus, Grafana, ELK, OpenTelemetry, etc.) with SLO/SLI tracking for AI platform reliability.
- Security & Compliance: Enforce zero-trust networking, RBAC, pod security policies, image scanning, and secrets management (e.g., HashiCorp Vault) aligned with enterprise security standards.
- Performance Optimization: Tune K8s scheduling, storage (Nutanix Files/Objects), networking (CNI), and resource allocation (CPU/GPU/memory) for AI/ML workloads.
- Collaboration & Enablement: Partner with AI/ML engineers to onboard models and services onto the platform; document best practices and provide self-service tooling.
- Disaster Recovery & Backup: Implement and test backup/recovery strategies for K8s workloads and persistent data using Nutanix-native or third-party tools (e.g., Velero).
Required Qualifications
- 5+ years of DevOps/SRE experience with 3+ years focused on Kubernetes in production environments.
- Deep hands-on experience with Nutanix (AHV, Prism, Karbon, Files, Objects) and managing K8s on-prem or hybrid.
- Proven track record designing and operating AI/ML infrastructure (e.g., Kubeflow, MLflow, Seldon, KServe, Ray).
- Expertise in Infrastructure as Code: Terraform, Helm, Ansible, GitOps.
- Strong scripting/automation skills (Python, Bash, Go).
- Experience with GPU orchestration (NVIDIA device plugins, MIG, CUDA) in K8s.
- Solid understanding of networking, storage, and security in K8s (CNI, CSI, RBAC, OPA/Gatekeeper).
- Familiarity with CI/CD tools (GitLab CI, Jenkins, GitHub Actions) and artifact management (Harbor, JFrog).
- Experience with observability stacks (Prometheus, Grafana, Loki, Tempo, OpenTelemetry).
- Bachelors degree in Computer Science, Engineering, or equivalent practical experience.
Preferred Qualifications
- Nutanix certifications (e.g., NCP-MCI, NCP-DS).
- CNCF certifications (CKA, CKAD, CKS).
- Experience with multi-cluster management (Rancher, Anthos, OpenShift).
- Knowledge of MLOps practices and tools (MLflow, TFX, Kubeflow Pipelines).
- Experience in regulated industries (finance, healthcare) with compliance needs (SOC2, HIPAA, GDPR).
Desired Candidate Profile
Required Qualifications
- 5+ years of DevOps/SRE experience with 3+ years focused on Kubernetes in production environments.
- Deep hands-on experience with Nutanix (AHV, Prism, Karbon, Files, Objects) and managing K8s on-prem or hybrid.
- Proven track record designing and operating AI/ML infrastructure (e.g., Kubeflow, MLflow, Seldon, KServe, Ray).
- Expertise in Infrastructure as Code: Terraform, Helm, Ansible, GitOps.
- Strong scripting/automation skills (Python, Bash, Go).
- Experience with GPU orchestration (NVIDIA device plugins, MIG, CUDA) in K8s.
- Solid understanding of networking, storage, and security in K8s (CNI, CSI, RBAC, OPA/Gatekeeper).
- Familiarity with CI/CD tools (GitLab CI, Jenkins, GitHub Actions) and artifact management (Harbor, JFrog).
- Experience with observability stacks (Prometheus, Grafana, Loki, Tempo, OpenTelemetry).
- Bachelors degree in Computer Science, Engineering, or equivalent practical experience.
Preferred Qualifications
- Nutanix certifications (e.g., NCP-MCI, NCP-DS).
- CNCF certifications (CKA, CKAD, CKS).
- Experience with multi-cluster management (Rancher, Anthos, OpenShift).
- Knowledge of MLOps practices and tools (MLflow, TFX, Kubeflow Pipelines).
- Experience in regulated industries (finance, healthcare) with compliance needs (SOC2, HIPAA, GDPR).
Company Industry
- IT - Software Services
Department / Functional Area
- IT Software
Keywords
- Senior DevOps Engineer Nutanix Kubernetes & AI Platform
Disclaimer: Naukrigulf.com is only a platform to bring jobseekers & employers together. Applicants are advised to research the bonafides of the prospective employer independently. We do NOT endorse any requests for money payments and strictly advice against sharing personal or bank related information. We also recommend you visit Security Advice for more information. If you suspect any fraud or malpractice, email us at abuse@naukrigulf.com
Vision Unlimited