Site Reliability Engineer (SRE)
Prime Gate
Employer Active
Posted on 30 Mar
Send me Jobs like this
Nationality
Any Nationality
Gender
Not Mentioned
Vacancy
1 Vacancy
Job Description
Roles & Responsibilities
Responsibilities:
Operate and maintain production systems with a focus on reliability, availability, and performance.
Work with Docker and Kubernetes to deploy, update, and troubleshoot services.
Configure and optimize Kubernetes resources (pods, deployments, services, ingress, config maps, secrets, etc.).
Implement and maintain monitoring, logging, and alerting for applications and infrastructure.
Build and improve CI/CD pipelines in collaboration with development and DevOps teams.
Create and maintain dashboards for key service metrics (latency, error rate, throughput, resource usage).
Participate in incident response: investigate issues, identify root cause, and propose fixes and improvements.
Work closely with backend developers to improve service reliability, resilience, and observability.
Contribute to capacity planning and performance tuning of services and infrastructure.
Automate repetitive operational tasks using scripts or small tools.
Document runbooks, procedures, and best practices for operating services in production.
Desired Candidate Profile
Must-Have Qualifications:
3 5 years of professional experience in an SRE, DevOps, or infrastructure-focused engineering role.
Strong understanding of Linux systems (shell, processes, networking, permissions, logs).
Hands-on experience with Docker and Kubernetes in real environments.
Practical experience with:
o Kubernetes deployments, services, ingress, config maps, and secrets o Basic troubleshooting inside a cluster (pods failing, crashes, restarts, resource issues)
Experience with monitoring and logging tools (e.g., Prometheus, Grafana, ELK/EFK, Application Insights, or similar).
Experience with CI/CD pipelines (Azure DevOps, GitHub Actions, GitLab CI, Jenkins, or similar).
Ability to read and modify pipeline definitions and understand build test deploy flows.
Basic programming/scripting skills in at least one language (e.g., Python, Bash, PowerShell, Go, etc.).
Understanding of core reliability concepts such as SLIs, SLOs, uptime, latency, and availability.
Experience troubleshooting production issues using logs, metrics, and dashboards.
Good communication skills and ability to collaborate with developers, QA, and product teams.
Nice-to-Have:
Experience with at least one major cloud platform (Azure, AWS, Alibaba Cloud, or GCP).
Experience with infrastructure as code (Terraform, Bicep, Pulumi, Helm, etc.).
Experience with ingress controllers, API gateways, or service mesh.
Familiarity with security best practices (secrets management, TLS/certificates, RBAC on Kubernetes or cloud).
Experience participating in on-call rotations and using incident management tools (PagerDuty, Opsgenie, etc.).
Experience contributing to post-incident reviews and implementing follow-up improvements.
Company Industry
- IT - Software Services
Department / Functional Area
- IT Software
Keywords
- Site Reliability Engineer (SRE)
Disclaimer: Naukrigulf.com is only a platform to bring jobseekers & employers together. Applicants are advised to research the bonafides of the prospective employer independently. We do NOT endorse any requests for money payments and strictly advice against sharing personal or bank related information. We also recommend you visit Security Advice for more information. If you suspect any fraud or malpractice, email us at abuse@naukrigulf.com
Similar Jobs
Senior Devops Engineer
Phars Films
- 4 - 9 Years
- Dubai , Abu Dhabi , Sharjah - United Arab Emirates (UAE)
Devops Engineer
Prime Pathway
- 4 - 6 Years
- Dubai - United Arab Emirates (UAE)
Devops Engineer
Confidential Company
- 2 - 7 Years
- Dubai - United Arab Emirates (UAE)
DevOps Engineer
RNS Technologies, Ltd
- 3 - 6 Years
- Bhubaneshwar , Ahmedabad , Hyderabad - India