Senior Site Reliability Engineer (SRE) Salla

Employer Active

Posted 15 hrs ago

Experience

5 - 7 Years

Education

Any Graduation

Nationality

Any Nationality

Gender

Not Mentioned

Vacancy

1 Vacancy

Job Description

Roles & Responsibilities

Reliability & Incident Management

  • Lead high-severity incident response and drive post-incident reviews.
  • Troubleshoot complex issues across applications, infrastructure, and networks.
  • Improve MTTR through better monitoring, alerts, and diagnostic tooling.
  • Participate in the on-call rotation supporting production systems.

Performance & Scalability br>

  • Identify and resolve performance bottlenecks and scaling challenges.
  • Conduct load testing and capacity planning for high-traffic scenarios.

Infrastructure & Operations br>

  • Enhance cloud-native infrastructure, deployment processes, and automation.
  • Improve resilience, fault-tolerance, and recovery mechanisms across systems.

Observability br>

  • Build and refine dashboards, alerts, metrics, logs, and traces.
  • Define SLIs/SLOs and improve visibility into system behavior.

Tooling & Automation br>

  • Develop tools that reduce operational toil and increase reliability.
  • Contribute to infrastructure-as-code, CI/CD pipelines, and GitOps workflows.

Collaboration br>

  • Work closely with engineering teams to ensure services are robust and production-ready.
  • Mentor engineers on reliability, debugging, and operational best practices.

Required Skills br>

  • Strong experience with Kubernetes, service mesh technologies, and cloud platforms (AWS/GCP/Azure).
  • Deep understanding of Linux, networking, distributed systems, and load balancers.
  • Hands-on with Terraform or similar IaC tools.
  • Experience with Prometheus, Grafana, Loki, Mimir, Elastic, or similar observability tools.
  • Proficiency in scripting/programming (Bash, Python, Go).
  • Experience with CI/CD and GitOps.
  • Strong debugging, incident response, and performance analysis skills.

Bonus Skills br>

  • Background in large-scale, high-traffic systems.
  • Experience with fault-tolerant design, DR, and HA patterns.
  • Familiarity with SLOs, SLIs, and error budgets.

Desired Candidate Profile

Required Skills br

  • Strong experience with Kubernetes, service mesh technologies, and cloud platforms (AWS/GCP/Azure).
  • Deep understanding of Linux, networking, distributed systems, and load balancers.
  • Hands-on with Terraform or similar IaC tools.
  • Experience with Prometheus, Grafana, Loki, Mimir, Elastic, or similar observability tools.
  • Proficiency in scripting/programming (Bash, Python, Go).
  • Experience with CI/CD and GitOps.
  • Strong debugging, incident response, and performance analysis skills.

Bonus Skills br

  • Background in large-scale, high-traffic systems.
  • Experience with fault-tolerant design, DR, and HA patterns.
  • Familiarity with SLOs, SLIs, and error budgets.

Company Industry

Department / Functional Area

Keywords

  • Senior Site Reliability Engineer (SRE)

Disclaimer: Naukrigulf.com is only a platform to bring jobseekers & employers together. Applicants are advised to research the bonafides of the prospective employer independently. We do NOT endorse any requests for money payments and strictly advice against sharing personal or bank related information. We also recommend you visit Security Advice for more information. If you suspect any fraud or malpractice, email us at abuse@naukrigulf.com

Similar Jobs

Site Reliability Engineer

View All