Senior Site Reliability Engineer

Sana Commerce

Employer Active

Posted 7 hrs ago

Experience

8 - 13 Years

Job Location

Alexandria - Egypt

Education

Bachelor of Technology/Engineering

Nationality

Any Nationality

Gender

Not Mentioned

Vacancy

1 Vacancy

Job Description

Roles & Responsibilities

This SRE position focuses on engineering reliability in everything we do: automating repetitive tasks, improving monitoring signals, running deep root cause analysis, and shaping systems for scalability. You ll be the engineer others look to during critical incidents, and the one raising the bar on how we prevent them in the first place.

What you'll get:

  • The opportunity to make an impact at a fast-growing SaaS scale-up;
  • A global and customized onboarding program (9,1/10 rated by previous hires);
  • A hybrid working model 3 days from the office, 2 days from home.

What you bring

  • 8+ years of experience in SRE, DevOps, or Cloud Infrastructure, with demonstrated ownership of large-scale systems.
  • Strong hands-on knowledge of Microsoft Azure services and practical experience operating Azure Kubernetes clusters in production.
  • Expertise in Dynatrace, Honeycomb, ElasticSearch, Kibana/Grafana, Azure Monitor (KQL). Able to design actionable monitoring that leads to prevention, not just detection.
  • Proficient in at least one programming/scripting language (PowerShell, Bash, Python, or C#). Strong debugging and logging practices.
  • Hands-on experience with Infrastructure-as-Code (Terraform, Bicep, or ARM) to automate and manage cloud infrastructure.
  • Solid understanding of TCP/IP protocols and troubleshooting network issues in distributed systems.
  • Ability to go beyond surface fixes, identify patterns, and engineer permanent improvements.
  • Strong communicator who can work with cross-functional teams and explain complex issues simply.
  • Microsoft Certified: Azure Administrator Associate
  • CKA: Certified Kubernetes Administrator

Desired Candidate Profile

What you'll be doing

  • Lead incident response and root cause analysis by driving deep investigations, educating the team, and delivering actionable post-incident insights that prevent recurrence.
  • Manage Kubernetes and Azure environments by owning cluster configurations, platform usage, and ensuring availability, cost efficiency, and security best practices.
  • Develop observability and monitoring strategies with Dynatrace, Honeycomb, ElasticSearch, Kibana/Grafana, and Azure Monitor to measure performance, user impact, and continuously refine alerts and dashboards.
  • Implement and maintain edge and CDN integrations (Fastly WAF, bot management, CDN) to enhance performance, security, and reliability of customer-facing services.
  • Write and debug automation scripts in PowerShell, Bash, Python, or C#, ensuring logging, rollback, and versioning practices make the platform more resilient and self-healing.
  • Drive Infrastructure-as-Code adoption with Terraform, Bicep, and ARM to standardize environments, automate deployments, and reduce manual interventions.
  • Optimize system and application performance through deep monitoring, dump analysis, and right-sizing of resources to eliminate bottlenecks and maximize efficiency.
  • Collaborate across teams to break down complex problems, contribute to CI/CD and SDLC improvements, and embed reliability into development and release pipelines.
  • Participate in the on-call rotation by taking ownership of incidents, coordinating responses, and ensuring sustainable fixes rather than temporary workarounds.

Company Industry

Department / Functional Area

Keywords

  • Senior Site Reliability Engineer

Disclaimer: Naukrigulf.com is only a platform to bring jobseekers & employers together. Applicants are advised to research the bonafides of the prospective employer independently. We do NOT endorse any requests for money payments and strictly advice against sharing personal or bank related information. We also recommend you visit Security Advice for more information. If you suspect any fraud or malpractice, email us at abuse@naukrigulf.com

Sana Commerce

At Sana Commerce, we re committed to creating an inclusive environment because we know our diverse workforce is one of our greatest strengths.

What started in 2007 with a pizza and a plan has grown into a fast-moving SaaS company that helps manufacturers, distributors, and wholesalers thrive in B2B commerce complexity.

Our mission? To transform the way businesses buy and sell, so they can grow, build stronger relationships, and make the most of digital commerce. Join us and take ownership of your career in a dynamic, fast-moving environment.

At Sana Commerce, we're looking for a Senior Site Reliability Engineer to strengthen our reliability, observability, and automation capabilities across our Azure and Kubernetes-based platforms.

Read More

https://jobs.smartrecruiters.com/SanaCommerce/744000083871225-senior-site-reliability-engineer