Senior Site Reliability Engineer
Sana Commerce
Employer Active
Posted 7 hrs ago
Send me Jobs like this
Nationality
Any Nationality
Gender
Not Mentioned
Vacancy
1 Vacancy
Job Description
Roles & Responsibilities
This SRE position focuses on engineering reliability in everything we do: automating repetitive tasks, improving monitoring signals, running deep root cause analysis, and shaping systems for scalability. You ll be the engineer others look to during critical incidents, and the one raising the bar on how we prevent them in the first place.
What you'll get:
- The opportunity to make an impact at a fast-growing SaaS scale-up;
- A global and customized onboarding program (9,1/10 rated by previous hires);
- A hybrid working model 3 days from the office, 2 days from home.
What you bring
- 8+ years of experience in SRE, DevOps, or Cloud Infrastructure, with demonstrated ownership of large-scale systems.
- Strong hands-on knowledge of Microsoft Azure services and practical experience operating Azure Kubernetes clusters in production.
- Expertise in Dynatrace, Honeycomb, ElasticSearch, Kibana/Grafana, Azure Monitor (KQL). Able to design actionable monitoring that leads to prevention, not just detection.
- Proficient in at least one programming/scripting language (PowerShell, Bash, Python, or C#). Strong debugging and logging practices.
- Hands-on experience with Infrastructure-as-Code (Terraform, Bicep, or ARM) to automate and manage cloud infrastructure.
- Solid understanding of TCP/IP protocols and troubleshooting network issues in distributed systems.
- Ability to go beyond surface fixes, identify patterns, and engineer permanent improvements.
- Strong communicator who can work with cross-functional teams and explain complex issues simply.
- Microsoft Certified: Azure Administrator Associate
- CKA: Certified Kubernetes Administrator
Desired Candidate Profile
What you'll be doing
- Lead incident response and root cause analysis by driving deep investigations, educating the team, and delivering actionable post-incident insights that prevent recurrence.
- Manage Kubernetes and Azure environments by owning cluster configurations, platform usage, and ensuring availability, cost efficiency, and security best practices.
- Develop observability and monitoring strategies with Dynatrace, Honeycomb, ElasticSearch, Kibana/Grafana, and Azure Monitor to measure performance, user impact, and continuously refine alerts and dashboards.
- Implement and maintain edge and CDN integrations (Fastly WAF, bot management, CDN) to enhance performance, security, and reliability of customer-facing services.
- Write and debug automation scripts in PowerShell, Bash, Python, or C#, ensuring logging, rollback, and versioning practices make the platform more resilient and self-healing.
- Drive Infrastructure-as-Code adoption with Terraform, Bicep, and ARM to standardize environments, automate deployments, and reduce manual interventions.
- Optimize system and application performance through deep monitoring, dump analysis, and right-sizing of resources to eliminate bottlenecks and maximize efficiency.
- Collaborate across teams to break down complex problems, contribute to CI/CD and SDLC improvements, and embed reliability into development and release pipelines.
- Participate in the on-call rotation by taking ownership of incidents, coordinating responses, and ensuring sustainable fixes rather than temporary workarounds.
Company Industry
- IT - Software Services
Department / Functional Area
- IT Software
Keywords
- Senior Site Reliability Engineer
Disclaimer: Naukrigulf.com is only a platform to bring jobseekers & employers together. Applicants are advised to research the bonafides of the prospective employer independently. We do NOT endorse any requests for money payments and strictly advice against sharing personal or bank related information. We also recommend you visit Security Advice for more information. If you suspect any fraud or malpractice, email us at abuse@naukrigulf.com
Sana Commerce
At Sana Commerce, we re committed to creating an inclusive environment because we know our diverse workforce is one of our greatest strengths. What started in 2007 with a pizza and a plan has grown into a fast-moving SaaS company that helps manufacturers, distributors, and wholesalers thrive in B2B commerce complexity. Our mission? To transform the way businesses buy and sell, so they can grow, build stronger relationships, and make the most of digital commerce. Join us and take ownership of your career in a dynamic, fast-moving environment. At Sana Commerce, we're looking for a Senior Site Reliability Engineer to strengthen our reliability, observability, and automation capabilities across our Azure and Kubernetes-based platforms.
https://jobs.smartrecruiters.com/SanaCommerce/744000083871225-senior-site-reliability-engineer