About the Role

We are looking for a proactive and skilled Site Reliability Engineer (SRE) to join our infrastructure and platform engineering team. In this role, you will focus on building resilient systems, automating infrastructure, and improving the reliability, scalability, and performance of our production environments.

You will partner closely with software engineers, DevOps, and product teams to create a culture of operational excellence and rapid, safe delivery.

Key Responsibilities

Design and implement reliable, scalable, and secure infrastructure
Build and maintain monitoring, alerting, and incident response systems
Develop and enforce SLAs, SLOs, and error budgets
Automate infrastructure provisioning using Infrastructure as Code (Terraform, Pulumi, or CloudFormation)
Create and manage CI/CD pipelines to support fast, stable releases
Lead incident response and postmortems, and drive root cause analysis
Monitor system performance and proactively address reliability risks
Optimize cloud cost, resource usage, and deployment strategies
Collaborate with engineering teams to embed reliability and observability into development cycles

Required Skills and Experience

3–7+ years of experience in SRE, DevOps, or systems engineering
Strong experience with cloud platforms (AWS, GCP, or Azure)
Proficiency with Linux systems administration and networking
Hands-on experience with Docker, Kubernetes, and container orchestration
Experience with observability tools like Prometheus, Grafana, ELK, Datadog, or New Relic
Proficient in scripting or programming (e.g., Python, Go, Bash)
Strong knowledge of CI/CD pipelines, GitOps workflows, and deployment automation
Understanding of resilience engineering, distributed systems, and fault tolerance

Nice to Have

Experience implementing service mesh (Istio, Linkerd) and zero-trust security models
Familiarity with incident management platforms (PagerDuty, Opsgenie)
Exposure to chaos engineering, game days, or reliability testing
Certifications (AWS Certified DevOps Engineer, CKA, etc.)

Soft Skills

Strong ownership mindset and ability to work independently
Excellent communication and documentation skills
Passion for automation, reliability, and building scalable systems
A calm, structured approach to incidents and on-call responsibilities

Why Join Us?

Work in a culture that values reliability, automation, and continuous improvement
Influence platform architecture and engineering best practices
Flexible work options, competitive salary, and benefits
Opportunity to lead and mentor in an evolving cloud-native environment

Site Reliability Engineer (SRE)

About the Role

Key Responsibilities

Required Skills and Experience

Nice to Have

Soft Skills

Why Join Us?

Apply for this position

Quick Links

Contact us

Social Media

Apply for this job