About the Role
We are looking for a proactive and skilled Site Reliability Engineer (SRE) to join our infrastructure and platform engineering team. In this role, you will focus on building resilient systems, automating infrastructure, and improving the reliability, scalability, and performance of our production environments.
You will partner closely with software engineers, DevOps, and product teams to create a culture of operational excellence and rapid, safe delivery.
Key Responsibilities
- Design and implement reliable, scalable, and secure infrastructure
- Build and maintain monitoring, alerting, and incident response systems
- Develop and enforce SLAs, SLOs, and error budgets
- Automate infrastructure provisioning using Infrastructure as Code (Terraform, Pulumi, or CloudFormation)
- Create and manage CI/CD pipelines to support fast, stable releases
- Lead incident response and postmortems, and drive root cause analysis
- Monitor system performance and proactively address reliability risks
- Optimize cloud cost, resource usage, and deployment strategies
- Collaborate with engineering teams to embed reliability and observability into development cycles
Required Skills and Experience
- 3–7+ years of experience in SRE, DevOps, or systems engineering
- Strong experience with cloud platforms (AWS, GCP, or Azure)
- Proficiency with Linux systems administration and networking
- Hands-on experience with Docker, Kubernetes, and container orchestration
- Experience with observability tools like Prometheus, Grafana, ELK, Datadog, or New Relic
- Proficient in scripting or programming (e.g., Python, Go, Bash)
- Strong knowledge of CI/CD pipelines, GitOps workflows, and deployment automation
- Understanding of resilience engineering, distributed systems, and fault tolerance
Nice to Have
- Experience implementing service mesh (Istio, Linkerd) and zero-trust security models
- Familiarity with incident management platforms (PagerDuty, Opsgenie)
- Exposure to chaos engineering, game days, or reliability testing
- Certifications (AWS Certified DevOps Engineer, CKA, etc.)
Soft Skills
- Strong ownership mindset and ability to work independently
- Excellent communication and documentation skills
- Passion for automation, reliability, and building scalable systems
- A calm, structured approach to incidents and on-call responsibilities
Why Join Us?
- Work in a culture that values reliability, automation, and continuous improvement
- Influence platform architecture and engineering best practices
- Flexible work options, competitive salary, and benefits
- Opportunity to lead and mentor in an evolving cloud-native environment

