Key Responsibilities Lead end-to-end infrastructure automation and platform reliability initiatives in high-performance environments. Design, implement, and manage CI/CD pipelines , IaC frameworks , and container orchestration systems . Apply SRE principles to ensure platform resilience, high availability , and continuous improvement. Develop and implement strategies for monitoring, observability , and incident response using open-source tools. Mentor and guide a small engineering team , fostering collaboration and technical excellence. Ensure systems adhere to security, compliance, scalability , and cost optimization best practices. Collaborate across product, development, and DevOps teams to define architectural standards and promote automation-first practices . Required Skills Proven hands-on expertise with IaC , cloud platforms , CI/CD pipelines , containerization , orchestration , and SRE principles . Strong experience with IaC tools such as Ansible, Terraform, CloudFormation, or Pulumi . Deep understanding of resource management frameworks like Kubernetes, Apache Mesos, or Yarn . Proficient in Linux administration , with experience in monitoring, logging, and observability using Prometheus, Grafana, and ELK . Programming proficiency in Python, Java, or Golang , with strong architectural and system design skills focused on scalability and resilience. Practical knowledge of multi-cloud and hybrid-cloud architectures . Preferred Skills Experience in network and infrastructure operations engineering . Understanding of network protocols (TCP/IP, UDP, HTTP/HTTPS, DNS, BGP, OSPF, VXLAN, IPSec, etc.). Familiarity with network security and automation , including zero-trust frameworks , TLS/SSL , and modern automation protocols such as gNMI/gRPC and RESTCONF . Experience with Agile methodologies (Scrum/Kanban) and SRE performance metrics (MTTR, SLO, SLI, deployment frequency). Strong Python scripting expertise for network automation (API integrations, structured data, parsing, error handling, packaging). Proven hands-on experience with Terraform and Ansible in production environments. Practical experience with NETCONF and YANG for model-driven network automation. Strong expertise in Jinja templating for configuration generation and standardization.
Soft Skills Strong leadership and mentoring abilities. Excellent problem-solving and analytical thinking. Effective communicator across technical and non-technical teams. Ability to thrive in fast-paced, evolving technology environments. Collaborative and automation-driven mindset. Qualifications Bachelor’s or Master’s degree in Computer Science, Information Technology, or related field . 5–10 years of experience in Infrastructure, DevOps, or SRE roles within product-focused organizations.Hands-on experience in cloud-native platforms (AWS, Azure, GCP). Preferred Certifications DevOps or SRE Certification .Kubernetes Certification (CKA / CKAD).Network or Security Certifications (CCNA, CompTIA, or equivalent).
Apply for this job
Share with someone awesome