OCBC

Site Reliability (Infrastructure) Engineer (AVP/VP)

OCBC
BusinessOCBC SingaporeFull-time1 months ago

About the role

AI summarised

This is a senior Site Reliability Engineer role at a global bank, responsible for ensuring the reliability, scalability, and performance of critical infrastructure and services. The role involves designing and implementing automation, monitoring, and incident response processes, as well as leading a team of SREs.

BusinessFull-timeGeneral

Key Responsibilities

  • Design, build, and maintain scalable and reliable infrastructure to support critical banking applications.
  • Implement and manage monitoring, alerting, and logging solutions to ensure system health and performance.
  • Automate operational tasks and workflows using scripting and infrastructure-as-code tools.
  • Lead incident response and root cause analysis for production outages, ensuring timely resolution.
  • Collaborate with development teams to improve system architecture and deployment processes.
  • Conduct capacity planning and performance tuning to meet growth demands.
  • Develop and maintain disaster recovery plans and conduct regular drills.
  • Mentor and guide junior SRE team members, fostering a culture of reliability and continuous improvement.
  • Participate in on-call rotations to provide 24/7 support for critical systems.
  • Evaluate and recommend new tools and technologies to enhance infrastructure reliability.

Requirements

  • Bachelor's degree in Computer Science, Information Technology, Engineering, or a related field.
  • 5-10 years of experience in site reliability engineering, DevOps, or infrastructure engineering.
  • Strong proficiency in Kubernetes and Docker for container orchestration.
  • Hands-on experience with cloud platforms such as AWS, Azure, or GCP.
  • Expertise in infrastructure-as-code tools like Terraform or Ansible.
  • Experience with monitoring and observability tools such as Prometheus, Grafana, and ELK stack.
  • Proficiency in scripting languages like Python, Go, or Bash.
  • Solid understanding of Linux systems administration and networking.
  • Experience with CI/CD pipelines using Jenkins or GitLab CI.
  • Excellent problem-solving skills and ability to work under pressure during incidents.
  • Strong communication and leadership skills, with experience mentoring team members.
  • Preferred: AWS Certified Solutions Architect or Certified Kubernetes Administrator.
  • Experience in the banking or financial services industry is a plus.