About the role
AI summarisedThe role is for an AVP, Platform SRE Engineer within the SRE & Governance team of Group Technology at a bank. The engineer will be responsible for ensuring the reliability, scalability, and performance of platform services, and will work on automation, monitoring, and incident management.
BusinessFull-timeGeneral
Key Responsibilities
- Design, build, and maintain scalable and reliable platform infrastructure.
- Implement and manage monitoring, alerting, and logging solutions.
- Automate operational tasks to improve efficiency and reduce manual intervention.
- Participate in incident response and root cause analysis.
- Collaborate with development teams to ensure reliability is built into the software development lifecycle.
- Perform capacity planning and performance tuning of platform services.
- Document system configurations, procedures, and runbooks.
- Contribute to the continuous improvement of SRE practices and tools.
Requirements
- Bachelor's degree in Computer Science, Engineering, or a related field.
- 5+ years of experience in site reliability engineering, DevOps, or platform engineering.
- Strong experience with cloud platforms such as AWS, Azure, or GCP.
- Proficiency in scripting languages like Python, Bash, or Go.
- Experience with containerization and orchestration tools like Docker and Kubernetes.
- Knowledge of infrastructure-as-code tools such as Terraform or Ansible.
- Experience with monitoring and observability tools like Prometheus, Grafana, or ELK stack.
- Strong understanding of networking, security, and system administration.
- Excellent problem-solving and communication skills.
- Ability to work in a fast-paced, collaborative environment.