UOB

Assistant VP, Cloud Infrastructure Engineer (AWS / Azure / OpenShift)

UOB
BusinessCentral Region (City Area)Full-time1 months ago

About the role

AI summarised

Assistant VP, Cloud Infrastructure Engineer at a leading bank in Asia, responsible for managing multi-cloud environments (AWS, Azure, OpenShift) with a focus on high availability, performance, resilience, and security. The role involves incident management, disaster recovery, and on-call rotation.

BusinessFull-timeGeneral

Key Responsibilities

  • Manage and support AWS production environments, ensuring high availability, reliability, and strict compliance.
  • Perform incident management, RCA, and problem resolution for production issues.
  • Monitor AWS resources using tools such as CloudWatch, CloudTrail, and third‑party monitoring platforms.
  • Operate and support AWS core services: EC2, EBS, ELB/ALB, Auto Scaling VPC, Subnets, Security Groups, NACLs, IAM S3, RDS (basic operational support).
  • Implement backup, restore, and disaster recovery (DR) strategies.
  • Provide basic support for Azure components: Azure VMs, VNets, NSGs, Storage, Load Balancers.
  • Monitor Azure services using Azure Monitor and Alerts.
  • Support hybrid cloud or multi‑cloud integration and connectivity scenarios.
  • Manage and support OpenShift clusters in production environments.
  • Perform cluster health checks, upgrades, lifecycle management, and troubleshooting.
  • Support containerized workloads: pods, deployments, services, ingresses/routes.
  • Collaborate with application teams to troubleshoot deployment and runtime issues.

Requirements

  • Hands‑on experience in managing multi‑cloud environments, specifically AWS, Azure, and OpenShift infrastructure.
  • Strong expertise in administration and troubleshooting across distributed platforms (Unix/Linux/Windows).
  • Solid understanding of high availability, performance tuning, resilience engineering, and security implementation.
  • Experience in service recovery, including disaster recovery (DR) testing, environment hardening, and root cause analysis (RCA) of production incidents.
  • Willingness to participate in a 24×7 on‑call rotation.
  • Solid understanding of container technologies (Docker/OCI) and Kubernetes fundamentals.