Apple

Sr. Site Reliability Engineer - ASE / iCloud Edge

Apple
TechnologySingaporeOnsitePosted 3 months ago

About the role

AI summarised

The Sr. Site Reliability Engineer will design, build, and maintain scalable, secure, and highly available infrastructure for Apple's iCloud Edge services, supporting platforms like iCloud, iMessage, and FaceTime. The role involves working across the full stack—from Linux systems and networking to cloud environments—using automation tools and collaborating with development teams to ensure service reliability. Candidates must combine strong software development skills with deep systems, network, and cloud expertise.

TechnologyOnsiteSoftware and Services

Key Responsibilities

  • Build and run services used by hundreds of millions of customers daily
  • Support foundational platforms for iCloud, iMessage, FaceTime, and other Apple services
  • Own the full infrastructure stack from device driver performance to CDN traffic management
  • Work with Linux and cloud-based systems using open source and internal tools
  • Manage system and configuration management, provisioning, software deployment, and monitoring
  • Learn and improve internal tools for infrastructure operations
  • Collaborate with development teams to deliver optimal service results
  • Balance technical excellence with timely delivery in engineering challenges
  • Contribute to a culture where good ideas are heard and results are rewarded

Requirements

  • BS in Computer Science or related field, or equivalent experience
  • Experience in network engineering or related role, building solutions for network provisioning, configuration, and management
  • Experience building and scaling distributed systems in public, private, or hybrid cloud environments
  • Experience deploying, supporting, and monitoring new and existing services, platforms, and application stacks
  • Strong understanding of networking protocols: HTTP, DNS, ECMP, TCP/IP, ICMP, OSI Model, Subnetting, Load Balancing
  • Proven ability to write programs using Python, Go, or Java
  • Experience handling large numbers of diverse systems with configuration management tools: Puppet, Chef, Ansible, or Salt
  • Understanding of Linux OS: Kernel, Memory, Process, Threads, Libraries, IPC, Signals
  • Excellent troubleshooting and problem-solving skills
  • Experience with scale testing, disaster recovery, and capacity planning (preferred)
  • Proclivity toward efficient programming with complexity analysis (preferred)
  • Experience implementing and maintaining network security policies and procedures (preferred)