Meta

Data Center Production Operations Engineer

Meta
BusinessSingapore, Singapore, ['SGP']; SingaporeFull-time4 months ago

About the role

AI summarised

Meta is seeking a senior Data Center Production Operations Engineer to join their team, responsible for ensuring the efficient operation of their rapidly scaling data center infrastructure. The role involves hands-on technical work with server hardware and Linux, troubleshooting complex issues, and leading technical projects in a fast-paced environment.

BusinessFull-timeGeneral

Key Responsibilities

  • Work in a fast paced, technical environment where adaptability and flexibility will be key to success
  • Perform hands-on technical skills in server hardware and Linux in a Data Center environment
  • Perform server administration and complex projects in a large-scale distributed data center environment
  • Triage, debug, and troubleshoot complex, systemic issues in Linux
  • Manage multiple technical issues concurrently driving to the root cause
  • Participate in or lead technical projects related to process improvement, technology, and/or automation
  • Bring peers, partners and other resources into the project where additional expertise is needed
  • Provide technical guidance to external vendors
  • Use data and metrics to drive decisions
  • Foster growth in others and drive influence across all organizational levels

Requirements

  • BS, BA or BEng in technical field or commensurate experience
  • 7+ years of technical IT experience within an infrastructure environment, in a role such as Systems Administrator, DevOps Engineer, or Site Reliability Engineer
  • Expert in Linux (or equivalent OS) in a complex IT environment with the ability to triage, debug, and troubleshoot complex, systemic issues
  • Hands-on experience and knowledge of server hardware and components, including storage
  • Experience of the interdependencies of data center functions and technologies including electrical, cooling, structured cabling, security, and network
  • Experience managing multiple technical issues concurrently driving to the root cause
  • Experience participating in or leading technical projects related to areas such as process improvement, technology, and/or automation
  • Ability to communicate effectively, in a clear and concise manner, appropriately tailoring messages to the audience
  • Extensive technical knowledge of technologies such as HTTP, DNS, RAID, and DHCP
  • Experience in providing technical guidance to external vendors
  • Experience in debugging, modifying and developing commonly used scripting or programming languages in at least one of these languages: Bash, PHP, Python, SQL, Rust, Go or Perl
  • Knowledge of out-of-band/lights-out server communication methods, such as IPMI and serial console
  • Experience using data and metrics to drive decisions
  • Proven experience in fostering growth in others, and driving influence across all organizational levels
  • Experience in a large-scale data center environment
  • Experience with large-scale AI implementations
  • Six Sigma knowledge/certification
  • Demonstrated ability to integrate AI tools to optimize/redesign workflows and drive measurable impact (e.g., efficiency gains, quality improvements)