Meta

Data Center Production Operations Engineer

Meta
BusinessSingapore, Singapore, ['SGP']; SingaporeFull-time4 months ago

About the role

AI summarised

This is a senior-level Data Center Production Operations Engineer role at a technology company. The position requires hands-on technical skills in server hardware and Linux, with experience managing large-scale data center environments and driving automation and process improvements.

BusinessFull-timeGeneral

Key Responsibilities

  • Triage, debug, and troubleshoot server issues in a complex IT environment
  • Manage technical issues and drive to root cause
  • Participate in technical projects related to process improvement, technology, and/or automation
  • Provide technical guidance to external vendors
  • Use data and metrics to drive decisions
  • Foster growth in others and drive influence across all organizational levels
  • Integrate AI tools to optimize/redesign workflows and drive measurable impact
  • Adhere to and implement responsible, ethical AI practices
  • Demonstrate ongoing AI skill development and stay current with emerging AI technologies

Requirements

  • BS, BA or BEng in technical field or commensurate experience
  • 5+ years of technical IT experience within an infrastructure environment, in a role such as Systems Administrator, DevOps Engineer, or Site Reliability Engineer
  • Intermediate-level understanding in Linux (or equivalent OS) in a complex IT environment with the ability to triage, debug, and troubleshoot server issues
  • Hands-on experience and knowledge of server hardware and components, including storage
  • Intermediate-level knowledge of the interdependencies of data center functions and technologies including electrical, cooling, structured cabling, security, and network
  • Experience managing technical issues and driving to the root cause
  • Experience participating in technical projects related to areas such as process improvement, technology, and/or automation
  • Ability to communicate effectively, in a clear and concise manner, appropriately tailoring messages to the audience
  • Intermediate-level knowledge of technologies such as HTTP, DNS, RAID, and DHCP
  • Experience in providing technical guidance to external vendors
  • Experience in debugging, modifying and developing commonly used scripting or programming languages in at least one of these languages: Bash, PHP, Python, SQL, Rust, Go or Perl
  • Knowledge of out-of-band/lights-out server communication methods, such as IPMI and serial console
  • Experience using data and metrics to drive decisions
  • Experience in fostering growth in others, and driving influence across all organizational levels
  • Experience in a large-scale data center environment
  • Experience with large-scale AI implementations
  • Six Sigma knowledge/certification
  • Demonstrated ability to integrate AI tools to optimize/redesign workflows and drive measurable impact (e.g., efficiency gains, quality improvements)