About the role
AI summarisedMeta is seeking a senior Data Center Production Operations Engineer to join their team, responsible for ensuring the efficient operation of their rapidly scaling data center infrastructure. The role involves hands-on technical work with server hardware and Linux, troubleshooting complex issues, and leading technical projects in a fast-paced environment.
BusinessFull-timeGeneral
Key Responsibilities
- Work in a fast paced, technical environment where adaptability and flexibility will be key to success
- Perform hands-on technical skills in server hardware and Linux in a Data Center environment
- Perform server administration and complex projects in a large-scale distributed data center environment
- Triage, debug, and troubleshoot complex, systemic issues in Linux
- Manage multiple technical issues concurrently driving to the root cause
- Participate in or lead technical projects related to process improvement, technology, and/or automation
- Bring peers, partners and other resources into the project where additional expertise is needed
- Provide technical guidance to external vendors
- Use data and metrics to drive decisions
- Foster growth in others and drive influence across all organizational levels
Requirements
- BS, BA or BEng in technical field or commensurate experience
- 7+ years of technical IT experience within an infrastructure environment, in a role such as Systems Administrator, DevOps Engineer, or Site Reliability Engineer
- Expert in Linux (or equivalent OS) in a complex IT environment with the ability to triage, debug, and troubleshoot complex, systemic issues
- Hands-on experience and knowledge of server hardware and components, including storage
- Experience of the interdependencies of data center functions and technologies including electrical, cooling, structured cabling, security, and network
- Experience managing multiple technical issues concurrently driving to the root cause
- Experience participating in or leading technical projects related to areas such as process improvement, technology, and/or automation
- Ability to communicate effectively, in a clear and concise manner, appropriately tailoring messages to the audience
- Extensive technical knowledge of technologies such as HTTP, DNS, RAID, and DHCP
- Experience in providing technical guidance to external vendors
- Experience in debugging, modifying and developing commonly used scripting or programming languages in at least one of these languages: Bash, PHP, Python, SQL, Rust, Go or Perl
- Knowledge of out-of-band/lights-out server communication methods, such as IPMI and serial console
- Experience using data and metrics to drive decisions
- Proven experience in fostering growth in others, and driving influence across all organizational levels
- Experience in a large-scale data center environment
- Experience with large-scale AI implementations
- Six Sigma knowledge/certification
- Demonstrated ability to integrate AI tools to optimize/redesign workflows and drive measurable impact (e.g., efficiency gains, quality improvements)