About the role
AI summarisedThis is a senior-level Data Center Production Operations Engineer role at a technology company. The position requires hands-on technical skills in server hardware and Linux, with experience managing large-scale data center environments and driving automation and process improvements.
BusinessFull-timeGeneral
Key Responsibilities
- Triage, debug, and troubleshoot server issues in a complex IT environment
- Manage technical issues and drive to root cause
- Participate in technical projects related to process improvement, technology, and/or automation
- Provide technical guidance to external vendors
- Use data and metrics to drive decisions
- Foster growth in others and drive influence across all organizational levels
- Integrate AI tools to optimize/redesign workflows and drive measurable impact
- Adhere to and implement responsible, ethical AI practices
- Demonstrate ongoing AI skill development and stay current with emerging AI technologies
Requirements
- BS, BA or BEng in technical field or commensurate experience
- 5+ years of technical IT experience within an infrastructure environment, in a role such as Systems Administrator, DevOps Engineer, or Site Reliability Engineer
- Intermediate-level understanding in Linux (or equivalent OS) in a complex IT environment with the ability to triage, debug, and troubleshoot server issues
- Hands-on experience and knowledge of server hardware and components, including storage
- Intermediate-level knowledge of the interdependencies of data center functions and technologies including electrical, cooling, structured cabling, security, and network
- Experience managing technical issues and driving to the root cause
- Experience participating in technical projects related to areas such as process improvement, technology, and/or automation
- Ability to communicate effectively, in a clear and concise manner, appropriately tailoring messages to the audience
- Intermediate-level knowledge of technologies such as HTTP, DNS, RAID, and DHCP
- Experience in providing technical guidance to external vendors
- Experience in debugging, modifying and developing commonly used scripting or programming languages in at least one of these languages: Bash, PHP, Python, SQL, Rust, Go or Perl
- Knowledge of out-of-band/lights-out server communication methods, such as IPMI and serial console
- Experience using data and metrics to drive decisions
- Experience in fostering growth in others, and driving influence across all organizational levels
- Experience in a large-scale data center environment
- Experience with large-scale AI implementations
- Six Sigma knowledge/certification
- Demonstrated ability to integrate AI tools to optimize/redesign workflows and drive measurable impact (e.g., efficiency gains, quality improvements)