A*STAR

HPC Storage Engineer (System), NSCC

A*STAR
ResearchSingaporeOnsitePosted 4 weeks ago

About the role

AI summarised

The HPC Storage Engineer will be responsible for managing the storage infrastructure within High-Performance Computing (HPC) environments, focusing on monitoring performance, optimization, and troubleshooting to ensure high availability and reliability.

ResearchOnsiteNational Supercomputing Centre

Key Responsibilities

  • Administer and support HPC storage infrastructure in collaboration with Managed Services teams.
  • Ensure high availability and reliability of all storage systems.
  • Provide technical support and resolve complex, storage-related issues.
  • Implement best practices for monitoring, alerting, and performance reporting.
  • Track utilization and allocation trends to support capacity planning initiatives.
  • Conduct performance testing and in-depth analysis of storage systems.
  • Enhance system performance and scalability through collaboration with cross-functional teams.
  • Maintain comprehensive documentation regarding infrastructure setup and operational processes.
  • Optimize data placement strategies for maximum performance and efficiency.
  • Support future storage expansion planning and HPC system design.

Requirements

  • Degree in Computer Science, Engineering, IT, or a related field.
  • At least 2 years of experience managing parallel file systems (Lustre, GPFS, BeeGFS, or similar).
  • Strong proficiency in Linux and comfort with the command-line interface.
  • Solid understanding of various Linux file systems (local: ext4, XFS; shared: NFS; parallel: Lustre, GPFS, BeeGFS).
  • Proficiency in scripting using Bash and/or Python.
  • Familiarity with RDMA-based interconnects such as InfiniBand or RoCE.
  • Strong problem-solving abilities for troubleshooting complex storage issues.