A*STAR

HPC Storage Engineer (System), NSCC

A*STAR
ResearchSingaporeFull-time1 months ago

About the role

AI summarised

The HPC Storage Engineer at NSCC is responsible for administering and optimizing high-performance computing storage infrastructure, ensuring high availability and reliability, and supporting capacity planning and performance analysis. The role requires strong Linux skills, experience with parallel file systems like Lustre or GPFS, and scripting proficiency in Bash or Python.

ResearchFull-timeNational Supercomputing Centre

Key Responsibilities

  • Storage administration and optimization
  • Collaborate with Managed Services teams to administer and support HPC storage infrastructure.
  • Ensure high availability and reliability of storage systems.
  • Provide technical support and resolve storage-related issues.
  • Implement best practices for monitoring, alerting, and reporting.
  • Track utilization and allocation trends to support capacity planning.
  • Conduct performance testing and analysis of storage systems.
  • Work with cross-functional teams to enhance performance and scalability.
  • Maintain comprehensive documentation of infrastructure and operational processes.
  • Data management - Optimize data placement strategies for performance and efficiency.
  • Designing and planning - Support future storage expansion and HPC system design.

Requirements

  • Degree in a Computer Science, Engineering, IT or other relevant areas.
  • Strong Linux skills and comfort with command-line interface.
  • Solid understanding of Linux file systems, including local (e.g., ext4, XFS), shared (e.g., NFS), and parallel (e.g., Lustre, GPFS, BeeGFS) file systems.
  • At least 2 years of experience managing parallel file systems such as Lustre, GPFS, BeeGFS, or similar technologies.
  • Familiar with RDMA-based interconnect such as InfiniBand or RoCE.
  • Proficient in scripting with Bash and/or Python.
  • Strong problem-solving skills and ability to troubleshoot complex issues.