A*STAR

HPC System Administrator, System, NSCC

A*STAR
ResearchSingaporeOnsitePosted 4 weeks ago

About the role

AI summarised

The HPC System Administrator will manage the day-to-day operations of high-performance computing (HPC) systems, ensuring optimal stability, security, and performance for NSCC's supercomputing environment.

ResearchOnsiteNational Supercomputing Centre

Key Responsibilities

  • Administer HPC compute nodes, storage systems, and internal networks.
  • Monitor system health using tools like Grafana, Prometheus, and custom scripts.
  • Apply patches, updates, and configuration changes to ensure system stability.
  • Manage user accounts, access controls, and authentication mechanisms.
  • Monitor job queues and assist users with job submission and scheduling issues.
  • Respond to system alerts and user-reported incidents, documenting resolutions.
  • Perform regular security checks and vulnerability assessments to ensure compliance.
  • Maintain system operation logs and configuration documentation.

Requirements

  • Degree in Computer Science, Engineering, IT or related field.
  • Minimum 2 years of experience in Linux system administration, preferably in HPC environments.
  • Proficiency in scripting using Python or Bash.
  • Familiarity with cluster management tools (xCAT, BCM, HPCM).
  • Experience with job schedulers such as PBS Pro or Slurm.
  • Basic understanding of parallel file systems (Lustre, GPFS, BeeGFS).
  • Understanding of basic network protocols (DHCP, DNS, TFTP, SMTP).