About the role
AI summarisedDesign and develop high-performance compute cluster configurations, selecting and validating hardware components to ensure optimal performance, reliability, and scalability within KLA systems.
EquipmentOnsite
Key Responsibilities
- Design and develop compute cluster configurations optimized for performance, reliability, and scalability in KLA systems.
- Select and validate hardware components including CPUs, memory, storage, networking, and specialized accelerators.
- Collaborate with hardware, software, and systems engineering teams to ensure seamless integration of compute clusters into broader system architectures.
- Document hardware design decisions, integration procedures, and diagnostic workflows for internal and cross-team use.
- Participate in design reviews, integration planning, and collaborative problem-solving sessions with cross-functional teams.
Requirements
- Strong experience in computer hardware design, particularly in compute cluster or server environments.
- Experience in networking design, including InfiniBand and Ethernet switches, with expertise in port mapping and configuration.
- Familiarity with modern memory technologies (e.g., DDR4/DDR5, DIMM, LPDDR, HBM).
- Familiarity with Linux system administration and OS customization (preferably SUSE Linux).
- Understanding of system-level performance tuning and hardware-software interaction.
- Excellent documentation and communication skills.