About the role
AI summarisedSenior Staff Software Development Engineer role at AMD, serving as SoC Diagnostics Technical Lead for Data Center GPU products. Responsible for driving diagnostics strategy, pre-silicon emulation, system-level validation, and cross-team collaboration to ensure quality and coverage of diagnostics solutions. Requires strong SoC architecture knowledge, system debugging skills, and technical leadership.
FablessFull-time{'name': 'Engineering'}
Key Responsibilities
- Serve as the SoC Diagnostics Technical Lead for DCGPU programs, providing primary local ownership and global technical leadership for silicon and manufacturing quality issues across Singapore/Tai and other sites, with end‑to‑end accountability for the quality, coverage, and completeness of diagnostics solutions.
- Work closely with the Diagnostics PM to define and drive end‑to‑end diagnostics strategy by translating program and customer requirements into clear priorities and execution plans across pre‑silicon and post‑silicon phases. Proactively articulate diagnostics objectives, strategic direction, risks, and tooling/framework requirements to PMs, managers, IP and framework architects to influence test coverage strategy, planning, and cross‑team alignment.
- Own the diagnostics pre‑silicon emulation strategy and planning across software‑based and FPGA‑based emulation models, including RTL coverage requirements before silicon tape‑out and diagnostics verification requirements before silicon back.
- Own the SoC system‑level feature validation methodology and planning for diagnostics.
- Drive the technical requirements needed to achieve feature coverage and hardware bug capture targets, ensuring that diagnostics content supports both engineering debug and manufacturing/field health checks.
- Lead and coordinate complex SoC/system‑level investigations (e.g., SLT/Board Production failures, field issues), analyze logs and symptoms, form hypotheses, and work with IP, platform, firmware and software teams to converge on root cause and corrective action.
- Exercise horizontal leadership and collaboration with cross‑functional teams such as platform validation, ROCm/SW, HW architects, product engineering, manufacturing, and other stakeholders to achieve key program milestones (bring‑up, feature enablement, performance profiling, production support) with the desired coverage metrics from diagnostics.
- Collaborate with the Product Engineering Organization to enable the product with high quality to customers; debug defects and help improve yield, coverage, and test time during NPI and volume production.
- Provide diagnostics support to contract manufacturers and board engineering teams, particularly for SLT/BP and system‑level test flows and ensure that Diagnostics content is usable and effective in manufacturing environments.
Requirements
- Proven experience with IP and SoC validation, diagnostics, and system Bring-up, with the ability to closely interact with hardware designers, validation, manufacturing and software teams.
- Excellent understanding of SoC architecture, including processor, GPU compute, system IO and memory/HBM, and security blocks, to identify critical areas for SoC & IP verification and diagnostics focus.
- Strong system‑level debugging and testing skills, with the capability to quickly identify problems, perform structured root‑cause analysis, and provide robust solutions.
- Excellent communication and interpersonal skills, with the ability to collaborate effectively across global teams and can clearly explain complex technical issues to both technical and non‑technical stakeholders.
- Demonstrated ability to work under pressure and manage competing priorities in tight project timelines while maintaining professionalism and quality.
- Knowledge and experience in developing or enabling applications on industry compute platforms such as ROCm, OpenCL, or CUDA is an asset.
- Familiar with Linux, knowledge and experience of device driver or software development is preferred.
- Knowledge and experience with Manufacturing ATE/Wafer Sort Test and System Level Test a bonus.
- Experienced with source controls systems like Perforce and GIT.
- Hands‑on experience with SoC Bring-up and working in lab environments is a plus.
- Prior experience in software development (e.g., object‑oriented C++, modern C++, system software or drivers), software development lifecycle; able to read and review code, understand architecture, and guide engineers in debug. Experience developing machine learning, HPC or general‑purpose GPU compute applications is a bonus.
- BS or MS in Computer Science, Computer Engineering or Electrical Engineering preferred.