About the role
AI summarisedThe Member of Technical Staff (MTS) in Machine Learning at Micron Technology's Smart Manufacturing and AI team will design, develop, and deploy scalable AI/ML solutions, including large language models and autonomous AI agents, to drive insights and automation in semiconductor manufacturing. The role involves optimizing distributed training, building data pipelines, implementing CI/CD for ML systems, and collaborating with cross-functional teams to enhance manufacturing processes through advanced AI technologies.
IDMOnsiteSmart MFG/AI
Key Responsibilities
- Architect and execute large-scale custom model training and fine-tuning jobs (SFT, RLHF) on multi-node, multi-GPU clusters
- Optimize training throughput and memory efficiency using distributed training strategies (FSDP, DeepSpeed, Megatron-LM) and mixed-precision techniques (FP16/BF16)
- Design and develop autonomous AI Agents capable of multi-step reasoning, planning, and tool execution to automate complex manufacturing workflows
- Implement Agentic frameworks (e.g., LangChain, LangGraph, CrewAI) to orchestrate LLM interactions with internal APIs, databases, and software tools
- Profile and debug GPU performance bottlenecks using tools like Nsight Systems or PyTorch Profiler to maximize hardware utilization
- Build and maintain data/solution pipelines that feed machine learning models and GenAI applications
- Design and optimize data structures in data management systems (Snowflake, and Google Cloud platforms) to enable AI/ML and Agentic solutions
- Create/Maintain CI/CD pipelines of machine learning and AI Agent solutions in the cloud
Requirements
- Technical Degree required. Computer Science or Statistics background highly desired
- Deep understanding of GPU architecture (memory hierarchy, tensor cores, interconnects like NVLink) and experience managing GPU resources in both cloud environments and on-prem
- Hands-on experience with Distributed Data Parallel (DDP), Fully Sharded Data Parallel (FSDP), and model parallelism techniques
- Proficiency in fine-tuning Large Language Models using PE
