A*STAR

Research Officer (Large Language Models & Data Engineering), A*STAR BII

A*STAR
ResearchSingaporeOnsitePosted 3 weeks ago

About the role

AI summarised

Join a multidisciplinary project team from SingHealth, A*STAR, and Synapxe to develop an AI-powered Large Language Model platform. This platform aims to enhance the management and treatment of patients with lower respiratory tract infections (LRTIs) by using clinical data to reduce unnecessary antibiotic prescriptions and combat antimicrobial resistance.

ResearchOnsiteBioinformatics Institute

Key Responsibilities

  • Contribute to LLM development, focusing on enhancing unstructured data processing for clinical and biomedical applications.
  • Fine-tune and train LLMs (e.g., LLaMA, Mistral, Phi, GPT family) using supervised and instruction-based datasets.
  • Design and implement pipelines for data cleaning, preprocessing, and tokenisation of large-scale text corpora.
  • Evaluate model performance using BLEU, ROUGE, BERTScore, and factual consistency metrics.
  • Develop optimised PEFT/LoRA/QLoRA fine-tuning frameworks for efficiency on GPU clusters.
  • Collaborate with researchers to design experiments, interpret results, and publish findings.
  • Maintain reproducible codebases, documentation, and experiment logs.

Requirements

  • Master or Bachelor in Computer Science, Data Science, AI, Computational Linguistics, or related disciplines.
  • Strong experience in Natural Language Processing (NLP), transformer-based models, and text generation.
  • Proficiency in Python, PyTorch, Hugging Face, Transformers, and LLM fine-tuning libraries (PEFT, DeepSpeed, bitsandbytes).
  • Experience with text and data processing, including annotation, tokenisation, and augmentation.
  • Familiarity with vector databases (FAISS, Qdrant) and RAG pipelines.
  • Understanding of GPU-based training, distributed model optimisation, and experiment tracking (MLflow, W&B).
  • Strong analytical, communication, and collaborative skills.