EY

Data Engineer (Graph, Vector & Data Platform) - Senior Associate, AI & Data, Technology Consulting

EY
BusinessSG, 048583Full-time5 days ago

About the role

AI summarised

EY is seeking a Senior Associate Data Engineer for its AI & Data practice in Asia-Pacific. The role focuses on designing, building, and operating scalable data solutions using Neo4j graph databases, vector databases (pgvector, Milvus, Weaviate), and Apache Iceberg on Nutanix infrastructure to support advanced analytics and AI use cases.

BusinessFull-timeGeneral

Key Responsibilities

  • Design and implement graph data models and entity networks using Neo4j
  • Develop and optimize Cypher queries for relationship and network analysis
  • Build and maintain vector databases using Postgres (pgvector), Milvus, or Weaviate
  • Implement embedding ingestion pipelines for similarity and semantic search use cases
  • Design and manage data repositories / lakehouse layers using Apache Iceberg
  • Develop data ingestion and transformation pipelines from multiple source systems
  • Deploy and operate databases and data platforms on Nutanix infrastructure
  • Ensure performance tuning, scalability, availability, and fault tolerance
  • Implement data quality checks, monitoring, and error handling
  • Collaborate with AI/ML, analytics, and application teams
  • Document data models, architectures, and operational procedures

Requirements

  • Bachelor's degree in computer science, Information Systems, or related field
  • Minimally 4 years of hands-on experience in Data Engineering
  • Strong hands-on experience with Neo4j (Graph DB)
  • Proven experience in entity network analysis and relationship-based modeling
  • Hands-on experience with vector databases: Postgres (pgvector), Milvus, and/or Weaviate
  • Strong SQL skills and experience with complex data transformations
  • Experience designing data lakes / lakehouse architectures
  • Hands-on experience with Apache Iceberg or similar table formats
  • Experience operating data platforms on Nutanix or comparable on-prem / hybrid infrastructure
  • Solid understanding of distributed systems and data storage concepts
  • Python or Java for data processing and integration
  • Experience with Spark, Kafka, or Flink
  • Experience with LLM / AI pipelines and embedding generation
  • Knowledge of Elasticsearch or hybrid search architectures
  • Experience with data governance, security, and access control
  • Exposure to DevOps / CI-CD for data platforms