Thales

Data Architect

Thales
Aerospace & DefenseSingaporeFull-time1 months ago

About the role

AI summarised

Thales is seeking a seasoned Data Architect to lead the design and implementation of a Data Warehouse platform for regulatory requirements and advanced analytics. The role involves architecting scalable data solutions combining data lakes and warehouses, enabling AI/ML applications and secure data access. The ideal candidate has strong experience with MinIO, distributed data processing technologies, and data security.

Aerospace & DefenseFull-timeGeneral

Key Responsibilities

  • Owner of the Architecture of the Data Warehouse; ensures that systems architecture blueprints (e.g., architecture diagrams, tooling, technologies) remain current.
  • Define data modeling standards for raw, curated and serving layers.
  • Develop strategies for data ingestion, data storage, data cataloging, data governance, and secured data access (e.g., data-at-rest-, data-in-transit).
  • Develop strategies for data backups and data recovery based on SLA/RPO;
  • Develop strategies for data consistency, data security (i.e., data-at-rest, data-at-transit) and data redundancy.
  • Lead the design and implementation of ingestion pipelines, for structured and unstructured data, ensure that they meet the functional and non-functional requirements.
  • Lead the design of storage layers, metadata management, and data cataloging to ensure that they meet the functional and non-functional requirements.
  • Define ELT/ELT workflows and support automation with orchestration tools (e.g., Apache Spark, Apache Flink)
  • Oversee deployment pipelines using CI/CD best practices for data-oriented infrastructure.
  • Ensure high availability, scalability and performance of the technical implementation (e.g., RMA analysis).
  • Implement robust data security frameworks (i.e., encryption, data masking, and fine-grained access controls)
  • Establish IAM policies and secure data perimeters using cloud-native and host-based tools.

Requirements

  • Bachelors in Computer Science or Information Technology
  • Masters degree in Computer Science or Data Science, if applicable
  • Strong experience in designing, building highly available data platforms using MinIO as the data storage infrastructure, Data Replication / Partition strategies.
  • Strong expertise in SQL, data modeling (SCD Type 1,2, 6), ETL/ELT design, and query performance tuning.
  • Strong expertise in containerization and orchestration (i.e., Docker, Kaniko, Kubernetes)
  • Strong expertise in design & development of ETL & ELT data pipelines (with structured or unstructured data), Change Data Capture.
  • Strong expertise in distributed data processing technologies (Apache Spark 3.0, Apache Flink 2.0, Apache Iceberg, Trino, Apache Kafka)
  • Hands-on experience with scalable data storage platforms (e.g., Azure Data Lake Storage, MinIO S3).
  • (Past / Present) Proficiency in implementing ETL/ELT that stores and retrieves data from object-based data stores (e.g., MinIO) and relational data stores (e.g., PostgreSQL)
  • Deep understanding of data security, encryption, IAM and compliance standards.
  • Proficiency with integrating OTEL (OpenTelemetry) into Data Infrastructures
  • Proficiency in programming languages in Java 8+, (e.g., Java 23.x), Kotlin 2.x
  • Proficiency with Continuous Integration in using Git-based protocols (e.g., Gitlab, Gitea).
  • Proficiency with distributed source code management tools using Git-based protocols (e.g., Gitlab, Gitea).
  • Proficiency with using the Linux command line commands (e.g., Linux filesystem, Linux processes).
  • Good communication skills in English