About the role

AI summarised

Thales is seeking a seasoned Data Architect to lead the design and implementation of a Data Warehouse platform for regulatory requirements and advanced analytics. The role involves architecting scalable data solutions combining data lakes and warehouses, enabling AI/ML applications and secure data access. The ideal candidate has strong experience with MinIO, distributed data processing technologies, and data security.

Aerospace & DefenseFull-timeGeneral

Key Responsibilities

Owner of the Architecture of the Data Warehouse; ensures that systems architecture blueprints (e.g., architecture diagrams, tooling, technologies) remain current.
Define data modeling standards for raw, curated and serving layers.
Develop strategies for data ingestion, data storage, data cataloging, data governance, and secured data access (e.g., data-at-rest-, data-in-transit).
Develop strategies for data backups and data recovery based on SLA/RPO;
Develop strategies for data consistency, data security (i.e., data-at-rest, data-at-transit) and data redundancy.
Lead the design and implementation of ingestion pipelines, for structured and unstructured data, ensure that they meet the functional and non-functional requirements.
Lead the design of storage layers, metadata management, and data cataloging to ensure that they meet the functional and non-functional requirements.
Define ELT/ELT workflows and support automation with orchestration tools (e.g., Apache Spark, Apache Flink)
Oversee deployment pipelines using CI/CD best practices for data-oriented infrastructure.
Ensure high availability, scalability and performance of the technical implementation (e.g., RMA analysis).
Implement robust data security frameworks (i.e., encryption, data masking, and fine-grained access controls)
Establish IAM policies and secure data perimeters using cloud-native and host-based tools.

Requirements

Bachelors in Computer Science or Information Technology
Masters degree in Computer Science or Data Science, if applicable
Strong experience in designing, building highly available data platforms using MinIO as the data storage infrastructure, Data Replication / Partition strategies.
Strong expertise in SQL, data modeling (SCD Type 1,2, 6), ETL/ELT design, and query performance tuning.
Strong expertise in containerization and orchestration (i.e., Docker, Kaniko, Kubernetes)
Strong expertise in design & development of ETL & ELT data pipelines (with structured or unstructured data), Change Data Capture.
Strong expertise in distributed data processing technologies (Apache Spark 3.0, Apache Flink 2.0, Apache Iceberg, Trino, Apache Kafka)
Hands-on experience with scalable data storage platforms (e.g., Azure Data Lake Storage, MinIO S3).
(Past / Present) Proficiency in implementing ETL/ELT that stores and retrieves data from object-based data stores (e.g., MinIO) and relational data stores (e.g., PostgreSQL)
Deep understanding of data security, encryption, IAM and compliance standards.
Proficiency with integrating OTEL (OpenTelemetry) into Data Infrastructures
Proficiency in programming languages in Java 8+, (e.g., Java 23.x), Kotlin 2.x
Proficiency with Continuous Integration in using Git-based protocols (e.g., Gitlab, Gitea).
Proficiency with distributed source code management tools using Git-based protocols (e.g., Gitlab, Gitea).
Proficiency with using the Linux command line commands (e.g., Linux filesystem, Linux processes).
Good communication skills in English