About the role
AI summarisedThales is seeking a seasoned Data Architect to lead the design and implementation of a Data Warehouse platform for regulatory requirements and advanced analytics. The role involves architecting scalable data solutions combining data lakes and warehouses, enabling AI/ML applications and secure data access. The ideal candidate has strong experience with MinIO, distributed data processing technologies, and data security.
Aerospace & DefenseFull-timeGeneral
Key Responsibilities
- Owner of the Architecture of the Data Warehouse; ensures that systems architecture blueprints (e.g., architecture diagrams, tooling, technologies) remain current.
- Define data modeling standards for raw, curated and serving layers.
- Develop strategies for data ingestion, data storage, data cataloging, data governance, and secured data access (e.g., data-at-rest-, data-in-transit).
- Develop strategies for data backups and data recovery based on SLA/RPO;
- Develop strategies for data consistency, data security (i.e., data-at-rest, data-at-transit) and data redundancy.
- Lead the design and implementation of ingestion pipelines, for structured and unstructured data, ensure that they meet the functional and non-functional requirements.
- Lead the design of storage layers, metadata management, and data cataloging to ensure that they meet the functional and non-functional requirements.
- Define ELT/ELT workflows and support automation with orchestration tools (e.g., Apache Spark, Apache Flink)
- Oversee deployment pipelines using CI/CD best practices for data-oriented infrastructure.
- Ensure high availability, scalability and performance of the technical implementation (e.g., RMA analysis).
- Implement robust data security frameworks (i.e., encryption, data masking, and fine-grained access controls)
- Establish IAM policies and secure data perimeters using cloud-native and host-based tools.
Requirements
- Bachelors in Computer Science or Information Technology
- Masters degree in Computer Science or Data Science, if applicable
- Strong experience in designing, building highly available data platforms using MinIO as the data storage infrastructure, Data Replication / Partition strategies.
- Strong expertise in SQL, data modeling (SCD Type 1,2, 6), ETL/ELT design, and query performance tuning.
- Strong expertise in containerization and orchestration (i.e., Docker, Kaniko, Kubernetes)
- Strong expertise in design & development of ETL & ELT data pipelines (with structured or unstructured data), Change Data Capture.
- Strong expertise in distributed data processing technologies (Apache Spark 3.0, Apache Flink 2.0, Apache Iceberg, Trino, Apache Kafka)
- Hands-on experience with scalable data storage platforms (e.g., Azure Data Lake Storage, MinIO S3).
- (Past / Present) Proficiency in implementing ETL/ELT that stores and retrieves data from object-based data stores (e.g., MinIO) and relational data stores (e.g., PostgreSQL)
- Deep understanding of data security, encryption, IAM and compliance standards.
- Proficiency with integrating OTEL (OpenTelemetry) into Data Infrastructures
- Proficiency in programming languages in Java 8+, (e.g., Java 23.x), Kotlin 2.x
- Proficiency with Continuous Integration in using Git-based protocols (e.g., Gitlab, Gitea).
- Proficiency with distributed source code management tools using Git-based protocols (e.g., Gitlab, Gitea).
- Proficiency with using the Linux command line commands (e.g., Linux filesystem, Linux processes).
- Good communication skills in English