About the role

AI summarised

Design and maintain scalable data pipelines and processing frameworks for Thales’ next-generation Data Warehouse and Data Lakehouse solutions, integrating diverse structured and unstructured sources while ensuring cybersecurity and regulatory compliance.

Aerospace & DefenseOnsite

Key Responsibilities

Integrate data from a variety of structured and unstructured sources (e.g., APIs, streaming platforms, databases, external data feeds)
Architect, implement and maintain robust, scalable and efficient ETL/ELT pipelines to collect, ingest, transform and load data from diverse sources. Ensure these pipelines shall meet the performance requirements
Ensure all data pipelines, data transformations and data retrieval are compliant to the cybersecurity and regulatory requirements
Develop and optimize data models (relational and non-relational) to support analytics, and operational supporting needs
Develop data generation/tracing capabilities via data models, audit change model and visualization into dashboards
Participate in designing and evolving Data Warehouse architecture
Implement data validation, sanitization and monitoring processes to ensure integrity, accuracy and consistency of data, pro-actively identify data quality issues and develop solutions
Facilitate data synchronization between storage and processing systems
Automate recurring data engineering tasks to improve efficiency and reliability
Apply data lifecycle policies, authorization, access controls and encryption enablement

Requirements

Bachelors in Computer Science or Information Technology
Masters degree in Computer Science or Data Science, if applicable
Proficient in data processing algorithm selection that respects balancing latency, throughput (e.g., Spark Structured Stream versus Flink DataStream)
Proficiency in implementing ETL & ELT data pipelines (with structured or unstructured data) using Apache Kafka, Apache Spark 3.0 and/or Apache Flink 2.0 using the row-/columnar-data model
Proficiency in implementing ETL/ELT pipelines into Kubernetes cluster in Azure cloud either as virtual machines or containerized workloads
Proficiency in implementing ETL/ELT that stores and retrieves data from object-based data stores (e.g., MinIO) and relational data stores (e.g., PostgreSQL)
Proficiency in using Grafana, Prometheus, ElasticSearch, Kibana
Proficiency in programming languages in Java 8+ (e.g., Java 23.x), Kotlin 2.x
Proficiency in developing performant abstract data structures (e.g., deterministic data lookups versus heuristic data lookups)
Proficiency with Continuous Integration in using Git-based protocols (e.g., Gitlab, Gitea)
Proficiency with distributed source code management tools using Git-based protocols (e.g., Gitlab, Gitea)
Proficiency with using the Linux command line commands (e.g., Linux filesystem, Linux processes)
Proficiency with integrating OTEL (OpenTelemetry)
Good communication skills in English
Working experiences with Python2/3, Scala2/3
Working experiences with working with Event-Driven Architectures
Familiar with data serialization & data exchange protocols/technologies (e.g., Apache Avro, FlatBuffers, ProtoBuffers)
Familiar with cloud-native deployment strategies to cloud service providers (e.g., Azure Cloud, AWS, GCP)

Data Engineer

About the role

Key Responsibilities

Requirements