About the role

AI summarised

Thales is seeking a Data Engineer to design, build, and optimize data pipelines and processing frameworks for next-generation Data Warehouse and Data Lakehouse solutions. The role involves integrating data from various sources, ensuring compliance, and automating tasks. Ideal candidates have strong technical skills in streaming/batch processing, data quality, and modern architectural challenges.

Aerospace & DefenseFull-timeGeneral

Key Responsibilities

Integrate data from a variety of structured and unstructured sources (e.g., APIs, streaming platforms, databases, external data feeds)
Architect, implement and maintain robust, scalable and efficient ETL/ELT pipelines to collect, ingest, transform and load data from diverse sources. Ensure these pipelines shall meet the performance requirements.
Ensure all data pipelines, data transformations and data retrieval are compliant to the cybersecurity and regulatory requirements.
Develop and optimize data models (relational and non-relational) to support analytics, and operational supporting needs.
Develop data generation/tracing capabilities via data models, audit change model and visualization into dashboards.
Participate in designing and evolving Data Warehouse architecture.
Implement data validation, sanitization and monitoring processes to ensure integrity, accuracy and consistency of data, pro-actively identify data quality issues and develop solutions.
Facilitate data synchronization between storage and processing systems
Automate recurring data engineering tasks to improve efficiency and reliability
Apply data lifecycle policies, authorization, access controls and encryption enablement.

Requirements

Bachelors in Computer Science or Information Technology
Masters degree in Computer Science or Data Science, if applicable
Proficient in data processing algorithm selection that respects balancing latency, throughput (e.g., Spark Structured Stream versus Flink DataStream).
Proficiency in implementing ETL & ELT data pipelines (with structured or unstructured data) using Apache Kafka, Apache Spark 3.0 and/or Apache Flink 2.0 using the row-/columnar-data model.
Proficiency in implementing ETL/ELT pipelines into Kubernetes cluster in Azure cloud either as virtual machines or containerized workloads.
Proficiency in implementing ETL/ELT that stores and retrieves data from object-based data stores (e.g., MinIO) and relational data stores (e.g., PostgreSQL)
Proficiency in using Grafana, Prometheus, ElasticSearch, Kibana
Proficiency in programming languages in Java 8+ (e.g., Java 23.x), Kotlin 2.x
Proficiency in developing performant abstract data structures (e.g., deterministic data lookups versus heuristic data lookups).
Proficiency with Continuous Integration in using Git-based protocols (e.g., Gitlab, Gitea).
Proficiency with distributed source code management tools using Git-based protocols (e.g., Gitlab, Gitea).
Proficiency with using the Linux command line commands (e.g., Linux filesystem, Linux processes).
Proficiency with integrating OTEL (OpenTelemetry)
Good communication skills in English