About the role
AI summarisedThales is seeking a Data Engineer to design, build, and optimize data pipelines and processing frameworks for next-generation Data Warehouse and Data Lakehouse solutions. The role involves integrating data from various sources, ensuring compliance, and automating tasks. Ideal candidates have strong technical skills in streaming/batch processing, data quality, and modern architectural challenges.
Aerospace & DefenseFull-timeGeneral
Key Responsibilities
- Integrate data from a variety of structured and unstructured sources (e.g., APIs, streaming platforms, databases, external data feeds)
- Architect, implement and maintain robust, scalable and efficient ETL/ELT pipelines to collect, ingest, transform and load data from diverse sources. Ensure these pipelines shall meet the performance requirements.
- Ensure all data pipelines, data transformations and data retrieval are compliant to the cybersecurity and regulatory requirements.
- Develop and optimize data models (relational and non-relational) to support analytics, and operational supporting needs.
- Develop data generation/tracing capabilities via data models, audit change model and visualization into dashboards.
- Participate in designing and evolving Data Warehouse architecture.
- Implement data validation, sanitization and monitoring processes to ensure integrity, accuracy and consistency of data, pro-actively identify data quality issues and develop solutions.
- Facilitate data synchronization between storage and processing systems
- Automate recurring data engineering tasks to improve efficiency and reliability
- Apply data lifecycle policies, authorization, access controls and encryption enablement.
Requirements
- Bachelors in Computer Science or Information Technology
- Masters degree in Computer Science or Data Science, if applicable
- Proficient in data processing algorithm selection that respects balancing latency, throughput (e.g., Spark Structured Stream versus Flink DataStream).
- Proficiency in implementing ETL & ELT data pipelines (with structured or unstructured data) using Apache Kafka, Apache Spark 3.0 and/or Apache Flink 2.0 using the row-/columnar-data model.
- Proficiency in implementing ETL/ELT pipelines into Kubernetes cluster in Azure cloud either as virtual machines or containerized workloads.
- Proficiency in implementing ETL/ELT that stores and retrieves data from object-based data stores (e.g., MinIO) and relational data stores (e.g., PostgreSQL)
- Proficiency in using Grafana, Prometheus, ElasticSearch, Kibana
- Proficiency in programming languages in Java 8+ (e.g., Java 23.x), Kotlin 2.x
- Proficiency in developing performant abstract data structures (e.g., deterministic data lookups versus heuristic data lookups).
- Proficiency with Continuous Integration in using Git-based protocols (e.g., Gitlab, Gitea).
- Proficiency with distributed source code management tools using Git-based protocols (e.g., Gitlab, Gitea).
- Proficiency with using the Linux command line commands (e.g., Linux filesystem, Linux processes).
- Proficiency with integrating OTEL (OpenTelemetry)
- Good communication skills in English