About the role
AI summarisedDesign and maintain scalable data pipelines and processing frameworks for Thales’ next-generation Data Warehouse and Data Lakehouse solutions, integrating diverse structured and unstructured sources while ensuring cybersecurity and regulatory compliance.
Aerospace & DefenseOnsite
Key Responsibilities
- Integrate data from a variety of structured and unstructured sources (e.g., APIs, streaming platforms, databases, external data feeds)
- Architect, implement and maintain robust, scalable and efficient ETL/ELT pipelines to collect, ingest, transform and load data from diverse sources. Ensure these pipelines shall meet the performance requirements
- Ensure all data pipelines, data transformations and data retrieval are compliant to the cybersecurity and regulatory requirements
- Develop and optimize data models (relational and non-relational) to support analytics, and operational supporting needs
- Develop data generation/tracing capabilities via data models, audit change model and visualization into dashboards
- Participate in designing and evolving Data Warehouse architecture
- Implement data validation, sanitization and monitoring processes to ensure integrity, accuracy and consistency of data, pro-actively identify data quality issues and develop solutions
- Facilitate data synchronization between storage and processing systems
- Automate recurring data engineering tasks to improve efficiency and reliability
- Apply data lifecycle policies, authorization, access controls and encryption enablement
Requirements
- Bachelors in Computer Science or Information Technology
- Masters degree in Computer Science or Data Science, if applicable
- Proficient in data processing algorithm selection that respects balancing latency, throughput (e.g., Spark Structured Stream versus Flink DataStream)
- Proficiency in implementing ETL & ELT data pipelines (with structured or unstructured data) using Apache Kafka, Apache Spark 3.0 and/or Apache Flink 2.0 using the row-/columnar-data model
- Proficiency in implementing ETL/ELT pipelines into Kubernetes cluster in Azure cloud either as virtual machines or containerized workloads
- Proficiency in implementing ETL/ELT that stores and retrieves data from object-based data stores (e.g., MinIO) and relational data stores (e.g., PostgreSQL)
- Proficiency in using Grafana, Prometheus, ElasticSearch, Kibana
- Proficiency in programming languages in Java 8+ (e.g., Java 23.x), Kotlin 2.x
- Proficiency in developing performant abstract data structures (e.g., deterministic data lookups versus heuristic data lookups)
- Proficiency with Continuous Integration in using Git-based protocols (e.g., Gitlab, Gitea)
- Proficiency with distributed source code management tools using Git-based protocols (e.g., Gitlab, Gitea)
- Proficiency with using the Linux command line commands (e.g., Linux filesystem, Linux processes)
- Proficiency with integrating OTEL (OpenTelemetry)
- Good communication skills in English
- Working experiences with Python2/3, Scala2/3
- Working experiences with working with Event-Driven Architectures
- Familiar with data serialization & data exchange protocols/technologies (e.g., Apache Avro, FlatBuffers, ProtoBuffers)
- Familiar with cloud-native deployment strategies to cloud service providers (e.g., Azure Cloud, AWS, GCP)