Thales

Data Engineer

Thales
Aerospace & DefenseSingaporeOnsitePosted 4 weeks ago

About the role

AI summarised

Design and maintain scalable data pipelines and processing frameworks for Thales’ next-generation Data Warehouse and Data Lakehouse solutions, integrating diverse structured and unstructured sources while ensuring cybersecurity and regulatory compliance.

Aerospace & DefenseOnsite

Key Responsibilities

  • Integrate data from a variety of structured and unstructured sources (e.g., APIs, streaming platforms, databases, external data feeds)
  • Architect, implement and maintain robust, scalable and efficient ETL/ELT pipelines to collect, ingest, transform and load data from diverse sources. Ensure these pipelines shall meet the performance requirements
  • Ensure all data pipelines, data transformations and data retrieval are compliant to the cybersecurity and regulatory requirements
  • Develop and optimize data models (relational and non-relational) to support analytics, and operational supporting needs
  • Develop data generation/tracing capabilities via data models, audit change model and visualization into dashboards
  • Participate in designing and evolving Data Warehouse architecture
  • Implement data validation, sanitization and monitoring processes to ensure integrity, accuracy and consistency of data, pro-actively identify data quality issues and develop solutions
  • Facilitate data synchronization between storage and processing systems
  • Automate recurring data engineering tasks to improve efficiency and reliability
  • Apply data lifecycle policies, authorization, access controls and encryption enablement

Requirements

  • Bachelors in Computer Science or Information Technology
  • Masters degree in Computer Science or Data Science, if applicable
  • Proficient in data processing algorithm selection that respects balancing latency, throughput (e.g., Spark Structured Stream versus Flink DataStream)
  • Proficiency in implementing ETL & ELT data pipelines (with structured or unstructured data) using Apache Kafka, Apache Spark 3.0 and/or Apache Flink 2.0 using the row-/columnar-data model
  • Proficiency in implementing ETL/ELT pipelines into Kubernetes cluster in Azure cloud either as virtual machines or containerized workloads
  • Proficiency in implementing ETL/ELT that stores and retrieves data from object-based data stores (e.g., MinIO) and relational data stores (e.g., PostgreSQL)
  • Proficiency in using Grafana, Prometheus, ElasticSearch, Kibana
  • Proficiency in programming languages in Java 8+ (e.g., Java 23.x), Kotlin 2.x
  • Proficiency in developing performant abstract data structures (e.g., deterministic data lookups versus heuristic data lookups)
  • Proficiency with Continuous Integration in using Git-based protocols (e.g., Gitlab, Gitea)
  • Proficiency with distributed source code management tools using Git-based protocols (e.g., Gitlab, Gitea)
  • Proficiency with using the Linux command line commands (e.g., Linux filesystem, Linux processes)
  • Proficiency with integrating OTEL (OpenTelemetry)
  • Good communication skills in English
  • Working experiences with Python2/3, Scala2/3
  • Working experiences with working with Event-Driven Architectures
  • Familiar with data serialization & data exchange protocols/technologies (e.g., Apache Avro, FlatBuffers, ProtoBuffers)
  • Familiar with cloud-native deployment strategies to cloud service providers (e.g., Azure Cloud, AWS, GCP)