Thales

Data Engineer

Thales
Aerospace & DefenseSingaporeFull-time1 months ago

About the role

AI summarised

Thales is seeking a Data Engineer to design, build, and optimize data pipelines and processing frameworks for next-generation Data Warehouse and Data Lakehouse solutions. The role involves integrating data from various sources, ensuring compliance, and automating tasks. Ideal candidates have strong technical skills in streaming/batch processing, data quality, and modern architectural challenges.

Aerospace & DefenseFull-timeGeneral

Key Responsibilities

  • Integrate data from a variety of structured and unstructured sources (e.g., APIs, streaming platforms, databases, external data feeds)
  • Architect, implement and maintain robust, scalable and efficient ETL/ELT pipelines to collect, ingest, transform and load data from diverse sources. Ensure these pipelines shall meet the performance requirements.
  • Ensure all data pipelines, data transformations and data retrieval are compliant to the cybersecurity and regulatory requirements.
  • Develop and optimize data models (relational and non-relational) to support analytics, and operational supporting needs.
  • Develop data generation/tracing capabilities via data models, audit change model and visualization into dashboards.
  • Participate in designing and evolving Data Warehouse architecture.
  • Implement data validation, sanitization and monitoring processes to ensure integrity, accuracy and consistency of data, pro-actively identify data quality issues and develop solutions.
  • Facilitate data synchronization between storage and processing systems
  • Automate recurring data engineering tasks to improve efficiency and reliability
  • Apply data lifecycle policies, authorization, access controls and encryption enablement.

Requirements

  • Bachelors in Computer Science or Information Technology
  • Masters degree in Computer Science or Data Science, if applicable
  • Proficient in data processing algorithm selection that respects balancing latency, throughput (e.g., Spark Structured Stream versus Flink DataStream).
  • Proficiency in implementing ETL & ELT data pipelines (with structured or unstructured data) using Apache Kafka, Apache Spark 3.0 and/or Apache Flink 2.0 using the row-/columnar-data model.
  • Proficiency in implementing ETL/ELT pipelines into Kubernetes cluster in Azure cloud either as virtual machines or containerized workloads.
  • Proficiency in implementing ETL/ELT that stores and retrieves data from object-based data stores (e.g., MinIO) and relational data stores (e.g., PostgreSQL)
  • Proficiency in using Grafana, Prometheus, ElasticSearch, Kibana
  • Proficiency in programming languages in Java 8+ (e.g., Java 23.x), Kotlin 2.x
  • Proficiency in developing performant abstract data structures (e.g., deterministic data lookups versus heuristic data lookups).
  • Proficiency with Continuous Integration in using Git-based protocols (e.g., Gitlab, Gitea).
  • Proficiency with distributed source code management tools using Git-based protocols (e.g., Gitlab, Gitea).
  • Proficiency with using the Linux command line commands (e.g., Linux filesystem, Linux processes).
  • Proficiency with integrating OTEL (OpenTelemetry)
  • Good communication skills in English