About the role
AI summarisedThales is seeking a Data Engineer to design, build, and optimize data pipelines and processing frameworks powering next-generation Data Warehouse and Data Lakehouse solutions. This role involves tackling complex data challenges including streaming & batch processing, data quality assurance, and real-time data integration within a collaborative, high-tech environment.
Aerospace & DefenseOnsite
Key Responsibilities
- Architect, implement, and maintain robust, scalable, and efficient ETL/ELT pipelines to collect, ingest, transform, and load data from diverse sources.
- Integrate data from various structured and unstructured sources, including APIs, streaming platforms, databases, and external feeds.
- Develop and optimize data models (relational and non-relational) to support advanced analytics and operational needs.
- Implement data validation, sanitization, and monitoring processes to ensure high data integrity, accuracy, and consistency.
- Develop data generation/tracing capabilities via data models, audit change models, and visualization into dashboards.
- Participate in the design and evolution of Data Warehouse architecture.
- Automate recurring data engineering tasks to enhance overall efficiency and reliability.
Requirements
- Bachelor's in Computer Science or Information Technology (Master's degree in CS/Data Science is applicable).
- Proficiency in implementing ETL & ELT pipelines using Apache Kafka, Apache Spark 3.0, and/or Apache Flink 2.0.
- Proficiency in programming languages such as Java 8+ or Kotlin 2.x.
- Proficiency in implementing ETL/ELT pipelines into Kubernetes clusters within Azure cloud environments.
- Proficiency in using monitoring tools including Grafana, Prometheus, ElasticSearch, and Kibana.
- Proficiency in implementing ETL/ELT that interacts with object-based stores (e.g., MinIO) and relational databases (e.g., PostgreSQL).
- Proficiency with distributed source code management using Git-based protocols (e.g., GitLab, Gitea).
- Proficiency with Linux command line commands and understanding of Linux filesystem/processes.