Data Engineer – Spark Specialist

Il y a 32 minutes

Paris, Île-de-France Dataiku Temps plein

Dataiku is The Universal AI Platform, giving organizations control over their AI talent, processes, and technologies to unleash the creation of analytics, models, and agents. Providing no-, low-, and full-code capabilities, Dataiku meets teams where they are today, allowing them to begin building with AI using their existing skills and knowledge.

About the Role

Dataiku is looking for a Data Engineer specialized in Spark (PySpark) to join our Field Engineering team. In this role, you will work closely with our clients to troubleshoot and optimize complex data pipelines within the Dataiku platform. This includes both reactive support (advanced issues reported via the support portal) and proactive services (performance reviews and architecture advisory missions we propose to clients).

You will serve as a technical expert in data processing, leveraging SQL and Python frameworks. You will specialize in Spark-based distributed data processing and lakehouse architecture. You will help our clients succeed, whether working with SQL-based workflows, processing data on Kubernetes, Databricks, or other modern data platforms.

What You'll Do

Help customers design, build, and optimize Flows in Dataiku, improving overall project performance and maintainability
Debug and enhance complex Spark code and data pipelines for better performance and reliability.
Guide clients in tuning and scaling Spark environments, such as Kubernetes and Databricks, including providing architectural guidance and best practices to enhance performance and reliability.
Optimize SQL-based data pipelines to ensure efficient and robust data workflows within Dataiku.
Advise clients on integrating different data pipelines (Spark, SQL, Python) into optimized solutions
Collaborate with internal teams to resolve technical issues and contribute to the knowledge base.

Who You Are

You have deep hands-on experience building, debugging, and tuning Spark pipelines in production environments. Specifically, you have:

Spark & PySpark Expertise

Proficiency in writing and debugging PySpark code for large-scale data processing.
Experience with Parquet, Delta Lake, and columnar file formats.
Understanding of Spark's interaction with metastores (e.g., Hive, Unity Catalog).
Deep understanding of resource management: Spark executors, cores, memory, and relevant configurations (e.g., , ).
Expertise in tuning Spark jobs: partitioning, caching, broadcast joins, and avoiding unnecessary shuffles.

Lakehouse & Orchestration

Familiarity with lakehouse architectures and ACID-compliant data layers (Delta Lake, Iceberg, Hudi).
Experience working with Databricks, including Databricks Connect and Databricks Workflows.
Experience automating and scheduling Spark jobs using tools like Apache Airflow or native orchestration tools.

Core Data Engineering Skills

Proven experience developing, optimizing, and troubleshooting SQL-based data pipelines for efficient ETL and data transformation processes.
Proficiency in building and managing data transformation workflows in Python, leveraging frameworks such as pandas.
Familiarity with data modeling concepts and data quality best practices.
Experience integrating data from a variety of sources, including databases, APIs, and cloud storages.
Ability to communicate technical concepts effectively to both technical and non-technical stakeholders.

What does the hiring process look like? #LI-Hybrid #LI-AN1

Initial call with a member of our Technical Recruiting team
Video call with the Field Engineer Hiring Manager
Technical Assessment to show your skills (Home Test)
Debrief of your Tech Assessment with FE Team members
Final Interview with the VP Field Engineering

What are you waiting for
At Dataiku, you'll be part of a journey to shape the ever-evolving world of AI. We're not just building a product; we're crafting the future of AI. If you're ready to make a significant impact in a company that values innovation, collaboration, and your personal growth, we can't wait to welcome you to Dataiku And if you'd like to learn even more about working here, you can visit our Dataiku LinkedIn page.

Our practices are rooted in the idea that everyone should be treated with dignity, decency and fairness. Dataiku also believes that a diverse identity is a source of strength and allows us to optimize across the many dimensions that are needed for our success. Therefore, we are proud to be an equal opportunity employer. All employment practices are based on business needs, without regard to race, ethnicity, gender identity or expression, sexual orientation, religion, age, neurodiversity, disability status, citizenship, veteran status or any other aspect which makes an individual unique or protected by laws and regulations in the locations where we operate. This applies to all policies and procedures related to recruitment and hiring, compensation, benefits, performance, promotion and termination and all other conditions and terms of employment. If you need assistance or an accommodation, please contact us at: reasonable-

Protect yourself from fraudulent recruitment activity
Dataiku will never ask you for payment of any type during the interview or hiring process. Other than our video-conference application, Zoom, we will also never ask you to make purchases or download third-party applications during the process. If you experience something out of the ordinary or suspect fraudulent activity, please review our page on identifying and reporting fraudulent activity here.

Data Engineer Spark

Il y a 33 minutes

Paris, Île-de-France Sibylone Temps plein

CDISIBYLONE, société de conseil spécialisée dans les systèmes d'information de synthèse et de pilotage, aide ses clients à tirer toute la valeur de leur patrimoine de données, levier stratégique majeur de développement et de rentabilité.Notre ambition : rendre les différents acteurs de l'entreprise autonomes dans l'exploitation des données,...
DATA ENGINEER Scala, Spark, AWS

Il y a 15 minutes

Paris, Île-de-France emagine Temps plein

Introductionemagine recherche un Data engineer pour l'un de ces clients finaux dont le rôle sera de concevoir et implémenter des pipelines d'ingestion de données tout en garantissant la performance et la scalabilité des processus de données.Durée : 36 mois maximumPrestation en freelance à temps plein en mode hybride (2 jours par...
Data Engineer

Il y a 17 minutes

Paris, Île-de-France Dataworks Temps plein

Data Engineer (Azure + Databricks)Paris, France45€k-60€kTECH STACK :Azure, Fabric, Python, Spark, Databricks, SQLLA SOCIÉTÉ :Reconnue comme une des agences data Microsoft de référence, la société développe une expertise R&D sur les outils Microsoft (BI, Cloud, Data) et Databricks.Actuellement composée d'une 60aine de profils, l'agence a...
Data Engineer

Il y a 20 minutes

Paris, Île-de-France SOFTEAM Temps plein

Vous évoluez dans le domaine de la Data et souhaitez intégrer un leader de la transformation numérique spécialisé dans les secteurs de la Banque, du Luxe, de l'Assurance, de la Finance, de l'Energie et la possibilité d'évoluer au sein du Groupe Docaposte Softeam est labellisé "HappyIndex AtWork " 2022 pour la 5ème année consécutive Nos...
Data Engineer

Il y a 38 minutes

Paris, Île-de-France Next Ventures Temps plein

Next Ventures are currently looking for a Data Engineer to work on a project with ahuge insurance client, this will be a2 year contract ending in February of start date is ASAPand you will be required to go on site inParis 3x per week.What the Ideal Profile looks like:Strong mastery of JavaStrong mastery of SparkVery good knowledge of the Hadoop ecosystem...
Data Engineer

Il y a 25 minutes

Paris, Île-de-France Happy Hire Temps plein

Leader de la distribution numérique (audio et video), cet acteur renforce actuellement son équipe Data Engineering sur la partie retail.Rattaché au Teamlead, vous rejoindrez une squad agile composée de 6 developers et 2 Data Engineers pour travailler sur leur plateforme ingestion.Environnement Technique : Snowflakes / Airflow / Data Bricks / Spark /...
Data engineer

Il y a 52 minutes

Paris, Île-de-France Collective Temps plein

Budget: 600Contexte de la missionFrance Télévisions recherche unData Engineer freelancepour renforcer son équipe Data au sein de la Direction du Numérique. La mission s'inscrit au cœur de la plateforme Data, avec pour objectif d'améliorer la qualité, la gouvernance et l'industrialisation des données dans un environnementGCPà forts enjeux...
Data Engineer Senior

Il y a 28 minutes

Paris, Île-de-France Collective Temps plein

Budget: 550Titre de l'offrePrestation de Data Engineering SeniorContexte de la missionAu sein de la Direction Data (Direction du Numérique), intégration à l'équipe Bronze (PO, data steward, 2 data engineers) au cœur de la plateforme Data.Stack principale : Python, Spark, SQL (traitements), Airflow (orchestration), Cloud Storage & Delta Lake (stockage),...
Lead Data Engineer

Il y a 53 minutes

Paris, Île-de-France Collective Temps plein

Budget: 70K€/600€Data Engineer - Lead Développeur ITRésuméBesoin d'un lead Data Engineer qui aura pour principale mission d'être Lead Développeur IT dans le cadre du programme Data pour la mise en place de notre data plateforme. L'objectif est de remplacer au plus vite une ressource clé en prenant en charge non seulement l'accompagnement des...
Data Specialist

Il y a 38 minutes

Paris, Île-de-France Beelix Temps plein

Qui sommes-nous ?Depuis 2016, Beelix accompagne ses clients sur des projets stratégiques en Product Management, Data, et Design Thinking. Nous intervenons dans divers secteurs : Automobile Énergie Médias & Télécoms Luxe & Retail Banque, Finance & Assurance DéfenseLabellisée Great Place To Work en 2023, Beelix c'est plus de 260 collaborateurs unis...

Amériques

Europe

Asie / Océanie

Afrique

Data Engineer – Spark Specialist