site reliability engineer

il y a 3 jours


Paris, Île-de-France STATION F Temps plein

About
At Welcome to the Jungle, we believe working is good. But thriving with the right people is better. We provide a suite of tools, content, and experiences that make recruitment more transparent, authentic, and human.

  • We help companies build their recruitment strategy by sharing their story through employer branding, enabling them to attract, engage, and retain talent who share their values.
  • We guide candidates to their future teams through immersive job listings and support them throughout their job search with a personalized candidate experience.

Job Description
As our Site Reliability Engineer you are responsible for
implementing and maintaining scalable infrastructure and systems that ensure the reliability,
performance, and security of our production environments.
This hands-on position bridges the gap between development and operations, applying software engineering principles to infrastructure and operational challenges. This role involves close collaboration with Development teams, Security teams, and other stakeholders to build and maintain robust systems, implement automation, and support operational excellence through SLOs (Service Level Objectives) and observability. Additionally, you will contribute to incident management, capacity planning, and implementing infrastructure as code practices across the organization.

You will report to the Platform Engineering Manager and you are integrated within the Platform Team.

Key Responsibilities

Technical Leadership & System Design

  • Collaborate with Development teams on infrastructure architecture, deployment strategies, and operational requirements.
  • Design and implement monitoring, alerting, and observability solutions.
  • Contribute to infrastructure as code initiatives and maintain deployment automation pipelines.
  • Implement security best practices in context and maintain compliance requirements.
  • Design and maintain disaster recovery and backup strategies.

Operational Excellence & Process Implementation

  • Contribute to incident response efforts and drive resolution of technical issues.
  • Develop and maintain runbooks and documentation for operational procedures.
  • Ensure proper logging and monitoring across all systems.
  • Increase automation initiatives to reduce manual operations.
  • Maintain and improve SRE practices across the organization.

Cross-team Collaboration & Knowledge Sharing

  • Work with development teams to implement operational readiness requirements.
  • Collaborate with Security teams on infrastructure security measures.
  • Provide technical mentorship to developers on operational practices.
  • Lead knowledge sharing sessions and documentation efforts.
  • Partner with Engineering Managers to improve development workflows and tools.

Preferred Experience
You

  • You have at least 4 years of infrastructure/systems engineering experience and want to maintain a strong hands-on technical focus.
  • You're comfortable:

  • Building and maintaining large-scale distributed systems.

  • Managing incident response according to SLA.
  • Implementing automation and self-healing systems.
  • Developing utility scripts and functions.
  • Working in both French and English, in a remote context.

  • It's not required, but having experience with our tech stack (Ruby, Elixir, ) is a significant advantage.

  • You have strong problem-solving skills and can troubleshoot complex systems issues.
  • You're reliability-focused: passionate about building resilient systems, measuring and improving reliability through data-driven approaches, and establishing sustainable operational practices.
  • You demonstrate excellent communication skills and can effectively collaborate with various technical and non-technical stakeholders.

**

Learn more about our stack :**

  • Our main cloud provider is AWS ;
  • We use Kubernetes as our container orchestrator ;
  • Our Infrastructure-as-Code is managed with Terraform and Terragrunt ;
  • We use ArgoCD and CircleCI as our integration and deployment tools ;
  • We use OpenTelemetry & Datadog to monitor our platforms ;
  • Our applications runs on GNU/Linux systems, like Debian

Additional Information

  • Contract Type: Full-Time
  • Location: Paris
  • Possible full remote


  • Paris, Île-de-France OVHcloud Temps plein

    Site Reliability Engineer - Network Observability H/F/NAu sein de votre équipe #OneTeamVous rejoindrez l'équipe Network Observability, en charge de la conception des produits d'observability pour une infrastructure composée de plus de serveurs, 5 millions d'adresses IP publiques et équipements réseau ; le maintien en condition opérationnel et...

  • Site Reliability Engineer

    il y a 3 jours


    Paris, Île-de-France Blackfluo Temps plein

    Job DescriptionLocation: Full remote, EU timezone (CET +/- 2 hours)Start Date: As soon as possibleLanguages: English requiredWe are looking for a skilled Site Reliability Engineer (SRE) with deep expertise in AWS to help us scale and secure our infrastructure. As an SRE, you will be instrumental in ensuring the reliability, performance, and scalability of...


  • Paris, Île-de-France Swile Temps plein

    At Swile, we believe that good products can help reduce friction in daily professional life and boost employee satisfaction. Today, we provide innovative solutions in various areas such as Fintech, Travel, HR, and Employee Benefits to more than 5.5 million users in 85,000 companies in France and Brazil. Your role as a Senior Site Reliability Engineer (SRE)...


  • Paris, Île-de-France Mistral Ai Temps plein

    About Mistral At Mistral AI, we believe in the power of AI to simplify tasks, save time, and enhance learning and creativity. Our technology is designed to integrate seamlessly into daily working life. We democratize AI through high-performance, optimized, open-source and cutting-edge models, products and solutions. Our comprehensive AI platform is designed...

  • Site Reliability Engineer

    il y a 1 semaine


    Paris, Île-de-France Criteo Temps plein

    What You'll Do:About the TeamThe Platform Core group at Criteo is composed of seven agile, human-sized teams providing the foundational platform and systems powering all Criteo products.Within this group, the Analytics Infrastructure team builds and operates the distributed, multi-datacenter analytic data stores and platforms enabling interactive querying,...

  • Site Reliability Engineer II

    il y a 2 semaines


    Paris, Île-de-France Doctolib Temps plein 120 000 € - 180 000 € par an

    What we do Doctolib's Engineering environment is rich and we are building innovative products and features aiming each day to ease doctors' and patient life. We are looking for a Site Reliability Engineer II to keep Doctolib production systems running smoothly. You will also be a key-player to support the exponential growth of Doctolib services. ...

  • Site Reliability Engineer

    il y a 7 jours


    Paris, Île-de-France Welcome to the Jungle France Temps plein

    As our Site Reliability Engineer you are responsible forimplementing and maintaining scalable infrastructure and systems that ensure the reliability,performance, and security of our production environments.This hands-on position bridges the gap between development and operations, applying software engineering principles to infrastructure and operational...

  • Site Reliability Engineer

    il y a 2 semaines


    Paris, Île-de-France Criteo Temps plein

    What You'll Do: About the TeamThe Platform Core group at Criteo is composed of seven agile, human-sized teams providing the foundational platform and systems powering all Criteo products.Within this group, the Analytics Infrastructure team builds and operates the distributed, multi-datacenter analytic data stores and platforms enabling interactive...

  • Site Reliability Engineer

    il y a 3 jours


    Paris, Île-de-France Criteo Temps plein

    What You'll Do:At Criteo, our Platform Core group builds the foundational services that power our global advertising platform. We design and operate scalable, resilient systems that support real-time decision-making and data processing at massive scale.As we expand our capabilities in high-performance inference and distributed computing, we're forming a new...

  • Site Reliability Engineer

    il y a 3 jours


    Paris, Île-de-France AKUR8 Temps plein

    Akur8 is a young, dynamic, fast growing Insurtech scale-up that is transforming insurance pricing and reserving with transparent machine learning.Our SaaS platform leverages the power of transparent machine learning and predictive analytics to inject game-changing speed, performance and reliability into insurers' pricing and reserving processes.Powered by...