Site Reliability Engineer

il y a 21 heures

Toulouse, France Scaleway Temps plein

OUR STORY 🇪🇺 Join Scaleway and shape the sovereign cloud of tomorrow Since 1999, we have been designing secure, sustainable infrastructures aimed at supporting the most ambitious companies. Historically known for our dedicated servers (Dedibox), we made a strategic shift to cloud computing in 2015. Staying true to our principles of simplicity, flexibility, and technical excellence, we have become one of the leading players in Europe in the sector. With the rise of artificial intelligence, we have strengthened our commitment, supported by the Iliad Group, which is investing €3 billion to develop a serious, sovereign AI alternative to American and Asian giants. Every day, thanks to our fast‑growing portfolio of cloud and AI products (bare metal, containerization, serverless, AI, etc.), Scaleway proudly serves thousands of customers across the private and public sector, from corporations like France Télévisions or Hachette Livre, to fast‑growing startups like Photoroom and Biolevate, to institutions like the City of Copenhagen. WHETHER WE NEED YOU Our growth is driving us to strengthen our SRE team to support and scale our production environments. Your mission will be to build and maintain reliable, observable, and secure infrastructure in order to ensure optimal service availability for our customers around the world. #HPC #AI #GPU #CLUSTERS YOUR FUTURE TEAM We work in a collaborative and international environment where the diversity of Scalers, combined with a spirit of sharing, helps bring new projects to life every day, advancing our ambitions together. You will join a newly formed team dedicated to building and operating Scaleway’s future AI infrastructure. As part of this group, you will design, maintain, and scale core systems and observability tools, partner with product teams, and ensure the reliability and performance of AI services across Scaleway. YOUR DAILY ROUTINE Build a large AI infrastructure with monitoring, diagnosis, and remediation of production incidents Participate in an on‑call rotation to handle incidents and ensure service continuity Implement and maintain observability solutions to monitor AI infrastructure and application health Contribute to AI infrastructure lifecycle management across different environments and countries Promote and apply best practices in terms of stability, resiliency, scalability, and security Maintain clear technical documentation for tools and procedures Contribute to system and tool evolution based on production feedback Collaborate closely with development teams to ensure infrastructure readiness Participate in team rituals and knowledge‑sharing initiatives ABOUT YOU SOFT SKILLS Proactive and solution‑oriented mindset Passion for automation and continuous improvement Strong collaboration and communication skills Ability to work independently and in a team Willingness to mentor and share knowledge HARD SKILLS Experience with Go, Python or Rust Strong scripting skills (Bash, Python) Hands‑on experience with Linux systems (Ubuntu/Debian) Hands‑on experience with GPU & HPC infrastructure Knowledge of networking (TCP/IP, DNS, BGP, load‑balancing, IPv6, etc.) Familiarity with monitoring and logging tools (Prometheus, Grafana, Elastic, etc.) Comfortable with Infrastructure‑as‑Code (Ansible, Salt, AWX, etc.) Experience managing relational databases (PostgreSQL) Understanding of CI/CD pipelines (GitLab) Comfortable with English (written and spoken) WHAT YOU WILL FIND AT SCALEWAY Hybrid work: We offer up to 3 days of remote work per week. Offices: Our offices are spacious, dynamic workspaces with bold design, conveniently located near public transport. Most of our offices feature outdoor spaces (terraces) and bike parking facilities. Dining: Our chef provides a healthy meal service at the headquarters, and breakfast is available across all our sites year‑round. Scalers working from regional sites enjoy a Swile card for lunches. Well‑being commitments: Whether it’s access to a gym, daycare places, or discounted services for caring services, Scaleway is committed to supporting Scalers in maintaining a balanced life. International environment: With dozens of nationalities, Scaleway offers a stimulating environment where English is as widely spoken as French. Career & Mobility: Our managers value internal mobility, and opportunities to transition to other entities within the Iliad Group are accessible to all Scalers. WHY JOIN THE SCALEWAY ADVENTURE ✔ A rich and diverse product offering: Scaleway offers over 100 public cloud products in IaaS, PaaS, and AI. ✔ A cutting‑edge technical environment: Scaleway provides modern infrastructures, including high‑performance bare metal servers, to tackle exciting technical challenges. ✔ Commitment to responsible cloud: Scaleway is dedicated to a more responsible cloud, with data centers powered solely by renewable energy since 2017, minimizing our ecological footprint and holding top‑level certification. THE NEXT STEPS Discovery call with a recruiter (30 min) Interview with the manager to understand your technical skills and approach to the role (45 min) Technical interview to validate your expertise (1 h) Interview with the Head of the Tribe to deepen your discussions and assess your fit with the team (45 min) HR interview to tour our offices and meet your future colleagues At Scaleway, we are committed to building an inclusive and respectful workplace where everyone has a fair opportunity to thrive. All applications are considered with care, regardless of age, gender, sexual orientation, ethnic or social background, religion, disability, or any other characteristic. We believe great ideas come from everywhere, and everyone which is why you should definitely apply. #J-18808-Ljbffr

Site Reliability Engineer

il y a 3 jours

Toulouse, France OVHCloud Temps plein

Au sein de votre équipe #OneTeam - Vous rejoindrez l'équipe pluri-disciplinaire AI Core responsable du développement des produits d'intelligence artificielle d'OVHcloud et de leur continuité de service.. - Dans le cadre des produits IA, vous maintiendrez et accompagnerez les évolutions de infrastructure pour l'intégration de nouveaux matériels, les...
Senior Site Reliability

il y a 22 heures

Toulouse, France Canonical Temps plein

Senior Site Reliability / Gitops EngineerCanonical is a leading provider of open source software and operating systems to the global enterprise and technology markets. Our platform, Ubuntu, is very widely used in breakthrough enterprise initiatives such as public cloud, data science, AI, engineering innovation, and IoT. Our customers include the world's...
Site Reliability Engineer

il y a 1 semaine

Paris / Bordeaux / Lille / Lyon / Toulouse / Rennes / Rouen, France Scaleway Temps plein

OUR STORY: Join Scaleway and shape the sovereign cloud of tomorrow Since 1999, we have been designing secure, sustainable infrastructures aimed at supporting the most ambitious companies. Historically known for our dedicated servers (Dedibox), we made a strategic shift to cloud computing in 2015. Staying true to our principles of simplicity, flexibility,...
Site Reliability Engineer

il y a 2 semaines

Toulouse, Occitanie, France Scaleway Temps plein

Fondée en 1999, Scaleway est la filiale cloud du groupe Iliad, l'un des leaders des télécommunications en Europe. Notre mission est de favoriser une industrie numérique plus responsable en aidant les développeurs et les entreprises à créer, déployer et adapter des applications à n'importe quelle infrastructure.Depuis nos bureaux situés à Paris et...
Network SRE Engineer — Hybrid Cloud Reliability

il y a 21 heures

Toulouse, France Scaleway Temps plein

A leading cloud service provider in Toulouse is seeking a Site Reliability Engineer to enhance the performance and reliability of their infrastructure. This role involves developing automation tools, maintaining CI/CD pipelines, and collaborating with diverse teams to improve system resilience. Candidates should have experience with Infrastructure as Code,...
Stage - Site Reliability Engineer (F/H)

il y a 6 jours

Rue d'Alsace-Lorraine, Toulouse, France OpenAirlines Temps plein

Contexte du stage :Nous recherchons un stagiaire SRE (Site Reliability Engineer) motivé et curieux techniquement pour nous aider à étudier, concevoir et mettre en œuvre une première itération d'une plateforme interne pour développeurs (Internal Developer Platform – IDP).Ce stage est une excellente opportunité d'acquérir une expérience pratique en...
Site Reliability Engineer

il y a 2 semaines

Toulouse, France MyUnisoft Temps plein

**Vous rêvez d’une entreprise qui laissera votre passion s’exprimer ? JOIN THE TEAM**!** **Qui sommes-nous ?** MyUnisoft est un éditeur de logiciels à destination des cabinets d’experts comptables et de leurs clients. Nous sommes en passe de devenir un acteur majeur sur le marché car nous proposons des solutions innovantes, et uniques conçues...
Remote Site Reliability Engineering Manager Lead Global Ops

il y a 19 heures

Toulouse, France Canonical Temps plein

A leading open-source technology company is seeking a Site Reliability Engineering Manager in Toulouse, France. This role combines operations management, software engineering, and team leadership. The ideal candidate has experience with devops teams and infrastructure as code. Key responsibilities include leading daily practices, mentoring engineers, and...
Site Reliability Engineer

il y a 1 semaine

Paris / Lille / Toulouse / Bordeaux / Lyon, France Scaleway Temps plein

OUR STORY: Join Scaleway and shape the sovereign cloud of tomorrow Since 1999, we have been designing secure, sustainable infrastructures aimed at supporting the most ambitious companies. Historically known for our dedicated servers (Dedibox), we made a strategic shift to cloud computing in 2015. Staying true to our principles of simplicity, flexibility,...
Site Reliability Engineer

il y a 2 semaines

Toulouse, France ESTREEM Temps plein

**Description de l'entreprise**: **Estreem**: la nouvelle référence du processing de paiements en Europe Née d’un partenariat stratégique entre deux grands groupes bancaires français, BNP Paribas et BPCE, Estreem est une fintech autonome lancée en février 2025. Son ambition : devenir le premier processeur de paiements souverain en France, capable...

Amériques

Europe

Asie / Océanie

Afrique

Site Reliability Engineer