Site Reliability Engineer

il y a 1 semaine


Eu, France FluidStack Temps plein

**About Fluidstack**:
Fluidstack is building GPU supercomputers for top AI labs, governments, and enterprises. Our customers include Mistral, Poolside, Black Forest Labs, Meta, and more.

Our team is small, highly motivated, and focused on providing a world class supercomputing experience. We put out customers first in everything we do, working hard to not just win the sale, but to win repeated business and customer referrals.

We hold ourselves and each other to high standards. We expect you to care deeply about the work you do, the products you build, and the experience our customers have in every interaction with us.

You must work hard, take ownership from inception to delivery, and approach every problem with an open mind and a positive attitude. We value effectiveness, competence, and a growth mindset.

**About the Role**:
SREs at Fluidstack sit at the core of our infrastructure, working across software, hardware, and operations to ensure the reliability and performance of our global GPU cloud.

They partner closely with teams including networking, platform engineering, and data center operations to build systems that scale with the demands of AI workloads.

SREs are hands-on and possess deep systems knowledge and strong communication skills. You’ll be responsible for tackling complex production issues, deploying resilient infrastructure, and continuously improving the stability and observability of our platform as we grow.

A typical day may involve:

- Deploying clusters of 1,000+ GPUs using custom written playbooks; modifying these tools as necessary to provide the perfect solution for a customer.
- Validating correctness and performance of underlying compute, storage, and networking infrastructure, and working with providers to optimize these subsystems.
- Migrating petabytes of data from public cloud platforms to local storage, as quickly and cost effectively as possible.
- Debugging issues anywhere in the stack, from “this server’s fan is blocked by a plastic bag” to “optimizing S3 dataloaders from buckets in different regions”.
- Building internal tooling to decrease deployment time and increase cluster reliability, including automation where the customer benefits clearly outweigh the implementation overhead.

This role will involve being part of an on-call rotation up to one week per month.

**Focus**:

- A customer-centric attitude, an accountability mindset, and a bias to action.
- A track record of shipping clean, well-documented code in complex environments.
- An ability to create structure from chaos, navigate ambiguity, and adapt to the dynamic nature of the AI ecosystem.
- Strong technical and interpersonal communication skills, a low ego, and a positive mental attitude.
- 2+ years of SRE, DevOps, Sysadmin, and/or HPC engineering experience.
- Great verbal and written communication skills in English.
- Experience deploying and operating Kubernetes and/or SLURM clusters.
- Experience in writing Go, Python, Bash.
- Experience using Ansible, Terraform, and other automation or IAC tools.
- Strong engineering background, preferably in Computer Science, Software Engineering, Math, Computer Engineering, or similar fields.
- You have built and operated an AI workload at 1000+ GPU scale.
- You have built multi-tenant, hyperscale Kubernetes based services.
- You have physically deployed infrastructure in a datacenter, managed bare metal hardware via MaaS or Netbox, etc.
- You have deployed and managed multi-tenant InfiniBand or RoCE networks.
- You have deployed and managed petabyte scale all-flash storage systems, including DDN, VAST, and/or Weka; or Ceph, LUSTRE, or similar open source tools.

**Interview Process**:
Our goal is to finish the main process within one week. All interviews will be conducted via virtually.

**Benefits**:

- Competitive total compensation package (cash + equity).
- Retirement or pension plan, in line with local norms.
- Health, dental, and vision insurance.
- Generous PTO policy, in line with local norms.
- Fluidstack is remote first, but has offices in London, New York, and SF. For all other locations, we provide access to WeWork.


  • Site Reliability Engineer

    il y a 1 semaine


    Eu, Normandie, France MinIO Temps plein

    MinIO is the industry leader in high-performance object storage and the company behind the world's fastest, most widely deployed object store, powering production infrastructure for more than half of the Fortune 500, including 9 of the 10 largest global automakers and all 10 of the largest U.S. banks. Our enterprise offering, AIStor, is engineered to handle...

  • Site Reliability Engineer

    il y a 2 semaines


    Eu, France MinIO Temps plein

    Software Engineer, Customer Infrastructure - EUJoin to apply for the Software Engineer, Customer Infrastructure - EU role at MinIO.MinIO is the industry leader in high-performance object storage and the company behind the world’s fastest, most widely deployed object store, powering production infrastructure for more than half of the Fortune 500, including...

  • Senior Data Engineer

    il y a 1 semaine


    Eu (76), France Invert Temps plein

    Invert is working to automate the design, execution, and analysis of bioprocesses. As scientists increasingly use bioprocessing to make new biomaterials to solve the environmental crisis, invent new therapies to combat disease, and produce essential chemicals cleanly - the tools we build together at Invert will be crucial to their success. We're a...

  • Data Engineer

    il y a 4 jours


    Eu (76), France Toogeza Temps plein

    We are toogeza, a Ukrainian recruiting company that is focused on hiring talents and building teams for tech startups worldwide. People make a difference in the big game, we may help to find the right ones. Currently, we are looking for a **Data Engineer /Developer** for **Spinlab.** **Location**: Remote **Job Type**: Full-Time **About our client**: We...

  • Senior Sre Engineer

    il y a 2 semaines


    Eu (76), France P2P. org Temps plein

    We at P2P.org are the largest staking and restaking operator, with a TVL of over $8B We are constantly focused on launching new yield products: for example, in Polkadot (adding +15-20% to NRR) and Ethereum, where we offer significantly higher NRR (+40%) than any other staking operator We also keep an eye on exciting projects and launch new networks such as...

  • Fullstack Engineer

    il y a 4 jours


    Eu (76), France G2i Inc. Temps plein

    **Full-Stack Engineer (Golang + React) - Contract (Europe Only)**: **Remote (Europe only)** | **Contract (6+ months, possible full-time)** | **Up to USD $50/hour** **About our client** The company is a regulated, centralised cryptocurrency exchange headquartered in the Isle of Man, operating under the island’s Financial Services Authority as a designated...


  • Eu, Normandie, France Cracken Temps plein

    Location: RemoteExperience: 3+ yearsAbout UsBuilt by ex–nation-state operators and AI researchers, Cracken is the world's first Uncensored Vibe Hacking platform that safely amplifies security teams and gives enterprises proven, auditable security resilience.The RoleWe're looking for a Software Development Engineer in Test (SDET) to own quality and test...

  • Sre Engineer

    il y a 2 semaines


    Eu (76), France P2P. org Temps plein

    We at P2P.org are the largest staking and restaking operator, with a TVL of over $8B We are constantly focused on launching new yield products: for example, in Polkadot (adding +15-20% to NRR) and Ethereum, where we offer significantly higher NRR (+40%) than any other staking operator We also keep an eye on exciting projects and launch new networks such as...


  • Eu (76), France FURTHER Temps plein

    **About FURTHER**: FURTHER is the leading AI-powered Sales & Marketing platform for senior living and healthcare organizations. Our next-gen suite of website engagement and AI sales assistants streamline the buying process for prospects and automate repetitive tasks for sales teams, driving higher conversion rates and increased NOI. Our unique blend of...

  • Head of Ai

    il y a 2 semaines


    Eu (76), France Konvu Temps plein

    **About Konvu**: Behind Konvu are founders with deep roots in the security industry. As early employees at Sqreen, a notable security startup acquired by Datadog, we have firsthand experience in driving security solutions from concept to global impact. We have secured $5M in Seed funding and are backed by top European and US venture capital firms, along...