Web Crawling

Il y a 4 mois


Paris, France Mistral AI Temps plein
About Mistral

- At Mistral AI, we are a tight-knit, nimble team dedicated to bringing our cutting-edge AI technology to the world.

- Our mission is to make AI ubiquitous and open.

- We are creative, low-ego, team-spirited, and have been passionate about AI for years.

- We hire people that foster in competitive environments, because they find them more fun to work in.

- We hire passionate women and men from all over the world.

- Our teams are distributed between France, UK and USA

Role Summary

- We are seeking a skilled and motivated Web Crawling and Data Indexing Engineer to join our dynamic engineering team.

- The ideal candidate will have a strong background in web scraping, data extraction, and indexing, with a focus on leveraging advanced tools and technologies to gather and process large-scale data from various web sources.

- The role is based in Paris or London

Key Responsibilities

- Develop and maintain web crawlers using Python libraries such as Beautiful Soup to extract data from target websites.

- Utilize headless browsing techniques, such as Chrome DevTools, to automate and optimize data collection processes.

- Collaborate with cross-functional teams to identify, scrape, and integrate data from APIs to support business objectives.

- Create and implement efficient parsing patterns using regular expressions, XPaths, and CSS selectors to ensure accurate data extraction.

- Design and manage distributed job queues using technologies such as Redis, Kubernetes, and Postgres to handle large-scale data processing tasks.

- Develop strategies to monitor and ensure data quality, accuracy, and integrity throughout the crawling and indexing process.

- Continuously improve and optimize existing web crawling infrastructure to maximize efficiency and adapt to new challenges.

Qualifications & profile

- Bachelor's or master's degree in computer science, information systems, or information technology

- Strong understanding of web technologies, data structures, and algorithms.

- They should have knowledge of database management systems and data warehousing.

- Programming Languages: Proficiency in programming languages such as Python, Java, or C++ is essential.

- Masterings of Web Technologies: Understanding of HTML, CSS, and JavaScript is crucial to navigate and scrape data from websites.

- Knowledge of HTTP and HTTPS protocols

- A good understanding of data structures (like queues, stacks, and hash maps) and algorithms is necessary

- Knowledge of databases (SQL or NoSQL) is important to store and manage the crawled data.

- Understanding distributed systems and technologies like Hadoop or Spark Experience using web Scraping Libraries and Frameworks like Scrapy, BeautifulSoup, Selenium, or MechanicalSoup

- Understanding how search engines work and how to optimize web crawling.

- Experience in Machine Learning to improve the efficiency and accuracy of web crawling

- Familiar with tools such as Pandas, NumPy, and Matplotlib to analyze and visualize data.

Benefits

- Daily lunch vouchers

- Contribution to a Gympass subscription

- Monthly contribution to a mobility pass

- Full health insurance for you and your family

- Generous parental leave policy
  • Software Engineer

    Il y a 5 mois


    Paris, France Wiser Solutions Temps plein

    **Company Description** Wiser Solutions is a suite of in-store and eCommerce intelligence and execution tools. We're on a mission to enable brands, retailers, and retail channel partners to gather intelligence and automate actions to optimize in-store and online pricing, marketing, and operations initiatives. Our Commerce Execution Suite is available...


  • Paris, Île-de-France CoStar Group Temps plein

    Qui sommes-nous ?CoStar Group est un fournisseur mondial de solutions d'information, d'analyse et de marketplaces pour le marché de l'immobilier d'entreprise. Nous sommes connus pour nos données immobilières commerciales en temps réel et vérifiées qui aident les clients à repérer avec confiance de grandes opportunités et à prendre des décisions...


  • Paris, France Mistral AI Temps plein

    About Mistral - At Mistral AI, we are a tight-knit, nimble team dedicated to bringing our cutting-edge AI technology to the world. - Our mission is to make AI ubiquitous and open. - We are creative, low-ego, team-spirited, and have been passionate about AI for years. - We hire people that foster in competitive environments, because they find them more fun to...


  • Paris, Île-de-France CoStar Group Temps plein

    Poste de Data Feed Researcher chez CoStar GroupCoStar Group recherche un Data Feed Researcher pour aider à sa croissance sur de nouveaux marchés à l'échelle internationale.ResponsabilitésEnquêter et vérifier les nouvelles données issues de nos processus de web crawling et d'alimentation de données.Découvrir et modéliser de nouvelles...

  • Data Feed Researcher

    Il y a 6 mois


    Paris, France CoStar Group Temps plein

    Data Feed Researcher - France **Job Description**: Costar Group (NASDAQ: CSGP) est un des fournisseurs mondial majeurs de solutions d'information, d'analyse, et de marketplaces pour le marché de l'immobilier d'entreprise. Coté au S&P 500 et au NASDAQ 100, le CoStar Group a pour objectif de digitaliser le secteur de l'immobilier à l'echelle...

  • Data Acquisition Specialist

    il y a 4 semaines


    Paris, Île-de-France Mistral AI Temps plein

    About Mistral AI- We are a cutting-edge AI technology company dedicated to bringing innovation to the world.- Our mission is to make AI ubiquitous and open, fostering a culture of creativity, collaboration, and passion.- We are a tight-knit team of professionals who are passionate about AI, data science, and web development.- We value diversity, inclusivity,...

  • Data Feed Researcher

    Il y a 6 mois


    Paris, France CoStar Group Temps plein

    Data Feed Researcher - Paris Job Description COSTAR GROUP - DATA FEED RESEARCHER - PARIS QUI NOUS SOMMES: Costar Group (NASDAQ: CSGP) est un des fournisseurs mondial majeurs de solutions d'information, d'analyse, et de marketplaces pour le marché de l'immobilier d'entreprise. Coté au S&P 500 et au NASDAQ 100, le CoStar Group a pour objectif de...

  • Fullstack Developer

    Il y a 5 mois


    Paris, France TapNation Temps plein

    TapNation is a Mobile Gaming publisher helping developers take their games to the next level. Our international and creative team harnesses the power of new technologies to deliver successful and entertaining gaming experiences to players worldwide. TapNation released 100+ games which generated more than 1 billion downloads! Our ambition is to lead the way...

  • Associate SEO Specialist

    Il y a 5 mois


    Paris, France Botify Temps plein

    Botify is a global, enterprise software company focused on enabling the most ambitious brands to leverage organic search as a high-impact, performance marketing channel. Powered by AI and a proprietary unified data model, Botify’s platform ensures web and mobile sites are optimized for search - increasing the number of pages seen, indexed and ranked by...