David Cairuz

David Cairuz

Software Engineer in Brazil, he/him

Projects

2024

This model builds on top of the recent release of the Aya model (Üstün et al., 2024), focusing on pairing a highly performant pre-trained model with the recently released Aya collection (Singh et al., 2024). The result is a powerful multilingual large language model serving 23 languages, expanding state-of-art language modeling capabilities to approximately half of the world's population.

2024

Command R+ is Cohere's newest 100B parameters large language model, optimized for conversational interaction and long-context tasks. It aims at being extremely performant, enabling companies to move beyond proof of concept and into production.

This model has been trained on a massive corpus of diverse texts in multiple languages, and can perform a wide array of text-generation tasks. Moreover, Command R+ has been trained with a particular focus on excelling in some of the most critical business use-cases.

2024

Command R is a 35B parameters large language model optimized for conversational interaction and long context tasks. It targets the “scalable” category of models that balance high performance with strong accuracy, enabling companies to move beyond proof of concept and into production.

Command R boasts high precision on retrieval augmented generation (RAG) and tool use tasks, low latency and high throughput, a long 128,000-token context length, and strong capabilities across 10 key languages.

2022
Social Media Data Extraction

Developed tools capable of extracting data from Social Media, such as: profile information and email, posts with hashtags, conversation threads, comments and likes of a post.

The tool was used on many projects with over 20 clients, mainly for lead generation and marketing campaign monitoring. It was used to find and extract over 100k qualified leads for different niches.

Work Experience

2022 — Now
Toronto, CA (remote)

Cohere aims to build and give people access to the latest generation of large language models. Our platform can be used to generate or analyze text to do things like copywriting, moderate content, classify data and extract information from it.

I work with research and engineering around data quality and safety, developing and maintaining PySpark and Apache Beam pipelines to ingest, process and filter the large amounts of data needed to train large language models.

2019 — 2022
Sweden (remote)

Scope is a CRM tool that aims to make it easier for brands to find relevant Instagram influencers to work with. It is currently used by hundreds of clients, mostly from Europe.

I worked on maintaining and developing new features for the Scope platform - the company's main application. Used Python, Flask, Elastic Search, Javascript, PostgreSQL and AWS.

I also developed web scraping tools used primarily for lead generation and to find good influencers for the platform.

2019 — 2020
São Paulo, Brazil

ClickBus is the largest platform for online purchase of bus tickets in Brazil. It counts with 130 bus companies that offer tickets to more than 100,000 routes, and sold over 10 million tickets by the end of 2018.

During my time at the company I developed, using Python and SQL, a system of connections for bus routes so that users can travel to any place they want optimally, even if a direct route is not available, resulting in the creation of over 20,000 new routes that are now available in the company's website.

Education

2018 — Now
Bachelor of Science in Computer Science at University of São Paulo
São Paulo, Brazil

CGPA: 9.0/10 - 19th (out of 118)

Awards

2020

Silver medal, 157th out of 6351 (Top 3%)

2020

Bronze medal, 179th out of 2038

Contact

LinkedIn