Leonardo Vida

Full Stack Data Engineer biased on building products that use (a lot of) data

About

As a Data Engineer, I single-handedly built data platforms for enterprise clients. Recently, I led a team of 5 for ~6 months in a big corporate, successfully delivering the product we were building and aim to going back doing so in the future. Currently, I work as consultant for companies in Netherlands and I mostly use Python, HCL and TypeScript.

Work Experience

DBC

Dec 2023 - Present

Senior Data Engineer Consultant

For client A, deployed data platform using IaC on Azure (Databricks, dbt) to centralize data of growing number of subsidiaries

For client A, created custom API wrappers to ELT across 10+ sources alongside custom libraries for pipelines and data tests

For client B, developed and productionized advanced custom RAG solution over 1M+ files on Azure used by 700+ employees

For client B, deployed self-hosted LLM reducing cost by 40X, trace monitoring and system telemetry (Langsmith, Azure)

For client C, deployed data platform using IaC on AWS (Airbyte, Databricks and Dagster), provided infra to backend team

Developed internal terraform blueprints and modules for a range of data platform solutions and client sizes

Led the development of internal project for a co-pilot for recruitment, currently being tested at a partner company

Mentored junior team members, led trainings and clients' workshops on the application and productionization of LLMs

Prima assicurazioni
Remote

Jun 2023 - Nov 2023

Senior Data Engineer

Supported the development of self-service data platform based on data mesh principles on AWS (dbt, Airbyte, Argo)

Refactored old in-house config-driven ETL package and developed supporting libraries for self-service data platform

Engineered petabyte-scale ETL processes in PySpark focusing on data quality and data pipeline efficiency

Implemented agile product management, boosting efficiency and team morale

Defined data product, contract and permission specifications across the company and supported team's roadmap definition

Brenntag

Oct 2022 - May 2023

Senior Data Engineer

Led core Data Engineering team with a total of 6 developers in newly created Data department

Architected and developed AWS-native data platform with focus on security and data quality (Glue, Iceberg, MWAA)

Developed all core pipelines for EMEA and NA subsidiaries and main libraries for data processing and quality monitoring

Deployed SLAs monitoring for critical data sets, ensuring high data availability and integrity

Managed team planning and rituals, and translated business needs into technical requirements prioritizing them

Beerwulf / Heineken

Aug 2021 - Sep 2022

Data Engineer

Transitioned all core ETL pipelines from batch to micro-batch structured streaming and then DLT on Databricks

Refactored and enhanced data observability library and introduced automated data tests across all medallion layers

Deployed MLOps platform on MLFlow and integrated into data infrastructure new B2B2C marketplace and D2C product

Created ML models to forecast churn, LTV and predict demand, improving demand forecasting accuracy by 40\%

Utrecht University

Aug 2020 - Jul 2021

Research Engineer

Led research project engineering, fine-tuned transformer models with \$100k+ GCP grant

Developed custom pipeline to extract entire collection of the Dutch national library, process, OCR and score texts

Developed back-end security of OSS to automate systematic reviews (asreview), with more than 150,000 downloads on PyPi

Developed OSS spatial data package (osmenrich) in R for sensitive data enrichment

Education

Utrecht University

2020 - 2022

M.Sc. in Computational Science: Applied Data Science; GPA: 8.1/10 (Cum laude)

Maastricht University

2014 - 2017

B.Sc. in Economics: International Economics; GPA: 8.2/10.0

Skills

Python

TypeScript

Next.js

SQL

Terraform HCL/Docker/Kubernetes

GraphQL

Langchain, llamaindex, open source LLMs

Projects

Talent Copilot

A LLM-based autopilot for companies to improve the time to best talent for a given position. It uses a custom LLM pipeline to understand the job description, resume and company, and then it scores candidates, write rejection/ confirmation emails and much more, all under human supervision.

Side Project

TypeScript

Next.js

Python

FastAPI

Openrouter

LLM

Nearit

A website to find list of places to visit in a city, recommended by locals, to be used directly in Google Maps

Side Project

Python

Webflow

GMaps API

Younico

A university platform to match students with an idea with other students that want to work on a project.

Side Project

React

Press ⌘J to open the command menu

Leonardo Vida

About

Work Experience

DBC

Senior Data Engineer Consultant

Prima assicurazioniRemote

Senior Data Engineer

Brenntag

Senior Data Engineer

Beerwulf / Heineken

Data Engineer

Utrecht University

Research Engineer

Education

Utrecht University

Maastricht University

Skills

Projects

Talent Copilot

Nearit

Younico

Prima assicurazioni
Remote