Leonardo Vida

Full Stack Data Engineer biased on building products that use (a lot of) data

Amsterdam, the Netherlands, CET

LV

About

As a Data Engineer, I single-handedly built data platforms for enterprise clients. Recently, I led a team of 5 for ~6 months in a big corporate, successfully delivering the product we were building and aim to going back doing so in the future. Currently, I work as consultant for companies in Netherlands and I mostly use Python, HCL and TypeScript.

Work Experience

DBC

Dec 2023 - Present

Senior Data Engineer Consultant

  • For client A, developed data platform (Azure and Terraform) to centralize data of 8 subsidiaries companies
  • For client A, created API wrapper templates and used them across 10+ sources alongside custom libraries for ETL pipelines
  • For a client B, developed and productionized advanced custom RAG solution over 1M+ files on Azure used 700+ employees
  • For a client B, finetuned embedding model, reduced LLM cost by 40X, deployed system monitoring and telemetry
  • Mentored junior team members, led trainings and workshops on the application and productionization of LLMs
  • Prima assicurazioni
    Remote

    Jun 2023 - Nov 2023

    Senior Data Engineer

  • Architected and supported the development of self-service data platform based on data mesh principles on AWS
  • Refactored old in-house config-driven ETL package and developed supporting libraries for self-service data platform
  • Engineered petabyte-scale ETL processes in PySpark focusing on data quality and data pipeline efficiency
  • Implemented agile product management, boosting efficiency and team morale
  • Defined data product, contract and permission specifications across the company and supported team's roadmap definition
  • Brenntag

    Oct 2022 - May 2023

    Senior Data Engineer

  • Led core Data Engineering team with a total of 6 developers in newly created Data department
  • Architected AWS-based performance data platform with focus on security and data quality
  • Developed all core pipelines for EMEA and NA subsidiaries and main libraries for data processing and quality monitoring
  • Deployed SLAs monitoring for critical data sets, ensuring high data availability and integrity
  • Managed team planning and rituals, and translated business needs into technical requirements prioritizing them
  • Beerwulf / Heineken

    Aug 2021 - Sep 2022

    Data Engineer

  • Transitioned all core ETL pipelines from batch to real-time streaming
  • Refactored and enhanced data observability library and introduced automated data tests for bronze and silver layers
  • Deployed MLOps platform on MLFlow and integrated into data infrastructure new B2B2C marketplace and D2C product
  • Created ML models to forecast churn, LTV and predict demand, improving demand forecasting accuracy by 40\%
  • Utrecht University

    Aug 2020 - Jul 2021

    Research Engineer

  • Led research project engineering, fine-tuned transformer models with \$100k+ GCP grant
  • Developed custom pipeline to extract entire collection of the Dutch national library, process, OCR and score texts
  • Developed back-end security of OSS to automate systematic reviews (asreview), with more than 150,000 downloads on PyPi
  • Developed OSS spatial data package (osmenrich) in R for sensitive data enrichment
  • Education

    Utrecht University

    2020 - 2022
    M.Sc. in Computational Science: Applied Data Science; GPA: 8.1/10 (Cum laude)

    Maastricht University

    2014 - 2017
    B.Sc. in Economics: International Economics; GPA: 8.2/10.0

    Skills

    Python
    TypeScript
    Next.js
    SQL
    Terraform HCL/Docker/Kubernetes
    GraphQL
    R

    Press J to open the command menu