A list of (some of) my posts and personal projects.
The objective of this repository is to put together in a single page my main posts and projects. I prioritize posts written in English (and that I'm proud of π).
I mainly write about Machine Learning and Data Science on Medium. You can visit my Medium profile to view all my posts.
Title | Link | Tags |
Code |
---|---|---|---|
Creating a Text Preprocessing Microservice with FastAPI | π |
|
π |
Brazilian Laws analysis with TF-IDF and K-Means | π |
|
π |
Understanding Topic Coherence Measures | π |
|
- |
How to ensemble Clustering Algorithms | π |
|
π |
Improve Your Data Preprocessing with ColumnTransformer and Pipelines | π |
|
- |
Creating a Simple ETL Pipeline With Apache Spark | π |
|
π |
Machine Learning Streaming with Kafka, Debezium, and BentoML. | π |
|
π |
Stream Processing and Data Analysis withΒ ksqlDB | π |
|
π |
A Fast Look at Spark Structured Streaming + Kafka | π |
|
π |
First Steps in Machine Learning with Apache Spark | π |
|
π |
Temporal and Geo-referenced Traffic Management with Python+Streamlit | π |
|
π |
Hands-On Introduction to Delta Lake with (py)Spark | π |
|
π |
Creating a Data Pipeline with Spark, Google Cloud Storage and Big Query | π |
|
π |
Data Pipeline with Airflow and AWS Tools (S3, Lambda & Glue) | π |
|
π |
Automatically Managing Data Pipeline Infrastructures With Terraform | π |
|
π |
Automatically Detecting Label Errors in Datasets with CleanLab | π |
|
π |
My First Billion (of Rows) in DuckDB | π |
|
π |
Anatomy of Windows Functions | π |
|
π |