Skip to content

jaumpedro214/posts

Repository files navigation

Posts

A list of (some of) my posts and personal projects.

The objective of this repository is to put together in a single page my main posts and projects. I prioritize posts written in English (and that I'm proud of 😁).

I mainly write about Machine Learning and Data Science on Medium. You can visit my Medium profile to view all my posts.

The list

Title Link Tags
Code
Creating a Text Preprocessing Microservice with FastAPI πŸ”— πŸ”—
Brazilian Laws analysis with TF-IDF and K-Means πŸ”— πŸ”—
Understanding Topic Coherence Measures πŸ”— -
How to ensemble Clustering Algorithms πŸ”— πŸ”—
Improve Your Data Preprocessing with ColumnTransformer and Pipelines πŸ”— -
Creating a Simple ETL Pipeline With Apache Spark πŸ”— πŸ”—
Machine Learning Streaming with Kafka, Debezium, and BentoML. πŸ”— πŸ”—
Stream Processing and Data Analysis withΒ ksqlDB πŸ”— πŸ”—
A Fast Look at Spark Structured Streaming + Kafka πŸ”— πŸ”—
First Steps in Machine Learning with Apache Spark πŸ”— πŸ”—
Temporal and Geo-referenced Traffic Management with Python+Streamlit πŸ”— πŸ”—
Hands-On Introduction to Delta Lake with (py)Spark πŸ”— πŸ”—
Creating a Data Pipeline with Spark, Google Cloud Storage and Big Query πŸ”— πŸ”—
Data Pipeline with Airflow and AWS Tools (S3, Lambda & Glue) πŸ”— πŸ”—
Automatically Managing Data Pipeline Infrastructures With Terraform πŸ”— πŸ”—
Automatically Detecting Label Errors in Datasets with CleanLab πŸ”— πŸ”—
My First Billion (of Rows) in DuckDB πŸ”— πŸ”—
Anatomy of Windows Functions πŸ”— πŸ”—

* Is used in almost every project