DEV Community

# dataengineering

Posts

👋 Sign in for the ability to sort posts by relevant, latest, or top.
AWS Lambda and AWS Glue Python Shell in the Context of Lightweight ETL

AWS Lambda and AWS Glue Python Shell in the Context of Lightweight ETL

3
Comments
7 min read
SQL: Doing GROUP BY in CsvPath

SQL: Doing GROUP BY in CsvPath

Comments
5 min read
🔥 Day 3: RDDs - The Foundation of Spark

🔥 Day 3: RDDs - The Foundation of Spark

Comments
2 min read
🔥 Day 4: RDD Internals - Partitions, Shuffles & Repartitioning Demystified

🔥 Day 4: RDD Internals - Partitions, Shuffles & Repartitioning Demystified

Comments
2 min read
The Developer's Guide to Normalizing Historical Airline Flight Data for Machine Learning

The Developer's Guide to Normalizing Historical Airline Flight Data for Machine Learning

Comments
6 min read
Overview of Real-Time Data Synchronization from MySQL to VeloDB

Overview of Real-Time Data Synchronization from MySQL to VeloDB

5
Comments
5 min read
Stop Writing df.describe(): Automate EDA with D-Tale (The Lazy Engineer's Way)

Stop Writing df.describe(): Automate EDA with D-Tale (The Lazy Engineer's Way)

Comments
3 min read
Build a Local Lead Gen Machine: Scraping Google Maps with n8n (Reliably)

Build a Local Lead Gen Machine: Scraping Google Maps with n8n (Reliably)

Comments
3 min read
CHW Monthly Activity Aggregation: Turning Visit Logs into Insight

CHW Monthly Activity Aggregation: Turning Visit Logs into Insight

Comments
5 min read
🔥 Day 2: Understanding Spark Architecture - How Spark Executes Your Code Internally

🔥 Day 2: Understanding Spark Architecture - How Spark Executes Your Code Internally

Comments
2 min read
RAG Isn’t a Modeling Problem. It’s a Data Engineering Problem.

RAG Isn’t a Modeling Problem. It’s a Data Engineering Problem.

1
Comments
6 min read
Apache Data Lakehouse Weekly: December 30, 2025 – January 5, 2026

Apache Data Lakehouse Weekly: December 30, 2025 – January 5, 2026

1
Comments
4 min read
Marmot: Data catalog without the complex infrastructure

Marmot: Data catalog without the complex infrastructure

1
Comments
3 min read
TDD for dbt: unit testing the way it should be

TDD for dbt: unit testing the way it should be

2
Comments
12 min read
Building a Medical-Grade Knowledge Graph: Mapping Drug Interactions with Neo4j and LlamaIndex 🩺💻

Building a Medical-Grade Knowledge Graph: Mapping Drug Interactions with Neo4j and LlamaIndex 🩺💻

Comments 1
3 min read
👋 Sign in for the ability to sort posts by relevant, latest, or top.