Blog

Technical deep dives, architecture patterns, and best practices in data engineering.

April 1, 2026

27 seconds: what a UC metastore blip taught us about streaming resilience

Three streaming jobs died within 32 seconds of each other. The data plane was healthy the whole time. Here's what actually happened and what we changed.

Databricks/Structured Streaming/Unity Catalog/Delta Lake/Incident Analysis

8 min read

March 27, 2026

Four Hallucinations and a Python Script

I asked an LLM agent to get a Databricks job ID at runtime. It confidently proposed four approaches. All four were wrong. The fix was a 30-line Python script I could have written in ten minutes.

Databricks/LLM/AI Agents/Software Engineering/PySpark

8 min read

March 11, 2026

One Cluster per Task — Proven, Ready, and Waiting

We know what the real answer is. We tested it. The code is ready. We're just waiting for the right moment, and that's a completely legitimate engineering decision.

Databricks/Structured Streaming/Apache Spark/Delta Lake

5 min read

March 5, 2026

Multi-Task on a Shared Cluster — Why That's Also Not Enough

Splitting into multiple tasks feels like the obvious fix after a multi-query partial failure. It isn't — not on a shared cluster. There's still one driver.

Databricks/Structured Streaming/Apache Spark/Delta Lake

6 min read

March 2, 2026

Streaming Failure Models: Why "It Didn't Crash" Is the Worst Outcome

Most Databricks streaming failures don't look dramatic. No cluster termination, no red wall of errors. Just a job that says RUNNING while your customers report nonsense.

Databricks/Structured Streaming/Apache Spark/Delta Lake

6 min read

February 16, 2026

Understanding Delta Table Partition Size Distribution Using the Delta Log

Learn how to inspect the Delta transaction log to understand your partition size distribution and make informed partitioning decisions.

Delta Lake/Databricks/Partitioning/Performance

5 min read

January 15, 2026

Advanced Delta Lake Optimization Techniques

Deep dive into Z-ordering, data skipping, and compaction strategies to maximize Delta Lake performance.

Delta Lake/Performance/Optimization

8 min read

December 20, 2025

Reducing Databricks Costs by 40%: A Practical Guide

Proven strategies for optimizing Databricks cluster configurations and reducing cloud infrastructure costs.

Databricks/Cost Optimization/Cloud

10 min read

November 10, 2025

Modern Lakehouse Architecture Patterns

Exploring medallion architecture, data mesh, and other patterns for building scalable lakehouse platforms.

Architecture/Lakehouse/Data Engineering

12 min read