Resources & Tools

The tools, platforms, and resources I use for data engineering, cloud infrastructure, and blogging — with honest reviews and exclusive discounts.

These are the tools, platforms, and resources I actually use for data engineering work and to run PipelinePulse. Some links below are affiliate links — if you use them, I earn a small commission at no extra cost to you. I only recommend tools I personally use or have thoroughly evaluated.

PipelinePulse Cheat Sheets & Guides

Printable quick references you can keep next to your notebook. All available on Gumroad and Etsy.

Delta Lake & Databricks:

Delta Table Troubleshooting Checklist ($9) — Step-by-step diagnosis for common Delta table failures
Schema Evolution Quick Reference ($4.99) — mergeSchema, column mapping, type widening, migration patterns
Databricks SQL Cheat Sheet ($4.99) — Essential SQL patterns for Databricks
Databricks Debugging Kit ($4.99) — Error decision trees, log reading guide, cluster config checklist
Databricks Cost Optimization Checklist ($4.99) — Cluster sizing, spot instances, DBU monitoring queries

PySpark:

PySpark Window Functions Cheat Sheet ($4.99) — row_number, rank, lag, lead, running totals, sessionization
PySpark Null Handling Cheat Sheet ($4.99) — Every null-handling pattern for Spark DataFrames

Pipeline & Architecture:

Pipeline Architecture Templates ($4.99) — Medallion, batch ETL, streaming, orchestration patterns
Structured Streaming Starter Kit ($4.99) — readStream, writeStream, triggers, foreachBatch, Auto Loader
Data Quality Monitoring Playbook ($4.99) — 6 check categories, PySpark framework, alerting patterns
Data Observability Toolkit ($4.99) — Monitoring templates, freshness tracking, drift detection

Cloud & Infrastructure

DigitalOcean — My go-to for deploying data tools, running ETL pipelines, and hosting side projects. Simple pricing, no surprise bills. $6/month gets you a production-ready VPS. New users get $200 in free credits through my link.

Kinsta — Premium managed hosting with excellent performance. If you need WordPress hosting or application hosting with a focus on speed and support, Kinsta is hard to beat.

Databricks — Where I spend most of my working hours. The unified analytics platform for Delta Lake, Spark, and SQL. If your company is evaluating data platforms, Databricks is hard to beat for lakehouse architecture.

Cloudflare — Free DNS and CDN. I use it for the pipelinepulse.dev domain. The free tier is genuinely generous — SSL, DDoS protection, and caching included.

Data & Analytics Learning

DataCamp — Interactive courses for SQL, Python, and data engineering. Great for beginners who want structured learning paths. Their data engineering track covers the fundamentals well.

Pluralsight — Deeper technical courses for experienced engineers. Strong Databricks, Spark, and cloud platform content. Worth it if you're preparing for certifications or want to go deep on a specific tool.

Blogging & Content

Ghost — The platform this blog runs on. Clean, fast, built for writers. No bloat, no plugins to manage. The Publisher plan includes hosting, SSL, custom domain, and newsletter features.

GetResponse — Full-featured email marketing with automation workflows. Good for building drip campaigns, landing pages, and managing subscriber segments. I use it for email automation beyond what Ghost provides natively.

Domains & Hosting

Kinsta — Premium managed hosting. Excellent for WordPress, application hosting, and database hosting. Fast support and great dashboard.

GoDaddy — The largest domain registrar. Reliable for domain registration.

Hostinger — Budget-friendly hosting starting at a few dollars per month. Good option if you want WordPress instead of Ghost.

Security & Privacy

NordVPN — Fast, reliable VPN for securing connections to remote servers and databases. Essential if you're SSHing into production VPS instances from public networks.

Surfshark — Budget-friendly VPN alternative with unlimited device connections. Good if you're running multiple machines and want to keep costs down.

These are the books that shaped how I think about data engineering:

Designing Data-Intensive Applications by Martin Kleppmann — The bible of distributed systems and data architecture
Fundamentals of Data Engineering by Joe Reis & Matt Housley — The best modern intro to the field
The Data Warehouse Toolkit by Ralph Kimball — Essential for dimensional modeling
SQL Performance Explained by Markus Winand — Short, practical, immediately useful

Free Tools

DBeaver — Free universal database client. Works with PostgreSQL, MySQL, Spark, and dozens more
VS Code — Free code editor with excellent Python and SQL extensions
Great Expectations — Open-source data quality framework
dbt Core — Free data transformation tool for SQL-first workflows
Apache Airflow — Open-source workflow orchestration

This page is updated regularly as I discover new tools worth recommending. Last updated: March 2026.

Disclosure: Some links on this page are affiliate links. I earn a small commission if you purchase through them, at no additional cost to you. I only recommend tools I personally use or have thoroughly evaluated.