Resources & Tools
These are the tools, platforms, and resources I actually use for data engineering work and to run PipelinePulse. Some links below are affiliate links — if you use them, I earn a small commission at no extra cost to you. I only recommend tools I personally use or have thoroughly evaluated.
PipelinePulse Cheat Sheets & Guides
Printable quick references you can keep next to your notebook. All available on Gumroad and Etsy.
Delta Lake & Databricks:
- Delta Table Troubleshooting Checklist ($9) — Step-by-step diagnosis for common Delta table failures
- Schema Evolution Quick Reference ($4.99) — mergeSchema, column mapping, type widening, migration patterns
- Databricks SQL Cheat Sheet ($4.99) — Essential SQL patterns for Databricks
- Databricks Debugging Kit ($4.99) — Error decision trees, log reading guide, cluster config checklist
- Databricks Cost Optimization Checklist ($4.99) — Cluster sizing, spot instances, DBU monitoring queries
PySpark:
- PySpark Window Functions Cheat Sheet ($4.99) — row_number, rank, lag, lead, running totals, sessionization
- PySpark Null Handling Cheat Sheet ($4.99) — Every null-handling pattern for Spark DataFrames
Pipeline & Architecture:
- Pipeline Architecture Templates ($4.99) — Medallion, batch ETL, streaming, orchestration patterns
- Structured Streaming Starter Kit ($4.99) — readStream, writeStream, triggers, foreachBatch, Auto Loader
- Data Quality Monitoring Playbook ($4.99) — 6 check categories, PySpark framework, alerting patterns
- Data Observability Toolkit ($4.99) — Monitoring templates, freshness tracking, drift detection
Cloud & Infrastructure
DigitalOcean — My go-to for deploying data tools, running ETL pipelines, and hosting side projects. Simple pricing, no surprise bills. $6/month gets you a production-ready VPS. New users get $200 in free credits through my link.
Kinsta — Premium managed hosting with excellent performance. If you need WordPress hosting or application hosting with a focus on speed and support, Kinsta is hard to beat.
Databricks — Where I spend most of my working hours. The unified analytics platform for Delta Lake, Spark, and SQL. If your company is evaluating data platforms, Databricks is hard to beat for lakehouse architecture.
Cloudflare — Free DNS and CDN. I use it for the pipelinepulse.dev domain. The free tier is genuinely generous — SSL, DDoS protection, and caching included.
Data & Analytics Learning
DataCamp — Interactive courses for SQL, Python, and data engineering. Great for beginners who want structured learning paths. Their data engineering track covers the fundamentals well.
Pluralsight — Deeper technical courses for experienced engineers. Strong Databricks, Spark, and cloud platform content. Worth it if you're preparing for certifications or want to go deep on a specific tool.
Blogging & Content
Ghost — The platform this blog runs on. Clean, fast, built for writers. No bloat, no plugins to manage. The Publisher plan includes hosting, SSL, custom domain, and newsletter features.
GetResponse — Full-featured email marketing with automation workflows. Good for building drip campaigns, landing pages, and managing subscriber segments. I use it for email automation beyond what Ghost provides natively.
Domains & Hosting
Kinsta — Premium managed hosting. Excellent for WordPress, application hosting, and database hosting. Fast support and great dashboard.
GoDaddy — The largest domain registrar. Reliable for domain registration.
Hostinger — Budget-friendly hosting starting at a few dollars per month. Good option if you want WordPress instead of Ghost.
Security & Privacy
NordVPN — Fast, reliable VPN for securing connections to remote servers and databases. Essential if you're SSHing into production VPS instances from public networks.
Surfshark — Budget-friendly VPN alternative with unlimited device connections. Good if you're running multiple machines and want to keep costs down.
Books I Recommend
These are the books that shaped how I think about data engineering:
- Designing Data-Intensive Applications by Martin Kleppmann — The bible of distributed systems and data architecture
- Fundamentals of Data Engineering by Joe Reis & Matt Housley — The best modern intro to the field
- The Data Warehouse Toolkit by Ralph Kimball — Essential for dimensional modeling
- SQL Performance Explained by Markus Winand — Short, practical, immediately useful
Free Tools
- DBeaver — Free universal database client. Works with PostgreSQL, MySQL, Spark, and dozens more
- VS Code — Free code editor with excellent Python and SQL extensions
- Great Expectations — Open-source data quality framework
- dbt Core — Free data transformation tool for SQL-first workflows
- Apache Airflow — Open-source workflow orchestration
This page is updated regularly as I discover new tools worth recommending. Last updated: March 2026.
Disclosure: Some links on this page are affiliate links. I earn a small commission if you purchase through them, at no additional cost to you. I only recommend tools I personally use or have thoroughly evaluated.