Nobody plans to become a data engineer. Or at least I didn’t. I didn’t even know the job existed when I started university in Turin. Here’s how it happened, with the honest parts left in.
The university years
I studied at the Università degli Studi di Torino. My degree wasn’t in computer science — I landed in a field adjacent enough to touch databases and statistics but not so technical that I was writing compilers. The important thing, looking back, wasn’t the specific degree but the fact that it forced me to think in structured, analytical ways and gave me a foundation in data.
The biggest lesson from those years had nothing to do with exams: I learned that I liked making data move. Cleaning a messy dataset and turning it into something useful gave me the same satisfaction that other people get from building a piece of furniture. There’s a thing that wasn’t there before, and now it works.
The skills that actually mattered
When I started working, nobody cared about my grades. They cared about:
- SQL. Not textbook SQL — the messy, production-grade kind where you’re joining six tables with left joins and null handling and the query has to run in under 5 seconds on 200 million rows.
- Python. Not for web apps or machine learning (yet), but for writing ETL scripts, data validation, and gluing systems together. Python is the duct tape of data engineering.
- Spark / PySpark. This is where my career really accelerated. Once you can write Spark jobs that process terabytes reliably, you solve a class of problems that most people can’t, and companies notice.
- Cloud platforms. Databricks, Snowflake, Azure Data Factory — the modern data stack. I invested heavily in Databricks and it paid off. Certifications aren’t magic, but they force you to learn the platform properly instead of just copying Stack Overflow answers.
- Communication. I know, everyone says this. But the reality is that a data engineer spends half their time talking to people who don’t know (or care) what a DAG is. Explaining your pipeline to a product manager in two sentences is a real skill, and it’s the one that gets you promoted.
The things I wish I’d done differently
Started building projects earlier. My first year of working, I learned on the job but didn’t build anything on my own time. Every side project I eventually built taught me more per hour than any work task, because side projects force you to make all the decisions yourself.
Learned testing earlier. I used to think tests were overhead. Now I think untested data pipelines are time bombs. The first time a pipeline silently produces wrong data for two weeks before anyone notices, you learn this lesson. I learned it the hard way.
Written more. This website exists partly because I now believe that explaining something is the best way to understand it. If you can write a clear explanation of how partitioning works in Spark, you actually understand partitioning. If you can’t, you understand it less than you think.
What I’d tell someone starting today
-
Learn SQL properly. Not just SELECT-FROM-WHERE. Window functions, CTEs, execution plans. SQL is the lingua franca of data — it’s not going away.
-
Pick one language and go deep. Python is the safe choice. Learn it well enough to write clean, testable code — not just scripts that work once.
-
Get comfortable with the cloud. Pick one platform (Databricks, Snowflake, BigQuery — doesn’t matter which) and learn it end to end. Build a project, deploy it, make it run on a schedule.
-
Don’t skip the fundamentals. Star schemas, slowly changing dimensions, data quality, idempotent pipelines — the concepts haven’t changed in 20 years even though the tools change every 2.
-
Build in public. A GitHub repo with a working pipeline is worth more than a certification. A blog post explaining what you built is worth more than both.
The data engineering job market in Europe is strong, especially in Italy where companies are still catching up on the modern data stack. The path doesn’t require a CS degree — it requires curiosity, persistence, and a tolerance for debugging things at 2 AM that worked fine at 5 PM.
That last part isn’t in any job description. But it’s the real prerequisite.