Why SQL is Forever followup
Yesterday's email had a broken link to the paper. Here's the correct link:
What Goes Around Comes Around... And Around... 👈 this PDF has my annotations. The paper is super approachable and you'll learn a lot about different database technologies. Great for any system design interview or conversation.
And if you're in a sharing mood, here's a link to the web version of yesterday's article. Forgot to include it. 👉 https://swizec.com/blog/why-sql-is-forever/
Thanks to everyone who wrote in to let me know I'm bad at copypaste ❤️ Keep reading for some questions from readers.
Is SQL just a better fit for our brains?
Jacek writes in suggesting "maybe SQL is just how our brains think about data"
Hard to say about brains. They’re all so different. I think SQL’s strenght is its permissiveness.
You can really butcher the query and still get a result. And the query optimizer will sigh, roll its eyes, then make it work fine enough 90% of the time. I have for example written awful queries with subqueries of subqueries (couldn’t figure out how to group) on tables with millions of rows … and it still ran in 3 seconds because B-tree indexes are that good.
Guess Jacek is right: SQL fits our brains because the query optimizer can make almost anything work.
Relational good, SQL bad?
Damien writes in arguing that yesterday's article made a great case for relational databases, but not for SQL itself. He suggests that SQL is full of footguns (true) and that you could have relational databases without SQL.
Easy to get A result
I agree with Damien that it's very easy to write SQL that produces the wrong result, but looks correct and doesn't complain. SQL will figure out how to return a result. And that's a threat!
You, the engineer / data scientist, need to have a deep understanding of the underlying data to make sure you're getting sensible results. Super easy to skip a "zero month" in your data and not notice.
OLAP databases help. The data engineer slurping up production data makes sure to fill in the blanks and do all the right joins before squashing data into columnar tables for data scientists to consume.
The syntax is wonky
A forever debate in SQL is whether you're "selecting socks from a drawer" or "from a drawer selecting socks". Damien is in the 2nd camp, I ... eh.
Yes if we started with from, the tooling could help us autocomplete columns and be more useful. But SQL's original intent was to "sound english" and be easy for non-engineers to write.
And it's criminal that you can't use an alias inside the HAVING clause. Life would be a lot easier if you could.
Lack of linting support
This is a big one. SQL is too flexible for its own good. It can check for basic type mismatches (comparing string to integer), but it doesn't understand your data as much as it could.
A user_id is an integer. An article_id is an integer. SQL will let you compare those and not even blink. This leads to queries that work but produce the wrong data. And you won't even notice.
But SQL is here to stay
Despite all these obvious flaws, databases that start without SQL keep evolving their own versions of SQL. ¯\(ツ)/¯
Cheers,
~Swizec