Code-First vs. SQL-First / by Chris Shaffer

I don't have official statistics, but it seems like the engineers of the world are evenly divided between the two approaches.

After reviewing the pros and cons of each (and working hands-on with both), it seems like database-first should be the default when there's a question - code-first is a niche solution for prototypes, throwaway apps, SQL-phobes, and of course no-SQL databases, where database-first is gibberish.

The approaches

  • Code-first: You write your objects in a higher-level language, and then some framework generates migration scripts for you to create/update tables
  • Database-first: You write your migration scripts first to create/update tables, and then some framework turns them into classes in your higher-level language
  • Neither:
    1. You write your migration scripts by hand, and then write your classes by hand, looking back and forth between two monitors
    2. You run your migration scripts manually with no tracking of what's been run and what hasn't, so they'd all better be re-runnable

First, we'll tackle "neither"

For Neither-1, a lot of us have a gut revulsion to typing almost-the-same-thing twice, but in the grand scheme of "things that are bad" this ranks pretty low. I wouldn't say it's a good approach, but it's hard to imagine it wasting more than 1% or 2% of your time, so if you decide "I don't want to figure out how this thing works; just lemme type it myself" I'll get off your lawn.

Neither-2 can be pretty messy - I've seen large companies pull it off, but it can easily consume 10% or 20% of everyone's time for the extra effort and troubleshooting it entails. I'd avoid this, and if you're doing this, no matter what stage you're at, it's probably worth it to reform the process.

In favor of database-first

Let's start with an exercise: Flavors of varchar/collation/nvarchar are super different in different SQL databases. But, on the code-side, in whatever language you're using, they're probably all just strings.

A key function any ORM can perform is making your higher-level code not care about these differences. If you write SQL first, and then generate models based on it, your higher-level code never has to care about those distinctions. If you're leaning heavily on your ORM, an enterprise can write millions of lines of code and have literally one place in the higher-level code that cares about whether the underlying database is Oracle or MS SQL Server. That's incredibly powerful when it comes to major version upgrades and interoperability.

If you go code-first, your code may have to care about that from day one. You're bringing a lot of SQL baggage into your higher-level language.

But you don't have to ...

I'm going to go code-first, but I'm not going to bother with any details of my SQL flavor that'll break my abstraction. I'll just create columns as strings an accept the wisdom of my ORM's defaults.

There's one argument against this that goes "How dare you store Unicode when a vendor is sending ASCII - why don't we just order two hard drives and throw one out the window?" It's true that disk space isn't nearly as dear as it used to be, but in databases size can equal speed (this is especially true if doubling the size of that column forces a table that could've been cached in RAM to page the file system).

While we're on the topic of performance, it's also worth noting that SQL-first makes it easy to just delete fields on a model, or have two versions of a model for one table, such that you don't have to pull every column into your web server's RAM with every query.

There is, though, some wisdom to not caring about this stuff for less data-intensive projects, so let's continue down this road... What if I want to make one of my reference tables into an enum? At this point I do care whether the primary key is an int or UUID.

And this is where you arrive at the thing that's most important to consider: the trade-offs are asymmetric. If you've got one situation where you do care about the "SQL stuff" and don't want to accept the defaults, you need to add a coupling between your higher-level code and that particular flavor and version of SQL. In order to maintain the same level of abstraction that you could have with SQL-first, you'd have to accept defaults everywhere.

Why write SQL if you don't have to?

This is often spoken with the implications that's it's a harder language; as if it's comparable to assembly because it's "closer to the metal." This is untrue.

A lot of what we as programmers think is easy is going to be based on what we're used to, but when you try teaching both to non-programmers, most will find SQL easier. Or, try the challenge yourself - create a table in both SQL and a code-first ORM. Add a foreign key and an index. The SQL probably is about 50% shorter.

If you don't want to learn about the complicated stuff that your ORM was abstracting away, then don't. Just make everything nvarchar(max) and come back to it later or never. The nice thing about the SQL-first approach is that you're not locked into not caring; you can change these without changing code or breaking your abstraction.

It's also worth noting that SQL knowledge is more portable. It's not terribly hard to jump from PostgreSQL to MySQL, and the fundamental syntax hasn't changed since before some of the people writing the fashionable ORM were born.

How'd we get here?

My best guess is that it comes largely from no-SQL habits. With a schema-less database, there's no such thing as SQL-first; everything is code-first. While there's wisdom in applying the same techniques to multiple problems rather than developing or learning new ones, maybe the best time to do that isn't porting techniques developed for databases which are defined by the absence of SQL over to a SQL database.

Lastly,  let's address audience bias. If you ask "what's the best way to build a new app?" you're going to get more answers from people who have a lot of experience ... building new apps. Draw on that wealth of experience, but be aware that, by definition, more information is coming from people the who haven't had to be responsible for the ramifications of their choices day-in-and-day-out years later.