Your Grandpappy's Cross-Database SQL Copy / by Chris Shaffer

If you’ve worked a decent amount with databases in the past, you’ll know that “one does not simply move data from one database to another.”

Before we get to your grandpappy, let’s review, in broad strokes, the reasons you’d want to move data from one database to another:

  1. One time - you have a legacy database system that you want to migrate wholesale to a new environment, once

  2. Ongoing - processing data from an external provider on an recurring basis

  3. A team member has created a large amount of data on a local database on their workstation and would like to deploy it to production

  4. QA - you want to run code on a copy of your production data to test code changes (particularly migrations) before deploying them live

  5. One time - you have a spreadsheet that you want to pull into the database, once (for example, someone has typed up a list of reference data in Excel)

  6. Debugging - you have a specific use case that you need to reproduce in your development environment, but you need a few hundred rows of data to do so

And the general approaches you’d take to each:

  1. Use a tool. Microsoft provides one called SQL Server Import and Export Wizard, for example, that also is compatible with Oracle, MySQL, and PostgreSQL.

  2. The provider has developed, or will develop, either a data feed or an API. You’ll have to write code to pick up their file or call their API, make sense of it, and insert it into your own database.

  3. If the data is created by a script, save that script in source control, review it, and then it can run again in production as part of a schedule or your deployment process.

  4. Do a wholesale backup and restore, and then point your scripts and code at the new database.

  5. and 6. - Now it’s time to talk to grandpappy…


Some asides:

#1 and #2 are both larger projects - they’re likely going to require an exercise in mapping schemas, to convert the source’s business and storage concepts to yours.

#2 will require additional work and thought to ensure that you’re updating and deleting as appropriate - more if the provider doesn’t have consistent primary keys.

If they’re overlapping with your own existing entities and data structures, then there’s a potentially huge exercise to disambiguate and deduplicate those entities (is my “Chris Shaffer” the same person as their “Christopher Shaffer”?), find conflicts, and resolve them.

With #3, if you assumed that data can simply be “dragged and dropped” between databases and lost the script (and the data is too large to wrangle manually by copy/pasting into Excel), you can learn those valuable lessons by redoing the project. This process will also help catch the bugs which certainly exist in code that’s outside of source control.


Here’s our spreadsheet:

spreadsheet.PNG

Now, we can get Excel to write our SQL for us…

="select '" & A2 & "' as ID, '" & B2 & "' as Name, " & C2 & " as SortOrder union all"

spreadsheet2.PNG

And then we can paste it into SQL!

sql5.PNG

Boom!

I use a merge template as a matter of habit - especially for reference data / enums; it allows you to store it in source control as if it’s code (because your enums are code). It’s also re-runnable, and can be used to instantly rectify any “the reference data is messed up because someone forgot a where clause” issues.

But, for a one time deal, there’s no reason you can’t go even simpler …

spreadsheet4.PNG

There’s also an equivalent for going straight from SQL to SQL, you just have to convert everything to strings:

sql3.PNG