| Refactoring a live database from the Highly Agile Relational Databases
At the Highly Agile Relational Database workshop, one of the questions was "How do you refactor a database that has to be live all the time?".
So, what's the answer then?
Hmm, difficult... I dont know. I don't think it's the right question, though.
Ah, that's very helpful! And pray tell us, what is the "right" question?
Look at it from the customer's point of view. Imagine you're a customer. You're a customer of a bank. Do you care about refactoring or about their database? Often they don't even have a database. Well, not what I would call a database anyway :-)
You don't want a database that's available 24/24h. You want to be able to see the amount in your account, to get money from the ATM, to transfer money to another bank account... 24/24h. You want an application, whose functionality is available 24/24.
"How do you refactor a (database based) application whose functions have to be available all the time?". Now, there's an interesting question! And I think it's easier than the previous question.
I still won't give you the solution, but a few tips and meta-tips should get you on your way to finding it yourself.
Tip 1: Let them read a copy (found at the workshop)
Before performing the refactoring, make a copy of the data and let the queries work on the copy for the (short) duration of the refactoring.
Problem: the data read might be out of date. Yes, at worst as out of date as the length of the refactoring. But, you should assume anyway that any data you read is out of date as soon as you've read it. Imagine two commands come in at the same time: one to read the amount in my account, one to deposit my paycheck. Which one is executed first? No way to tell. If the read happens before the update, the data I get back is out of date. I have to deal with it. Let's minimize the length of time the original database is unavailable, though.
Tip 2: queue transactions
Important transactions are often queued to deal with technical and functional failures. For example, if I perform a bank transfer, the transaction is put on a queue to be executed. If the database is not available, some communications system is down or if there's not enough money in my account, the transaction is scheduled to be retried a short while later. This gives us the time to bring the database back, plug the network cable back in or put some money into my account....
This method is typically used with transactions of the form: IF THEN . If there's no precondition, we can use the next tip.
Problem: what if, after "a while", the transaction still can't go through? Then we have to take corrective action, remove the transaction from the queue and warn someone who can make the precondition true ("Hey Pascal, put some money into your account!").
Tip 3: log transactions and replay after the refactoring (found at workshop)
Make a copy of the database. Work on the copy while the original is refactored, but log all transactions. When the refactoring is done, replay all transactions logged into the original database.
Tip 4: perform small, backwards compatible refactorings
It's possible to perform some changes to the schema of a database, without having to bring the database down or impacting the applications working on it. These changes have to be small (so that the database remains available) and have to be backwards compatible (so that the applications don't notice the change). For example: adding a column to a table is pretty easy to do on a live database. Adding a view to a database is painless, adding a table to a database can always be done...
If you can reduce a large refactoring to a number of small, backwards compatible refactorings, no one will notice. If you have a small refactoring you can't apply without taking the database down, use one of the previous tips to deal with the short downtime.
Tip 5: use what encapsulation tools your database provides
Databases provide some encapsulation tools like computed columns, views and stored procedures which allow you to maintain a consistent interface with your applications, while allowing the underlying implementation to change. Use them! For example, you need to make non-backwards compatible change to a table. Rename the table and create a view which exposes the data as the table did before. You can now change the table and no one will be the wiser.
Meta-tip 1: Don't repeat yourself (DRY as the Pragmatic Programming book calls it)
If you have knowledge of the details of the database repeated and scattered all over the place, you deserve everything you suffer. There's no reason I know of to have the knowledge of table/view/column names duplicated in several places or several applications. There's no reason I know of to rely on the position or number of columns in a result set/table/view.
Don't forget DBMS are pretty powerful systems, don't duplicate data, business rules or processing over lots of applications if you can have them in the single, shared database.
Meta-tip 2: SMALL STEPS!
Do everything in small steps, small refactorings, test each step. I'm pretty confident that I can break down a large refactoring into a number of small, testable refactoring steps. I have no proof that this is universally true, I just haven't encountered any situations in which this was not true. Maybe I should get out more...
Meta-tip 3: If it hurts, do it more often
So, you're scared as hell of database refactorings, of database testing? Every new release to the users is a nightmare? It's always so painful? It takes so much time? There's so much stress... You're tempted to postpone doing that, aren't you?
Well, don't! Instead do it more often. On my projects the database is tested from day 1, when it's small and easily testable. The database is refactored from day 2, when it's small and easily refactored and we haven't amassed a large amount of data to preserve.
And we keep on testing and refactoring, every day. And we get good at it. And as complexity, amount of data and criticality slowly rise, we cope with it. We find cunning ways to speed up our tests; we find ingenious ways of breaking refactorings into small, compatible changes; we evolve clever architectures and code that are able to deal gracefully and correctly with change, with failures and performance issues.
|