Apr
15
This is the first post from a hopefully long series to come, about Database Sharding. 
The best way I can think of to define the concept is to associate it with ice fragments (build them up and you can sculpt anything, but failing to provide the right temperature collapses it all).
The idea is to split tables in a database in what are called shards, or fragments, or pieces. As your application increases in size, you need a way to scale cheap, efficiently and limitless. Furthermore, minor changes to the already existing code are required (buying more hardware is usually cheaper than re-programming).
There are several ways of dealing with database sharding, each with its pros and cons:
- application layer
- proxy
- database layer
Of course, many other methods exist, but they are only implementations at some extent of the above.
Before breaking it down and seeing what it is all about, let’s stop for a while on the main concept and take some real examples to play with. Let’s take a simple database (for a forum) with 3 tables: users, categories and posts.
The definitions are straight forward:
CREATE TABLE `mydatabase`.`users` ( `uid` INT( 10 ) UNSIGNED NOT NULL AUTO_INCREMENT , `name` VARCHAR( 50 ) NOT NULL , PRIMARY KEY ( `uid` ) ) ENGINE = MYISAM CREATE TABLE `mydatabase`.`categories` ( `cid` INT( 10 ) UNSIGNED NOT NULL AUTO_INCREMENT , `name` VARCHAR( 40 ) NOT NULL , PRIMARY KEY ( `cid` ) ) ENGINE = MYISAM CREATE TABLE `mydatabase`.`posts` ( `pid` INT( 10 ) UNSIGNED NOT NULL AUTO_INCREMENT , `cid` INT( 10 ) UNSIGNED NOT NULL , `uid` INT( 10 ) UNSIGNED NOT NULL , `date` timestamp NOT NULL default CURRENT_TIMESTAMP, `value` TEXT NOT NULL , PRIMARY KEY ( `pid` ) ) ENGINE = MYISAM
Ok, we’ve setup the database, now it’s time to split the tables. In order to do that, the big resource eating table will be analyzed. So, the posts table (by far the biggest) should be divided in smaller parts and each of those parts must be moved on a different hdd or machine. After giving it some thought, 3 ways of achieving our goal emerge:
- split the posts by the post id(pid),
- by the category id(cid),
- or by the user id(uid).
Which of them to choose? The answer is simple -> checkout your application navigation logic. In this case, the visitor would see a list of categories, click on one of them and read the posts. It would be of no help to split the posts by user id, as only 1 percent of your visitors would click a username to see its posts. Nor would it be to split the posts by the post id - this is the worst method of them all.
What I’m trying to say is that no matter the logic you chose to split a table, always keep in mind that you want 0 join, order by or limit clauses which would require more than one table shards.
Think about it for a minute - your visitor clicks a category name and is shown all the posts under it. The query would look like (for simplicity’s sake, I’ll use PHP variable notations):
SELECT * FROM posts WHERE cid=$cid ORDER BY date DESC LIMIT 0,20
Given one of the 2 “bad” methods of dividing your data from above, what can you do? You would have x tables posts_0, posts_1, … and soon, either with “users rule” for dividing (something like: all the users from 1 to 1000 have the posts in posts_0, all the users from 1001 to 2000 have the posts in posts_1, etc.) or with “posts rule” for dividing (something like: all the posts from 1 to 1000 go to table posts_0…).
The bad thing about the query is the ORDER BY clause - how can you extract data from all the tables posts_0, posts_1, … order it and then get only x results? - you can’t “join” or “glue” posts_0, posts_1… The answer is there’s no way! (did I just there’s no way? Of course there is, but more on that in another post ;)).
On the other hand, if we would have split the posts by the category id, than the problem would have been solved. So, craft a rule of your choosing (I’ll explain it later) and make sure posts with category id’s following your rule go to table posts_x or posts_y or whatever. A simple rule would be odd and even category ids, thus resulting in posts_odd (for posts with category id an odd number) and posts_even. When clicking on the category link, you pass the query the category id. The query is next analyzed, the category id extracted and checked for parity and the correct table to interrogate is used and because all the posts which belong to that category are in the same table, …. problem solved!
2 short examples:
SELECT * FROM posts_odd WHERE cid=123 ORDER BY date DESC LIMIT 0,20; SELECT * FROM posts_even WHERE cid=1466 ORDER BY date DESC LIMIT 0,20'
Comments
9 Responses to “Database sharding unraveled - part I”
Leave a Reply








Just use db partitioning instead of trying to hand code this sort of thing.
Well, db partitioning doesn’t do it for many reasons, at least MySQL wise, because is has only simple rules of splitting the data. There are also some other things to consider, like trying to use stateless interactions and relying on the code instead of the DB for data picking. You can check out this excellent article, if you’d like more insight on the benefits of sharding (sharding, at the essence, is partitioning, but at app level, not at DB level): http://www.infoq.com/articles/ebay-scalability-best-practices
Range partitioning in DB2 does the trick for me. I hate to do anything from the application layer since DB engine is always known to be faster in hashing and distribution than using application code.
As you say, Arun, the job gets done quickly if all you want is range partitioning, but what I am talking about is full separation in self-manageable entities, each, at their turn further split to enhance performance. There are many cases in which range partitioning isn’t a solution. What I need is deep object location manipulation plus object inner dwellings separation so as to allow as much as possible stateless interactions, to avoid using transactions. It’s not that easy to obtain that only using your database and it’s a nightmare to manage afterwards.
Let’s not forget that DB2 isn’t cheap and MySQL’s built in partitioning is not that good.
Talking strictly about parsing/hashing/distribution speed - the speed gain only appears when building a custom parser just before the query comes out from the DB abstraction layer, or somewhere in between. But if you build your app from scratch with partitioning in mind, things change dramatically - as you will build your queries already targeted to the right DB/Table.
So, to take advantage of this I could extend our ORM framework and have something like this:
ITable shardDecorator = table.shardBy(categoryId);
ResultSet results = shardDecorator.findBy(categoryId);
Should I implement the findBy method using fork-join to run all queries simultaneously? How do you handle error cases?
Thanks for these articles on sharding. It is very useful and much appreciated!
Regards
Willem
After writing this long comment, the power went down and stupid me didn’t have the PC in the UPS. So, here’s a shorter version:
1. The 2 lines of code are correct
2. Yes, use fork-joins, but these must be personalized for each data structure you may use with your ORM. So, my advice would be to make findby a dispatcher to specialized data scanning methods by using a criteria as parameter.
Criteria c = new Criteria(…)
ITable shardDecorator = table.shardBy(c);
ResultSet results = shardDecorator.findBy.(c);
When findBy receives the Criteria (which is an object describing the columns needed and the rules by which to get the data (the stuff following “where …”) it than dispatches accordingly. If it sees a full data-range scan is needed, it calls for a fork/join (or any kind of divide and conquer method you may use), or as last resort uses a merged table and leaves the data gathering logic to the DB (not recommended but does the work).
But these are only special cases and should be avoided at all cost, instead intelligent data splitting must be used so that only the needed table is queried.
Always start your app with scalability in mind, otherwise it can be a headache to deal with later.
There are 2 things to scale: the transactions and the data itself (although, the key is to use stateless objects so that transactions only become a mirage to the user).
The sharding must be done by prioritizing data and at the same time refactoring the business-logic relationships.
You must think of the multiple ways your data will be splitted into shards (eg. users are split by the last digit in their id in 10 tables, products are split using a lookup table and so on…). You must be able to cope with them all, and that is why findby must be a dispatcher. The error handling is done in the same place where data consistency assuring is done.
The best part is that scaling big by using shards means that transactions are not needed, because by cleverly crafting the splitting rules, immediate consistency is not needed. That is why the error handling mechanism lies deep in the belly of the monster.
The consistency is made by ordering your DB operations carefully, asynchronous recovery events, and reconciliation or settlement batches(for the logic behind settlement batches, read this http://www.devguru.com/products/dgCharge/doc/dgCharge_index.asp).
Google search for asynchronous recovery events and read more if you’d like - although these types of computer literature are not free.
All that is too much however. For a simple system, you can write some asynch methods of recovery very easy and fast.
All I am trying to say is that sharding is not that difficult if you take the problems each at a turn and logically resolve them.
Thank you Willem for taking your time to read them. As the time allows it, I’ll post some real case scenario and how I resolved it using sharding on an Invision Power Board forum - the speed gain was impressive.
[...] understanding how to pick the correct dividing logic we continue our journey into database sharding. Many say that sharding is partitioning and they are [...]