Before continuing, please read the first parts of the database sharding adventure:
Database sharding unraveled - part I
Database sharding unraveled - part II

Chapter 1. The small guys

Before really diving into high scalability principles, I want to take a moment to talk about why database sharding has an important role even in small startups or medium sized web-sites (5 - 30k unique visitors/day).

It is equally important and benefic for a smaller web business to prepare itself from the beginning to tackle large amounts of users cheap. If it’s not obvious enough, think about what happens to a web-page that gets some plain old Digg attention. The server quickly collapses and the user experience immediately turns from positive to mega negative.
As I’ve explained before, the whole purpose of sharding is to be able to use an unlimited number of cheap machines topped by an open-source database. As experience taught me, the web server will rarely die. Instead, the DB server will choke easily when having to deal with many simultaneous connections.
The database doesn’t even have to be very big.


I’ll give you a practical example. It involves a forum with about 150k registered users, 600k posts and about 10k unique visitors/day (peaks reaching 50k unique visitors/day). The DB server is a 8 dual-core XEON processors with 8gb/ram. The whole DB is about 1GB in size and the server is MySQL.
When the number of simultaneous DB connections reaches a critical level (it’s variable), MySQL will freeze. That’s not all, as the sessions and posts tables will often get corrupted.
The first solution would be to put the tables in memory, so as not to stress the HDD. This actually turns out to be the worst idea possible.
So what to do?
You could say that DB replication could do the trick, but that would mean redundant data and other issues.
Let’s say that instead, inside the app, some sort of sharding is made. Then, the app would just need to know were the specific pieces of the DB are. So, instead of using one big server, we would use 3, 7, 1000000 or whatever smaller, less fast DB servers. They would be almost as easy to maintain, but the bottleneck nightmare would be over. As soon as you get more money, you buy another cheap machine and just plug it in where it’s needed.

Chapter 2. Principles involved

When talking Database Sharding, the logic reflects on all of the components of the app. That is why, a general high scalability background knowledge is needed. I really like the e-bay architecture and that’s why the principles will roughly be the same.Einstein Was Wrong

Principle 1. Partition by Function

I think the best keyword to describe it would be modularity. The idea behind it is to decompose the business logic into non-interacting components. The general effects are more economical BL representation, as well as, more importantly, the insight it provides into the entire BL structure. This provides valuable information on how the different sub-structures exist, interact and the way in which they can be predicted and manipulated. For example, if you know that x1 depends directly only on x2, this means that for the purpose of predicting x1, is suffice to know only x2. Moreover, in order to influence x1, one could directly interact with x2.

The maths behind this is very complex and I’m not going to get into details. Long story short, functional decomposition is never fully achievable because of the complexity of the systems it involves (instead, the matter becomes one of probability and statistics). The term “noise” symbolizes all the unwanted and untraceable influences in observing such systems.
In order to deal with the noise, Bayesian network methods are a good source of theoretical info. The essential motivation behind these methods is again that within most systems (natural or artificial), relatively few components/events interact with one another directly on equal footing(Simon 1963).

Back to the practical stuff - the database tier.

Split the database into smaller, specialized db’s. Use a DB for users, one for messanging functionalities, one for product orders, etc. Each of these databases must exist independently, that is, the splitting must be made so that I could, for example, create a new functionality (implicitly a new database), let’s say discussion board, and just take the users DB as a whole and use it, without
affecting anything else. The beauty behind it is that I can scale only what is needed. In this example, the users will be used in many Business Logic actions, and it’s only normal to give a greater priority on scaling it.

If you still think that partitioning by function is not important, well, you’re in for a big surprise. Not only that smaller, autonomous components are easier to maintain and scale, but this also allows the app to isolate and reduce the resource dependencies. Furthermore, this is the base for a SOA architecture. The concept was invented by IBM, but it’s basically a modular, easily interrogateable application building logic. The key is to easily allow a component to expose it’s api for comprehensive questioning.

In the end, the cherry on top of the cake: functional partitioning is the basis for asynchronous DB operations. Asynchronous means no more server clogging. There are some apparent “drawbacks”, but they can be overcomed with clever coding :).

Stay tuned for the next episode…

Save and share: These icons link to social bookmarking sites where readers can share and discover new web pages.
  • bodytext
  • Sphinn
  • del.icio.us
  • Mixx
  • Google
  • StumbleUpon
  • Technorati
  • YahooMyWeb

Comments

15 Responses to “Database sharding unraveled - part III”

  1. Mark on June 17th, 2008 1:26 am

    So, I have a question about splitting functionally.

    Take the two examples above: a users database and a discussion board database. If the use cases of the discussion board system requires data that exists in the users database (e.g., you want each post to include author info that’s in the user DB), does that mean that they really can’t be split functionally because they are actually coupled?

    Does that make sense?

  2. mrp on June 17th, 2008 9:06 am

    Provided you have a limit set for the number of posts in the discussion, it may be trivial to issue a request for the author information after retrieving a list of author ids from the discussion board. This is definitely not best practices, but it would work for the skill level this article was written in. Good day. :-)

  3. Bogdan on June 17th, 2008 10:25 am

    Hi Mark,

    Of course you split them. After that, when you need full info, you call your services asynchronously. Service A deals with getting post info. Service B deals with getting user info. Call A and wait for it to finish, then call B. If A fails, B will never get called and your server resources will be saved. For the end user, it’s not such a big deal, things will be fast enough.

    However functional partitioning can be extended such that a certain (higher) degree of denormalization inside the data could occur. That is, if you find that it’s more probable to need certain info both from the posts and the authors, than make a DB/service which deals with that only. Don’t be scared of redundant data, sometimes this is the key to fast reads. You will always have to compromise between reads and writes.

  4. Bogdan on June 17th, 2008 10:30 am

    Dear mrp,

    Thank you for being involved in the discussion. I’m not sure what you mean in the last sentence.
    What skill level would satisfy you? All I am trying to do is offer simple explanations to somewhat complex problems. I have never said I know everything about the subject, nor am I some kind of guru. If you are looking for complex mathematical explanation, I’m not sure I can give them (I can, but up to a certain extent).

    If you’d like, you could point me to directions you consider to be more thorough and I would be happy to read and learn from them. Or maybe you could post your own opinion on the matter - just sent me an e-mail with the article and I will publish it.

    My best,
    Bogdan

  5. Mark on June 18th, 2008 3:20 am

    Thanks for the reply, Bogdan.

    I understand what you mean, and actually that’s been the solution that I’ve ended up with in the past when dealing with data that is split between sources.

    Do you think it’s a useful analogy to compare sharded databases as being similar to normalized tables within a single database, with the main difference being that any joins require a separate service call to fetch the data from another shard? In other words, if users and posts live in the same database, the data from both tables is retrieved using a join. However, if the data lives in two different databases, then the data must be retrieved via two separate calls, unless the duplicate data it is denormalized and stored in both databases.

    What do you think, is that a useful comparison?

  6. Bogdan on June 18th, 2008 8:58 am

    Yes and no. It depends on the structure of your data. For example, functionally speaking, you’ve got 2 tables - users and posts. It doesn’t matter so much if they are in the same DB for your comparison, but the structure of the tables themselves. For example, the posts table can be organized in shards (let’s take a simple rule, like odd post id’s go in table posts_odd and even post id’s in posts_even). Then, a simple join would not be enough. Or, if it’s more functionally convenient, you could make a table containing only what is needed from the users and posts data then split this table using a rule of your choice…
    Your comparison makes more sense from the functional side.
    The last idea is quite an extreme one and should be dealt with very carefully (oh, it’s not duplicate data, I think you meant common data). It’s better if you create a separate table with that mixture and store it in a third DB.

  7. Wawrzek on June 26th, 2008 9:10 pm

    Sitting at home and searching almost randomly internet I found this article - in my opinion very good one.
    I have a question. What would you suggest if posts should be searched not only simply by user but also by users’ characteristics. For example searching posts has to be done by color of hair, by height, by sex, by color of eyes, by place of birth, etc. Searching will base either on single characteristic or on more than one. Is it a solution for this case to create tables with redundant data for each combination of characteristics?

  8. Bogdan on June 27th, 2008 11:47 am

    Hi Wawrzek!
    I depends on the splitting criteria. If you only have functional partitioning, then just query the users for the desired characteristics, than use that list to retrieve the posts. If you have, however, split the posts table into shards, using a criteria of your choice, then you would need a look up table or a very smart in-app method of fetching the data (for a simple solution, use a merged table only for this kind of queries).
    If you are referring to speed gains for hardcore searching for only some data, it would be simpler to just make a separate table with keywords and the COMPLETE, already formatted results. This is used in some major sites, which have a separate DB only for searches, but which contains fully formatted results (refreshed every x hours) for the most common y searches. In your specific case, I’d just recommend using simple data retrieving (users first, posts afterward) or devise simple look up tables (but from what you write, there are too many criteria, therefore too many look up tables to create). Redundant data is only good for a very-very small number of combinations (you could do only for height, for example).

  9. LockeVN on August 8th, 2008 1:09 pm

    Hi Bogdan,

    I have a same point of view with you about functional partitioning. Very nice to meet you at this

    article. I will talk about pros and cons. I love to discuss this topic a lot so my text will be

    long, and so again ;) thanks first, for reading.

    Me and some guys make a ORM (object relation mapping) framework on .NET. It’s DB independence. We

    also build a framework for web development (an portal) and framework for winform. In portal, it

    allows to have several module, each module can fetch data from seperate datasource. Datasource point

    to seperate machines, databases (of course, all datasource can point to the same machine if needed).

    We gain a lot good things.
    - DB is not stress,
    - application business is isolate and service orient,

    - data is isolate,
    - module is easy to deploy.

    It’s the pros, and now the cons, I think we all have to deal when implement these techniques.
    - when there are some common data (Mark said) like: User, Role, Right, Permission (and in Portal, there are Page, PageConfig, PageletConfig, …)
    SOLUTION:
    + make database link (because of use SQL Server, we can query remote machine.database.table) or replicate common data. But, when User number grow, it’s rather hurt. :S . IMO, it brings more effect if we still replicate data, because, in most case, common data ussually -not to change- day by day.
    + use service orient call, we will lose performance. Eg: UserService calling return some UserData, we use UserData to feed BlogService, fetch BlogPostData, BlogCommentData. It’s slower than the normal SQL Join query. Use service call is we do “mashup” data in app layer, instead of let DB does by join.

    - we will have trouble if or module work together. Eg: Forum use Blog, both Forum and BLog and News use RateReview… .
    SOLUTION:
    + use service orient call.

    - split logic data entity into small pieces and distribute physically, like you call, is our aspect partitioning (IMO). It’s good at situation “we want it to be good” , at the aspect it is design to. In any other aspect, it’s not good. In your example, you do Category aspect split to your Post. If we need to fetch post by category, it’s ok. If we need to fetch 10 lastest posts, it’s hurt. If we need to fetch top 10 read posts, it’s hurt again.
    - we can not do thing on all data (whatever aspect it is) like: search, make report, make data analysis …
    SOLUTION (I found it like SQL Server does): make a DB view (Microsoft name it Distribute Partition View - DPV). View Post will create by “select * from Post_odd UNION select * from Post_even”. All read query will be fetch from this view. But this solution only works in SQL Server (where you can link databases from different machines). I can not do in MySQL (my MySQL knowledge is too poor). It is really hurt, so I still do not intergrate AspectPartitioning into our framework. If your app make a lot of search, report, don’t use AspectPartitioning.

    - aspectPartitioning is push stress to DBA, maintain, rolling our database. We have to keep one logic entity’s structure be the same at all its partitions, all its shards.
    - when we upgrade some application from not-scalarable to scalarable, common query do not work (join, union, nested query)
    - as mention before, it moves data mashup from DB to App (and IMO, DB does better and faster)

    - repartitioning is really a pain. Eg: our server work well when Post table has under 1Million. We can live well when we have 2M records (because of split in to Post_Odd and Post_Even). But what should we do when it is 3M records? and more and more?. I only hope, in that case, we will have enough money to buy more expensive machine (scale up), or have enough time to make MixAspectPartitioning (add Year aspect, divide Post into Post_odd_XXXX, Post_even_XXXX, where XXXX = 2007 2008 2009)

    I agree with you:
    - buying more hardware is usually cheaper than re-programming
    - AspectPartitioning reduce the data size, reduce load, It is really good, in speed and performance, data size.
    - FunctionalPartitioning is good, it balances the load, increase throughtput.
    - Some of my opinion is post here

    http://codebetter.com/blogs/karlseguin/archive/2008/06/30/scale-cheaply-sharding.aspx , almost the

    cons. :( (I’ve seen that article first and feel not good)
    - we should make some replicate, mirror for high availability

    I dig into partitioning, cluster and database for several recent weeks (in opensource DBMS, aka: MySQL). And also eager to hear from you.

    I think, best case in future is: we implement functionalPartitioning in our app, let DBMS vendor do AspectPartitioning. They can split data table into partitions, save each partition in each machine, harddisk. They do automatically after we config, they help us repartitioning. And with our app, they a unique Database, seamlessly, no change.

    I know Oracle and DB2 have these features. I hope these features will exist in MySQL or some free Open DBMS. Microsoft does the cluster with their SQL Server (they depend on DPV), but I saw in all review, it is really bad product.

    We should not sharding if we need transaction. This time, I know only Oracle and DB2 can do ACID transaction in their cluster architecture.

    Sorry for my poor English.

    Regards,
    T.

  10. Bogdan on August 13th, 2008 1:58 pm

    Hi T,
    Sorry for my late answer. For the past 2 months I`ve been through the process of moving to Norway and didn’t really had time to check on the blog. Thanks for your comment and will get back when the time allows it.

  11. LockeVN on August 16th, 2008 4:51 am

    OK, ;) I’ll wait

  12. kyaw kyaw naing on September 14th, 2008 6:18 pm

    Thanks a lot.

    I enjoyed all the three parts.
    You make use see that even small setups like ours, need to pay attention to sharding options.

    Thanks again.

  13. Bogdan on September 16th, 2008 1:04 pm

    Hi Kyaw,

    Thanks for your thoughts :).
    Right now I’m waaaaaay over my head in working in upgrading some PHP systems, but starting next month, the blog will be back in business.

  14. James on September 19th, 2008 7:56 am

    Hi, I found your blog on this new directory of WordPress Blogs at blackhatbootcamp.com/listofwordpressblogs. I dont know how your blog came up, must have been a typo, i duno. Anyways, I just clicked it and here I am. Your blog looks good. Have a nice day. James.

  15. Bogdan on September 19th, 2008 10:33 am

    Hi!
    I guess it’s just a list of wordpress blogs.

Leave a Reply




Advertisements