Continuing the series about Database Sharding, I’m going to to talk about the software/hardware architecture. This post started from an excellent read, MySQL Database Scale-out and Replication for High Growth Businesses.

MySQL logo

The first order of business is MySQL replication. Replication is needed to offer redundancy and to distribute even further the load on the system. In a typicall shard environment, the database is split among multiple servers, with data being unique to each server. If one of the servers goes down, all that data will become unavailable, and even though the system will continue working, parts of some scenarios will fail. This is where replication comes to stage.
Read more

There’s a nice tool called iostat to check the HDD related info, like number of reads/writes, amount of data processed, etc. It’s a must have for any good sysadmin, as it allows you to identify some of the bottlenecks in the DB.

Together with the vmstat tool - allows a user to see statistics for the virtual memory usage - form a powerful duo to use, especially when your DB is running very slow, but the processors are not fully used.

The tools have enough explanations on the man pages.

So, the only thing remaining is to start them up:

Open 2 terminal windows. The first one would run something like iostat -dx 10 (will display the device extended report, refreshed every 10 seconds - you can increase/decrease this number to suite your needs - too small is not very good, as it’s better to have stats over a longer period). The second one should run vmstat 10.

Last but not least, to get them you need to install the sysstat package (vmstat is in the procps package, installed by default). For ubuntu, type: sudo apt-get install sysstat.

Before continuing, please read the first parts of the database sharding adventure:
Database sharding unraveled - part I
Database sharding unraveled - part II

Chapter 1. The small guys

Before really diving into high scalability principles, I want to take a moment to talk about why database sharding has an important role even in small startups or medium sized web-sites (5 - 30k unique visitors/day).

It is equally important and benefic for a smaller web business to prepare itself from the beginning to tackle large amounts of users cheap. If it’s not obvious enough, think about what happens to a web-page that gets some plain old Digg attention. The server quickly collapses and the user experience immediately turns from positive to mega negative.
As I’ve explained before, the whole purpose of sharding is to be able to use an unlimited number of cheap machines topped by an open-source database. As experience taught me, the web server will rarely die. Instead, the DB server will choke easily when having to deal with many simultaneous connections.
The database doesn’t even have to be very big.

Read more

After understanding how to pick the correct dividing logic we continue our journey into database sharding. Many say that sharding is partitioning and they are right, but keep in mind that it’s the most complex form of all. In order to better grasp the concept, think about a field of flowers. Unpartitioned dataIn a normal situation (database), the flowers are all together.

What if you want to pick only the red flowers? Partitioned dataIn this case you would have to check every flower and see which one has the desired color, than pick it up, but that would take to long.

Instead, why not plant all the flowers based on their color. So, if you’d like to get the red ones only, it would be easy as pie.
The only problem which could appear would be if you wanted only the flowers which had 5 petals. That is why you must carefully think things over before starting to split your data.

Alright then, we’ve setup the logic, what next? It’s time to implement it.Now, the implementation is the tricky part.

Read more

This is the first post from a hopefully long series to come, about Database Sharding. dark ice(f)
The best way I can think of to define the concept is to associate it with ice fragments (build them up and you can sculpt anything, but failing to provide the right temperature collapses it all).
The idea is to split tables in a database in what are called shards, or fragments, or pieces. As your application increases in size, you need a way to scale cheap, efficiently and limitless. Furthermore, minor changes to the already existing code are required (buying more hardware is usually cheaper than re-programming).

There are several ways of dealing with database sharding, each with its pros and cons:

  • application layer
  • proxy
  • database layer

Of course, many other methods exist, but they are only implementations at some extent of the above.

Read more

Next Page →

Advertisements