Database scaling is the most difficult means to scale the application, but it also provides most benefits. It’s the database, which most often becomes an e-commerce or social network application bottleneck. In case of e-commerce applications the reason is a large number of transactions, and a high level of data normalization. In case of social networks, the amount of data is hard do predict, and such applications feature lots of write operations.

Sites such as Facebook and similar utilize comprehensive MySQL database partitioning strategies, implemented on application level. User data may be split between different physical servers, and all write and read operations may be conducted on those particular servers.
Currently databases such as MongoDB implement horizontal partitioning strategies (so-called auto-sharding), but they require assuming an object-oriented approach to the database, and don’t support SQL, which might cause difficulties.

Various e-commerce software platforms such as Magento enable vertical database scaling through a selection of servers from Master (write operations) and Slave (read operations) pools. It’s a very simple method, which brings measurable efficiency-related benefits. However, it’s difficult to accomplish horizontal scaling in such fashion, since it requires particular programming techniques, such as avoiding database-level joins.

Some time in the past we attempted to implement the Ganglib library here at Divante. It was supposed to act as an abstraction layer between physical SQL database servers, and the application. Out-of-the-box Ganglib enables scaling using several strategies, both horizontally and vertically, in a fashion entirely transparent to the developer.

How Does It Work
Ganglib establishes an abstraction layer between queries and data sources. Queries are written in an object-oriented fashion, just like with LINQ to SQL or Doctrine. Queries are built as objects, using a notation similar to SQL and compatible with it. This provides an extra abstraction layer separating the physical database dialect.
Every query is then sent to the Div_Db_Distributor class. The data distributor, based on analyzing the query using interchangeable strategies, selects which physical server or servers the query is to be directed to.

Then the distributor merges data collected from the servers, and forms a Data Set. In case of write operation queries, they’re simply sent to suitable physical databases.
The distributor also implements a data-level cache using Memcache.
The following is a code snippet with a reference to the database:

$distributor = Div_Db_Distributor::getInstance();
$distributor->setConfig($config)->setStrategy(‘PDO_Vertical’)->setCache($cacheManager->get(‘default’));

$query = new Div_Db_Query();
$q = $query->from(‘users’)->where(array(‘id’ => 1));
$this->view->users = $distributor->query($q);

$query = new Div_Db_Query();
$q = $query->from(‘images’)->where(array(‘id’ => 1));
$this->view->images = $distributor->query($q);
The two queries will be executed on separate databases (vertical partitioning, table-dependent). The following is an example of a horizontal partitioning query (different data scopes on different servers — note that there’s no difference!):

 

$timer = microtime(1);
$distributor = Div_Db_Distributor::getInstance();
$distributor->setConfig($config)->setStrategy(‘PDO_Horizontal’)->setCache($cacheManager->get(‘default’));

$query = new Div_Db_Query();
$q = $query->from(‘users’)->where(array(‘id >=’ => 1))->where(array(‘id <=’ => 2));
$this->view->users = $distributor->query($q);

$query = new Div_Db_Query();
$q = $query->from(‘users’)->where(array(‘id >=’ => 3))->where(array(‘id <‘ => 5));

And here’s something even more surprising. A data query which is based not on a database, but on RSS feeds:

$distributor = Div_Db_Distributor::getInstance();
$distributor->setStrategy(‘RSS_Horizontal’)->setCache($cacheManager->get(‘default’));
Div_Db_Distributor_Strategy_RSS_Horizontal::addField(‘pubDate’);

$query = new Div_Db_Query();

$from = array(
‘http://potw.news.yahoo.com/rss’,
‘http://rss.news.yahoo.com/rss/oddlyenough’,
‘http://rss.news.yahoo.com/rss/politics’,
‘http://rss.news.yahoo.com/rss/mostviewed’,
);
$q = $query->from($from)->like(‘title’ , ‘obama’);

Full example source code is downloadable from code.google.com/p/ganglib.

The Power of Strategies
All data source specific operations are implemented as strategies. The developer is able to select the right strategy at configuration level, as required by the growth of the application. Since Ganglib forces the developer to use a particular API to create object-oriented queries, there is no chance that queries which work with a single database won’t work with multiple databases.
A relational database may actually be migrated to e.g. Amazon SimpleDB or MongoDB with no apparent differences for the programmer. This is the power of strategies.
Ganglib examples include strategies operating with RSS feeds and the file system.

Example Strategies
In order to present the idea behind Ganglib, we’ve implemented a couple of out-of-the-box strategies. A short description with code examples follows.

Simple
A simple strategy utilizing the PDO driver, and enabling a single database server connection — an excellent initial choice.

Vertical
A strategy which separates queries on the basis of the tables they refer to. For example queries related to the Users table will be directed to a different database than those which refer to the Products table.

Horizontal
At the time Ganglib was in development, there was no other engine which would implement such a strategy. Certain elements of this strategy are implemented by MySQL 5, or as an add-on to HSCALE, but they only work with a single server. If you’d like to use 15 SQL servers in write mode, there’s no way to do it.

Unless, of course, you use this strategy. The strategy splits queries between physical SQL servers using an algorithm selected during configuration — RANGE, HASH, LIST, or another.
Strategies may easily be added, and the following code snippet shows an example of one.

class Div_Db_Distributor_Strategy_PDO_Vertical extends Div_Db_Distributor_Strategy_PDO_Abstract
{
/**
* Query database
*
* @param Div_Db_Query $query
* @return PDO_Statement
*/
public function distributeQuery(Div_Db_Query $query)
{
$modelArray = $query->toModelArray();
if(isset($this->_config->vertical->from->{$modelArray[‘from’][0]}))
$schema = $this->_config->vertical->from->{$modelArray[‘from’][0]};
else
$schema = $this->_config->vertical->default;

$connection = $this->connect($schema);
return $connection->query($query);
}

/**
* Execute query
*
* @param Div_Db_Query $query
* @return mixed
* @access public
*/
public function execute(Div_Db_Query $query)
{
$result = $this->distributeQuery($query);
$this->isCacheable = false;

if($this->queryIsSelect($query))
{
$result = $result->fetchAll();
$this->isCacheable = false;
}

return $result;
}
}
This strategy is based on a configuration file (listed below), and selects a particular physical server depending on tables which the query refers to.

[schemas]

db1.adapter                     = mysql
db1.host                        = localhost
db1.username                        = root
db1.password                        =
db1.dbname                      = ganglib_db1

db2.adapter                     = mysql
db2.host                        = localhost
db2.username                        = root
db2.password                        =
db2.dbname                      = ganglib_db2

[strategies]
simple.default                      = db1

vertical.default                    = db1
vertical.from.users                 = db1
vertical.from.images                    = db2

Is It Stable?
We’ve used Ganglib to implement various sites, e.g. Znam.to for Agora (the biggest publisher in Poland), the Active GPS (GIS) system, and more.