SQL: Column Oriented Databases

http://www.nearinfinity.com/blogs/aaron_mccurry/column_oriented_databases.html

Column Oriented Databases

By Aaron Mccurry

Apr 30, 2008

What is it?

A column oriented database (or a column store database) is a database that stores it’s information in a column oriented manor, instead of a row oriented manor.

Take for example this simple table of information.

Id FirstName LastName Gender BirthDate
1 John Doe M 1/1/1980
2 Jane Doe F 2/2/1981
3 John Smith M 3/3/1979

It is very natural for a row oriented database to store the following information as sequential appending rows on disk. This makes them very good for inserting and updating data (depending on implementation).

1,John,Doe,M,1/1/1980~2,Jane,Doe,F,2/2/1981~3,John,Smith,M,3/3/1979

A column oriented database might store the data like this:

1,2,3~John,Jane,John~Doe,Doe,Smith~M,F,M~1/1/1980,2/2/1981,3/3/1979

As you can imagine this type of structure would not perform well if you were making lots of changes to the data. But if you are scanning through a column looking for values, the column oriented database groups all the values of a column together as opposed to spreading them out like the row store database would do.

Why do I care?

Database vendors would argue that indexing the columns solves all the performance problems of a row store database. And for most applications they are right, but one benefit that most column store databases provide that may not be apparent, is their ability to compress the data. Take my example:

1,2,3~John,Jane,John~Doe,Doe,Smith~M,F,M~1/1/1980,2/2/1981,3/3/1979

Because the type of each column is consistent, compression of the entire column can be a lot greater than compressing each row in a row store database (some newer RDBMs are adding compression features). Vertica, a commercial column store database boasts up to 90% reduction in storage compared with a traditional row store databases.

Query speed can also be greatly improved on a column store database versus a row store database, although that feature is more dependent on implementation then on the structure of the data.

Conclusion

For problems where you are writing and updating a lot (OLTP in database talk) stick with a row store database. But if you are reading more than you are writing (OLAP) a column store database might be something worth considering.

Here are some column store databases that I have used or heard of:

Open source:

Commercial:

  • Vertica – Grid based column store database
Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s