You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-commits@hadoop.apache.org by Apache Wiki <wi...@apache.org> on 2009/05/13 15:59:58 UTC

[Hadoop Wiki] Update of "Hbase/PoweredBy" by CosminLehene

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification.

The following page has been changed by CosminLehene:
http://wiki.apache.org/hadoop/Hbase/PoweredBy

The comment on the change is:
Updating information related to Adobe HBase clusters

------------------------------------------------------------------------------
- [http://www.adobe.com Adobe] - We use a 5 node cluster running HDFS, Hadoop and HBase as a storage and processing backend for some of our social services. Data is regularly aggregated using mapreduce jobs and stored back in HBase. Currently an evaluation experiment, the storage is designed to store around 20-40M rows of structured data. The production cluster has been running since Oct 2008.
+ [http://www.adobe.com Adobe] - We currently have about 30 nodes running HDFS, Hadoop and HBase  in clusters ranging from 5 to 14 nodes on both production and development. In two months we'll be deploying an 80 nodes cluster. We are using HBase in several areas from social services to structured data and processing for internal use. We constantly write data to HBase and run mapreduce jobs to process then store it back to HBase or external systems. Our production cluster has been running since Oct 2008.
  
  [http://www.mahalo.com Mahalo], "...the world's first human-powered search engine". All the markup that powers the wiki is stored in HBase. It's been in use for a few months now. !MediaWiki - the same software that power Wikipedia - has version/revision control. Mahalo's in-house editors produce a lot of revisions per day, which was not working well in a RDBMS. An hbase-based solution for this was built and tested, and the data migrated out of MySQL and into HBase. Right now it's at something like 6 million items in HBase. The upload tool runs every hour from a shell script to back up that data, and on 6 nodes takes about 5-10 minutes to run - and does not slow down production at all.