You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-commits@hadoop.apache.org by Apache Wiki <wi...@apache.org> on 2008/10/01 22:41:57 UTC

[Hadoop Wiki] Update of "Hbase/PoweredBy" by stack

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification.

The following page has been changed by stack:
http://wiki.apache.org/hadoop/Hbase/PoweredBy

The comment on the change is:
Added videosurf

------------------------------------------------------------------------------
  
  [http://www.tokenizer.org Shopping Engine at Tokenizer] is a web crawler; it uses HBase to store URLs and Outlinks (AnchorText + LinkedURL): more than a billion. It was initially designed as Nutch-Hadoop extension, then (due to very specific 'shopping' scenario) moved to SOLR + MySQL(InnoDB) (ten thousands queries per second), and now - to HBase. HBase is significantly faster due to: no need for huge transaction logs, column-oriented design exactly matches 'lazy' business logic, data compression, MapReduce support. Number of mutable 'indexes' (term from RDBMS) significantly reduced due to the fact that each 'row::column' structure is physically sorted by 'row'. MySQL InnoDB engine is best DB choice for highly-concurrent updates. However, necessity to flash a block of data to harddrive even if we changed only few bytes is obvious bottleneck. HBase greatly helps: not-so-popular in modern DBMS 'delete-insert', 'mutable primary key', and 'natural primary key' patterns become a 
 big advantage with HBase.
  
+ [http://www.videosurf.com/ VideoSurf] - "The video search engine that has taught computers to see". We're using Hbase to persist various large graphs of data and other statistics. Hbase was a real win for us because it let us store substantially larger datasets without the need for manually partitioning the data and it's column-oriented nature allowed us to create schemas that were substantially more efficient for storing and retrieving data.
+