You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by jeremy p <at...@gmail.com> on 2014/12/05 22:37:46 UTC

What companies are using HBase to serve a customer-facing product?

Hey all,

So, I'm currently evaluating HBase as a solution for querying a very large
data set (think 60+ TB). We'd like to use it to directly power a
customer-facing product. My question is threefold :

1) What companies use HBase to serve a customer-facing product? I'm not
interested in evaluations, experiments, or POC.  I'm also not interested in
offline BI or analytics.  I'm specifically interested in cases where HBase
serves as the data store for a customer-facing product.

2) Of the companies that use HBase to serve a customer-facing product,
which ones use it to query data sets of 60TB or more?

3) Of companies use HBase to query 60+ TB data sets and serve a
customer-facing product, how many employees are required to support their
HBase installation?  In other words, if I were to start a team tomorrow,
and their purpose was to maintain a 60+ TB HBase installation for a
customer-facing product, how many people should I hire?

4) Of companies use HBase to query 60+ TB data sets and serve a
customer-facing product, what kind of measures do they take for disaster
recovery?

If you can, please point me to articles, videos, and other materials.
Obviously, the larger the company, the better case it will make for HBase.

Thank you!

Re: What companies are using HBase to serve a customer-facing product?

Posted by Esteban Gutierrez <es...@cloudera.com>.
The folks from Gap have a really nice use case:

http://www.slideshare.net/cloudera/1-serving-apparel-catalog-from-h-base-suraj-varma-gap-inc-finalupdated-last-minute


--
Cloudera, Inc.


On Fri, Dec 5, 2014 at 2:01 PM, Ted Yu <yu...@gmail.com> wrote:

> Please see the following:
>
> Facebook messages:
>
> https://www.facebook.com/notes/facebook-engineering/the-underlying-technology-of-messages/454991608919
> https://www.facebook.com/UsingHbase
>
> https://www.facebook.com/download/499785426741400/Storage%20Infrastructure%20Behind%20Facebook%20Messages%20.pdf
>
> Cassini @ EBay:
> http://www.slideshare.net/Hadoop_Summit/ma-june27-140pmroom212v2
>
> http://www.cloudera.com/content/cloudera/en/resources/library/hbasecon/video-hbasecon-2012-hbase-the-use-case-in-ebay-cassini.html
>
> Yahoo:
>
> https://developer.yahoo.com/blogs/ydn/apache-hbase-yahoo-multi-tenancy-helm-again-203911418.html
>
> there're many more ...
>
> On Fri, Dec 5, 2014 at 1:37 PM, jeremy p <at...@gmail.com>
> wrote:
>
> > Hey all,
> >
> > So, I'm currently evaluating HBase as a solution for querying a very
> large
> > data set (think 60+ TB). We'd like to use it to directly power a
> > customer-facing product. My question is threefold :
> >
> > 1) What companies use HBase to serve a customer-facing product? I'm not
> > interested in evaluations, experiments, or POC.  I'm also not interested
> in
> > offline BI or analytics.  I'm specifically interested in cases where
> HBase
> > serves as the data store for a customer-facing product.
> >
> > 2) Of the companies that use HBase to serve a customer-facing product,
> > which ones use it to query data sets of 60TB or more?
> >
> > 3) Of companies use HBase to query 60+ TB data sets and serve a
> > customer-facing product, how many employees are required to support their
> > HBase installation?  In other words, if I were to start a team tomorrow,
> > and their purpose was to maintain a 60+ TB HBase installation for a
> > customer-facing product, how many people should I hire?
> >
> > 4) Of companies use HBase to query 60+ TB data sets and serve a
> > customer-facing product, what kind of measures do they take for disaster
> > recovery?
> >
> > If you can, please point me to articles, videos, and other materials.
> > Obviously, the larger the company, the better case it will make for
> HBase.
> >
> > Thank you!
> >
>

Re: What companies are using HBase to serve a customer-facing product?

Posted by Ted Yu <yu...@gmail.com>.
Please see the following:

Facebook messages:
https://www.facebook.com/notes/facebook-engineering/the-underlying-technology-of-messages/454991608919
https://www.facebook.com/UsingHbase
https://www.facebook.com/download/499785426741400/Storage%20Infrastructure%20Behind%20Facebook%20Messages%20.pdf

Cassini @ EBay:
http://www.slideshare.net/Hadoop_Summit/ma-june27-140pmroom212v2
http://www.cloudera.com/content/cloudera/en/resources/library/hbasecon/video-hbasecon-2012-hbase-the-use-case-in-ebay-cassini.html

Yahoo:
https://developer.yahoo.com/blogs/ydn/apache-hbase-yahoo-multi-tenancy-helm-again-203911418.html

there're many more ...

On Fri, Dec 5, 2014 at 1:37 PM, jeremy p <at...@gmail.com>
wrote:

> Hey all,
>
> So, I'm currently evaluating HBase as a solution for querying a very large
> data set (think 60+ TB). We'd like to use it to directly power a
> customer-facing product. My question is threefold :
>
> 1) What companies use HBase to serve a customer-facing product? I'm not
> interested in evaluations, experiments, or POC.  I'm also not interested in
> offline BI or analytics.  I'm specifically interested in cases where HBase
> serves as the data store for a customer-facing product.
>
> 2) Of the companies that use HBase to serve a customer-facing product,
> which ones use it to query data sets of 60TB or more?
>
> 3) Of companies use HBase to query 60+ TB data sets and serve a
> customer-facing product, how many employees are required to support their
> HBase installation?  In other words, if I were to start a team tomorrow,
> and their purpose was to maintain a 60+ TB HBase installation for a
> customer-facing product, how many people should I hire?
>
> 4) Of companies use HBase to query 60+ TB data sets and serve a
> customer-facing product, what kind of measures do they take for disaster
> recovery?
>
> If you can, please point me to articles, videos, and other materials.
> Obviously, the larger the company, the better case it will make for HBase.
>
> Thank you!
>

Re: What companies are using HBase to serve a customer-facing product?

Posted by Bryan Beaudreault <bb...@hubspot.com>.
At HubSpot we have 5 customer facing production clusters 30-60TB+ each. Our
Data Ops team has ranged from 2-3 (including me), but we support much more
than just hbase. We have an in-house built nightly backup system and
persist all HLogs on an ongoing basis, so in 2-3 hours we can recover to
within about a minute of any disaster . We are also looking at replication
for the future.

Our clusters serve mixed workloads, from APIs serving thousands of req/s to
Hadoop and other batch jobs. Each of these have different SLAs, but the
APIs are driving customer interactions so must be in the 100-500ms range or
lower, mostly.

On Saturday, December 6, 2014, lars hofhansl <la...@apache.org> wrote:

> For expected latency, read this:
> http://hadoop-hbase.blogspot.com/2014/08/hbase-client-response-times.htmlFor
> cluster/machine sizing this might be helpful:
> http://hadoop-hbase.blogspot.com/2013/01/hbase-region-server-memory-sizing.html
>  Disclaimer: I wrote these two posts.
> -- Lars
>
>       From: jeremy p <athomewithagroovebox@gmail.com <javascript:;>>
>  To: user@hbase.apache.org <javascript:;>
>  Sent: Friday, December 5, 2014 1:37 PM
>  Subject: What companies are using HBase to serve a customer-facing
> product?
>
> Hey all,
>
> So, I'm currently evaluating HBase as a solution for querying a very large
> data set (think 60+ TB). We'd like to use it to directly power a
> customer-facing product. My question is threefold :
>
> 1) What companies use HBase to serve a customer-facing product? I'm not
> interested in evaluations, experiments, or POC.  I'm also not interested in
> offline BI or analytics.  I'm specifically interested in cases where HBase
> serves as the data store for a customer-facing product.
>
> 2) Of the companies that use HBase to serve a customer-facing product,
> which ones use it to query data sets of 60TB or more?
>
> 3) Of companies use HBase to query 60+ TB data sets and serve a
> customer-facing product, how many employees are required to support their
> HBase installation?  In other words, if I were to start a team tomorrow,
> and their purpose was to maintain a 60+ TB HBase installation for a
> customer-facing product, how many people should I hire?
>
> 4) Of companies use HBase to query 60+ TB data sets and serve a
> customer-facing product, what kind of measures do they take for disaster
> recovery?
>
> If you can, please point me to articles, videos, and other materials.
> Obviously, the larger the company, the better case it will make for HBase.
>
> Thank you!
>
>
>

Re: What companies are using HBase to serve a customer-facing product?

Posted by Pradeep Gollakota <pr...@gmail.com>.
Lithium (Klout) powers www.klout.com with HBase. The operations team is 2
full time engineers + the manager (who also does hands on operations work
with the team). This operations team is responsible for the entirety of our
Hadoop stack including the HBase clusters. We have one 165 node Hive
cluster for Data Science and 5 HBase clusters (of varying sizes), 2 of
which are used to power klout.com.

We have strong SLA requirements for klout.com as it is a user facing
product. I don't remember the sizing of our HBase clusters off hand but
they are substantial enough to load user profile data and Klout scores for
approximately 600 million users on a daily basis. I believe the data set is
in the order of several terabytes.

On Sat Dec 06 2014 at 8:49:37 PM lars hofhansl <la...@apache.org> wrote:

> For expected latency, read this: http://hadoop-hbase.blogspot.
> com/2014/08/hbase-client-response-times.htmlFor cluster/machine sizing
> this might be helpful: http://hadoop-hbase.blogspot.
> com/2013/01/hbase-region-server-memory-sizing.html  Disclaimer: I wrote
> these two posts.
> -- Lars
>
>       From: jeremy p <at...@gmail.com>
>  To: user@hbase.apache.org
>  Sent: Friday, December 5, 2014 1:37 PM
>  Subject: What companies are using HBase to serve a customer-facing
> product?
>
> Hey all,
>
> So, I'm currently evaluating HBase as a solution for querying a very large
> data set (think 60+ TB). We'd like to use it to directly power a
> customer-facing product. My question is threefold :
>
> 1) What companies use HBase to serve a customer-facing product? I'm not
> interested in evaluations, experiments, or POC.  I'm also not interested in
> offline BI or analytics.  I'm specifically interested in cases where HBase
> serves as the data store for a customer-facing product.
>
> 2) Of the companies that use HBase to serve a customer-facing product,
> which ones use it to query data sets of 60TB or more?
>
> 3) Of companies use HBase to query 60+ TB data sets and serve a
> customer-facing product, how many employees are required to support their
> HBase installation?  In other words, if I were to start a team tomorrow,
> and their purpose was to maintain a 60+ TB HBase installation for a
> customer-facing product, how many people should I hire?
>
> 4) Of companies use HBase to query 60+ TB data sets and serve a
> customer-facing product, what kind of measures do they take for disaster
> recovery?
>
> If you can, please point me to articles, videos, and other materials.
> Obviously, the larger the company, the better case it will make for HBase.
>
> Thank you!
>
>
>

Re: What companies are using HBase to serve a customer-facing product?

Posted by lars hofhansl <la...@apache.org>.
For expected latency, read this: http://hadoop-hbase.blogspot.com/2014/08/hbase-client-response-times.htmlFor cluster/machine sizing this might be helpful: http://hadoop-hbase.blogspot.com/2013/01/hbase-region-server-memory-sizing.html  Disclaimer: I wrote these two posts.
-- Lars

      From: jeremy p <at...@gmail.com>
 To: user@hbase.apache.org 
 Sent: Friday, December 5, 2014 1:37 PM
 Subject: What companies are using HBase to serve a customer-facing product?
   
Hey all,

So, I'm currently evaluating HBase as a solution for querying a very large
data set (think 60+ TB). We'd like to use it to directly power a
customer-facing product. My question is threefold :

1) What companies use HBase to serve a customer-facing product? I'm not
interested in evaluations, experiments, or POC.  I'm also not interested in
offline BI or analytics.  I'm specifically interested in cases where HBase
serves as the data store for a customer-facing product.

2) Of the companies that use HBase to serve a customer-facing product,
which ones use it to query data sets of 60TB or more?

3) Of companies use HBase to query 60+ TB data sets and serve a
customer-facing product, how many employees are required to support their
HBase installation?  In other words, if I were to start a team tomorrow,
and their purpose was to maintain a 60+ TB HBase installation for a
customer-facing product, how many people should I hire?

4) Of companies use HBase to query 60+ TB data sets and serve a
customer-facing product, what kind of measures do they take for disaster
recovery?

If you can, please point me to articles, videos, and other materials.
Obviously, the larger the company, the better case it will make for HBase.

Thank you!


  

Re: What companies are using HBase to serve a customer-facing product?

Posted by Jack Levin <ma...@gmail.com>.
We at Imageshack use Hbase to store all of our images, currently at ~2bl
rows with about 350+ TB.

Jack

On Friday, December 5, 2014, iain wright <ia...@gmail.com> wrote:

> Hi Jeremy,
>
> pinterest is using it for their feeds:
> http://www.slideshare.net/cloudera/case-studies-session-3a
> http://www.slideshare.net/cloudera/operations-session-1
>
> Not sure on their dataset size, they are doing cluster level replication
> for DR. We based our architecture on their success (cluster in each
> az,  multi master replication between them for DR, flume & api's watch
> zookeeper znodes for which cluster to talk too-- talk to one cluster at a
> time and we control flips between them for maintenance/DR). Our use case is
> retrieving social data ingested from twitter/fb/etc. when customer facing
> applications hit our social api.
>
> In terms of team size there are many variables
> - If you are running your own metal there would be more work around
> networking/rack+stack+cabling/provisioning os/etc. unless this is provided
> by another dept already
> - Do you have an hbase expert or DBA in house already? Or are your
> developers going to take on learning schema design and tuning the cluster?
> - Do you have sysadmins/devops available to write puppet/chef/ansible for
> provisioning this cluster (and dev/qa enviornments) and performing
> upgrades/etc. moving forward?
> - Do you have a NOC & monitoring already in place for other pieces of infra
> that will take on monitoring cluster health and responding to alerts/failed
> disk/regionservers/etc.
>
> You may want to check out previous hbasecon and hadoop summit videos, lots
> of presentations will talk about or at least mention their dataset size and
> use case:
> - https://www.youtube.com/user/HadoopSummit
> - http://hbasecon.com/archive.html
>
> All the best,
>
> --
> Iain Wright
>
> This email message is confidential, intended only for the recipient(s)
> named above and may contain information that is privileged, exempt from
> disclosure under applicable law. If you are not the intended recipient, do
> not disclose or disseminate the message to anyone except the intended
> recipient. If you have received this message in error, or are not the named
> recipient(s), please immediately notify the sender by return email, and
> delete all copies of this message.
>
> On Fri, Dec 5, 2014 at 1:37 PM, jeremy p <athomewithagroovebox@gmail.com
> <javascript:;>>
> wrote:
>
> > Hey all,
> >
> > So, I'm currently evaluating HBase as a solution for querying a very
> large
> > data set (think 60+ TB). We'd like to use it to directly power a
> > customer-facing product. My question is threefold :
> >
> > 1) What companies use HBase to serve a customer-facing product? I'm not
> > interested in evaluations, experiments, or POC.  I'm also not interested
> in
> > offline BI or analytics.  I'm specifically interested in cases where
> HBase
> > serves as the data store for a customer-facing product.
> >
> > 2) Of the companies that use HBase to serve a customer-facing product,
> > which ones use it to query data sets of 60TB or more?
> >
> > 3) Of companies use HBase to query 60+ TB data sets and serve a
> > customer-facing product, how many employees are required to support their
> > HBase installation?  In other words, if I were to start a team tomorrow,
> > and their purpose was to maintain a 60+ TB HBase installation for a
> > customer-facing product, how many people should I hire?
> >
> > 4) Of companies use HBase to query 60+ TB data sets and serve a
> > customer-facing product, what kind of measures do they take for disaster
> > recovery?
> >
> > If you can, please point me to articles, videos, and other materials.
> > Obviously, the larger the company, the better case it will make for
> HBase.
> >
> > Thank you!
> >
>

Re: What companies are using HBase to serve a customer-facing product?

Posted by iain wright <ia...@gmail.com>.
Hi Jeremy,

pinterest is using it for their feeds:
http://www.slideshare.net/cloudera/case-studies-session-3a
http://www.slideshare.net/cloudera/operations-session-1

Not sure on their dataset size, they are doing cluster level replication
for DR. We based our architecture on their success (cluster in each
az,  multi master replication between them for DR, flume & api's watch
zookeeper znodes for which cluster to talk too-- talk to one cluster at a
time and we control flips between them for maintenance/DR). Our use case is
retrieving social data ingested from twitter/fb/etc. when customer facing
applications hit our social api.

In terms of team size there are many variables
- If you are running your own metal there would be more work around
networking/rack+stack+cabling/provisioning os/etc. unless this is provided
by another dept already
- Do you have an hbase expert or DBA in house already? Or are your
developers going to take on learning schema design and tuning the cluster?
- Do you have sysadmins/devops available to write puppet/chef/ansible for
provisioning this cluster (and dev/qa enviornments) and performing
upgrades/etc. moving forward?
- Do you have a NOC & monitoring already in place for other pieces of infra
that will take on monitoring cluster health and responding to alerts/failed
disk/regionservers/etc.

You may want to check out previous hbasecon and hadoop summit videos, lots
of presentations will talk about or at least mention their dataset size and
use case:
- https://www.youtube.com/user/HadoopSummit
- http://hbasecon.com/archive.html

All the best,

-- 
Iain Wright

This email message is confidential, intended only for the recipient(s)
named above and may contain information that is privileged, exempt from
disclosure under applicable law. If you are not the intended recipient, do
not disclose or disseminate the message to anyone except the intended
recipient. If you have received this message in error, or are not the named
recipient(s), please immediately notify the sender by return email, and
delete all copies of this message.

On Fri, Dec 5, 2014 at 1:37 PM, jeremy p <at...@gmail.com>
wrote:

> Hey all,
>
> So, I'm currently evaluating HBase as a solution for querying a very large
> data set (think 60+ TB). We'd like to use it to directly power a
> customer-facing product. My question is threefold :
>
> 1) What companies use HBase to serve a customer-facing product? I'm not
> interested in evaluations, experiments, or POC.  I'm also not interested in
> offline BI or analytics.  I'm specifically interested in cases where HBase
> serves as the data store for a customer-facing product.
>
> 2) Of the companies that use HBase to serve a customer-facing product,
> which ones use it to query data sets of 60TB or more?
>
> 3) Of companies use HBase to query 60+ TB data sets and serve a
> customer-facing product, how many employees are required to support their
> HBase installation?  In other words, if I were to start a team tomorrow,
> and their purpose was to maintain a 60+ TB HBase installation for a
> customer-facing product, how many people should I hire?
>
> 4) Of companies use HBase to query 60+ TB data sets and serve a
> customer-facing product, what kind of measures do they take for disaster
> recovery?
>
> If you can, please point me to articles, videos, and other materials.
> Obviously, the larger the company, the better case it will make for HBase.
>
> Thank you!
>

Re: What companies are using HBase to serve a customer-facing product?

Posted by Sean Busbey <bu...@cloudera.com>.
Hi Jeremy!

We'll probably need more information to answer your questions.

In particular, what kind of read or write SLA are you looking to meet? At
what scale of concurrent users? What size of retrievals?

Normally, "customer facing application" means something in the human
interactive time scale, but how tight that bound needs to be varies widely
(e.g. 99% in < 1s, 99% in <500ms, 99% in < 5ms, 99% in < 1ms).

The combination of your latency needs and the expected concurrent workload
will probably end up driving your cluster needs more-so than the data set
size. (For reference, 60TB of raw data will probably fit in an HBase
cluster with only 1-8 worker nodes depending on hdd choice and compression.)

Your questions about # of deployments and supportability should then be
driven by the needed cluster size rather than data set size.

On Fri, Dec 5, 2014 at 3:37 PM, jeremy p <at...@gmail.com>
wrote:

> Hey all,
>
> So, I'm currently evaluating HBase as a solution for querying a very large
> data set (think 60+ TB). We'd like to use it to directly power a
> customer-facing product. My question is threefold :
>
> 1) What companies use HBase to serve a customer-facing product? I'm not
> interested in evaluations, experiments, or POC.  I'm also not interested in
> offline BI or analytics.  I'm specifically interested in cases where HBase
> serves as the data store for a customer-facing product.
>
> 2) Of the companies that use HBase to serve a customer-facing product,
> which ones use it to query data sets of 60TB or more?
>
> 3) Of companies use HBase to query 60+ TB data sets and serve a
> customer-facing product, how many employees are required to support their
> HBase installation?  In other words, if I were to start a team tomorrow,
> and their purpose was to maintain a 60+ TB HBase installation for a
> customer-facing product, how many people should I hire?
>
> 4) Of companies use HBase to query 60+ TB data sets and serve a
> customer-facing product, what kind of measures do they take for disaster
> recovery?
>
> If you can, please point me to articles, videos, and other materials.
> Obviously, the larger the company, the better case it will make for HBase.
>
> Thank you!
>



-- 
Sean