You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by suresh babu <bi...@gmail.com> on 2014/02/04 14:22:22 UTC

Regarding Hardware configuration for HBase cluster

Hi folks,

We are trying to setup HBase cluster for the following requirement:

We have to maintain data of size around 800TB,

For the above requirement,please suggest me the best hardware configuration
details like

1)how many disks to consider for machine and the  capacity of disks ,for
example, 16/24 disks per node with 1/2TB capacity per each disk

2) which compression method is suited for production environment , space is
not a major limitation , but speed is of prime concern for my use case

3) how many CPU Cores should be configured for each node/machine ?  Or
ideal ratio of number of cores to the number of disks,for example
1core/1disk ?

Regards,
Kaushik

Re: Regarding Hardware configuration for HBase cluster

Posted by Nick Xie <ni...@gmail.com>.
Ramu,

I think Kaushik wants to setup a HBase cluster. 24TB on a single region
server sounds too large to handle anyway.

Nick


On Sat, Feb 8, 2014 at 12:10 AM, Ramu M S <ra...@gmail.com> wrote:

> Lars,
>
> What about high density storage servers that has capacity of up to 24
> drives. There were also some recommendations in few blogs about having 1
> core per disk.
>
> 1TB disks have slight price difference compared to 600 GB. With
> negotiations it'll be as low as 50$. Also price difference between 8 core
> and 12 core processors is very less, 200-300$.
>
> Do you think having 20-24 cores and 24 1TB disks will also be an option?
>
> Regards,
> Ramu
> On Feb 8, 2014 11:19 AM, "lars hofhansl" <la...@apache.org> wrote:
>
> > Let's not refer to our users in the third person. It's not polite :)
> >
> > Suresh,
> >
> > I wrote something up about RegionServer sizing here:
> >
> http://hadoop-hbase.blogspot.com/2013/01/hbase-region-server-memory-sizing.html
> >
> > For your load I would guess that you'd need about 100 servers.
> >
> > That would:
> > 1. have 8TB/server
> > 2. 30m rows/day/server
> > 3. 30GB/day/server
> >
> > You not expect a single server to be able to absorb more than 10000rows/s
> > or 40mb/s, whatever is less.
> >
> > The machines I'd size as follows:
> > 12-16 cores, HT, 1.8GHz-2.4GHz (more is better)
> > 32-96GB ram
> > 6-12 drives (more spindles are better to absorb the write load)
> > 10ge NICs and TopOfRack switches
> >
> > Now, this is only a *rough guideline* and obviously you'd have perform
> > your own tests and this would only scale across if the machines if your
> > keys are sufficiently distributed.
> > The details also depend on how compressable your data is and your exact
> > access patterns (read patters, spiky write load, etc)
> > Start with 10 data nodes and appropriately scaled down load and see how
> it
> > works.
> >
> > Vladimir is right here, you probably want to seek professional help.
> >
> > -- Lars
> >
> >
> >
> >
> > ________________________________
> >  From: Vladimir Rodionov <vr...@carrieriq.com>
> > To: "user@hbase.apache.org" <us...@hbase.apache.org>
> > Sent: Friday, February 7, 2014 10:29 AM
> > Subject: RE: Regarding Hardware configuration for HBase cluster
> >
> >
> > This guy is building system of a scale of Yahoo and asking user group how
> > to size the cluster.
> > Few people here can give him advice based on their experience and I am
> not
> > one of them. I can
> > only speculate on "how many nodes will we need to consume 3TB/3B records
> > daily".
> >
> > For this scale of a system its better to go to Cloudera/IBM/HW, and not
> to
> > try to build it yourself,
> > especially when you ask questions on user group (not answer them).
> >
> > Best regards,
> > Vladimir Rodionov
> > Principal Platform Engineer
> > Carrier IQ, www.carrieriq.com
> > e-mail: vrodionov@carrieriq.com
> >
> > ________________________________________
> >
> > From: Ted Yu [yuzhihong@gmail.com]
> > Sent: Friday, February 07, 2014 6:27 AM
> > To: user@hbase.apache.org
> > Cc: user@hbase.apache.org
> > Subject: Re: Regarding Hardware configuration for HBase cluster
> >
> > Have you read http://www.slideshare.net/larsgeorge/hbase-sizing-notes ?
> >
> > Cheers
> >
> > On Feb 6, 2014, at 8:47 PM, suresh babu <bi...@gmail.com> wrote:
> >
> > > Hi Stana,
> > >
> > > We are trying to find out how many data nodes (including hardware
> > > configuration detail)should be configured or setup for this requirement
> > >
> > > -suresh
> > >
> > > On Friday, February 7, 2014, stana <st...@is-land.com.tw> wrote:
> > >
> > >> HI suresh babu :
> > >>
> > >> how many data nodes do you have?
> > >>
> > >>
> > >> 2014-02-07 suresh babu <bigdatacslt@gmail.com <javascript:;>>:
> > >>
> > >>> refreshing the thread,
> > >>>
> > >>> Can you please  suggest any inputs for the hardware configuration(for
> > the
> > >>> below mentioned use case).
> > >>>
> > >>>
> > >>>
> > >>>
> > >>> On Wed, Feb 5, 2014 at 10:31 AM, suresh babu <bi...@gmail.com>
> > >>> wrote:
> > >>>
> > >>>> Please find the data requirements for our use case below :
> > >>>>
> > >>>> Raw data processing
> > >>>> ----------------------------------
> > >>>> 1. Data is populated into hdfs , after etl around 3 billion puts per
> > >> day
> > >>>> in to hbase
> > >>>>
> > >>>> 2. Oldest data after X days to be deleted from hbase
> > >>>>
> > >>>> Aggregates processing
> > >>>> ----------------------------------
> > >>>> 3 billion reads per day ... Large scan or reads
> > >>>>
> > >>>> KV size around 1 KB Daily Processing, raw and aggregates, via M/R
> jobs
> > >>>> Hive queries in future, but not of immediate focus
> > >>>> On Feb 5, 2014 12:48 AM, "Vladimir Rodionov" <
> vrodionov@carrieriq.com
> > >
> > >>>> wrote:
> > >>>>
> > >>>>> Yes,
> > >>>>>
> > >>>>> 1. What is the expected avg and peak load in
> > >>> writes/updates/deletes/reads?
> > >>>>> 2. What is the average size of a KV?
> > >>>>> 3. Reads/small scans/medium/large scan %%
> > >>>>> 4. Do you plan M/R jobs, Hive query?
> > >>>>>
> > >>>>>
> > >>>>> Best regards,
> > >>>>> Vladimir Rodionov
> > >>>>> Principal Platform Engineer
> > >>>>> Carrier IQ, www.carrieriq.com
> > >>>>> e-mail: vrodionov@carrieriq.com
> > >>>>>
> > >>>>> ________________________________________
> > >>>>> From: Nick Xie [nick.xie.hadoop@gmail.com]
> > >>>>> Sent: Tuesday, February 04, 2014 10:02 AM
> > >>>>> To: user@hbase.apache.org
> > >>>>> Subject: Re: Regarding Hardware configuration for HBase cluster
> > >>>>>
> > >>>>> I guess you'd better describe a little bit more about your
> > >> applications.
> > >>>>> Does the data increase over the time at all?
> > >>>>>
> > >>>>> Nick
> > >>>>>
> > >>>>>
> > >>>>> On Tue, Feb 4, 2014 at 5:22 AM, suresh babu <bigdatacslt@gmail.com
> >
> > >>>>> wrote:
> > >>>>>
> > >>>>>> Hi folks,
> > >>>>>>
> > >>>>>> We are trying to setup HBase cluster for the following
> requirement:
> > >>>>>>
> > >>>>>> We have to maintain data of size around 800TB,
> > >>>>>>
> > >>>>>> For the above requirement,please suggest me the best hardware
> > >>>>> configuration
> > >>>>>> details like
> > >>>>>>
> > >>>>>> 1)how many disks to consider for machine and the  capacity of
> disks
> > >>> ,for
> > >>>>>> example, 16/24 disks per node with 1/2TB capacity per each disk
> > >>>>>>
> > >>>>>> 2) which compression method is suited for production environment ,
> > >>>>> space is
> > >>>>>> not a major limitation , but speed is of prime concern for my use
> > >> case
> > >>>>>>
> > >>>>>> 3) how many CPU Cores should be configured for each node/machine ?
> > >> Or
> > >>>>>> ideal ratio of number of cores to the number of disks,for example
> > >>>>>> 1core/1disk ?
> > >>>>>>
> > >>>>>> Regards,
> > >>>>>> Kaushik
> > >>>>>
> > >>>>> Confidentiality Notice:  The information contained in this message,
> > >>>>> including any attachments hereto, may be confidential and is
> intended
> > >>> to be
> > >>>>> read only by the individual or entity to whom this message is
> > >>> addressed. If
> > >>>>> the reader of this message is not the intended recipient or an
> agent
> > >> or
> > >>>>> designee of the intended recipient, please note that any review,
> use,
> > >>>>> disclosure or distribution of this message or its attachments, in
> any
> > >>> form,
> > >>>>> is strictly prohibited.  If you have received this message in
> error,
> > >>> please
> > >>>>> immediat--
> > >> Best Regards
> > >>
> > >> 亦思科技  is-land Systems Inc.
> > >> Tel:03-5630345 Ext.14
> > >> Fax:03-5631345
> > >> e-MAIL:stana@is-land.com.tw <javascript:;>
> > >>
> > >> 何永安 Yung An He
> > >>
> >
> > Confidentiality Notice:  The information contained in this message,
> > including any attachments hereto, may be confidential and is intended to
> be
> > read only by the individual or entity to whom this message is addressed.
> If
> > the reader of this message is not the intended recipient or an agent or
> > designee of the intended recipient, please note that any review, use,
> > disclosure or distribution of this message or its attachments, in any
> form,
> > is strictly prohibited.  If you have received this message in error,
> please
> > immediately notify the sender and/or Notifications@carrieriq.com and
> > delete or destroy any copy of this message and its attachments.
>

Re: Regarding Hardware configuration for HBase cluster

Posted by Enis Söztutar <en...@hortonworks.com>.
We've also recently updated
http://hbase.apache.org/book/ops.capacity.htmlwhich contains similar
numbers, and some more details on the items to
consider for sizing.

Enis



On Sat, Feb 8, 2014 at 10:12 PM, Ramu M S <ra...@gmail.com> wrote:

> Thanks Lars.
>
> We were in the process of building our HBase cluster. Much smaller size
> though. This discussion helped a lot to us as well.
>
> Regards,
> Ramu
> On Feb 9, 2014 11:06 AM, "lars hofhansl" <la...@apache.org> wrote:
>
> > In a year or two you won't be able to buy 1T or even 2T disks cheaply.
> > More spindles are good more cores are good too. This is a fuzzy art.
> >
> > A hard fact is that HBase cannot (at the moment) handle more than 8-10T
> > per server with HBase, you'd  just have extra disks for IOPS.
> > You won't be happy if you expect each server to store 24T.
> >
> > I would go with more and smaller servers. Some people run two
> > RegionServers on a single machine, but that is not a well explored option
> > at this point (up to recently it needed an HBase patch to work).
> >
> > You *definitely* have to do some benchmarking with your usecase. You
> might
> > be able to get away with fewer servers, you need to test for that.
> >
> > -- Lars
> >
> >
> >
> >
> > ________________________________
> >  From: Ramu M S <ra...@gmail.com>
> > To: user@hbase.apache.org
> > Sent: Saturday, February 8, 2014 12:10 AM
> > Subject: Re: Regarding Hardware configuration for HBase cluster
> >
> >
> > Lars,
> >
> > What about high density storage servers that has capacity of up to 24
> > drives. There were also some recommendations in few blogs about having 1
> > core per disk.
> >
> > 1TB disks have slight price difference compared to 600 GB. With
> > negotiations it'll be as low as 50$. Also price difference between 8 core
> > and 12 core processors is very less, 200-300$.
> >
> > Do you think having 20-24 cores and 24 1TB disks will also be an option?
> >
> > Regards,
> > Ramu
> >
> > On Feb 8, 2014 11:19 AM, "lars hofhansl" <la...@apache.org> wrote:
> >
> > > Let's not refer to our users in the third person. It's not polite :)
> > >
> > > Suresh,
> > >
> > > I wrote something up about RegionServer sizing here:
> > >
> >
> http://hadoop-hbase.blogspot.com/2013/01/hbase-region-server-memory-sizing.html
> > >
> > > For your load I would guess that you'd need about 100 servers.
> > >
> > > That would:
> > > 1. have 8TB/server
> > > 2. 30m rows/day/server
> > > 3. 30GB/day/server
> > >
> > > You not expect a single server to be able to absorb more than
> 10000rows/s
> > > or 40mb/s, whatever is less.
> > >
> > > The machines I'd size as follows:
> > > 12-16 cores, HT, 1.8GHz-2.4GHz (more is better)
> > > 32-96GB ram
> > > 6-12 drives (more spindles are better to absorb the write load)
> > > 10ge NICs and TopOfRack switches
> > >
> > > Now, this is only a *rough guideline* and obviously you'd have perform
> > > your own tests and this would only scale across if the machines if your
> > > keys are sufficiently distributed.
> > > The details also depend on how compressable your data is and your exact
> > > access patterns (read patters, spiky write load, etc)
> > > Start with 10 data nodes and appropriately scaled down load and see how
> > it
> > > works.
> > >
> > > Vladimir is right here, you probably want to seek professional help.
> > >
> > > -- Lars
> > >
> > >
> > >
> > >
> > > ________________________________
> > >  From: Vladimir Rodionov <vr...@carrieriq.com>
> > > To: "user@hbase.apache.org" <us...@hbase.apache.org>
> > > Sent: Friday, February 7, 2014 10:29 AM
> > > Subject: RE: Regarding Hardware configuration for HBase cluster
> > >
> > >
> > > This guy is building system of a scale of Yahoo and asking user group
> how
> > > to size the cluster.
> > > Few people here can give him advice based on their experience and I am
> > not
> > > one of them. I can
> > > only speculate on "how many nodes will we need to consume 3TB/3B
> records
> > > daily".
> > >
> > > For this scale of a system its better to go to Cloudera/IBM/HW, and not
> > to
> > > try to build it yourself,
> > > especially when you ask questions on user group (not answer them).
> > >
> > > Best regards,
> > > Vladimir Rodionov
> > > Principal Platform Engineer
> > > Carrier IQ, www.carrieriq.com
> > > e-mail: vrodionov@carrieriq.com
> > >
> > > ________________________________________
> > >
> > > From: Ted Yu [yuzhihong@gmail.com]
> > > Sent: Friday, February 07, 2014 6:27 AM
> > > To: user@hbase.apache.org
> > > Cc: user@hbase.apache.org
> > > Subject: Re: Regarding Hardware configuration for HBase cluster
> > >
> > > Have you read http://www.slideshare.net/larsgeorge/hbase-sizing-notes?
> > >
> > > Cheers
> > >
> > > On Feb 6, 2014, at 8:47 PM, suresh babu <bi...@gmail.com> wrote:
> > >
> > > > Hi Stana,
> > > >
> > > > We are trying to find out how many data nodes (including hardware
> > > > configuration detail)should be configured or setup for this
> requirement
> > > >
> > > > -suresh
> > > >
> > > > On Friday, February 7, 2014, stana <st...@is-land.com.tw> wrote:
> > > >
> > > >> HI suresh babu :
> > > >>
> > > >> how many data nodes do you have?
> > > >>
> > > >>
> > > >> 2014-02-07 suresh babu <bigdatacslt@gmail.com <javascript:;>>:
> > > >>
> > > >>> refreshing the thread,
> > > >>>
> > > >>> Can you please  suggest any inputs for the hardware
> configuration(for
> > > the
> > > >>> below mentioned use case).
> > > >>>
> > > >>>
> > > >>>
> > > >>>
> > > >>> On Wed, Feb 5, 2014 at 10:31 AM, suresh babu <
> bigdatacslt@gmail.com>
> > > >>> wrote:
> > > >>>
> > > >>>> Please find the data requirements for our use case below :
> > > >>>>
> > > >>>> Raw data processing
> > > >>>> ----------------------------------
> > > >>>> 1. Data is populated into hdfs , after etl around 3 billion puts
> per
> > > >> day
> > > >>>> in to hbase
> > > >>>>
> > > >>>> 2. Oldest data after X days to be deleted from hbase
> > > >>>>
> > > >>>> Aggregates processing
> > > >>>> ----------------------------------
> > > >>>> 3 billion reads per day ... Large scan or reads
> > > >>>>
> > > >>>> KV size around 1 KB Daily Processing, raw and aggregates, via M/R
> > jobs
> > > >>>> Hive queries in future, but not of immediate focus
> > > >>>> On Feb 5, 2014 12:48 AM, "Vladimir Rodionov" <
> > vrodionov@carrieriq.com
> > > >
> > > >>>> wrote:
> > > >>>>
> > > >>>>> Yes,
> > > >>>>>
> > > >>>>> 1. What is the expected avg and peak load in
> > > >>> writes/updates/deletes/reads?
> > > >>>>> 2. What is the average size of a KV?
> > > >>>>> 3. Reads/small scans/medium/large scan %%
> > > >>>>> 4. Do you plan M/R jobs, Hive query?
> > > >>>>>
> > > >>>>>
> > > >>>>> Best regards,
> > > >>>>> Vladimir Rodionov
> > > >>>>> Principal Platform Engineer
> > > >>>>> Carrier IQ, www.carrieriq.com
> > > >>>>> e-mail: vrodionov@carrieriq.com
> > > >>>>>
> > > >>>>> ________________________________________
> > > >>>>> From: Nick Xie [nick.xie.hadoop@gmail.com]
> > > >>>>> Sent: Tuesday, February 04, 2014 10:02 AM
> > > >>>>> To: user@hbase.apache.org
> > > >>>>> Subject: Re: Regarding Hardware configuration for HBase cluster
> > > >>>>>
> > > >>>>> I guess you'd better describe a little bit more about your
> > > >> applications.
> > > >>>>> Does the data increase over the time at all?
> > > >>>>>
> > > >>>>> Nick
> > > >>>>>
> > > >>>>>
> > > >>>>> On Tue, Feb 4, 2014 at 5:22 AM, suresh babu <
> bigdatacslt@gmail.com
> > >
> > > >>>>> wrote:
> > > >>>>>
> > > >>>>>> Hi folks,
> > > >>>>>>
> > > >>>>>> We are trying to setup HBase cluster for the following
> > requirement:
> > > >>>>>>
> > > >>>>>> We have to maintain data of size around 800TB,
> > > >>>>>>
> > > >>>>>> For the above requirement,please suggest me the best hardware
> > > >>>>> configuration
> > > >>>>>> details like
> > > >>>>>>
> > > >>>>>> 1)how many disks to consider for machine and the  capacity of
> > disks
> > > >>> ,for
> > > >>>>>> example, 16/24 disks per node with 1/2TB capacity per each disk
> > > >>>>>>
> > > >>>>>> 2) which compression method is suited for production
> environment ,
> > > >>>>> space is
> > > >>>>>> not a major limitation , but speed is of prime concern for my
> use
> > > >> case
> > > >>>>>>
> > > >>>>>> 3) how many CPU Cores should be configured for each
> node/machine ?
> > > >> Or
> > > >>>>>> ideal ratio of number of cores to the number of disks,for
> example
> > > >>>>>> 1core/1disk ?
> > > >>>>>>
> > > >>>>>> Regards,
> > > >>>>>> Kaushik
> > > >>>>>
> > > >>>>> Confidentiality Notice:  The information contained in this
> message,
> > > >>>>> including any attachments hereto, may be confidential and is
> > intended
> > > >>> to be
> > > >>>>> read only by the individual or entity to whom this message is
> > > >>> addressed. If
> > > >>>>> the reader of this message is not the intended recipient or an
> > agent
> > > >> or
> > > >>>>> designee of the intended recipient, please note that any review,
> > use,
> > > >>>>> disclosure or distribution of this message or its attachments, in
> > any
> > > >>> form,
> > > >>>>> is strictly prohibited.  If you have received this message in
> > error,
> > > >>> please
> > > >>>>> immediat--
> > > >> Best Regards
> > > >>
> > > >> 亦思科技  is-land Systems Inc.
> > > >> Tel:03-5630345 Ext.14
> > > >> Fax:03-5631345
> > > >> e-MAIL:stana@is-land.com.tw <javascript:;>
> > > >>
> > > >> 何永安 Yung An He
> > > >>
> > >
> > > Confidentiality Notice:  The information contained in this message,
> > > including any attachments hereto, may be confidential and is intended
> to
> > be
> > > read only by the individual or entity to whom this message is
> addressed.
> > If
> > > the reader of this message is not the intended recipient or an agent or
> > > designee of the intended recipient, please note that any review, use,
> > > disclosure or distribution of this message or its attachments, in any
> > form,
> > > is strictly prohibited.  If you have received this message in error,
> > please
> > > immediately notify the sender and/or Notifications@carrieriq.com and
> > > delete or destroy any copy of this message and its attachments.
>

Re: Regarding Hardware configuration for HBase cluster

Posted by Ramu M S <ra...@gmail.com>.
Thanks Lars.

We were in the process of building our HBase cluster. Much smaller size
though. This discussion helped a lot to us as well.

Regards,
Ramu
On Feb 9, 2014 11:06 AM, "lars hofhansl" <la...@apache.org> wrote:

> In a year or two you won't be able to buy 1T or even 2T disks cheaply.
> More spindles are good more cores are good too. This is a fuzzy art.
>
> A hard fact is that HBase cannot (at the moment) handle more than 8-10T
> per server with HBase, you'd  just have extra disks for IOPS.
> You won't be happy if you expect each server to store 24T.
>
> I would go with more and smaller servers. Some people run two
> RegionServers on a single machine, but that is not a well explored option
> at this point (up to recently it needed an HBase patch to work).
>
> You *definitely* have to do some benchmarking with your usecase. You might
> be able to get away with fewer servers, you need to test for that.
>
> -- Lars
>
>
>
>
> ________________________________
>  From: Ramu M S <ra...@gmail.com>
> To: user@hbase.apache.org
> Sent: Saturday, February 8, 2014 12:10 AM
> Subject: Re: Regarding Hardware configuration for HBase cluster
>
>
> Lars,
>
> What about high density storage servers that has capacity of up to 24
> drives. There were also some recommendations in few blogs about having 1
> core per disk.
>
> 1TB disks have slight price difference compared to 600 GB. With
> negotiations it'll be as low as 50$. Also price difference between 8 core
> and 12 core processors is very less, 200-300$.
>
> Do you think having 20-24 cores and 24 1TB disks will also be an option?
>
> Regards,
> Ramu
>
> On Feb 8, 2014 11:19 AM, "lars hofhansl" <la...@apache.org> wrote:
>
> > Let's not refer to our users in the third person. It's not polite :)
> >
> > Suresh,
> >
> > I wrote something up about RegionServer sizing here:
> >
> http://hadoop-hbase.blogspot.com/2013/01/hbase-region-server-memory-sizing.html
> >
> > For your load I would guess that you'd need about 100 servers.
> >
> > That would:
> > 1. have 8TB/server
> > 2. 30m rows/day/server
> > 3. 30GB/day/server
> >
> > You not expect a single server to be able to absorb more than 10000rows/s
> > or 40mb/s, whatever is less.
> >
> > The machines I'd size as follows:
> > 12-16 cores, HT, 1.8GHz-2.4GHz (more is better)
> > 32-96GB ram
> > 6-12 drives (more spindles are better to absorb the write load)
> > 10ge NICs and TopOfRack switches
> >
> > Now, this is only a *rough guideline* and obviously you'd have perform
> > your own tests and this would only scale across if the machines if your
> > keys are sufficiently distributed.
> > The details also depend on how compressable your data is and your exact
> > access patterns (read patters, spiky write load, etc)
> > Start with 10 data nodes and appropriately scaled down load and see how
> it
> > works.
> >
> > Vladimir is right here, you probably want to seek professional help.
> >
> > -- Lars
> >
> >
> >
> >
> > ________________________________
> >  From: Vladimir Rodionov <vr...@carrieriq.com>
> > To: "user@hbase.apache.org" <us...@hbase.apache.org>
> > Sent: Friday, February 7, 2014 10:29 AM
> > Subject: RE: Regarding Hardware configuration for HBase cluster
> >
> >
> > This guy is building system of a scale of Yahoo and asking user group how
> > to size the cluster.
> > Few people here can give him advice based on their experience and I am
> not
> > one of them. I can
> > only speculate on "how many nodes will we need to consume 3TB/3B records
> > daily".
> >
> > For this scale of a system its better to go to Cloudera/IBM/HW, and not
> to
> > try to build it yourself,
> > especially when you ask questions on user group (not answer them).
> >
> > Best regards,
> > Vladimir Rodionov
> > Principal Platform Engineer
> > Carrier IQ, www.carrieriq.com
> > e-mail: vrodionov@carrieriq.com
> >
> > ________________________________________
> >
> > From: Ted Yu [yuzhihong@gmail.com]
> > Sent: Friday, February 07, 2014 6:27 AM
> > To: user@hbase.apache.org
> > Cc: user@hbase.apache.org
> > Subject: Re: Regarding Hardware configuration for HBase cluster
> >
> > Have you read http://www.slideshare.net/larsgeorge/hbase-sizing-notes ?
> >
> > Cheers
> >
> > On Feb 6, 2014, at 8:47 PM, suresh babu <bi...@gmail.com> wrote:
> >
> > > Hi Stana,
> > >
> > > We are trying to find out how many data nodes (including hardware
> > > configuration detail)should be configured or setup for this requirement
> > >
> > > -suresh
> > >
> > > On Friday, February 7, 2014, stana <st...@is-land.com.tw> wrote:
> > >
> > >> HI suresh babu :
> > >>
> > >> how many data nodes do you have?
> > >>
> > >>
> > >> 2014-02-07 suresh babu <bigdatacslt@gmail.com <javascript:;>>:
> > >>
> > >>> refreshing the thread,
> > >>>
> > >>> Can you please  suggest any inputs for the hardware configuration(for
> > the
> > >>> below mentioned use case).
> > >>>
> > >>>
> > >>>
> > >>>
> > >>> On Wed, Feb 5, 2014 at 10:31 AM, suresh babu <bi...@gmail.com>
> > >>> wrote:
> > >>>
> > >>>> Please find the data requirements for our use case below :
> > >>>>
> > >>>> Raw data processing
> > >>>> ----------------------------------
> > >>>> 1. Data is populated into hdfs , after etl around 3 billion puts per
> > >> day
> > >>>> in to hbase
> > >>>>
> > >>>> 2. Oldest data after X days to be deleted from hbase
> > >>>>
> > >>>> Aggregates processing
> > >>>> ----------------------------------
> > >>>> 3 billion reads per day ... Large scan or reads
> > >>>>
> > >>>> KV size around 1 KB Daily Processing, raw and aggregates, via M/R
> jobs
> > >>>> Hive queries in future, but not of immediate focus
> > >>>> On Feb 5, 2014 12:48 AM, "Vladimir Rodionov" <
> vrodionov@carrieriq.com
> > >
> > >>>> wrote:
> > >>>>
> > >>>>> Yes,
> > >>>>>
> > >>>>> 1. What is the expected avg and peak load in
> > >>> writes/updates/deletes/reads?
> > >>>>> 2. What is the average size of a KV?
> > >>>>> 3. Reads/small scans/medium/large scan %%
> > >>>>> 4. Do you plan M/R jobs, Hive query?
> > >>>>>
> > >>>>>
> > >>>>> Best regards,
> > >>>>> Vladimir Rodionov
> > >>>>> Principal Platform Engineer
> > >>>>> Carrier IQ, www.carrieriq.com
> > >>>>> e-mail: vrodionov@carrieriq.com
> > >>>>>
> > >>>>> ________________________________________
> > >>>>> From: Nick Xie [nick.xie.hadoop@gmail.com]
> > >>>>> Sent: Tuesday, February 04, 2014 10:02 AM
> > >>>>> To: user@hbase.apache.org
> > >>>>> Subject: Re: Regarding Hardware configuration for HBase cluster
> > >>>>>
> > >>>>> I guess you'd better describe a little bit more about your
> > >> applications.
> > >>>>> Does the data increase over the time at all?
> > >>>>>
> > >>>>> Nick
> > >>>>>
> > >>>>>
> > >>>>> On Tue, Feb 4, 2014 at 5:22 AM, suresh babu <bigdatacslt@gmail.com
> >
> > >>>>> wrote:
> > >>>>>
> > >>>>>> Hi folks,
> > >>>>>>
> > >>>>>> We are trying to setup HBase cluster for the following
> requirement:
> > >>>>>>
> > >>>>>> We have to maintain data of size around 800TB,
> > >>>>>>
> > >>>>>> For the above requirement,please suggest me the best hardware
> > >>>>> configuration
> > >>>>>> details like
> > >>>>>>
> > >>>>>> 1)how many disks to consider for machine and the  capacity of
> disks
> > >>> ,for
> > >>>>>> example, 16/24 disks per node with 1/2TB capacity per each disk
> > >>>>>>
> > >>>>>> 2) which compression method is suited for production environment ,
> > >>>>> space is
> > >>>>>> not a major limitation , but speed is of prime concern for my use
> > >> case
> > >>>>>>
> > >>>>>> 3) how many CPU Cores should be configured for each node/machine ?
> > >> Or
> > >>>>>> ideal ratio of number of cores to the number of disks,for example
> > >>>>>> 1core/1disk ?
> > >>>>>>
> > >>>>>> Regards,
> > >>>>>> Kaushik
> > >>>>>
> > >>>>> Confidentiality Notice:  The information contained in this message,
> > >>>>> including any attachments hereto, may be confidential and is
> intended
> > >>> to be
> > >>>>> read only by the individual or entity to whom this message is
> > >>> addressed. If
> > >>>>> the reader of this message is not the intended recipient or an
> agent
> > >> or
> > >>>>> designee of the intended recipient, please note that any review,
> use,
> > >>>>> disclosure or distribution of this message or its attachments, in
> any
> > >>> form,
> > >>>>> is strictly prohibited.  If you have received this message in
> error,
> > >>> please
> > >>>>> immediat--
> > >> Best Regards
> > >>
> > >> 亦思科技  is-land Systems Inc.
> > >> Tel:03-5630345 Ext.14
> > >> Fax:03-5631345
> > >> e-MAIL:stana@is-land.com.tw <javascript:;>
> > >>
> > >> 何永安 Yung An He
> > >>
> >
> > Confidentiality Notice:  The information contained in this message,
> > including any attachments hereto, may be confidential and is intended to
> be
> > read only by the individual or entity to whom this message is addressed.
> If
> > the reader of this message is not the intended recipient or an agent or
> > designee of the intended recipient, please note that any review, use,
> > disclosure or distribution of this message or its attachments, in any
> form,
> > is strictly prohibited.  If you have received this message in error,
> please
> > immediately notify the sender and/or Notifications@carrieriq.com and
> > delete or destroy any copy of this message and its attachments.

Re: Regarding Hardware configuration for HBase cluster

Posted by lars hofhansl <la...@apache.org>.
In a year or two you won't be able to buy 1T or even 2T disks cheaply.
More spindles are good more cores are good too. This is a fuzzy art.

A hard fact is that HBase cannot (at the moment) handle more than 8-10T per server with HBase, you'd  just have extra disks for IOPS.
You won't be happy if you expect each server to store 24T.

I would go with more and smaller servers. Some people run two RegionServers on a single machine, but that is not a well explored option at this point (up to recently it needed an HBase patch to work).

You *definitely* have to do some benchmarking with your usecase. You might be able to get away with fewer servers, you need to test for that.

-- Lars




________________________________
 From: Ramu M S <ra...@gmail.com>
To: user@hbase.apache.org 
Sent: Saturday, February 8, 2014 12:10 AM
Subject: Re: Regarding Hardware configuration for HBase cluster
 

Lars,

What about high density storage servers that has capacity of up to 24
drives. There were also some recommendations in few blogs about having 1
core per disk.

1TB disks have slight price difference compared to 600 GB. With
negotiations it'll be as low as 50$. Also price difference between 8 core
and 12 core processors is very less, 200-300$.

Do you think having 20-24 cores and 24 1TB disks will also be an option?

Regards,
Ramu

On Feb 8, 2014 11:19 AM, "lars hofhansl" <la...@apache.org> wrote:

> Let's not refer to our users in the third person. It's not polite :)
>
> Suresh,
>
> I wrote something up about RegionServer sizing here:
> http://hadoop-hbase.blogspot.com/2013/01/hbase-region-server-memory-sizing.html
>
> For your load I would guess that you'd need about 100 servers.
>
> That would:
> 1. have 8TB/server
> 2. 30m rows/day/server
> 3. 30GB/day/server
>
> You not expect a single server to be able to absorb more than 10000rows/s
> or 40mb/s, whatever is less.
>
> The machines I'd size as follows:
> 12-16 cores, HT, 1.8GHz-2.4GHz (more is better)
> 32-96GB ram
> 6-12 drives (more spindles are better to absorb the write load)
> 10ge NICs and TopOfRack switches
>
> Now, this is only a *rough guideline* and obviously you'd have perform
> your own tests and this would only scale across if the machines if your
> keys are sufficiently distributed.
> The details also depend on how compressable your data is and your exact
> access patterns (read patters, spiky write load, etc)
> Start with 10 data nodes and appropriately scaled down load and see how it
> works.
>
> Vladimir is right here, you probably want to seek professional help.
>
> -- Lars
>
>
>
>
> ________________________________
>  From: Vladimir Rodionov <vr...@carrieriq.com>
> To: "user@hbase.apache.org" <us...@hbase.apache.org>
> Sent: Friday, February 7, 2014 10:29 AM
> Subject: RE: Regarding Hardware configuration for HBase cluster
>
>
> This guy is building system of a scale of Yahoo and asking user group how
> to size the cluster.
> Few people here can give him advice based on their experience and I am not
> one of them. I can
> only speculate on "how many nodes will we need to consume 3TB/3B records
> daily".
>
> For this scale of a system its better to go to Cloudera/IBM/HW, and not to
> try to build it yourself,
> especially when you ask questions on user group (not answer them).
>
> Best regards,
> Vladimir Rodionov
> Principal Platform Engineer
> Carrier IQ, www.carrieriq.com
> e-mail: vrodionov@carrieriq.com
>
> ________________________________________
>
> From: Ted Yu [yuzhihong@gmail.com]
> Sent: Friday, February 07, 2014 6:27 AM
> To: user@hbase.apache.org
> Cc: user@hbase.apache.org
> Subject: Re: Regarding Hardware configuration for HBase cluster
>
> Have you read http://www.slideshare.net/larsgeorge/hbase-sizing-notes ?
>
> Cheers
>
> On Feb 6, 2014, at 8:47 PM, suresh babu <bi...@gmail.com> wrote:
>
> > Hi Stana,
> >
> > We are trying to find out how many data nodes (including hardware
> > configuration detail)should be configured or setup for this requirement
> >
> > -suresh
> >
> > On Friday, February 7, 2014, stana <st...@is-land.com.tw> wrote:
> >
> >> HI suresh babu :
> >>
> >> how many data nodes do you have?
> >>
> >>
> >> 2014-02-07 suresh babu <bigdatacslt@gmail.com <javascript:;>>:
> >>
> >>> refreshing the thread,
> >>>
> >>> Can you please  suggest any inputs for the hardware configuration(for
> the
> >>> below mentioned use case).
> >>>
> >>>
> >>>
> >>>
> >>> On Wed, Feb 5, 2014 at 10:31 AM, suresh babu <bi...@gmail.com>
> >>> wrote:
> >>>
> >>>> Please find the data requirements for our use case below :
> >>>>
> >>>> Raw data processing
> >>>> ----------------------------------
> >>>> 1. Data is populated into hdfs , after etl around 3 billion puts per
> >> day
> >>>> in to hbase
> >>>>
> >>>> 2. Oldest data after X days to be deleted from hbase
> >>>>
> >>>> Aggregates processing
> >>>> ----------------------------------
> >>>> 3 billion reads per day ... Large scan or reads
> >>>>
> >>>> KV size around 1 KB Daily Processing, raw and aggregates, via M/R jobs
> >>>> Hive queries in future, but not of immediate focus
> >>>> On Feb 5, 2014 12:48 AM, "Vladimir Rodionov" <vrodionov@carrieriq.com
> >
> >>>> wrote:
> >>>>
> >>>>> Yes,
> >>>>>
> >>>>> 1. What is the expected avg and peak load in
> >>> writes/updates/deletes/reads?
> >>>>> 2. What is the average size of a KV?
> >>>>> 3. Reads/small scans/medium/large scan %%
> >>>>> 4. Do you plan M/R jobs, Hive query?
> >>>>>
> >>>>>
> >>>>> Best regards,
> >>>>> Vladimir Rodionov
> >>>>> Principal Platform Engineer
> >>>>> Carrier IQ, www.carrieriq.com
> >>>>> e-mail: vrodionov@carrieriq.com
> >>>>>
> >>>>> ________________________________________
> >>>>> From: Nick Xie [nick.xie.hadoop@gmail.com]
> >>>>> Sent: Tuesday, February 04, 2014 10:02 AM
> >>>>> To: user@hbase.apache.org
> >>>>> Subject: Re: Regarding Hardware configuration for HBase cluster
> >>>>>
> >>>>> I guess you'd better describe a little bit more about your
> >> applications.
> >>>>> Does the data increase over the time at all?
> >>>>>
> >>>>> Nick
> >>>>>
> >>>>>
> >>>>> On Tue, Feb 4, 2014 at 5:22 AM, suresh babu <bi...@gmail.com>
> >>>>> wrote:
> >>>>>
> >>>>>> Hi folks,
> >>>>>>
> >>>>>> We are trying to setup HBase cluster for the following requirement:
> >>>>>>
> >>>>>> We have to maintain data of size around 800TB,
> >>>>>>
> >>>>>> For the above requirement,please suggest me the best hardware
> >>>>> configuration
> >>>>>> details like
> >>>>>>
> >>>>>> 1)how many disks to consider for machine and the  capacity of disks
> >>> ,for
> >>>>>> example, 16/24 disks per node with 1/2TB capacity per each disk
> >>>>>>
> >>>>>> 2) which compression method is suited for production environment ,
> >>>>> space is
> >>>>>> not a major limitation , but speed is of prime concern for my use
> >> case
> >>>>>>
> >>>>>> 3) how many CPU Cores should be configured for each node/machine ?
> >> Or
> >>>>>> ideal ratio of number of cores to the number of disks,for example
> >>>>>> 1core/1disk ?
> >>>>>>
> >>>>>> Regards,
> >>>>>> Kaushik
> >>>>>
> >>>>> Confidentiality Notice:  The information contained in this message,
> >>>>> including any attachments hereto, may be confidential and is intended
> >>> to be
> >>>>> read only by the individual or entity to whom this message is
> >>> addressed. If
> >>>>> the reader of this message is not the intended recipient or an agent
> >> or
> >>>>> designee of the intended recipient, please note that any review, use,
> >>>>> disclosure or distribution of this message or its attachments, in any
> >>> form,
> >>>>> is strictly prohibited.  If you have received this message in error,
> >>> please
> >>>>> immediat--
> >> Best Regards
> >>
> >> 亦思科技  is-land Systems Inc.
> >> Tel:03-5630345 Ext.14
> >> Fax:03-5631345
> >> e-MAIL:stana@is-land.com.tw <javascript:;>
> >>
> >> 何永安 Yung An He
> >>
>
> Confidentiality Notice:  The information contained in this message,
> including any attachments hereto, may be confidential and is intended to be
> read only by the individual or entity to whom this message is addressed. If
> the reader of this message is not the intended recipient or an agent or
> designee of the intended recipient, please note that any review, use,
> disclosure or distribution of this message or its attachments, in any form,
> is strictly prohibited.  If you have received this message in error, please
> immediately notify the sender and/or Notifications@carrieriq.com and
> delete or destroy any copy of this message and its attachments.

Re: Regarding Hardware configuration for HBase cluster

Posted by Ramu M S <ra...@gmail.com>.
Lars,

What about high density storage servers that has capacity of up to 24
drives. There were also some recommendations in few blogs about having 1
core per disk.

1TB disks have slight price difference compared to 600 GB. With
negotiations it'll be as low as 50$. Also price difference between 8 core
and 12 core processors is very less, 200-300$.

Do you think having 20-24 cores and 24 1TB disks will also be an option?

Regards,
Ramu
On Feb 8, 2014 11:19 AM, "lars hofhansl" <la...@apache.org> wrote:

> Let's not refer to our users in the third person. It's not polite :)
>
> Suresh,
>
> I wrote something up about RegionServer sizing here:
> http://hadoop-hbase.blogspot.com/2013/01/hbase-region-server-memory-sizing.html
>
> For your load I would guess that you'd need about 100 servers.
>
> That would:
> 1. have 8TB/server
> 2. 30m rows/day/server
> 3. 30GB/day/server
>
> You not expect a single server to be able to absorb more than 10000rows/s
> or 40mb/s, whatever is less.
>
> The machines I'd size as follows:
> 12-16 cores, HT, 1.8GHz-2.4GHz (more is better)
> 32-96GB ram
> 6-12 drives (more spindles are better to absorb the write load)
> 10ge NICs and TopOfRack switches
>
> Now, this is only a *rough guideline* and obviously you'd have perform
> your own tests and this would only scale across if the machines if your
> keys are sufficiently distributed.
> The details also depend on how compressable your data is and your exact
> access patterns (read patters, spiky write load, etc)
> Start with 10 data nodes and appropriately scaled down load and see how it
> works.
>
> Vladimir is right here, you probably want to seek professional help.
>
> -- Lars
>
>
>
>
> ________________________________
>  From: Vladimir Rodionov <vr...@carrieriq.com>
> To: "user@hbase.apache.org" <us...@hbase.apache.org>
> Sent: Friday, February 7, 2014 10:29 AM
> Subject: RE: Regarding Hardware configuration for HBase cluster
>
>
> This guy is building system of a scale of Yahoo and asking user group how
> to size the cluster.
> Few people here can give him advice based on their experience and I am not
> one of them. I can
> only speculate on "how many nodes will we need to consume 3TB/3B records
> daily".
>
> For this scale of a system its better to go to Cloudera/IBM/HW, and not to
> try to build it yourself,
> especially when you ask questions on user group (not answer them).
>
> Best regards,
> Vladimir Rodionov
> Principal Platform Engineer
> Carrier IQ, www.carrieriq.com
> e-mail: vrodionov@carrieriq.com
>
> ________________________________________
>
> From: Ted Yu [yuzhihong@gmail.com]
> Sent: Friday, February 07, 2014 6:27 AM
> To: user@hbase.apache.org
> Cc: user@hbase.apache.org
> Subject: Re: Regarding Hardware configuration for HBase cluster
>
> Have you read http://www.slideshare.net/larsgeorge/hbase-sizing-notes ?
>
> Cheers
>
> On Feb 6, 2014, at 8:47 PM, suresh babu <bi...@gmail.com> wrote:
>
> > Hi Stana,
> >
> > We are trying to find out how many data nodes (including hardware
> > configuration detail)should be configured or setup for this requirement
> >
> > -suresh
> >
> > On Friday, February 7, 2014, stana <st...@is-land.com.tw> wrote:
> >
> >> HI suresh babu :
> >>
> >> how many data nodes do you have?
> >>
> >>
> >> 2014-02-07 suresh babu <bigdatacslt@gmail.com <javascript:;>>:
> >>
> >>> refreshing the thread,
> >>>
> >>> Can you please  suggest any inputs for the hardware configuration(for
> the
> >>> below mentioned use case).
> >>>
> >>>
> >>>
> >>>
> >>> On Wed, Feb 5, 2014 at 10:31 AM, suresh babu <bi...@gmail.com>
> >>> wrote:
> >>>
> >>>> Please find the data requirements for our use case below :
> >>>>
> >>>> Raw data processing
> >>>> ----------------------------------
> >>>> 1. Data is populated into hdfs , after etl around 3 billion puts per
> >> day
> >>>> in to hbase
> >>>>
> >>>> 2. Oldest data after X days to be deleted from hbase
> >>>>
> >>>> Aggregates processing
> >>>> ----------------------------------
> >>>> 3 billion reads per day ... Large scan or reads
> >>>>
> >>>> KV size around 1 KB Daily Processing, raw and aggregates, via M/R jobs
> >>>> Hive queries in future, but not of immediate focus
> >>>> On Feb 5, 2014 12:48 AM, "Vladimir Rodionov" <vrodionov@carrieriq.com
> >
> >>>> wrote:
> >>>>
> >>>>> Yes,
> >>>>>
> >>>>> 1. What is the expected avg and peak load in
> >>> writes/updates/deletes/reads?
> >>>>> 2. What is the average size of a KV?
> >>>>> 3. Reads/small scans/medium/large scan %%
> >>>>> 4. Do you plan M/R jobs, Hive query?
> >>>>>
> >>>>>
> >>>>> Best regards,
> >>>>> Vladimir Rodionov
> >>>>> Principal Platform Engineer
> >>>>> Carrier IQ, www.carrieriq.com
> >>>>> e-mail: vrodionov@carrieriq.com
> >>>>>
> >>>>> ________________________________________
> >>>>> From: Nick Xie [nick.xie.hadoop@gmail.com]
> >>>>> Sent: Tuesday, February 04, 2014 10:02 AM
> >>>>> To: user@hbase.apache.org
> >>>>> Subject: Re: Regarding Hardware configuration for HBase cluster
> >>>>>
> >>>>> I guess you'd better describe a little bit more about your
> >> applications.
> >>>>> Does the data increase over the time at all?
> >>>>>
> >>>>> Nick
> >>>>>
> >>>>>
> >>>>> On Tue, Feb 4, 2014 at 5:22 AM, suresh babu <bi...@gmail.com>
> >>>>> wrote:
> >>>>>
> >>>>>> Hi folks,
> >>>>>>
> >>>>>> We are trying to setup HBase cluster for the following requirement:
> >>>>>>
> >>>>>> We have to maintain data of size around 800TB,
> >>>>>>
> >>>>>> For the above requirement,please suggest me the best hardware
> >>>>> configuration
> >>>>>> details like
> >>>>>>
> >>>>>> 1)how many disks to consider for machine and the  capacity of disks
> >>> ,for
> >>>>>> example, 16/24 disks per node with 1/2TB capacity per each disk
> >>>>>>
> >>>>>> 2) which compression method is suited for production environment ,
> >>>>> space is
> >>>>>> not a major limitation , but speed is of prime concern for my use
> >> case
> >>>>>>
> >>>>>> 3) how many CPU Cores should be configured for each node/machine ?
> >> Or
> >>>>>> ideal ratio of number of cores to the number of disks,for example
> >>>>>> 1core/1disk ?
> >>>>>>
> >>>>>> Regards,
> >>>>>> Kaushik
> >>>>>
> >>>>> Confidentiality Notice:  The information contained in this message,
> >>>>> including any attachments hereto, may be confidential and is intended
> >>> to be
> >>>>> read only by the individual or entity to whom this message is
> >>> addressed. If
> >>>>> the reader of this message is not the intended recipient or an agent
> >> or
> >>>>> designee of the intended recipient, please note that any review, use,
> >>>>> disclosure or distribution of this message or its attachments, in any
> >>> form,
> >>>>> is strictly prohibited.  If you have received this message in error,
> >>> please
> >>>>> immediat--
> >> Best Regards
> >>
> >> 亦思科技  is-land Systems Inc.
> >> Tel:03-5630345 Ext.14
> >> Fax:03-5631345
> >> e-MAIL:stana@is-land.com.tw <javascript:;>
> >>
> >> 何永安 Yung An He
> >>
>
> Confidentiality Notice:  The information contained in this message,
> including any attachments hereto, may be confidential and is intended to be
> read only by the individual or entity to whom this message is addressed. If
> the reader of this message is not the intended recipient or an agent or
> designee of the intended recipient, please note that any review, use,
> disclosure or distribution of this message or its attachments, in any form,
> is strictly prohibited.  If you have received this message in error, please
> immediately notify the sender and/or Notifications@carrieriq.com and
> delete or destroy any copy of this message and its attachments.

Re: Regarding Hardware configuration for HBase cluster

Posted by lars hofhansl <la...@apache.org>.
Let's not refer to our users in the third person. It's not polite :)

Suresh,

I wrote something up about RegionServer sizing here: http://hadoop-hbase.blogspot.com/2013/01/hbase-region-server-memory-sizing.html

For your load I would guess that you'd need about 100 servers.

That would:
1. have 8TB/server
2. 30m rows/day/server
3. 30GB/day/server

You not expect a single server to be able to absorb more than 10000rows/s or 40mb/s, whatever is less.

The machines I'd size as follows:
12-16 cores, HT, 1.8GHz-2.4GHz (more is better)
32-96GB ram
6-12 drives (more spindles are better to absorb the write load)
10ge NICs and TopOfRack switches

Now, this is only a *rough guideline* and obviously you'd have perform your own tests and this would only scale across if the machines if your keys are sufficiently distributed.
The details also depend on how compressable your data is and your exact access patterns (read patters, spiky write load, etc)
Start with 10 data nodes and appropriately scaled down load and see how it works.

Vladimir is right here, you probably want to seek professional help.

-- Lars




________________________________
 From: Vladimir Rodionov <vr...@carrieriq.com>
To: "user@hbase.apache.org" <us...@hbase.apache.org> 
Sent: Friday, February 7, 2014 10:29 AM
Subject: RE: Regarding Hardware configuration for HBase cluster
 

This guy is building system of a scale of Yahoo and asking user group how to size the cluster.
Few people here can give him advice based on their experience and I am not one of them. I can
only speculate on "how many nodes will we need to consume 3TB/3B records daily".

For this scale of a system its better to go to Cloudera/IBM/HW, and not to try to build it yourself,
especially when you ask questions on user group (not answer them).

Best regards,
Vladimir Rodionov
Principal Platform Engineer
Carrier IQ, www.carrieriq.com
e-mail: vrodionov@carrieriq.com

________________________________________

From: Ted Yu [yuzhihong@gmail.com]
Sent: Friday, February 07, 2014 6:27 AM
To: user@hbase.apache.org
Cc: user@hbase.apache.org
Subject: Re: Regarding Hardware configuration for HBase cluster

Have you read http://www.slideshare.net/larsgeorge/hbase-sizing-notes ?

Cheers

On Feb 6, 2014, at 8:47 PM, suresh babu <bi...@gmail.com> wrote:

> Hi Stana,
>
> We are trying to find out how many data nodes (including hardware
> configuration detail)should be configured or setup for this requirement
>
> -suresh
>
> On Friday, February 7, 2014, stana <st...@is-land.com.tw> wrote:
>
>> HI suresh babu :
>>
>> how many data nodes do you have?
>>
>>
>> 2014-02-07 suresh babu <bigdatacslt@gmail.com <javascript:;>>:
>>
>>> refreshing the thread,
>>>
>>> Can you please  suggest any inputs for the hardware configuration(for the
>>> below mentioned use case).
>>>
>>>
>>>
>>>
>>> On Wed, Feb 5, 2014 at 10:31 AM, suresh babu <bi...@gmail.com>
>>> wrote:
>>>
>>>> Please find the data requirements for our use case below :
>>>>
>>>> Raw data processing
>>>> ----------------------------------
>>>> 1. Data is populated into hdfs , after etl around 3 billion puts per
>> day
>>>> in to hbase
>>>>
>>>> 2. Oldest data after X days to be deleted from hbase
>>>>
>>>> Aggregates processing
>>>> ----------------------------------
>>>> 3 billion reads per day ... Large scan or reads
>>>>
>>>> KV size around 1 KB Daily Processing, raw and aggregates, via M/R jobs
>>>> Hive queries in future, but not of immediate focus
>>>> On Feb 5, 2014 12:48 AM, "Vladimir Rodionov" <vr...@carrieriq.com>
>>>> wrote:
>>>>
>>>>> Yes,
>>>>>
>>>>> 1. What is the expected avg and peak load in
>>> writes/updates/deletes/reads?
>>>>> 2. What is the average size of a KV?
>>>>> 3. Reads/small scans/medium/large scan %%
>>>>> 4. Do you plan M/R jobs, Hive query?
>>>>>
>>>>>
>>>>> Best regards,
>>>>> Vladimir Rodionov
>>>>> Principal Platform Engineer
>>>>> Carrier IQ, www.carrieriq.com
>>>>> e-mail: vrodionov@carrieriq.com
>>>>>
>>>>> ________________________________________
>>>>> From: Nick Xie [nick.xie.hadoop@gmail.com]
>>>>> Sent: Tuesday, February 04, 2014 10:02 AM
>>>>> To: user@hbase.apache.org
>>>>> Subject: Re: Regarding Hardware configuration for HBase cluster
>>>>>
>>>>> I guess you'd better describe a little bit more about your
>> applications.
>>>>> Does the data increase over the time at all?
>>>>>
>>>>> Nick
>>>>>
>>>>>
>>>>> On Tue, Feb 4, 2014 at 5:22 AM, suresh babu <bi...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Hi folks,
>>>>>>
>>>>>> We are trying to setup HBase cluster for the following requirement:
>>>>>>
>>>>>> We have to maintain data of size around 800TB,
>>>>>>
>>>>>> For the above requirement,please suggest me the best hardware
>>>>> configuration
>>>>>> details like
>>>>>>
>>>>>> 1)how many disks to consider for machine and the  capacity of disks
>>> ,for
>>>>>> example, 16/24 disks per node with 1/2TB capacity per each disk
>>>>>>
>>>>>> 2) which compression method is suited for production environment ,
>>>>> space is
>>>>>> not a major limitation , but speed is of prime concern for my use
>> case
>>>>>>
>>>>>> 3) how many CPU Cores should be configured for each node/machine ?
>> Or
>>>>>> ideal ratio of number of cores to the number of disks,for example
>>>>>> 1core/1disk ?
>>>>>>
>>>>>> Regards,
>>>>>> Kaushik
>>>>>
>>>>> Confidentiality Notice:  The information contained in this message,
>>>>> including any attachments hereto, may be confidential and is intended
>>> to be
>>>>> read only by the individual or entity to whom this message is
>>> addressed. If
>>>>> the reader of this message is not the intended recipient or an agent
>> or
>>>>> designee of the intended recipient, please note that any review, use,
>>>>> disclosure or distribution of this message or its attachments, in any
>>> form,
>>>>> is strictly prohibited.  If you have received this message in error,
>>> please
>>>>> immediat--
>> Best Regards
>>
>> 亦思科技  is-land Systems Inc.
>> Tel:03-5630345 Ext.14
>> Fax:03-5631345
>> e-MAIL:stana@is-land.com.tw <javascript:;>
>>
>> 何永安 Yung An He
>>

Confidentiality Notice:  The information contained in this message, including any attachments hereto, may be confidential and is intended to be read only by the individual or entity to whom this message is addressed. If the reader of this message is not the intended recipient or an agent or designee of the intended recipient, please note that any review, use, disclosure or distribution of this message or its attachments, in any form, is strictly prohibited.  If you have received this message in error, please immediately notify the sender and/or Notifications@carrieriq.com and delete or destroy any copy of this message and its attachments.

RE: Regarding Hardware configuration for HBase cluster

Posted by Vladimir Rodionov <vr...@carrieriq.com>.
This guy is building system of a scale of Yahoo and asking user group how to size the cluster.
Few people here can give him advice based on their experience and I am not one of them. I can
only speculate on "how many nodes will we need to consume 3TB/3B records daily".

For this scale of a system its better to go to Cloudera/IBM/HW, and not to try to build it yourself,
especially when you ask questions on user group (not answer them).

Best regards,
Vladimir Rodionov
Principal Platform Engineer
Carrier IQ, www.carrieriq.com
e-mail: vrodionov@carrieriq.com

________________________________________
From: Ted Yu [yuzhihong@gmail.com]
Sent: Friday, February 07, 2014 6:27 AM
To: user@hbase.apache.org
Cc: user@hbase.apache.org
Subject: Re: Regarding Hardware configuration for HBase cluster

Have you read http://www.slideshare.net/larsgeorge/hbase-sizing-notes ?

Cheers

On Feb 6, 2014, at 8:47 PM, suresh babu <bi...@gmail.com> wrote:

> Hi Stana,
>
> We are trying to find out how many data nodes (including hardware
> configuration detail)should be configured or setup for this requirement
>
> -suresh
>
> On Friday, February 7, 2014, stana <st...@is-land.com.tw> wrote:
>
>> HI suresh babu :
>>
>> how many data nodes do you have?
>>
>>
>> 2014-02-07 suresh babu <bigdatacslt@gmail.com <javascript:;>>:
>>
>>> refreshing the thread,
>>>
>>> Can you please  suggest any inputs for the hardware configuration(for the
>>> below mentioned use case).
>>>
>>>
>>>
>>>
>>> On Wed, Feb 5, 2014 at 10:31 AM, suresh babu <bi...@gmail.com>
>>> wrote:
>>>
>>>> Please find the data requirements for our use case below :
>>>>
>>>> Raw data processing
>>>> ----------------------------------
>>>> 1. Data is populated into hdfs , after etl around 3 billion puts per
>> day
>>>> in to hbase
>>>>
>>>> 2. Oldest data after X days to be deleted from hbase
>>>>
>>>> Aggregates processing
>>>> ----------------------------------
>>>> 3 billion reads per day ... Large scan or reads
>>>>
>>>> KV size around 1 KB Daily Processing, raw and aggregates, via M/R jobs
>>>> Hive queries in future, but not of immediate focus
>>>> On Feb 5, 2014 12:48 AM, "Vladimir Rodionov" <vr...@carrieriq.com>
>>>> wrote:
>>>>
>>>>> Yes,
>>>>>
>>>>> 1. What is the expected avg and peak load in
>>> writes/updates/deletes/reads?
>>>>> 2. What is the average size of a KV?
>>>>> 3. Reads/small scans/medium/large scan %%
>>>>> 4. Do you plan M/R jobs, Hive query?
>>>>>
>>>>>
>>>>> Best regards,
>>>>> Vladimir Rodionov
>>>>> Principal Platform Engineer
>>>>> Carrier IQ, www.carrieriq.com
>>>>> e-mail: vrodionov@carrieriq.com
>>>>>
>>>>> ________________________________________
>>>>> From: Nick Xie [nick.xie.hadoop@gmail.com]
>>>>> Sent: Tuesday, February 04, 2014 10:02 AM
>>>>> To: user@hbase.apache.org
>>>>> Subject: Re: Regarding Hardware configuration for HBase cluster
>>>>>
>>>>> I guess you'd better describe a little bit more about your
>> applications.
>>>>> Does the data increase over the time at all?
>>>>>
>>>>> Nick
>>>>>
>>>>>
>>>>> On Tue, Feb 4, 2014 at 5:22 AM, suresh babu <bi...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Hi folks,
>>>>>>
>>>>>> We are trying to setup HBase cluster for the following requirement:
>>>>>>
>>>>>> We have to maintain data of size around 800TB,
>>>>>>
>>>>>> For the above requirement,please suggest me the best hardware
>>>>> configuration
>>>>>> details like
>>>>>>
>>>>>> 1)how many disks to consider for machine and the  capacity of disks
>>> ,for
>>>>>> example, 16/24 disks per node with 1/2TB capacity per each disk
>>>>>>
>>>>>> 2) which compression method is suited for production environment ,
>>>>> space is
>>>>>> not a major limitation , but speed is of prime concern for my use
>> case
>>>>>>
>>>>>> 3) how many CPU Cores should be configured for each node/machine ?
>> Or
>>>>>> ideal ratio of number of cores to the number of disks,for example
>>>>>> 1core/1disk ?
>>>>>>
>>>>>> Regards,
>>>>>> Kaushik
>>>>>
>>>>> Confidentiality Notice:  The information contained in this message,
>>>>> including any attachments hereto, may be confidential and is intended
>>> to be
>>>>> read only by the individual or entity to whom this message is
>>> addressed. If
>>>>> the reader of this message is not the intended recipient or an agent
>> or
>>>>> designee of the intended recipient, please note that any review, use,
>>>>> disclosure or distribution of this message or its attachments, in any
>>> form,
>>>>> is strictly prohibited.  If you have received this message in error,
>>> please
>>>>> immediat--
>> Best Regards
>>
>> 亦思科技  is-land Systems Inc.
>> Tel:03-5630345 Ext.14
>> Fax:03-5631345
>> e-MAIL:stana@is-land.com.tw <javascript:;>
>>
>> 何永安 Yung An He
>>

Confidentiality Notice:  The information contained in this message, including any attachments hereto, may be confidential and is intended to be read only by the individual or entity to whom this message is addressed. If the reader of this message is not the intended recipient or an agent or designee of the intended recipient, please note that any review, use, disclosure or distribution of this message or its attachments, in any form, is strictly prohibited.  If you have received this message in error, please immediately notify the sender and/or Notifications@carrieriq.com and delete or destroy any copy of this message and its attachments.

Re: Regarding Hardware configuration for HBase cluster

Posted by Ted Yu <yu...@gmail.com>.
Have you read http://www.slideshare.net/larsgeorge/hbase-sizing-notes ?

Cheers

On Feb 6, 2014, at 8:47 PM, suresh babu <bi...@gmail.com> wrote:

> Hi Stana,
> 
> We are trying to find out how many data nodes (including hardware
> configuration detail)should be configured or setup for this requirement
> 
> -suresh
> 
> On Friday, February 7, 2014, stana <st...@is-land.com.tw> wrote:
> 
>> HI suresh babu :
>> 
>> how many data nodes do you have?
>> 
>> 
>> 2014-02-07 suresh babu <bigdatacslt@gmail.com <javascript:;>>:
>> 
>>> refreshing the thread,
>>> 
>>> Can you please  suggest any inputs for the hardware configuration(for the
>>> below mentioned use case).
>>> 
>>> 
>>> 
>>> 
>>> On Wed, Feb 5, 2014 at 10:31 AM, suresh babu <bi...@gmail.com>
>>> wrote:
>>> 
>>>> Please find the data requirements for our use case below :
>>>> 
>>>> Raw data processing
>>>> ----------------------------------
>>>> 1. Data is populated into hdfs , after etl around 3 billion puts per
>> day
>>>> in to hbase
>>>> 
>>>> 2. Oldest data after X days to be deleted from hbase
>>>> 
>>>> Aggregates processing
>>>> ----------------------------------
>>>> 3 billion reads per day ... Large scan or reads
>>>> 
>>>> KV size around 1 KB Daily Processing, raw and aggregates, via M/R jobs
>>>> Hive queries in future, but not of immediate focus
>>>> On Feb 5, 2014 12:48 AM, "Vladimir Rodionov" <vr...@carrieriq.com>
>>>> wrote:
>>>> 
>>>>> Yes,
>>>>> 
>>>>> 1. What is the expected avg and peak load in
>>> writes/updates/deletes/reads?
>>>>> 2. What is the average size of a KV?
>>>>> 3. Reads/small scans/medium/large scan %%
>>>>> 4. Do you plan M/R jobs, Hive query?
>>>>> 
>>>>> 
>>>>> Best regards,
>>>>> Vladimir Rodionov
>>>>> Principal Platform Engineer
>>>>> Carrier IQ, www.carrieriq.com
>>>>> e-mail: vrodionov@carrieriq.com
>>>>> 
>>>>> ________________________________________
>>>>> From: Nick Xie [nick.xie.hadoop@gmail.com]
>>>>> Sent: Tuesday, February 04, 2014 10:02 AM
>>>>> To: user@hbase.apache.org
>>>>> Subject: Re: Regarding Hardware configuration for HBase cluster
>>>>> 
>>>>> I guess you'd better describe a little bit more about your
>> applications.
>>>>> Does the data increase over the time at all?
>>>>> 
>>>>> Nick
>>>>> 
>>>>> 
>>>>> On Tue, Feb 4, 2014 at 5:22 AM, suresh babu <bi...@gmail.com>
>>>>> wrote:
>>>>> 
>>>>>> Hi folks,
>>>>>> 
>>>>>> We are trying to setup HBase cluster for the following requirement:
>>>>>> 
>>>>>> We have to maintain data of size around 800TB,
>>>>>> 
>>>>>> For the above requirement,please suggest me the best hardware
>>>>> configuration
>>>>>> details like
>>>>>> 
>>>>>> 1)how many disks to consider for machine and the  capacity of disks
>>> ,for
>>>>>> example, 16/24 disks per node with 1/2TB capacity per each disk
>>>>>> 
>>>>>> 2) which compression method is suited for production environment ,
>>>>> space is
>>>>>> not a major limitation , but speed is of prime concern for my use
>> case
>>>>>> 
>>>>>> 3) how many CPU Cores should be configured for each node/machine ?
>> Or
>>>>>> ideal ratio of number of cores to the number of disks,for example
>>>>>> 1core/1disk ?
>>>>>> 
>>>>>> Regards,
>>>>>> Kaushik
>>>>> 
>>>>> Confidentiality Notice:  The information contained in this message,
>>>>> including any attachments hereto, may be confidential and is intended
>>> to be
>>>>> read only by the individual or entity to whom this message is
>>> addressed. If
>>>>> the reader of this message is not the intended recipient or an agent
>> or
>>>>> designee of the intended recipient, please note that any review, use,
>>>>> disclosure or distribution of this message or its attachments, in any
>>> form,
>>>>> is strictly prohibited.  If you have received this message in error,
>>> please
>>>>> immediat--
>> Best Regards
>> 
>> 亦思科技  is-land Systems Inc.
>> Tel:03-5630345 Ext.14
>> Fax:03-5631345
>> e-MAIL:stana@is-land.com.tw <javascript:;>
>> 
>> 何永安 Yung An He
>> 

Re: Regarding Hardware configuration for HBase cluster

Posted by suresh babu <bi...@gmail.com>.
Hi Stana,

We are trying to find out how many data nodes (including hardware
configuration detail)should be configured or setup for this requirement

-suresh

On Friday, February 7, 2014, stana <st...@is-land.com.tw> wrote:

> HI suresh babu :
>
> how many data nodes do you have?
>
>
> 2014-02-07 suresh babu <bigdatacslt@gmail.com <javascript:;>>:
>
> > refreshing the thread,
> >
> > Can you please  suggest any inputs for the hardware configuration(for the
> > below mentioned use case).
> >
> >
> >
> >
> > On Wed, Feb 5, 2014 at 10:31 AM, suresh babu <bi...@gmail.com>
> > wrote:
> >
> > > Please find the data requirements for our use case below :
> > >
> > > Raw data processing
> > > ----------------------------------
> > > 1. Data is populated into hdfs , after etl around 3 billion puts per
> day
> > > in to hbase
> > >
> > > 2. Oldest data after X days to be deleted from hbase
> > >
> > > Aggregates processing
> > > ----------------------------------
> > > 3 billion reads per day ... Large scan or reads
> > >
> > > KV size around 1 KB Daily Processing, raw and aggregates, via M/R jobs
> > > Hive queries in future, but not of immediate focus
> > > On Feb 5, 2014 12:48 AM, "Vladimir Rodionov" <vr...@carrieriq.com>
> > > wrote:
> > >
> > >> Yes,
> > >>
> > >> 1. What is the expected avg and peak load in
> > writes/updates/deletes/reads?
> > >> 2. What is the average size of a KV?
> > >> 3. Reads/small scans/medium/large scan %%
> > >> 4. Do you plan M/R jobs, Hive query?
> > >>
> > >>
> > >> Best regards,
> > >> Vladimir Rodionov
> > >> Principal Platform Engineer
> > >> Carrier IQ, www.carrieriq.com
> > >> e-mail: vrodionov@carrieriq.com
> > >>
> > >> ________________________________________
> > >> From: Nick Xie [nick.xie.hadoop@gmail.com]
> > >> Sent: Tuesday, February 04, 2014 10:02 AM
> > >> To: user@hbase.apache.org
> > >> Subject: Re: Regarding Hardware configuration for HBase cluster
> > >>
> > >> I guess you'd better describe a little bit more about your
> applications.
> > >> Does the data increase over the time at all?
> > >>
> > >> Nick
> > >>
> > >>
> > >> On Tue, Feb 4, 2014 at 5:22 AM, suresh babu <bi...@gmail.com>
> > >> wrote:
> > >>
> > >> > Hi folks,
> > >> >
> > >> > We are trying to setup HBase cluster for the following requirement:
> > >> >
> > >> > We have to maintain data of size around 800TB,
> > >> >
> > >> > For the above requirement,please suggest me the best hardware
> > >> configuration
> > >> > details like
> > >> >
> > >> > 1)how many disks to consider for machine and the  capacity of disks
> > ,for
> > >> > example, 16/24 disks per node with 1/2TB capacity per each disk
> > >> >
> > >> > 2) which compression method is suited for production environment ,
> > >> space is
> > >> > not a major limitation , but speed is of prime concern for my use
> case
> > >> >
> > >> > 3) how many CPU Cores should be configured for each node/machine ?
>  Or
> > >> > ideal ratio of number of cores to the number of disks,for example
> > >> > 1core/1disk ?
> > >> >
> > >> > Regards,
> > >> > Kaushik
> > >> >
> > >>
> > >> Confidentiality Notice:  The information contained in this message,
> > >> including any attachments hereto, may be confidential and is intended
> > to be
> > >> read only by the individual or entity to whom this message is
> > addressed. If
> > >> the reader of this message is not the intended recipient or an agent
> or
> > >> designee of the intended recipient, please note that any review, use,
> > >> disclosure or distribution of this message or its attachments, in any
> > form,
> > >> is strictly prohibited.  If you have received this message in error,
> > please
> > >> immediat--
> Best Regards
>
> 亦思科技  is-land Systems Inc.
> Tel:03-5630345 Ext.14
> Fax:03-5631345
> e-MAIL:stana@is-land.com.tw <javascript:;>
>
> 何永安 Yung An He
>

Re: Regarding Hardware configuration for HBase cluster

Posted by stana <st...@is-land.com.tw>.
HI suresh babu :

how many data nodes do you have?


2014-02-07 suresh babu <bi...@gmail.com>:

> refreshing the thread,
>
> Can you please  suggest any inputs for the hardware configuration(for the
> below mentioned use case).
>
>
>
>
> On Wed, Feb 5, 2014 at 10:31 AM, suresh babu <bi...@gmail.com>
> wrote:
>
> > Please find the data requirements for our use case below :
> >
> > Raw data processing
> > ----------------------------------
> > 1. Data is populated into hdfs , after etl around 3 billion puts per day
> > in to hbase
> >
> > 2. Oldest data after X days to be deleted from hbase
> >
> > Aggregates processing
> > ----------------------------------
> > 3 billion reads per day ... Large scan or reads
> >
> > KV size around 1 KB Daily Processing, raw and aggregates, via M/R jobs
> > Hive queries in future, but not of immediate focus
> > On Feb 5, 2014 12:48 AM, "Vladimir Rodionov" <vr...@carrieriq.com>
> > wrote:
> >
> >> Yes,
> >>
> >> 1. What is the expected avg and peak load in
> writes/updates/deletes/reads?
> >> 2. What is the average size of a KV?
> >> 3. Reads/small scans/medium/large scan %%
> >> 4. Do you plan M/R jobs, Hive query?
> >>
> >>
> >> Best regards,
> >> Vladimir Rodionov
> >> Principal Platform Engineer
> >> Carrier IQ, www.carrieriq.com
> >> e-mail: vrodionov@carrieriq.com
> >>
> >> ________________________________________
> >> From: Nick Xie [nick.xie.hadoop@gmail.com]
> >> Sent: Tuesday, February 04, 2014 10:02 AM
> >> To: user@hbase.apache.org
> >> Subject: Re: Regarding Hardware configuration for HBase cluster
> >>
> >> I guess you'd better describe a little bit more about your applications.
> >> Does the data increase over the time at all?
> >>
> >> Nick
> >>
> >>
> >> On Tue, Feb 4, 2014 at 5:22 AM, suresh babu <bi...@gmail.com>
> >> wrote:
> >>
> >> > Hi folks,
> >> >
> >> > We are trying to setup HBase cluster for the following requirement:
> >> >
> >> > We have to maintain data of size around 800TB,
> >> >
> >> > For the above requirement,please suggest me the best hardware
> >> configuration
> >> > details like
> >> >
> >> > 1)how many disks to consider for machine and the  capacity of disks
> ,for
> >> > example, 16/24 disks per node with 1/2TB capacity per each disk
> >> >
> >> > 2) which compression method is suited for production environment ,
> >> space is
> >> > not a major limitation , but speed is of prime concern for my use case
> >> >
> >> > 3) how many CPU Cores should be configured for each node/machine ?  Or
> >> > ideal ratio of number of cores to the number of disks,for example
> >> > 1core/1disk ?
> >> >
> >> > Regards,
> >> > Kaushik
> >> >
> >>
> >> Confidentiality Notice:  The information contained in this message,
> >> including any attachments hereto, may be confidential and is intended
> to be
> >> read only by the individual or entity to whom this message is
> addressed. If
> >> the reader of this message is not the intended recipient or an agent or
> >> designee of the intended recipient, please note that any review, use,
> >> disclosure or distribution of this message or its attachments, in any
> form,
> >> is strictly prohibited.  If you have received this message in error,
> please
> >> immediately notify the sender and/or Notifications@carrieriq.com and
> >> delete or destroy any copy of this message and its attachments.
> >>
> >
>



-- 
Best Regards

亦思科技  is-land Systems Inc.
Tel:03-5630345 Ext.14
Fax:03-5631345
e-MAIL:stana@is-land.com.tw

何永安 Yung An He

Re: Regarding Hardware configuration for HBase cluster

Posted by suresh babu <bi...@gmail.com>.
refreshing the thread,

Can you please  suggest any inputs for the hardware configuration(for the
below mentioned use case).




On Wed, Feb 5, 2014 at 10:31 AM, suresh babu <bi...@gmail.com> wrote:

> Please find the data requirements for our use case below :
>
> Raw data processing
> ----------------------------------
> 1. Data is populated into hdfs , after etl around 3 billion puts per day
> in to hbase
>
> 2. Oldest data after X days to be deleted from hbase
>
> Aggregates processing
> ----------------------------------
> 3 billion reads per day ... Large scan or reads
>
> KV size around 1 KB Daily Processing, raw and aggregates, via M/R jobs
> Hive queries in future, but not of immediate focus
> On Feb 5, 2014 12:48 AM, "Vladimir Rodionov" <vr...@carrieriq.com>
> wrote:
>
>> Yes,
>>
>> 1. What is the expected avg and peak load in writes/updates/deletes/reads?
>> 2. What is the average size of a KV?
>> 3. Reads/small scans/medium/large scan %%
>> 4. Do you plan M/R jobs, Hive query?
>>
>>
>> Best regards,
>> Vladimir Rodionov
>> Principal Platform Engineer
>> Carrier IQ, www.carrieriq.com
>> e-mail: vrodionov@carrieriq.com
>>
>> ________________________________________
>> From: Nick Xie [nick.xie.hadoop@gmail.com]
>> Sent: Tuesday, February 04, 2014 10:02 AM
>> To: user@hbase.apache.org
>> Subject: Re: Regarding Hardware configuration for HBase cluster
>>
>> I guess you'd better describe a little bit more about your applications.
>> Does the data increase over the time at all?
>>
>> Nick
>>
>>
>> On Tue, Feb 4, 2014 at 5:22 AM, suresh babu <bi...@gmail.com>
>> wrote:
>>
>> > Hi folks,
>> >
>> > We are trying to setup HBase cluster for the following requirement:
>> >
>> > We have to maintain data of size around 800TB,
>> >
>> > For the above requirement,please suggest me the best hardware
>> configuration
>> > details like
>> >
>> > 1)how many disks to consider for machine and the  capacity of disks ,for
>> > example, 16/24 disks per node with 1/2TB capacity per each disk
>> >
>> > 2) which compression method is suited for production environment ,
>> space is
>> > not a major limitation , but speed is of prime concern for my use case
>> >
>> > 3) how many CPU Cores should be configured for each node/machine ?  Or
>> > ideal ratio of number of cores to the number of disks,for example
>> > 1core/1disk ?
>> >
>> > Regards,
>> > Kaushik
>> >
>>
>> Confidentiality Notice:  The information contained in this message,
>> including any attachments hereto, may be confidential and is intended to be
>> read only by the individual or entity to whom this message is addressed. If
>> the reader of this message is not the intended recipient or an agent or
>> designee of the intended recipient, please note that any review, use,
>> disclosure or distribution of this message or its attachments, in any form,
>> is strictly prohibited.  If you have received this message in error, please
>> immediately notify the sender and/or Notifications@carrieriq.com and
>> delete or destroy any copy of this message and its attachments.
>>
>

RE: Regarding Hardware configuration for HBase cluster

Posted by suresh babu <bi...@gmail.com>.
Please find the data requirements for our use case below :

Raw data processing
----------------------------------
1. Data is populated into hdfs , after etl around 3 billion puts per day in
to hbase

2. Oldest data after X days to be deleted from hbase

Aggregates processing
----------------------------------
3 billion reads per day ... Large scan or reads

KV size around 1 KB Daily Processing, raw and aggregates, via M/R jobs Hive
queries in future, but not of immediate focus
On Feb 5, 2014 12:48 AM, "Vladimir Rodionov" <vr...@carrieriq.com>
wrote:

> Yes,
>
> 1. What is the expected avg and peak load in writes/updates/deletes/reads?
> 2. What is the average size of a KV?
> 3. Reads/small scans/medium/large scan %%
> 4. Do you plan M/R jobs, Hive query?
>
>
> Best regards,
> Vladimir Rodionov
> Principal Platform Engineer
> Carrier IQ, www.carrieriq.com
> e-mail: vrodionov@carrieriq.com
>
> ________________________________________
> From: Nick Xie [nick.xie.hadoop@gmail.com]
> Sent: Tuesday, February 04, 2014 10:02 AM
> To: user@hbase.apache.org
> Subject: Re: Regarding Hardware configuration for HBase cluster
>
> I guess you'd better describe a little bit more about your applications.
> Does the data increase over the time at all?
>
> Nick
>
>
> On Tue, Feb 4, 2014 at 5:22 AM, suresh babu <bi...@gmail.com> wrote:
>
> > Hi folks,
> >
> > We are trying to setup HBase cluster for the following requirement:
> >
> > We have to maintain data of size around 800TB,
> >
> > For the above requirement,please suggest me the best hardware
> configuration
> > details like
> >
> > 1)how many disks to consider for machine and the  capacity of disks ,for
> > example, 16/24 disks per node with 1/2TB capacity per each disk
> >
> > 2) which compression method is suited for production environment , space
> is
> > not a major limitation , but speed is of prime concern for my use case
> >
> > 3) how many CPU Cores should be configured for each node/machine ?  Or
> > ideal ratio of number of cores to the number of disks,for example
> > 1core/1disk ?
> >
> > Regards,
> > Kaushik
> >
>
> Confidentiality Notice:  The information contained in this message,
> including any attachments hereto, may be confidential and is intended to be
> read only by the individual or entity to whom this message is addressed. If
> the reader of this message is not the intended recipient or an agent or
> designee of the intended recipient, please note that any review, use,
> disclosure or distribution of this message or its attachments, in any form,
> is strictly prohibited.  If you have received this message in error, please
> immediately notify the sender and/or Notifications@carrieriq.com and
> delete or destroy any copy of this message and its attachments.
>

RE: Regarding Hardware configuration for HBase cluster

Posted by Vladimir Rodionov <vr...@carrieriq.com>.
Yes,

1. What is the expected avg and peak load in writes/updates/deletes/reads?
2. What is the average size of a KV?
3. Reads/small scans/medium/large scan %%
4. Do you plan M/R jobs, Hive query?


Best regards,
Vladimir Rodionov
Principal Platform Engineer
Carrier IQ, www.carrieriq.com
e-mail: vrodionov@carrieriq.com

________________________________________
From: Nick Xie [nick.xie.hadoop@gmail.com]
Sent: Tuesday, February 04, 2014 10:02 AM
To: user@hbase.apache.org
Subject: Re: Regarding Hardware configuration for HBase cluster

I guess you'd better describe a little bit more about your applications.
Does the data increase over the time at all?

Nick


On Tue, Feb 4, 2014 at 5:22 AM, suresh babu <bi...@gmail.com> wrote:

> Hi folks,
>
> We are trying to setup HBase cluster for the following requirement:
>
> We have to maintain data of size around 800TB,
>
> For the above requirement,please suggest me the best hardware configuration
> details like
>
> 1)how many disks to consider for machine and the  capacity of disks ,for
> example, 16/24 disks per node with 1/2TB capacity per each disk
>
> 2) which compression method is suited for production environment , space is
> not a major limitation , but speed is of prime concern for my use case
>
> 3) how many CPU Cores should be configured for each node/machine ?  Or
> ideal ratio of number of cores to the number of disks,for example
> 1core/1disk ?
>
> Regards,
> Kaushik
>

Confidentiality Notice:  The information contained in this message, including any attachments hereto, may be confidential and is intended to be read only by the individual or entity to whom this message is addressed. If the reader of this message is not the intended recipient or an agent or designee of the intended recipient, please note that any review, use, disclosure or distribution of this message or its attachments, in any form, is strictly prohibited.  If you have received this message in error, please immediately notify the sender and/or Notifications@carrieriq.com and delete or destroy any copy of this message and its attachments.

Re: Regarding Hardware configuration for HBase cluster

Posted by Nick Xie <ni...@gmail.com>.
I guess you'd better describe a little bit more about your applications.
Does the data increase over the time at all?

Nick


On Tue, Feb 4, 2014 at 5:22 AM, suresh babu <bi...@gmail.com> wrote:

> Hi folks,
>
> We are trying to setup HBase cluster for the following requirement:
>
> We have to maintain data of size around 800TB,
>
> For the above requirement,please suggest me the best hardware configuration
> details like
>
> 1)how many disks to consider for machine and the  capacity of disks ,for
> example, 16/24 disks per node with 1/2TB capacity per each disk
>
> 2) which compression method is suited for production environment , space is
> not a major limitation , but speed is of prime concern for my use case
>
> 3) how many CPU Cores should be configured for each node/machine ?  Or
> ideal ratio of number of cores to the number of disks,for example
> 1core/1disk ?
>
> Regards,
> Kaushik
>