You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Weishung Chung <we...@gmail.com> on 2011/03/10 18:12:48 UTC

cost estimation

I am trying to estimate the cost of hosting own HBase cluster vs using EC2.
Could anyone give me some guidance?
Cluster size ~ 6 to 8 nodes
Usage ~ at least 12 hours/day with lot of read/write operations. (I know I
need to have more concrete usage number here)

Thank you so much :)

Re: cost estimation

Posted by Weishung Chung <we...@gmail.com>.
Thank you :)
I also found this cost calculator for EC2
http://calculator.s3.amazonaws.com/calc5.html
<http://calculator.s3.amazonaws.com/calc5.html>

On Thu, Mar 10, 2011 at 11:38 AM, Ted Dunning <td...@maprtech.com> wrote:

> With no information whatsoever about size of the data, I would guess a cost
> of about $4000 / node with annual hosting and power requirements about
> $2000/year.
>
> This is probably no more accurate than one order of magnitude.  It has a
> decent chance of being on the close order of magnitude.   In particular, you
> might want
> a lot of memory.  It is unlikely you want a lot of disks.
>
> You can do the math yourself on the EC2 costs.
>
> On Thu, Mar 10, 2011 at 9:12 AM, Weishung Chung <we...@gmail.com>wrote:
>
>> I am trying to estimate the cost of hosting own HBase cluster vs using
>> EC2.
>> Could anyone give me some guidance?
>> Cluster size ~ 6 to 8 nodes
>> Usage ~ at least 12 hours/day with lot of read/write operations. (I know I
>> need to have more concrete usage number here)
>>
>> Thank you so much :)
>>
>
>

Re: cost estimation

Posted by Ted Dunning <td...@maprtech.com>.
With no information whatsoever about size of the data, I would guess a cost
of about $4000 / node with annual hosting and power requirements about
$2000/year.

This is probably no more accurate than one order of magnitude.  It has a
decent chance of being on the close order of magnitude.   In particular, you
might want
a lot of memory.  It is unlikely you want a lot of disks.

You can do the math yourself on the EC2 costs.

On Thu, Mar 10, 2011 at 9:12 AM, Weishung Chung <we...@gmail.com> wrote:

> I am trying to estimate the cost of hosting own HBase cluster vs using EC2.
> Could anyone give me some guidance?
> Cluster size ~ 6 to 8 nodes
> Usage ~ at least 12 hours/day with lot of read/write operations. (I know I
> need to have more concrete usage number here)
>
> Thank you so much :)
>

Re: Get Question

Posted by Ryan Rawson <ry...@gmail.com>.
Depends on how well cached you are.

Remember, random gets require disk seeks.  239 gets/sec is 239 * 1-3
seeks/sec (1-3 store files per get appx).  So that seems reasonable
yes, sorry.

-ryan

On Thu, Mar 10, 2011 at 3:55 PM, Peter Haidinyak <ph...@local.com> wrote:
> For the first time I am using a get to retrieve a record vs. a scan to pull back a bunch of records. I discovered that I am only seeing 239 Gets per second. This is causing my import time to go from 90 seconds to over 27 minutes. Any idea of what would be a 'normal' get rate?
>
> Thanks
>
> -Pete
>

Get Question

Posted by Peter Haidinyak <ph...@local.com>.
For the first time I am using a get to retrieve a record vs. a scan to pull back a bunch of records. I discovered that I am only seeing 239 Gets per second. This is causing my import time to go from 90 seconds to over 27 minutes. Any idea of what would be a 'normal' get rate?

Thanks

-Pete

Re: cost estimation

Posted by Lars George <la...@gmail.com>.
Hi,

That is an interesting question and I noticed the same: stopped instances (backed by EBS) get a new IP at start. Only restarts has the IP survive. 

Not sure how to handle this but add some extra scripts to patch the configs on start. 

Messy. 

Anyone with experience willing to chime in?

Lars

On Mar 11, 2011, at 0:46, Peter Haidinyak <ph...@local.com> wrote:

> I just took a day course on the Amazon Cloud and he had mentioned the every time you spin up a VM it gets a different IP and Host name. If this is true how do you keep the configuration files current every time you add a new VM or power on an existing Cluster?
> 
> Thanks
> 
> -Pete
> 
> -----Original Message-----
> From: Andrew Purtell [mailto:apurtell@apache.org] 
> Sent: Thursday, March 10, 2011 3:31 PM
> To: user@hbase.apache.org
> Subject: Re: cost estimation
> 
> Everything Gary said.
> 
> Something interesting Netflix said this week at the ccevent conference was they were able to depreciate Reserved Instance payments as a capital expenditure.
> 
> Also, c1.xlarge is one of only three instance types that seem to get its own physical server for each instance (others are m2.4xlarge and cc1.xlarge iirc). 
> 
>> From: Gary Helmling <gh...@gmail.com>
>> Subject: Re: cost estimation
>> To: user@hbase.apache.org
>> Date: Thursday, March 10, 2011, 9:37 AM
>> Hi Weishung,
>> 
>> See the EC2 instance pricing details here:
>> http://aws.amazon.com/ec2/#pricing
>> 
>> <http://aws.amazon.com/ec2/#pricing>and
>> try to calculate it out vs. price
>> quotes for hardware.
>> 
>> You'll need to run at _least_ m1.large or c1.xlarge instances for HBase.
>> There was a recent discussion thread covering EC2 performance.  You can
>> look it up at search-hadoop.com.
>> 
>> If you don't need the cluster running 24x7, maybe you can make the EC2
>> pricing work out.  Just be aware that you'll be taking a hit in raw IO
>> performance per node, so you may need to balance that out with more nodes
>> than you would need with using your own hardware.  If you need to persist
>> data between cluster restarts, you'll also need either EBS or S3 storage, so
>> be sure to factor that in.  Also factor in bandwidth costs if you need to
>> transfer a lot of data in/out of AWS.
>> 
>> My own impression is that EC2 is great and very cost effective for short
>> lived, on-demand computing resources.  We use it a great deal for functional
>> testing.  For 24x7 services, it seems like you pay a premium long term over
>> owning your own hardware, with advantage of no large up-front cost for
>> acquisition and access to easy elasticity to expand to meet demand, but with
>> a cost of reduced performance per node due to virtualization.
>> 
>> Best advice I can give is do some benchmarking to see how many nodes you
>> need to satisfy your processing requirements in EC2 vs on raw hardware and
>> try to comparatively price it out.
>> 
>> --gh
>> 
>> On Thu, Mar 10, 2011 at 9:12 AM, Weishung Chung <we...@gmail.com>
>> wrote:
>> 
>>> I am trying to estimate the cost of hosting own HBase
>> cluster vs using EC2.
>>> Could anyone give me some guidance?
>>> Cluster size ~ 6 to 8 nodes
>>> Usage ~ at least 12 hours/day with lot of read/write
>> operations. (I know I
>>> need to have more concrete usage number here)
>>> 
>>> Thank you so much :)
>>> 
>> 
> 
> 
> 

RE: cost estimation

Posted by Andrew Purtell <ap...@apache.org>.
Hi Peter,

We boot the master first, then boot the slaves after the master's IP address is known. 

Instances are initialized using user-data scripts. 

We do substitutions on config details when creating the user-data for the instances.

So this is sufficient for transient/testing clusters. For a cluster that would run for a long time or need to be reliable, you'd want to have a plan for if the master instance goes away. I think what would be relatively easy to do is grab an elastic IP (which gives you a "well known" DNS name also), assign it to the current master, then use RedHat Cluster Suite or similar with another instance as a hot spare, with DRDB replication of the fsimage from primary to secondary. Then the script which handles loss of the primary can remap the elastic IP and start a namenode on the secondary.

Best regards,

    - Andy

> From: Peter Haidinyak <ph...@local.com>
> Subject: RE: cost estimation
> To: "user@hbase.apache.org" <us...@hbase.apache.org>
> Date: Thursday, March 10, 2011, 3:46 PM
> I just took a day course on the
> Amazon Cloud and he had mentioned the every time you spin up
> a VM it gets a different IP and Host name. If this is true
> how do you keep the configuration files current every time
> you add a new VM or power on an existing Cluster?
> 
> Thanks
> 
> -Pete
> 
> -----Original Message-----
> From: Andrew Purtell [mailto:apurtell@apache.org]
> 
> Sent: Thursday, March 10, 2011 3:31 PM
> To: user@hbase.apache.org
> Subject: Re: cost estimation
> 
> Everything Gary said.
> 
> Something interesting Netflix said this week at the ccevent
> conference was they were able to depreciate Reserved
> Instance payments as a capital expenditure.
> 
> Also, c1.xlarge is one of only three instance types that
> seem to get its own physical server for each instance
> (others are m2.4xlarge and cc1.xlarge iirc). 
> 
> > From: Gary Helmling <gh...@gmail.com>
> > Subject: Re: cost estimation
> > To: user@hbase.apache.org
> > Date: Thursday, March 10, 2011, 9:37 AM
> > Hi Weishung,
> > 
> > See the EC2 instance pricing details here:
> > http://aws.amazon.com/ec2/#pricing
> > 
> > <http://aws.amazon.com/ec2/#pricing>and
> > try to calculate it out vs. price
> > quotes for hardware.
> > 
> > You'll need to run at _least_ m1.large or c1.xlarge
> instances for HBase.
> >  There was a recent discussion thread covering
> EC2 performance.  You can
> > look it up at search-hadoop.com.
> > 
> > If you don't need the cluster running 24x7, maybe you
> can make the EC2
> > pricing work out.  Just be aware that you'll be
> taking a hit in raw IO
> > performance per node, so you may need to balance that
> out with more nodes
> > than you would need with using your own hardware.  If
> you need to persist
> > data between cluster restarts, you'll also need either
> EBS or S3 storage, so
> > be sure to factor that in.  Also factor in bandwidth
> costs if you need to
> > transfer a lot of data in/out of AWS.
> > 
> > My own impression is that EC2 is great and very cost
> effective for short
> > lived, on-demand computing resources.  We use it a
> great deal for functional
> > testing.  For 24x7 services, it seems like you pay a
> premium long term over
> > owning your own hardware, with advantage of no large
> up-front cost for
> > acquisition and access to easy elasticity to expand to
> meet demand, but with
> > a cost of reduced performance per node due to
> virtualization.
> > 
> > Best advice I can give is do some benchmarking to see
> how many nodes you
> > need to satisfy your processing requirements in EC2 vs
> on raw hardware and
> > try to comparatively price it out.
> > 
> > --gh
> > 
> > On Thu, Mar 10, 2011 at 9:12 AM, Weishung Chung <we...@gmail.com>
> > wrote:
> > 
> > > I am trying to estimate the cost of hosting own
> HBase
> > cluster vs using EC2.
> > > Could anyone give me some guidance?
> > > Cluster size ~ 6 to 8 nodes
> > > Usage ~ at least 12 hours/day with lot of
> read/write
> > operations. (I know I
> > > need to have more concrete usage number here)
> > >
> > > Thank you so much :)
> > >
> > 
> 
> 
>       
> 


      

RE: cost estimation

Posted by Peter Haidinyak <ph...@local.com>.
I just took a day course on the Amazon Cloud and he had mentioned the every time you spin up a VM it gets a different IP and Host name. If this is true how do you keep the configuration files current every time you add a new VM or power on an existing Cluster?

Thanks

-Pete

-----Original Message-----
From: Andrew Purtell [mailto:apurtell@apache.org] 
Sent: Thursday, March 10, 2011 3:31 PM
To: user@hbase.apache.org
Subject: Re: cost estimation

Everything Gary said.

Something interesting Netflix said this week at the ccevent conference was they were able to depreciate Reserved Instance payments as a capital expenditure.

Also, c1.xlarge is one of only three instance types that seem to get its own physical server for each instance (others are m2.4xlarge and cc1.xlarge iirc). 

> From: Gary Helmling <gh...@gmail.com>
> Subject: Re: cost estimation
> To: user@hbase.apache.org
> Date: Thursday, March 10, 2011, 9:37 AM
> Hi Weishung,
> 
> See the EC2 instance pricing details here:
> http://aws.amazon.com/ec2/#pricing
> 
> <http://aws.amazon.com/ec2/#pricing>and
> try to calculate it out vs. price
> quotes for hardware.
> 
> You'll need to run at _least_ m1.large or c1.xlarge instances for HBase.
>  There was a recent discussion thread covering EC2 performance.  You can
> look it up at search-hadoop.com.
> 
> If you don't need the cluster running 24x7, maybe you can make the EC2
> pricing work out.  Just be aware that you'll be taking a hit in raw IO
> performance per node, so you may need to balance that out with more nodes
> than you would need with using your own hardware.  If you need to persist
> data between cluster restarts, you'll also need either EBS or S3 storage, so
> be sure to factor that in.  Also factor in bandwidth costs if you need to
> transfer a lot of data in/out of AWS.
> 
> My own impression is that EC2 is great and very cost effective for short
> lived, on-demand computing resources.  We use it a great deal for functional
> testing.  For 24x7 services, it seems like you pay a premium long term over
> owning your own hardware, with advantage of no large up-front cost for
> acquisition and access to easy elasticity to expand to meet demand, but with
> a cost of reduced performance per node due to virtualization.
> 
> Best advice I can give is do some benchmarking to see how many nodes you
> need to satisfy your processing requirements in EC2 vs on raw hardware and
> try to comparatively price it out.
> 
> --gh
> 
> On Thu, Mar 10, 2011 at 9:12 AM, Weishung Chung <we...@gmail.com>
> wrote:
> 
> > I am trying to estimate the cost of hosting own HBase
> cluster vs using EC2.
> > Could anyone give me some guidance?
> > Cluster size ~ 6 to 8 nodes
> > Usage ~ at least 12 hours/day with lot of read/write
> operations. (I know I
> > need to have more concrete usage number here)
> >
> > Thank you so much :)
> >
> 


      

Re: cost estimation

Posted by Andrew Purtell <ap...@apache.org>.
Everything Gary said.

Something interesting Netflix said this week at the ccevent conference was they were able to depreciate Reserved Instance payments as a capital expenditure.

Also, c1.xlarge is one of only three instance types that seem to get its own physical server for each instance (others are m2.4xlarge and cc1.xlarge iirc). 

> From: Gary Helmling <gh...@gmail.com>
> Subject: Re: cost estimation
> To: user@hbase.apache.org
> Date: Thursday, March 10, 2011, 9:37 AM
> Hi Weishung,
> 
> See the EC2 instance pricing details here:
> http://aws.amazon.com/ec2/#pricing
> 
> <http://aws.amazon.com/ec2/#pricing>and
> try to calculate it out vs. price
> quotes for hardware.
> 
> You'll need to run at _least_ m1.large or c1.xlarge instances for HBase.
>  There was a recent discussion thread covering EC2 performance.  You can
> look it up at search-hadoop.com.
> 
> If you don't need the cluster running 24x7, maybe you can make the EC2
> pricing work out.  Just be aware that you'll be taking a hit in raw IO
> performance per node, so you may need to balance that out with more nodes
> than you would need with using your own hardware.  If you need to persist
> data between cluster restarts, you'll also need either EBS or S3 storage, so
> be sure to factor that in.  Also factor in bandwidth costs if you need to
> transfer a lot of data in/out of AWS.
> 
> My own impression is that EC2 is great and very cost effective for short
> lived, on-demand computing resources.  We use it a great deal for functional
> testing.  For 24x7 services, it seems like you pay a premium long term over
> owning your own hardware, with advantage of no large up-front cost for
> acquisition and access to easy elasticity to expand to meet demand, but with
> a cost of reduced performance per node due to virtualization.
> 
> Best advice I can give is do some benchmarking to see how many nodes you
> need to satisfy your processing requirements in EC2 vs on raw hardware and
> try to comparatively price it out.
> 
> --gh
> 
> On Thu, Mar 10, 2011 at 9:12 AM, Weishung Chung <we...@gmail.com>
> wrote:
> 
> > I am trying to estimate the cost of hosting own HBase
> cluster vs using EC2.
> > Could anyone give me some guidance?
> > Cluster size ~ 6 to 8 nodes
> > Usage ~ at least 12 hours/day with lot of read/write
> operations. (I know I
> > need to have more concrete usage number here)
> >
> > Thank you so much :)
> >
> 


      

Re: cost estimation

Posted by Gary Helmling <gh...@gmail.com>.
Hi Weishung,

See the EC2 instance pricing details here:
http://aws.amazon.com/ec2/#pricing

<http://aws.amazon.com/ec2/#pricing>and try to calculate it out vs. price
quotes for hardware.

You'll need to run at _least_ m1.large or c1.xlarge instances for HBase.
 There was a recent discussion thread covering EC2 performance.  You can
look it up at search-hadoop.com.

If you don't need the cluster running 24x7, maybe you can make the EC2
pricing work out.  Just be aware that you'll be taking a hit in raw IO
performance per node, so you may need to balance that out with more nodes
than you would need with using your own hardware.  If you need to persist
data between cluster restarts, you'll also need either EBS or S3 storage, so
be sure to factor that in.  Also factor in bandwidth costs if you need to
transfer a lot of data in/out of AWS.

My own impression is that EC2 is great and very cost effective for short
lived, on-demand computing resources.  We use it a great deal for functional
testing.  For 24x7 services, it seems like you pay a premium long term over
owning your own hardware, with advantage of no large up-front cost for
acquisition and access to easy elasticity to expand to meet demand, but with
a cost of reduced performance per node due to virtualization.

Best advice I can give is do some benchmarking to see how many nodes you
need to satisfy your processing requirements in EC2 vs on raw hardware and
try to comparatively price it out.

--gh

On Thu, Mar 10, 2011 at 9:12 AM, Weishung Chung <we...@gmail.com> wrote:

> I am trying to estimate the cost of hosting own HBase cluster vs using EC2.
> Could anyone give me some guidance?
> Cluster size ~ 6 to 8 nodes
> Usage ~ at least 12 hours/day with lot of read/write operations. (I know I
> need to have more concrete usage number here)
>
> Thank you so much :)
>