You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Iulia Zidaru <iu...@1and1.ro> on 2011/05/02 08:53:05 UTC

Re: Hardware configuration

  Thank you both. How would you estimate really big clusters, with 
hundreds of nodes? Requirements might change in time and replacing an 
entire cluster seems not the best solution...



On 04/29/2011 07:08 PM, Stack wrote:
> I agree with Michel Segel.  Distributed computing is hard enough.
> There is no need to add extra complexity.
>
> St.Ack
>
> On Fri, Apr 29, 2011 at 4:05 AM, Iulia Zidaru<iu...@1and1.ro>  wrote:
>>   Hi,
>> I'm wondering if having a cluster with different machines in terms of CPU,
>> RAM and disk space would be a big issue for HBase. For example, machines
>> with 12GBs RAM and machines with 48GBs. We suppose that we use them at full
>> capacity. What problems we might encounter if having this kind of
>> configuration?
>> Thank you,
>> Iulia
>>
>>


-- 
Iulia Zidaru
Java Developer

1&1 Internet AG - Bucharest/Romania - Web Components Romania
18 Mircea Eliade St
Sect 1, Bucharest
RO Bucharest, 012015
iulia.zidaru@1and1.ro
0040 31 223 9153

  


Re: Hardware configuration

Posted by Iulia Zidaru <iu...@1and1.ro>.
Thank you for your detailed explanation. It helps a lot.
Iulia


On 05/02/2011 04:57 PM, Michael Segel wrote:
> Hi,
>
> That's actually a really good question.
> Unfortunately, the answer isn't really simple.
>
> You're going to need to estimate your growth and you're going to need to estimate your configuration.
>
> Suppose I know that within 2 years, the amount of data that I want to retain is going to be 1PB, with a 3x replication factor, I'll need at least 3PB of disk. Assuming that I can fit 12x2TB drives in a node, I'll need 125-150 machines. (There's some overhead for logging and OS)
>
> Now this doesn't mean that I'll need to buy all of the machines today and build out the cluster.
> It means that I will need to figure out my machine room, (rack space, power, etc...) and also hardware configuration.
>
> You'll also need to plan out your hardware choices too. An example.. you may want 10GBe on the switch but not at the data node. However you're going to want to be able to expand your data nodes to be able to add 10GBe cards.
>
> The idea is that as I build out my cluster, all of the machines have the same look and feel. So if you buy quad core CPUs and they are 2.2 GHz but 6 months from now, you buy 2.6 GHz cpus, as long as they are 4 core cpus, your cluster will look the same.
>
> The point is that when you lay out your cluster to start with, you'll need to plan ahead and keep things similar. Also you'll need to make sure your NameNode has enough memory...
>
> Having said that... Yahoo! has written a paper detailing MR2 (next generation of map/reduce).  As the M/R Job scheduler becomes more intelligent about the types of jobs and types of hardware, the consistency of hardware becomes less important.
>
> With respect to HBase, I suspect there to be a parallel evolution.
>
> As to building out and replacing your cluster... if this is a production environment, you'll have to think about DR and building out a second cluster. So the cost of replacing clusters should also be factored in when you budget for hardware.
>
> Like I said, its not a simple answer and you have to approach each instance separately and fine tune your cluster plans.
>
> HTH
>
> -Mike
>
>
> ----------------------------------------
>> Date: Mon, 2 May 2011 09:53:05 +0300
>> From: iulia.zidaru@1and1.ro
>> To: user@hbase.apache.org
>> CC: stack@duboce.net
>> Subject: Re: Hardware configuration
>>
>> Thank you both. How would you estimate really big clusters, with
>> hundreds of nodes? Requirements might change in time and replacing an
>> entire cluster seems not the best solution...
>>
>>
>>
>> On 04/29/2011 07:08 PM, Stack wrote:
>>> I agree with Michel Segel. Distributed computing is hard enough.
>>> There is no need to add extra complexity.
>>>
>>> St.Ack
>>>
>>> On Fri, Apr 29, 2011 at 4:05 AM, Iulia Zidaru wrote:
>>>> Hi,
>>>> I'm wondering if having a cluster with different machines in terms of CPU,
>>>> RAM and disk space would be a big issue for HBase. For example, machines
>>>> with 12GBs RAM and machines with 48GBs. We suppose that we use them at full
>>>> capacity. What problems we might encounter if having this kind of
>>>> configuration?
>>>> Thank you,
>>>> Iulia
>>>>
>>>>
>>
>> --
>> Iulia Zidaru
>> Java Developer
>>
>> 1&1 Internet AG - Bucharest/Romania - Web Components Romania
>> 18 Mircea Eliade St
>> Sect 1, Bucharest
>> RO Bucharest, 012015
>> iulia.zidaru@1and1.ro
>> 0040 31 223 9153
>>
>>
>>
>   		 	   		



Re: Hardware configuration

Posted by Iulia Zidaru <iu...@1and1.ro>.
  Thank you all. It really helps to see the points that guide you when 
choosing the hardware.



On 05/02/2011 08:45 PM, Ted Dunning wrote:
> For map-reduce, the balancing is easier because you can configure slots.  It
> would be nice to be
> able to express cores and memory separately, but slots are pretty good.
>
> For HDFS, the situation is much worse because the balancing is based on
> percent fill.  That leaves
> you with much less available space on smaller machines.  You also wind up
> with odd segregation by
> age between different kinds of data.  That leads to poor I/O performance.
>
> On Mon, May 2, 2011 at 10:31 AM, Jean-Daniel Cryans<jd...@apache.org>wrote:
>
>> I think the first issues you would encounter (just regarding HBase,
>> not talking about MR) is that if you have wildly different HW some
>> nodes might be able to handle their share of the load but some others
>> might not. At the moment the master doesn't know about the HW on the
>> slave nodes so it will just balance the regions equally. You would put
>> yourself in a situation where you would need to disable the balancer
>> and then do its job by yourself.
>>
>> Problems like that.
>>
>> J-D
>>
>> On Mon, May 2, 2011 at 10:03 AM, Chris Tarnas<cf...@email.com>  wrote:
>>> What are some of the common pitfalls of having different configurations
>> for different nodes? Is the problem more management issues, making sure each
>> type of node has its own config (so a 12 core box has 12 mappers and
>> reduces, an 8 core has 8, drive layouts, etc) or are there problems that
>> configuration changes can't deal with?
>>> thanks,
>>> -chris
>>>
>>> On May 2, 2011, at 6:57 AM, Michael Segel wrote:
>>>
>>>> Hi,
>>>>
>>>> That's actually a really good question.
>>>> Unfortunately, the answer isn't really simple.
>>>>
>>>> You're going to need to estimate your growth and you're going to need to
>> estimate your configuration.
>>>> Suppose I know that within 2 years, the amount of data that I want to
>> retain is going to be 1PB, with a 3x replication factor, I'll need at least
>> 3PB of disk. Assuming that I can fit 12x2TB drives in a node, I'll need
>> 125-150 machines. (There's some overhead for logging and OS)
>>>> Now this doesn't mean that I'll need to buy all of the machines today
>> and build out the cluster.
>>>> It means that I will need to figure out my machine room, (rack space,
>> power, etc...) and also hardware configuration.
>>>> You'll also need to plan out your hardware choices too. An example.. you
>> may want 10GBe on the switch but not at the data node. However you're going
>> to want to be able to expand your data nodes to be able to add 10GBe cards.
>>>> The idea is that as I build out my cluster, all of the machines have the
>> same look and feel. So if you buy quad core CPUs and they are 2.2 GHz but 6
>> months from now, you buy 2.6 GHz cpus, as long as they are 4 core cpus, your
>> cluster will look the same.
>>>> The point is that when you lay out your cluster to start with, you'll
>> need to plan ahead and keep things similar. Also you'll need to make sure
>> your NameNode has enough memory...
>>>> Having said that... Yahoo! has written a paper detailing MR2 (next
>> generation of map/reduce).  As the M/R Job scheduler becomes more
>> intelligent about the types of jobs and types of hardware, the consistency
>> of hardware becomes less important.
>>>> With respect to HBase, I suspect there to be a parallel evolution.
>>>>
>>>> As to building out and replacing your cluster... if this is a production
>> environment, you'll have to think about DR and building out a second
>> cluster. So the cost of replacing clusters should also be factored in when
>> you budget for hardware.
>>>> Like I said, its not a simple answer and you have to approach each
>> instance separately and fine tune your cluster plans.
>>>> HTH
>>>>
>>>> -Mike
>>>>
>>>>
>>>> ----------------------------------------
>>>>> Date: Mon, 2 May 2011 09:53:05 +0300
>>>>> From: iulia.zidaru@1and1.ro
>>>>> To: user@hbase.apache.org
>>>>> CC: stack@duboce.net
>>>>> Subject: Re: Hardware configuration
>>>>>
>>>>> Thank you both. How would you estimate really big clusters, with
>>>>> hundreds of nodes? Requirements might change in time and replacing an
>>>>> entire cluster seems not the best solution...
>>>>>
>>>>>
>>>>>
>>>>> On 04/29/2011 07:08 PM, Stack wrote:
>>>>>> I agree with Michel Segel. Distributed computing is hard enough.
>>>>>> There is no need to add extra complexity.
>>>>>>
>>>>>> St.Ack
>>>>>>
>>>>>> On Fri, Apr 29, 2011 at 4:05 AM, Iulia Zidaru wrote:
>>>>>>> Hi,
>>>>>>> I'm wondering if having a cluster with different machines in terms of
>> CPU,
>>>>>>> RAM and disk space would be a big issue for HBase. For example,
>> machines
>>>>>>> with 12GBs RAM and machines with 48GBs. We suppose that we use them
>> at full
>>>>>>> capacity. What problems we might encounter if having this kind of
>>>>>>> configuration?
>>>>>>> Thank you,
>>>>>>> Iulia
>>>>>>>
>>>>>>>
>>>>>
>>>>> --
>>>>> Iulia Zidaru
>>>>> Java Developer
>>>>>
>>>>> 1&1 Internet AG - Bucharest/Romania - Web Components Romania
>>>>> 18 Mircea Eliade St
>>>>> Sect 1, Bucharest
>>>>> RO Bucharest, 012015
>>>>> iulia.zidaru@1and1.ro
>>>>> 0040 31 223 9153
>>>>>
>>>>>
>>>>>
>>>



Re: Hardware configuration

Posted by Ted Dunning <td...@maprtech.com>.
For map-reduce, the balancing is easier because you can configure slots.  It
would be nice to be
able to express cores and memory separately, but slots are pretty good.

For HDFS, the situation is much worse because the balancing is based on
percent fill.  That leaves
you with much less available space on smaller machines.  You also wind up
with odd segregation by
age between different kinds of data.  That leads to poor I/O performance.

On Mon, May 2, 2011 at 10:31 AM, Jean-Daniel Cryans <jd...@apache.org>wrote:

> I think the first issues you would encounter (just regarding HBase,
> not talking about MR) is that if you have wildly different HW some
> nodes might be able to handle their share of the load but some others
> might not. At the moment the master doesn't know about the HW on the
> slave nodes so it will just balance the regions equally. You would put
> yourself in a situation where you would need to disable the balancer
> and then do its job by yourself.
>
> Problems like that.
>
> J-D
>
> On Mon, May 2, 2011 at 10:03 AM, Chris Tarnas <cf...@email.com> wrote:
> > What are some of the common pitfalls of having different configurations
> for different nodes? Is the problem more management issues, making sure each
> type of node has its own config (so a 12 core box has 12 mappers and
> reduces, an 8 core has 8, drive layouts, etc) or are there problems that
> configuration changes can't deal with?
> >
> > thanks,
> > -chris
> >
> > On May 2, 2011, at 6:57 AM, Michael Segel wrote:
> >
> >>
> >> Hi,
> >>
> >> That's actually a really good question.
> >> Unfortunately, the answer isn't really simple.
> >>
> >> You're going to need to estimate your growth and you're going to need to
> estimate your configuration.
> >>
> >> Suppose I know that within 2 years, the amount of data that I want to
> retain is going to be 1PB, with a 3x replication factor, I'll need at least
> 3PB of disk. Assuming that I can fit 12x2TB drives in a node, I'll need
> 125-150 machines. (There's some overhead for logging and OS)
> >>
> >> Now this doesn't mean that I'll need to buy all of the machines today
> and build out the cluster.
> >> It means that I will need to figure out my machine room, (rack space,
> power, etc...) and also hardware configuration.
> >>
> >> You'll also need to plan out your hardware choices too. An example.. you
> may want 10GBe on the switch but not at the data node. However you're going
> to want to be able to expand your data nodes to be able to add 10GBe cards.
> >>
> >> The idea is that as I build out my cluster, all of the machines have the
> same look and feel. So if you buy quad core CPUs and they are 2.2 GHz but 6
> months from now, you buy 2.6 GHz cpus, as long as they are 4 core cpus, your
> cluster will look the same.
> >>
> >> The point is that when you lay out your cluster to start with, you'll
> need to plan ahead and keep things similar. Also you'll need to make sure
> your NameNode has enough memory...
> >>
> >> Having said that... Yahoo! has written a paper detailing MR2 (next
> generation of map/reduce).  As the M/R Job scheduler becomes more
> intelligent about the types of jobs and types of hardware, the consistency
> of hardware becomes less important.
> >>
> >> With respect to HBase, I suspect there to be a parallel evolution.
> >>
> >> As to building out and replacing your cluster... if this is a production
> environment, you'll have to think about DR and building out a second
> cluster. So the cost of replacing clusters should also be factored in when
> you budget for hardware.
> >>
> >> Like I said, its not a simple answer and you have to approach each
> instance separately and fine tune your cluster plans.
> >>
> >> HTH
> >>
> >> -Mike
> >>
> >>
> >> ----------------------------------------
> >>> Date: Mon, 2 May 2011 09:53:05 +0300
> >>> From: iulia.zidaru@1and1.ro
> >>> To: user@hbase.apache.org
> >>> CC: stack@duboce.net
> >>> Subject: Re: Hardware configuration
> >>>
> >>> Thank you both. How would you estimate really big clusters, with
> >>> hundreds of nodes? Requirements might change in time and replacing an
> >>> entire cluster seems not the best solution...
> >>>
> >>>
> >>>
> >>> On 04/29/2011 07:08 PM, Stack wrote:
> >>>> I agree with Michel Segel. Distributed computing is hard enough.
> >>>> There is no need to add extra complexity.
> >>>>
> >>>> St.Ack
> >>>>
> >>>> On Fri, Apr 29, 2011 at 4:05 AM, Iulia Zidaru wrote:
> >>>>> Hi,
> >>>>> I'm wondering if having a cluster with different machines in terms of
> CPU,
> >>>>> RAM and disk space would be a big issue for HBase. For example,
> machines
> >>>>> with 12GBs RAM and machines with 48GBs. We suppose that we use them
> at full
> >>>>> capacity. What problems we might encounter if having this kind of
> >>>>> configuration?
> >>>>> Thank you,
> >>>>> Iulia
> >>>>>
> >>>>>
> >>>
> >>>
> >>> --
> >>> Iulia Zidaru
> >>> Java Developer
> >>>
> >>> 1&1 Internet AG - Bucharest/Romania - Web Components Romania
> >>> 18 Mircea Eliade St
> >>> Sect 1, Bucharest
> >>> RO Bucharest, 012015
> >>> iulia.zidaru@1and1.ro
> >>> 0040 31 223 9153
> >>>
> >>>
> >>>
> >>
> >
> >
>

Re: Hardware configuration

Posted by Jean-Daniel Cryans <jd...@apache.org>.
I think the first issues you would encounter (just regarding HBase,
not talking about MR) is that if you have wildly different HW some
nodes might be able to handle their share of the load but some others
might not. At the moment the master doesn't know about the HW on the
slave nodes so it will just balance the regions equally. You would put
yourself in a situation where you would need to disable the balancer
and then do its job by yourself.

Problems like that.

J-D

On Mon, May 2, 2011 at 10:03 AM, Chris Tarnas <cf...@email.com> wrote:
> What are some of the common pitfalls of having different configurations for different nodes? Is the problem more management issues, making sure each type of node has its own config (so a 12 core box has 12 mappers and reduces, an 8 core has 8, drive layouts, etc) or are there problems that configuration changes can't deal with?
>
> thanks,
> -chris
>
> On May 2, 2011, at 6:57 AM, Michael Segel wrote:
>
>>
>> Hi,
>>
>> That's actually a really good question.
>> Unfortunately, the answer isn't really simple.
>>
>> You're going to need to estimate your growth and you're going to need to estimate your configuration.
>>
>> Suppose I know that within 2 years, the amount of data that I want to retain is going to be 1PB, with a 3x replication factor, I'll need at least 3PB of disk. Assuming that I can fit 12x2TB drives in a node, I'll need 125-150 machines. (There's some overhead for logging and OS)
>>
>> Now this doesn't mean that I'll need to buy all of the machines today and build out the cluster.
>> It means that I will need to figure out my machine room, (rack space, power, etc...) and also hardware configuration.
>>
>> You'll also need to plan out your hardware choices too. An example.. you may want 10GBe on the switch but not at the data node. However you're going to want to be able to expand your data nodes to be able to add 10GBe cards.
>>
>> The idea is that as I build out my cluster, all of the machines have the same look and feel. So if you buy quad core CPUs and they are 2.2 GHz but 6 months from now, you buy 2.6 GHz cpus, as long as they are 4 core cpus, your cluster will look the same.
>>
>> The point is that when you lay out your cluster to start with, you'll need to plan ahead and keep things similar. Also you'll need to make sure your NameNode has enough memory...
>>
>> Having said that... Yahoo! has written a paper detailing MR2 (next generation of map/reduce).  As the M/R Job scheduler becomes more intelligent about the types of jobs and types of hardware, the consistency of hardware becomes less important.
>>
>> With respect to HBase, I suspect there to be a parallel evolution.
>>
>> As to building out and replacing your cluster... if this is a production environment, you'll have to think about DR and building out a second cluster. So the cost of replacing clusters should also be factored in when you budget for hardware.
>>
>> Like I said, its not a simple answer and you have to approach each instance separately and fine tune your cluster plans.
>>
>> HTH
>>
>> -Mike
>>
>>
>> ----------------------------------------
>>> Date: Mon, 2 May 2011 09:53:05 +0300
>>> From: iulia.zidaru@1and1.ro
>>> To: user@hbase.apache.org
>>> CC: stack@duboce.net
>>> Subject: Re: Hardware configuration
>>>
>>> Thank you both. How would you estimate really big clusters, with
>>> hundreds of nodes? Requirements might change in time and replacing an
>>> entire cluster seems not the best solution...
>>>
>>>
>>>
>>> On 04/29/2011 07:08 PM, Stack wrote:
>>>> I agree with Michel Segel. Distributed computing is hard enough.
>>>> There is no need to add extra complexity.
>>>>
>>>> St.Ack
>>>>
>>>> On Fri, Apr 29, 2011 at 4:05 AM, Iulia Zidaru wrote:
>>>>> Hi,
>>>>> I'm wondering if having a cluster with different machines in terms of CPU,
>>>>> RAM and disk space would be a big issue for HBase. For example, machines
>>>>> with 12GBs RAM and machines with 48GBs. We suppose that we use them at full
>>>>> capacity. What problems we might encounter if having this kind of
>>>>> configuration?
>>>>> Thank you,
>>>>> Iulia
>>>>>
>>>>>
>>>
>>>
>>> --
>>> Iulia Zidaru
>>> Java Developer
>>>
>>> 1&1 Internet AG - Bucharest/Romania - Web Components Romania
>>> 18 Mircea Eliade St
>>> Sect 1, Bucharest
>>> RO Bucharest, 012015
>>> iulia.zidaru@1and1.ro
>>> 0040 31 223 9153
>>>
>>>
>>>
>>
>
>

Re: Hardware configuration

Posted by Chris Tarnas <cf...@email.com>.
What are some of the common pitfalls of having different configurations for different nodes? Is the problem more management issues, making sure each type of node has its own config (so a 12 core box has 12 mappers and reduces, an 8 core has 8, drive layouts, etc) or are there problems that configuration changes can't deal with? 

thanks,
-chris

On May 2, 2011, at 6:57 AM, Michael Segel wrote:

> 
> Hi,
> 
> That's actually a really good question.
> Unfortunately, the answer isn't really simple.
> 
> You're going to need to estimate your growth and you're going to need to estimate your configuration.
> 
> Suppose I know that within 2 years, the amount of data that I want to retain is going to be 1PB, with a 3x replication factor, I'll need at least 3PB of disk. Assuming that I can fit 12x2TB drives in a node, I'll need 125-150 machines. (There's some overhead for logging and OS)
> 
> Now this doesn't mean that I'll need to buy all of the machines today and build out the cluster.
> It means that I will need to figure out my machine room, (rack space, power, etc...) and also hardware configuration.
> 
> You'll also need to plan out your hardware choices too. An example.. you may want 10GBe on the switch but not at the data node. However you're going to want to be able to expand your data nodes to be able to add 10GBe cards.
> 
> The idea is that as I build out my cluster, all of the machines have the same look and feel. So if you buy quad core CPUs and they are 2.2 GHz but 6 months from now, you buy 2.6 GHz cpus, as long as they are 4 core cpus, your cluster will look the same.
> 
> The point is that when you lay out your cluster to start with, you'll need to plan ahead and keep things similar. Also you'll need to make sure your NameNode has enough memory...
> 
> Having said that... Yahoo! has written a paper detailing MR2 (next generation of map/reduce).  As the M/R Job scheduler becomes more intelligent about the types of jobs and types of hardware, the consistency of hardware becomes less important. 
> 
> With respect to HBase, I suspect there to be a parallel evolution.
> 
> As to building out and replacing your cluster... if this is a production environment, you'll have to think about DR and building out a second cluster. So the cost of replacing clusters should also be factored in when you budget for hardware.
> 
> Like I said, its not a simple answer and you have to approach each instance separately and fine tune your cluster plans.
> 
> HTH
> 
> -Mike
> 
> 
> ----------------------------------------
>> Date: Mon, 2 May 2011 09:53:05 +0300
>> From: iulia.zidaru@1and1.ro
>> To: user@hbase.apache.org
>> CC: stack@duboce.net
>> Subject: Re: Hardware configuration
>> 
>> Thank you both. How would you estimate really big clusters, with
>> hundreds of nodes? Requirements might change in time and replacing an
>> entire cluster seems not the best solution...
>> 
>> 
>> 
>> On 04/29/2011 07:08 PM, Stack wrote:
>>> I agree with Michel Segel. Distributed computing is hard enough.
>>> There is no need to add extra complexity.
>>> 
>>> St.Ack
>>> 
>>> On Fri, Apr 29, 2011 at 4:05 AM, Iulia Zidaru wrote:
>>>> Hi,
>>>> I'm wondering if having a cluster with different machines in terms of CPU,
>>>> RAM and disk space would be a big issue for HBase. For example, machines
>>>> with 12GBs RAM and machines with 48GBs. We suppose that we use them at full
>>>> capacity. What problems we might encounter if having this kind of
>>>> configuration?
>>>> Thank you,
>>>> Iulia
>>>> 
>>>> 
>> 
>> 
>> --
>> Iulia Zidaru
>> Java Developer
>> 
>> 1&1 Internet AG - Bucharest/Romania - Web Components Romania
>> 18 Mircea Eliade St
>> Sect 1, Bucharest
>> RO Bucharest, 012015
>> iulia.zidaru@1and1.ro
>> 0040 31 223 9153
>> 
>> 
>> 
> 		 	   		  


Re: Hardware configuration

Posted by Ian Roughley <ro...@gmail.com>.
Sorry - I meant to answer Iulia, not Michael.  I was speaking more generally, as there is also no
guarantee that MR jobs are running.  So perhaps I should add in deployment / running server
architecture.

/Ian

On 05/02/2011 01:47 PM, Jean-Daniel Cryans wrote:
> Ian,
> 
> Regarding your first point, I understand where the concern is coming
> from but I'd like to point out that with the new MemStore-Local
> Allocation Buffers the full GCs taking minutes might not be as much as
> an issue as it used to be. That said, I haven't tested that out yet
> and I don't know of anyone that did it.
> 
> Your second point is dead-on. Also not only it takes time to
> replicate, but it can also steal precious IO and in 0.20 it's pretty
> much impossible to limit the rate of re-replication.
> 
> J-D
> 
> On Mon, May 2, 2011 at 7:30 AM, Ian Roughley <ro...@gmail.com> wrote:
>> I think that there are two important considerations:
>> 1. Can the JVM you're planning on using support a heap of > 10GB, if not, you're wasting money
>> 2. Putting more disk on nodes, means that a failure will take longer to re-replicate back to it's
>> balanced state.  i.e. Given you're network topology, how long will even a 50TB machine take, a day a
>> week, longer?
>>
>> /Ian
>> Architect / Mgr - Novell Vibe
>>
>> On 05/02/2011 09:57 AM, Michael Segel wrote:
>>>
>>> Hi,
>>>
>>> That's actually a really good question.
>>> Unfortunately, the answer isn't really simple.
>>>
>>> You're going to need to estimate your growth and you're going to need to estimate your configuration.
>>>
>>> Suppose I know that within 2 years, the amount of data that I want to retain is going to be 1PB, with a 3x replication factor, I'll need at least 3PB of disk. Assuming that I can fit 12x2TB drives in a node, I'll need 125-150 machines. (There's some overhead for logging and OS)
>>>
>>> Now this doesn't mean that I'll need to buy all of the machines today and build out the cluster.
>>> It means that I will need to figure out my machine room, (rack space, power, etc...) and also hardware configuration.
>>>
>>> You'll also need to plan out your hardware choices too. An example.. you may want 10GBe on the switch but not at the data node. However you're going to want to be able to expand your data nodes to be able to add 10GBe cards.
>>>
>>> The idea is that as I build out my cluster, all of the machines have the same look and feel. So if you buy quad core CPUs and they are 2.2 GHz but 6 months from now, you buy 2.6 GHz cpus, as long as they are 4 core cpus, your cluster will look the same.
>>>
>>> The point is that when you lay out your cluster to start with, you'll need to plan ahead and keep things similar. Also you'll need to make sure your NameNode has enough memory...
>>>
>>> Having said that... Yahoo! has written a paper detailing MR2 (next generation of map/reduce).  As the M/R Job scheduler becomes more intelligent about the types of jobs and types of hardware, the consistency of hardware becomes less important.
>>>
>>> With respect to HBase, I suspect there to be a parallel evolution.
>>>
>>> As to building out and replacing your cluster... if this is a production environment, you'll have to think about DR and building out a second cluster. So the cost of replacing clusters should also be factored in when you budget for hardware.
>>>
>>> Like I said, its not a simple answer and you have to approach each instance separately and fine tune your cluster plans.
>>>
>>> HTH
>>>
>>> -Mike
>>>
>>>
>>> ----------------------------------------
>>>> Date: Mon, 2 May 2011 09:53:05 +0300
>>>> From: iulia.zidaru@1and1.ro
>>>> To: user@hbase.apache.org
>>>> CC: stack@duboce.net
>>>> Subject: Re: Hardware configuration
>>>>
>>>> Thank you both. How would you estimate really big clusters, with
>>>> hundreds of nodes? Requirements might change in time and replacing an
>>>> entire cluster seems not the best solution...
>>>>
>>>>
>>>>
>>>> On 04/29/2011 07:08 PM, Stack wrote:
>>>>> I agree with Michel Segel. Distributed computing is hard enough.
>>>>> There is no need to add extra complexity.
>>>>>
>>>>> St.Ack
>>>>>
>>>>> On Fri, Apr 29, 2011 at 4:05 AM, Iulia Zidaru wrote:
>>>>>> Hi,
>>>>>> I'm wondering if having a cluster with different machines in terms of CPU,
>>>>>> RAM and disk space would be a big issue for HBase. For example, machines
>>>>>> with 12GBs RAM and machines with 48GBs. We suppose that we use them at full
>>>>>> capacity. What problems we might encounter if having this kind of
>>>>>> configuration?
>>>>>> Thank you,
>>>>>> Iulia
>>>>>>
>>>>>>
>>>>
>>>>
>>>> --
>>>> Iulia Zidaru
>>>> Java Developer
>>>>
>>>> 1&1 Internet AG - Bucharest/Romania - Web Components Romania
>>>> 18 Mircea Eliade St
>>>> Sect 1, Bucharest
>>>> RO Bucharest, 012015
>>>> iulia.zidaru@1and1.ro
>>>> 0040 31 223 9153
>>>>
>>>>
>>>>
>>>
>>
>>


Re: Hardware configuration

Posted by Jean-Daniel Cryans <jd...@apache.org>.
Ian,

Regarding your first point, I understand where the concern is coming
from but I'd like to point out that with the new MemStore-Local
Allocation Buffers the full GCs taking minutes might not be as much as
an issue as it used to be. That said, I haven't tested that out yet
and I don't know of anyone that did it.

Your second point is dead-on. Also not only it takes time to
replicate, but it can also steal precious IO and in 0.20 it's pretty
much impossible to limit the rate of re-replication.

J-D

On Mon, May 2, 2011 at 7:30 AM, Ian Roughley <ro...@gmail.com> wrote:
> I think that there are two important considerations:
> 1. Can the JVM you're planning on using support a heap of > 10GB, if not, you're wasting money
> 2. Putting more disk on nodes, means that a failure will take longer to re-replicate back to it's
> balanced state.  i.e. Given you're network topology, how long will even a 50TB machine take, a day a
> week, longer?
>
> /Ian
> Architect / Mgr - Novell Vibe
>
> On 05/02/2011 09:57 AM, Michael Segel wrote:
>>
>> Hi,
>>
>> That's actually a really good question.
>> Unfortunately, the answer isn't really simple.
>>
>> You're going to need to estimate your growth and you're going to need to estimate your configuration.
>>
>> Suppose I know that within 2 years, the amount of data that I want to retain is going to be 1PB, with a 3x replication factor, I'll need at least 3PB of disk. Assuming that I can fit 12x2TB drives in a node, I'll need 125-150 machines. (There's some overhead for logging and OS)
>>
>> Now this doesn't mean that I'll need to buy all of the machines today and build out the cluster.
>> It means that I will need to figure out my machine room, (rack space, power, etc...) and also hardware configuration.
>>
>> You'll also need to plan out your hardware choices too. An example.. you may want 10GBe on the switch but not at the data node. However you're going to want to be able to expand your data nodes to be able to add 10GBe cards.
>>
>> The idea is that as I build out my cluster, all of the machines have the same look and feel. So if you buy quad core CPUs and they are 2.2 GHz but 6 months from now, you buy 2.6 GHz cpus, as long as they are 4 core cpus, your cluster will look the same.
>>
>> The point is that when you lay out your cluster to start with, you'll need to plan ahead and keep things similar. Also you'll need to make sure your NameNode has enough memory...
>>
>> Having said that... Yahoo! has written a paper detailing MR2 (next generation of map/reduce).  As the M/R Job scheduler becomes more intelligent about the types of jobs and types of hardware, the consistency of hardware becomes less important.
>>
>> With respect to HBase, I suspect there to be a parallel evolution.
>>
>> As to building out and replacing your cluster... if this is a production environment, you'll have to think about DR and building out a second cluster. So the cost of replacing clusters should also be factored in when you budget for hardware.
>>
>> Like I said, its not a simple answer and you have to approach each instance separately and fine tune your cluster plans.
>>
>> HTH
>>
>> -Mike
>>
>>
>> ----------------------------------------
>>> Date: Mon, 2 May 2011 09:53:05 +0300
>>> From: iulia.zidaru@1and1.ro
>>> To: user@hbase.apache.org
>>> CC: stack@duboce.net
>>> Subject: Re: Hardware configuration
>>>
>>> Thank you both. How would you estimate really big clusters, with
>>> hundreds of nodes? Requirements might change in time and replacing an
>>> entire cluster seems not the best solution...
>>>
>>>
>>>
>>> On 04/29/2011 07:08 PM, Stack wrote:
>>>> I agree with Michel Segel. Distributed computing is hard enough.
>>>> There is no need to add extra complexity.
>>>>
>>>> St.Ack
>>>>
>>>> On Fri, Apr 29, 2011 at 4:05 AM, Iulia Zidaru wrote:
>>>>> Hi,
>>>>> I'm wondering if having a cluster with different machines in terms of CPU,
>>>>> RAM and disk space would be a big issue for HBase. For example, machines
>>>>> with 12GBs RAM and machines with 48GBs. We suppose that we use them at full
>>>>> capacity. What problems we might encounter if having this kind of
>>>>> configuration?
>>>>> Thank you,
>>>>> Iulia
>>>>>
>>>>>
>>>
>>>
>>> --
>>> Iulia Zidaru
>>> Java Developer
>>>
>>> 1&1 Internet AG - Bucharest/Romania - Web Components Romania
>>> 18 Mircea Eliade St
>>> Sect 1, Bucharest
>>> RO Bucharest, 012015
>>> iulia.zidaru@1and1.ro
>>> 0040 31 223 9153
>>>
>>>
>>>
>>
>
>

Re: A question for release 0.90.3

Posted by Jean-Daniel Cryans <jd...@apache.org>.
We are planning on releasing a release candidate (RC) possibly this
week, meaning that in the best case it would be released next week
since the voting period is 7 days. If the RC is voted down, then the
issues need to be fixed so that another RC can be built and voted for.
And so on.

J-D

On Wed, May 4, 2011 at 6:38 PM, Gaojinchao <ga...@huawei.com> wrote:
> Our release is meant for production deploy next week.
> I have merged some issue to 0.90.2 and verified it.
>
> Can the version 0.90.3 release this week?
> if it can, I will use 0.90.3 and verify it next week.
>

A question for release 0.90.3

Posted by Gaojinchao <ga...@huawei.com>.
Our release is meant for production deploy next week.
I have merged some issue to 0.90.2 and verified it.

Can the version 0.90.3 release this week? 
if it can, I will use 0.90.3 and verify it next week.

RE: Hardware configuration

Posted by Michael Segel <mi...@hotmail.com>.
Ian,

You're not running a single JVM per node.

You have your datanode, task tracker and then the number of m/r tasks that you run on the node.

With Xeon chips, depending on your configuration, you can run 8 mappers and 8 reducers.
Add in HBase which you'll want to increase the amount of memory on the Region Server up to 4-8GB range... you'll see your memory use going up, and that's with 8 cores. Add in the addtional 4 cores if you have 6 core cpus and you will end up with 48GB of memory.

And as to the number of disks per node...

With 4 disks per node, we end up seeing disk as our limiting factor. Cloudera and others recommend 2 disks per core, and that makes some sense so we're not blocked on disk i/o. W 8 core that's 3 disks per core. With 12 core that's only 2 disks per core.

And while its been pointed out that 24TB per node is a lot of disk... add 10GBe to the mix and you won't have as much of an issue with respect to balancing. 

So there's no money being wasted. 
Again... We're talking about 125-150 nodes in a cluster that has 1PB of HDFS...

If you limit yourself w 12TB of disk per node... that's 300 machines. You've essentially doubled your power consumption and footprint in your machine room. If you've got to expand past 1PB, ... you really need to plan for that density.

This is why I said that the answer isn't straight forward and that you had to plan out your cluster appropriately. 

It goes back to the OP's initial question about starting with a heterogeneous cluster where the nodes aren't roughly the same size and configuration.

HTH

-Mike
 


----------------------------------------
> Date: Mon, 2 May 2011 10:30:21 -0400
> From: roughley@gmail.com
> To: user@hbase.apache.org
> Subject: Re: Hardware configuration
>
> I think that there are two important considerations:
> 1. Can the JVM you're planning on using support a heap of > 10GB, if not, you're wasting money
> 2. Putting more disk on nodes, means that a failure will take longer to re-replicate back to it's
> balanced state. i.e. Given you're network topology, how long will even a 50TB machine take, a day a
> week, longer?
>
> /Ian
> Architect / Mgr - Novell Vibe
>
> On 05/02/2011 09:57 AM, Michael Segel wrote:
> >
> > Hi,
> >
> > That's actually a really good question.
> > Unfortunately, the answer isn't really simple.
> >
> > You're going to need to estimate your growth and you're going to need to estimate your configuration.
> >
> > Suppose I know that within 2 years, the amount of data that I want to retain is going to be 1PB, with a 3x replication factor, I'll need at least 3PB of disk. Assuming that I can fit 12x2TB drives in a node, I'll need 125-150 machines. (There's some overhead for logging and OS)
> >
> > Now this doesn't mean that I'll need to buy all of the machines today and build out the cluster.
> > It means that I will need to figure out my machine room, (rack space, power, etc...) and also hardware configuration.
> >
> > You'll also need to plan out your hardware choices too. An example.. you may want 10GBe on the switch but not at the data node. However you're going to want to be able to expand your data nodes to be able to add 10GBe cards.
> >
> > The idea is that as I build out my cluster, all of the machines have the same look and feel. So if you buy quad core CPUs and they are 2.2 GHz but 6 months from now, you buy 2.6 GHz cpus, as long as they are 4 core cpus, your cluster will look the same.
> >
> > The point is that when you lay out your cluster to start with, you'll need to plan ahead and keep things similar. Also you'll need to make sure your NameNode has enough memory...
> >
> > Having said that... Yahoo! has written a paper detailing MR2 (next generation of map/reduce). As the M/R Job scheduler becomes more intelligent about the types of jobs and types of hardware, the consistency of hardware becomes less important.
> >
> > With respect to HBase, I suspect there to be a parallel evolution.
> >
> > As to building out and replacing your cluster... if this is a production environment, you'll have to think about DR and building out a second cluster. So the cost of replacing clusters should also be factored in when you budget for hardware.
> >
> > Like I said, its not a simple answer and you have to approach each instance separately and fine tune your cluster plans.
> >
> > HTH
> >
> > -Mike
> >
> >
> > ----------------------------------------
> >> Date: Mon, 2 May 2011 09:53:05 +0300
> >> From: iulia.zidaru@1and1.ro
> >> To: user@hbase.apache.org
> >> CC: stack@duboce.net
> >> Subject: Re: Hardware configuration
> >>
> >> Thank you both. How would you estimate really big clusters, with
> >> hundreds of nodes? Requirements might change in time and replacing an
> >> entire cluster seems not the best solution...
> >>
> >>
> >>
> >> On 04/29/2011 07:08 PM, Stack wrote:
> >>> I agree with Michel Segel. Distributed computing is hard enough.
> >>> There is no need to add extra complexity.
> >>>
> >>> St.Ack
> >>>
> >>> On Fri, Apr 29, 2011 at 4:05 AM, Iulia Zidaru wrote:
> >>>> Hi,
> >>>> I'm wondering if having a cluster with different machines in terms of CPU,
> >>>> RAM and disk space would be a big issue for HBase. For example, machines
> >>>> with 12GBs RAM and machines with 48GBs. We suppose that we use them at full
> >>>> capacity. What problems we might encounter if having this kind of
> >>>> configuration?
> >>>> Thank you,
> >>>> Iulia
> >>>>
> >>>>
> >>
> >>
> >> --
> >> Iulia Zidaru
> >> Java Developer
> >>
> >> 1&1 Internet AG - Bucharest/Romania - Web Components Romania
> >> 18 Mircea Eliade St
> >> Sect 1, Bucharest
> >> RO Bucharest, 012015
> >> iulia.zidaru@1and1.ro
> >> 0040 31 223 9153
> >>
> >>
> >>
> >
>
 		 	   		  

Re: Hardware configuration

Posted by Iulia Zidaru <iu...@1and1.ro>.
  Thank you Ian. These are very important points to think about.

iulia

On 05/02/2011 05:30 PM, Ian Roughley wrote:
> I think that there are two important considerations:
> 1. Can the JVM you're planning on using support a heap of>  10GB, if not, you're wasting money
> 2. Putting more disk on nodes, means that a failure will take longer to re-replicate back to it's
> balanced state.  i.e. Given you're network topology, how long will even a 50TB machine take, a day a
> week, longer?
>
> /Ian
> Architect / Mgr - Novell Vibe
>
> On 05/02/2011 09:57 AM, Michael Segel wrote:
>> Hi,
>>
>> That's actually a really good question.
>> Unfortunately, the answer isn't really simple.
>>
>> You're going to need to estimate your growth and you're going to need to estimate your configuration.
>>
>> Suppose I know that within 2 years, the amount of data that I want to retain is going to be 1PB, with a 3x replication factor, I'll need at least 3PB of disk. Assuming that I can fit 12x2TB drives in a node, I'll need 125-150 machines. (There's some overhead for logging and OS)
>>
>> Now this doesn't mean that I'll need to buy all of the machines today and build out the cluster.
>> It means that I will need to figure out my machine room, (rack space, power, etc...) and also hardware configuration.
>>
>> You'll also need to plan out your hardware choices too. An example.. you may want 10GBe on the switch but not at the data node. However you're going to want to be able to expand your data nodes to be able to add 10GBe cards.
>>
>> The idea is that as I build out my cluster, all of the machines have the same look and feel. So if you buy quad core CPUs and they are 2.2 GHz but 6 months from now, you buy 2.6 GHz cpus, as long as they are 4 core cpus, your cluster will look the same.
>>
>> The point is that when you lay out your cluster to start with, you'll need to plan ahead and keep things similar. Also you'll need to make sure your NameNode has enough memory...
>>
>> Having said that... Yahoo! has written a paper detailing MR2 (next generation of map/reduce).  As the M/R Job scheduler becomes more intelligent about the types of jobs and types of hardware, the consistency of hardware becomes less important.
>>
>> With respect to HBase, I suspect there to be a parallel evolution.
>>
>> As to building out and replacing your cluster... if this is a production environment, you'll have to think about DR and building out a second cluster. So the cost of replacing clusters should also be factored in when you budget for hardware.
>>
>> Like I said, its not a simple answer and you have to approach each instance separately and fine tune your cluster plans.
>>
>> HTH
>>
>> -Mike
>>
>>
>> ----------------------------------------
>>> Date: Mon, 2 May 2011 09:53:05 +0300
>>> From: iulia.zidaru@1and1.ro
>>> To: user@hbase.apache.org
>>> CC: stack@duboce.net
>>> Subject: Re: Hardware configuration
>>>
>>> Thank you both. How would you estimate really big clusters, with
>>> hundreds of nodes? Requirements might change in time and replacing an
>>> entire cluster seems not the best solution...
>>>
>>>
>>>
>>> On 04/29/2011 07:08 PM, Stack wrote:
>>>> I agree with Michel Segel. Distributed computing is hard enough.
>>>> There is no need to add extra complexity.
>>>>
>>>> St.Ack
>>>>
>>>> On Fri, Apr 29, 2011 at 4:05 AM, Iulia Zidaru wrote:
>>>>> Hi,
>>>>> I'm wondering if having a cluster with different machines in terms of CPU,
>>>>> RAM and disk space would be a big issue for HBase. For example, machines
>>>>> with 12GBs RAM and machines with 48GBs. We suppose that we use them at full
>>>>> capacity. What problems we might encounter if having this kind of
>>>>> configuration?
>>>>> Thank you,
>>>>> Iulia
>>>>>
>>>>>
>>>
>>> --
>>> Iulia Zidaru
>>> Java Developer
>>>
>>> 1&1 Internet AG - Bucharest/Romania - Web Components Romania
>>> 18 Mircea Eliade St
>>> Sect 1, Bucharest
>>> RO Bucharest, 012015
>>> iulia.zidaru@1and1.ro
>>> 0040 31 223 9153
>>>
>>>
>>>
>>   		 	   		



Re: Hardware configuration

Posted by Ian Roughley <ro...@gmail.com>.
I think that there are two important considerations:
1. Can the JVM you're planning on using support a heap of > 10GB, if not, you're wasting money
2. Putting more disk on nodes, means that a failure will take longer to re-replicate back to it's
balanced state.  i.e. Given you're network topology, how long will even a 50TB machine take, a day a
week, longer?

/Ian
Architect / Mgr - Novell Vibe

On 05/02/2011 09:57 AM, Michael Segel wrote:
> 
> Hi,
> 
> That's actually a really good question.
> Unfortunately, the answer isn't really simple.
> 
> You're going to need to estimate your growth and you're going to need to estimate your configuration.
> 
> Suppose I know that within 2 years, the amount of data that I want to retain is going to be 1PB, with a 3x replication factor, I'll need at least 3PB of disk. Assuming that I can fit 12x2TB drives in a node, I'll need 125-150 machines. (There's some overhead for logging and OS)
> 
> Now this doesn't mean that I'll need to buy all of the machines today and build out the cluster.
> It means that I will need to figure out my machine room, (rack space, power, etc...) and also hardware configuration.
> 
> You'll also need to plan out your hardware choices too. An example.. you may want 10GBe on the switch but not at the data node. However you're going to want to be able to expand your data nodes to be able to add 10GBe cards.
> 
> The idea is that as I build out my cluster, all of the machines have the same look and feel. So if you buy quad core CPUs and they are 2.2 GHz but 6 months from now, you buy 2.6 GHz cpus, as long as they are 4 core cpus, your cluster will look the same.
> 
> The point is that when you lay out your cluster to start with, you'll need to plan ahead and keep things similar. Also you'll need to make sure your NameNode has enough memory...
> 
> Having said that... Yahoo! has written a paper detailing MR2 (next generation of map/reduce).  As the M/R Job scheduler becomes more intelligent about the types of jobs and types of hardware, the consistency of hardware becomes less important. 
> 
> With respect to HBase, I suspect there to be a parallel evolution.
> 
> As to building out and replacing your cluster... if this is a production environment, you'll have to think about DR and building out a second cluster. So the cost of replacing clusters should also be factored in when you budget for hardware.
> 
> Like I said, its not a simple answer and you have to approach each instance separately and fine tune your cluster plans.
> 
> HTH
> 
> -Mike
> 
> 
> ----------------------------------------
>> Date: Mon, 2 May 2011 09:53:05 +0300
>> From: iulia.zidaru@1and1.ro
>> To: user@hbase.apache.org
>> CC: stack@duboce.net
>> Subject: Re: Hardware configuration
>>
>> Thank you both. How would you estimate really big clusters, with
>> hundreds of nodes? Requirements might change in time and replacing an
>> entire cluster seems not the best solution...
>>
>>
>>
>> On 04/29/2011 07:08 PM, Stack wrote:
>>> I agree with Michel Segel. Distributed computing is hard enough.
>>> There is no need to add extra complexity.
>>>
>>> St.Ack
>>>
>>> On Fri, Apr 29, 2011 at 4:05 AM, Iulia Zidaru wrote:
>>>> Hi,
>>>> I'm wondering if having a cluster with different machines in terms of CPU,
>>>> RAM and disk space would be a big issue for HBase. For example, machines
>>>> with 12GBs RAM and machines with 48GBs. We suppose that we use them at full
>>>> capacity. What problems we might encounter if having this kind of
>>>> configuration?
>>>> Thank you,
>>>> Iulia
>>>>
>>>>
>>
>>
>> --
>> Iulia Zidaru
>> Java Developer
>>
>> 1&1 Internet AG - Bucharest/Romania - Web Components Romania
>> 18 Mircea Eliade St
>> Sect 1, Bucharest
>> RO Bucharest, 012015
>> iulia.zidaru@1and1.ro
>> 0040 31 223 9153
>>
>>
>>
>  		 	   		  


RE: Hardware configuration

Posted by Michael Segel <mi...@hotmail.com>.
Hi,

That's actually a really good question.
Unfortunately, the answer isn't really simple.

You're going to need to estimate your growth and you're going to need to estimate your configuration.

Suppose I know that within 2 years, the amount of data that I want to retain is going to be 1PB, with a 3x replication factor, I'll need at least 3PB of disk. Assuming that I can fit 12x2TB drives in a node, I'll need 125-150 machines. (There's some overhead for logging and OS)

Now this doesn't mean that I'll need to buy all of the machines today and build out the cluster.
It means that I will need to figure out my machine room, (rack space, power, etc...) and also hardware configuration.

You'll also need to plan out your hardware choices too. An example.. you may want 10GBe on the switch but not at the data node. However you're going to want to be able to expand your data nodes to be able to add 10GBe cards.

The idea is that as I build out my cluster, all of the machines have the same look and feel. So if you buy quad core CPUs and they are 2.2 GHz but 6 months from now, you buy 2.6 GHz cpus, as long as they are 4 core cpus, your cluster will look the same.

The point is that when you lay out your cluster to start with, you'll need to plan ahead and keep things similar. Also you'll need to make sure your NameNode has enough memory...

Having said that... Yahoo! has written a paper detailing MR2 (next generation of map/reduce).  As the M/R Job scheduler becomes more intelligent about the types of jobs and types of hardware, the consistency of hardware becomes less important. 

With respect to HBase, I suspect there to be a parallel evolution.

As to building out and replacing your cluster... if this is a production environment, you'll have to think about DR and building out a second cluster. So the cost of replacing clusters should also be factored in when you budget for hardware.

Like I said, its not a simple answer and you have to approach each instance separately and fine tune your cluster plans.

HTH

-Mike


----------------------------------------
> Date: Mon, 2 May 2011 09:53:05 +0300
> From: iulia.zidaru@1and1.ro
> To: user@hbase.apache.org
> CC: stack@duboce.net
> Subject: Re: Hardware configuration
>
> Thank you both. How would you estimate really big clusters, with
> hundreds of nodes? Requirements might change in time and replacing an
> entire cluster seems not the best solution...
>
>
>
> On 04/29/2011 07:08 PM, Stack wrote:
> > I agree with Michel Segel. Distributed computing is hard enough.
> > There is no need to add extra complexity.
> >
> > St.Ack
> >
> > On Fri, Apr 29, 2011 at 4:05 AM, Iulia Zidaru wrote:
> >> Hi,
> >> I'm wondering if having a cluster with different machines in terms of CPU,
> >> RAM and disk space would be a big issue for HBase. For example, machines
> >> with 12GBs RAM and machines with 48GBs. We suppose that we use them at full
> >> capacity. What problems we might encounter if having this kind of
> >> configuration?
> >> Thank you,
> >> Iulia
> >>
> >>
>
>
> --
> Iulia Zidaru
> Java Developer
>
> 1&1 Internet AG - Bucharest/Romania - Web Components Romania
> 18 Mircea Eliade St
> Sect 1, Bucharest
> RO Bucharest, 012015
> iulia.zidaru@1and1.ro
> 0040 31 223 9153
>
>
>