You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hbase.apache.org by David Charle <db...@gmail.com> on 2012/11/26 14:53:57 UTC

recommended nodes

hi

what's the recommended nodes for NN, hmaster and zk nodes for a larger cluster, lets say 50-100+

also, what would be the ideal replication factor for larger clusters when u have 3-4 racks ?

--
David

Re: recommended nodes

Posted by Mohammad Tariq <do...@gmail.com>.

Hello David,

     Do you mean the recommended specs?IMHO, it depends more on the data
and the kind of processing you are going to perform, rather than the size
of your cluster.

Regards,
    Mohammad Tariq



On Mon, Nov 26, 2012 at 7:23 PM, David Charle <db...@gmail.com> wrote:

> hi
>
> what's the recommended nodes for NN, hmaster and zk nodes for a larger
> cluster, lets say 50-100+
>
> also, what would be the ideal replication factor for larger clusters when
> u have 3-4 racks ?
>
> --
> David

Re: recommended nodes

Posted by Leonid Fedotov <lf...@hortonworks.com>.

Jean-Mark,
you are right, SATA may be too slow, especially when you only have 2 drives.
You may get reasonable performance off there SATA drives, but there should be more than 2, to spread all IO operation on many spindles.
Better to go to SAS.
In any case, try to have network IO and disk IO balanced.

Thank you!

Sincerely,
Leonid Fedotov


On Nov 27, 2012, at 11:52 AM, Jean-Marc Spaggiari wrote:

> Hi Michael,
> 
> so are you recommanding 32Gb per node?
> 
> What about the disks? SATA drives are to slow?
> 
> JM
> 
> 2012/11/26, Michael Segel <mi...@hotmail.com>:
>> Uhm, those specs are actually now out of date.
>> 
>> If you're running HBase, or want to also run R on top of Hadoop, you will
>> need to add more memory.
>> Also forget 1GBe got 10GBe,  and w 2 SATA drives, you will be disk i/o bound
>> way too quickly.
>> 
>> 
>> On Nov 26, 2012, at 8:05 AM, Marcos Ortiz <ml...@uci.cu> wrote:
>> 
>>> Are you asking about hardware recommendations?
>>> Eric Sammer on his "Hadoop Operations" book, did a great job about this:
>>> For middle size clusters (until 300 nodes):
>>> Processor: A dual quad-core 2.6 Ghz
>>> RAM: 24 GB DDR3
>>> Dual 1 Gb Ethernet NICs
>>> a SAS drive controller
>>> at least two SATA II drives in a JBOD configuration
>>> 
>>> The replication factor depends heavily of the primary use of your
>>> cluster.
>>> 
>>> On 11/26/2012 08:53 AM, David Charle wrote:
>>>> hi
>>>> 
>>>> what's the recommended nodes for NN, hmaster and zk nodes for a larger
>>>> cluster, lets say 50-100+
>>>> 
>>>> also, what would be the ideal replication factor for larger clusters when
>>>> u have 3-4 racks ?
>>>> 
>>>> --
>>>> David
>>>> 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS
>>>> INFORMATICAS...
>>>> CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION
>>>> 
>>>> http://www.uci.cu
>>>> http://www.facebook.com/universidad.uci
>>>> http://www.flickr.com/photos/universidad_uci
>>> 
>>> --
>>> 
>>> Marcos Luis Ortíz Valmaseda
>>> about.me/marcosortiz <http://about.me/marcosortiz>
>>> @marcosluis2186 <http://twitter.com/marcosluis2186>
>>> 
>>> 
>>> 
>>> 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS
>>> INFORMATICAS...
>>> CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION
>>> 
>>> http://www.uci.cu
>>> http://www.facebook.com/universidad.uci
>>> http://www.flickr.com/photos/universidad_uci
>> 
>>

Re: recommended nodes

Posted by Mohammad Tariq <do...@gmail.com>.

I depends on the data size and the processing your are going to do. A rough
idea would be 1G/1 million blocks. SSDs could be a better option if SATA is
found to be slow. But keep the cost in mind.

Regards,
    Mohammad Tariq



On Wed, Nov 28, 2012 at 1:22 AM, Jean-Marc Spaggiari <
jean-marc@spaggiari.org> wrote:

> Hi Michael,
>
> so are you recommanding 32Gb per node?
>
> What about the disks? SATA drives are to slow?
>
> JM
>
> 2012/11/26, Michael Segel <mi...@hotmail.com>:
> > Uhm, those specs are actually now out of date.
> >
> > If you're running HBase, or want to also run R on top of Hadoop, you will
> > need to add more memory.
> > Also forget 1GBe got 10GBe,  and w 2 SATA drives, you will be disk i/o
> bound
> > way too quickly.
> >
> >
> > On Nov 26, 2012, at 8:05 AM, Marcos Ortiz <ml...@uci.cu> wrote:
> >
> >> Are you asking about hardware recommendations?
> >> Eric Sammer on his "Hadoop Operations" book, did a great job about this:
> >> For middle size clusters (until 300 nodes):
> >> Processor: A dual quad-core 2.6 Ghz
> >> RAM: 24 GB DDR3
> >> Dual 1 Gb Ethernet NICs
> >> a SAS drive controller
> >> at least two SATA II drives in a JBOD configuration
> >>
> >> The replication factor depends heavily of the primary use of your
> >> cluster.
> >>
> >> On 11/26/2012 08:53 AM, David Charle wrote:
> >>> hi
> >>>
> >>> what's the recommended nodes for NN, hmaster and zk nodes for a larger
> >>> cluster, lets say 50-100+
> >>>
> >>> also, what would be the ideal replication factor for larger clusters
> when
> >>> u have 3-4 racks ?
> >>>
> >>> --
> >>> David
> >>> 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS
> >>> INFORMATICAS...
> >>> CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION
> >>>
> >>> http://www.uci.cu
> >>> http://www.facebook.com/universidad.uci
> >>> http://www.flickr.com/photos/universidad_uci
> >>
> >> --
> >>
> >> Marcos Luis Ortíz Valmaseda
> >> about.me/marcosortiz <http://about.me/marcosortiz>
> >> @marcosluis2186 <http://twitter.com/marcosluis2186>
> >>
> >>
> >>
> >> 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS
> >> INFORMATICAS...
> >> CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION
> >>
> >> http://www.uci.cu
> >> http://www.facebook.com/universidad.uci
> >> http://www.flickr.com/photos/universidad_uci
> >
> >
>

Re: recommended nodes

Posted by Gregory Alan Bolcer <gr...@bolcer.org>.

You take a raw disk performance hit with LVM in exchange for 
resizability and partionability.

My biggest issue was that the internal SAS/SATA raid controller in my 
Dell T7400 wasn't update-able to properly use the 3T disks.   Updating 
to the LSI card turned out to be a great hidden surprise for performance 
and simplicity.

Greg

LSI MegaRAID 9260-4 6Gb/sw/ 512MB cache
Dual 128GB OCZ4 SSD RAID-0 boot
4 x 3Terabyte RAID-0 data, xfs
Cloudera CDH 4.1.1

On 11/28/2012 9:26 AM, Jean-Marc Spaggiari wrote:
> Hi Mike,
>
> Helped a lot. Just pointed me that not any of my nodes is correct ;)
> But now I know which way to go.
>
> Regarding SATA II vs SATA III is there a big difference? I found many
> JBOD cards working with SATAII but I did not found any (at good price)
> which is managing SATA III..
>
> Or will LVM be able to replace a JBOD card? In the documentation it's
> saying that LVM is suitable for "Creating single logical volumes of
> multiple physical volumes or entire hard disks (somewhat similar to
> RAID 0, but more similar to JBOD), allowing for dynamic volume
> resizing.". This is what we want to achieve here, right?
>
> JM
>
-- 
greg@bolcer.org, http://bolcer.org, c: +1.714.928.5476

Re: recommended nodes

Posted by Jean-Marc Spaggiari <je...@spaggiari.org>.

Hi Mike,

Helped a lot. Just pointed me that not any of my nodes is correct ;)
But now I know which way to go.

Regarding SATA II vs SATA III is there a big difference? I found many
JBOD cards working with SATAII but I did not found any (at good price)
which is managing SATA III..

Or will LVM be able to replace a JBOD card? In the documentation it's
saying that LVM is suitable for "Creating single logical volumes of
multiple physical volumes or entire hard disks (somewhat similar to
RAID 0, but more similar to JBOD), allowing for dynamic volume
resizing.". This is what we want to achieve here, right?

JM


2012/11/28, Adrien Mogenet <ad...@gmail.com>:
> Does HBase really benefit from 64 GB of RAM since allocating too large heap
> might increase GC time ?
>
> Another question : why not RAID 0, in order to aggregate disk bandwidth ?
> (and thus keep 3x replication factor)
>
>
> On Wed, Nov 28, 2012 at 5:58 PM, Michael Segel
> <mi...@hotmail.com>wrote:
>
>> Sorry,
>>
>> I need to clarify.
>>
>> 4GB per physical core is a good starting point.
>> So with 2 quad core chips, that is going to be 32GB.
>>
>> IMHO that's a minimum. If you go with HBase, you will want more.
>> (Actually
>> you will need more.) The next logical jump would be to 48 or 64GB.
>>
>> If we start to price out memory, depending on vendor, your company's
>> procurement,  there really isn't much of a price difference in terms of
>> 32,48, or 64 GB.
>> Note that it also depends on the chips themselves. Also you need to see
>> how many memory channels exist in the mother board. You may need to buy
>> in
>> pairs or triplets. Your hardware vendor can help you. (Also you need to
>> keep an eye on your hardware vendor. Sometimes they will give you higher
>> density chips that are going to be more expensive...) ;-)
>>
>> I tend to like having extra memory from the start.
>> It gives you a bit more freedom and also protects you from 'fat' code.
>>
>> Looking at YARN... you will need more memory too.
>>
>>
>> With respect to the hard drives...
>>
>> The best recommendation is to keep the drives as JBOD and then use 3x
>> replication.
>> In this case, make sure that the disk controller cards can handle JBOD.
>> (Some don't support JBOD out of the box)
>>
>> With respect to RAID...
>>
>> If you are running MapR, no need for RAID.
>> If you are running an Apache derivative, you could use RAID 1. Then cut
>> your replication to 2X. This makes it easier to manage drive failures.
>> (Its not the norm, but it works...) In some clusters, they are using
>> appliances like Net App's e series where the machines see the drives as
>> local attached storage and I think the appliances themselves are using
>> RAID.  I haven't played with this configuration, however it could make
>> sense and its a valid design.
>>
>> HTH
>>
>> -Mike
>>
>> On Nov 28, 2012, at 10:33 AM, Jean-Marc Spaggiari
>> <je...@spaggiari.org>
>> wrote:
>>
>> > Hi Mike,
>> >
>> > Thanks for all those details!
>> >
>> > So to simplify the equation, for 16 virtual cores we need 48 to 64GB.
>> > Which mean 3 to 4GB per core. So with quad cores, 12GB to 16GB are a
>> > good start? Or I simplified it to much?
>> >
>> > Regarding the hard drives. If you add more than one drive, do you need
>> > to build them on RAID or similar systems? Or can Hadoop/HBase be
>> > configured to use more than one drive?
>> >
>> > Thanks,
>> >
>> > JM
>> >
>> > 2012/11/27, Michael Segel <mi...@hotmail.com>:
>> >>
>> >> OK... I don't know why Cloudera is so hung up on 32GB. ;-) [Its an
>> inside
>> >> joke ...]
>> >>
>> >> So here's the problem...
>> >>
>> >> By default, your child processes in a map/reduce job get a default
>> 512MB.
>> >> The majority of the time, this gets raised to 1GB.
>> >>
>> >> 8 cores (dual quad cores) shows up at 16 virtual processors in Linux.
>> (Note:
>> >> This is why when people talk about the number of cores, you have to
>> specify
>> >> physical cores or logical cores....)
>> >>
>> >> So if you were to over subscribe and have lets say 12  mappers and 12
>> >> reducers, that's 24 slots. Which means that you would need 24GB of
>> memory
>> >> reserved just for the child processes. This would leave 8GB for DN, TT
>> and
>> >> the rest of the linux OS processes.
>> >>
>> >> Can you live with that? Sure.
>> >> Now add in R, HBase, Impala, or some other set of tools on top of the
>> >> cluster.
>> >>
>> >> Ooops! Now you are in trouble because you will swap.
>> >> Also adding in R, you may want to bump up those child procs from 1GB
>> >> to
>> 2
>> >> GB. That means the 24 slots would now require 48GB.  Now you have swap
>> and
>> >> if that happens you will see HBase in a cascading failure.
>> >>
>> >> So while you can do a rolling restart with the changed configuration
>> >> (reducing the number of mappers and reducers) you end up with less
>> >> slots
>> >> which will mean in longer run time for your jobs. (Less slots == less
>> >> parallelism )
>> >>
>> >> Looking at the price of memory... you can get 48GB or even 64GB  for
>> around
>> >> the same price point. (8GB chips)
>> >>
>> >> And I didn't even talk about adding SOLR either again a memory hog...
>> ;-)
>> >>
>> >> Note that I matched the number of mappers w reducers. You could go
>> >> with
>> >> fewer reducers if you want. I tend to recommend a ratio of 2:1 mappers
>> to
>> >> reducers, depending on the work flow....
>> >>
>> >> As to the disks... no 7200 SATA III drives are fine. SATA III
>> >> interface
>> is
>> >> pretty much available in the new kit being shipped.
>> >> Its just that you don't have enough drives. 8 cores should be 8
>> spindles if
>> >> available.
>> >> Otherwise you end up seeing your CPU load climb on wait states as the
>> >> processes wait for the disk i/o to catch up.
>> >>
>> >> I mean you could build out a cluster w 4 x 3 3.5" 2TB drives in a 1 U
>> >> chassis based on price. You're making a trade off and you should be
>> aware of
>> >> the performance hit you will take.
>> >>
>> >> HTH
>> >>
>> >> -Mike
>> >>
>> >> On Nov 27, 2012, at 1:52 PM, Jean-Marc Spaggiari <
>> jean-marc@spaggiari.org>
>> >> wrote:
>> >>
>> >>> Hi Michael,
>> >>>
>> >>> so are you recommanding 32Gb per node?
>> >>>
>> >>> What about the disks? SATA drives are to slow?
>> >>>
>> >>> JM
>> >>>
>> >>> 2012/11/26, Michael Segel <mi...@hotmail.com>:
>> >>>> Uhm, those specs are actually now out of date.
>> >>>>
>> >>>> If you're running HBase, or want to also run R on top of Hadoop, you
>> >>>> will
>> >>>> need to add more memory.
>> >>>> Also forget 1GBe got 10GBe,  and w 2 SATA drives, you will be disk
>> >>>> i/o
>> >>>> bound
>> >>>> way too quickly.
>> >>>>
>> >>>>
>> >>>> On Nov 26, 2012, at 8:05 AM, Marcos Ortiz <ml...@uci.cu> wrote:
>> >>>>
>> >>>>> Are you asking about hardware recommendations?
>> >>>>> Eric Sammer on his "Hadoop Operations" book, did a great job about
>> >>>>> this:
>> >>>>> For middle size clusters (until 300 nodes):
>> >>>>> Processor: A dual quad-core 2.6 Ghz
>> >>>>> RAM: 24 GB DDR3
>> >>>>> Dual 1 Gb Ethernet NICs
>> >>>>> a SAS drive controller
>> >>>>> at least two SATA II drives in a JBOD configuration
>> >>>>>
>> >>>>> The replication factor depends heavily of the primary use of your
>> >>>>> cluster.
>> >>>>>
>> >>>>> On 11/26/2012 08:53 AM, David Charle wrote:
>> >>>>>> hi
>> >>>>>>
>> >>>>>> what's the recommended nodes for NN, hmaster and zk nodes for a
>> larger
>> >>>>>> cluster, lets say 50-100+
>> >>>>>>
>> >>>>>> also, what would be the ideal replication factor for larger
>> >>>>>> clusters
>> >>>>>> when
>> >>>>>> u have 3-4 racks ?
>> >>>>>>
>> >>>>>> --
>> >>>>>> David
>> >>>>>> 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS
>> >>>>>> INFORMATICAS...
>> >>>>>> CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION
>> >>>>>>
>> >>>>>> http://www.uci.cu
>> >>>>>> http://www.facebook.com/universidad.uci
>> >>>>>> http://www.flickr.com/photos/universidad_uci
>> >>>>>
>> >>>>> --
>> >>>>>
>> >>>>> Marcos Luis Ortíz Valmaseda
>> >>>>> about.me/marcosortiz <http://about.me/marcosortiz>
>> >>>>> @marcosluis2186 <http://twitter.com/marcosluis2186>
>> >>>>>
>> >>>>>
>> >>>>>
>> >>>>> 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS
>> >>>>> INFORMATICAS...
>> >>>>> CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION
>> >>>>>
>> >>>>> http://www.uci.cu
>> >>>>> http://www.facebook.com/universidad.uci
>> >>>>> http://www.flickr.com/photos/universidad_uci
>> >>>>
>> >>>>
>> >>>
>> >>
>> >>
>> >
>>
>>
>
>
> --
> Adrien Mogenet
> 06.59.16.64.22
> http://www.mogenet.me
>

Re: recommended nodes

Posted by Gregory Alan Bolcer <gr...@bolcer.org>.

Ubuntu's Disk Manager has a nice benchmark feature.  I'm not sure if 
anyone else would, but I'd be interested if you posted them and if 
you're using that OS?

Greg

On 11/28/2012 10:05 AM, Jean-Marc Spaggiari wrote:
> Hi Gregory,
>
> I founs this about LVM:
> -> http://blog.andrew.net.au/2006/08/09
> -> http://www.phoronix.com/scan.php?page=article&item=fedora_15_lvm&num=2
>
> Seems that performances are still correct with it. I will most
> probably give it a try and bench that too... I have one new hard drive
> which should arrived tomorrow. Perfect timing ;)
>
>
>
> JM
>

-- 
greg@bolcer.org, http://bolcer.org, c: +1.714.928.5476

Re: recommended nodes

Posted by Adrien Mogenet <ad...@gmail.com>.

Maybe you should give a little more information about your RAID controller
(write back / write through ?) and the underlying filesystem (ext3 ?
blocksize ?).

Very interesting benchmark and discussion by the way :-)


On Thu, Dec 20, 2012 at 11:07 PM, Jean-Marc Spaggiari <
jean-marc@spaggiari.org> wrote:

> I did the test with a 2GB file... So read and write were spread over the 2
> drives for RAID0.
>
> Those test were to give an overall idea of the performances vs CPU usage
> etc. and you might need to adjust them based on the way it's configured on
> your system.
>
> I don't know how RAID0 is managing small files (<=64k) but maybe it's still
> spread on the 2 disks too?
>
> JM
>
> 2012/12/20 Varun Sharma <va...@pinterest.com>
>
> > Hmm, I thought that RAID0 simply stripes across all disks. So if you got
> 4
> > disks - an HFile block for example could get striped across 4 disks. So
> to
> > read that block, you would need all 4 of them to seek so that you could
> > read all 4 stripes for that HFile block. This could make things as slow
> as
> > the slowest seeking disk for that random read. However, certainly, data
> > xfer rate would be much faster with RAID0 but since this is merely 64K
> for
> > a HFile block, I would have expected the seek latency to play a major
> role
> > and not really the data xfer latency.
> >
> > However, your tests indeed show that RAID0 still outperforms JBOD on
> seeks.
> > Am I missing something ?
> >
> > On Thu, Dec 20, 2012 at 1:26 PM, Jean-Marc Spaggiari <
> > jean-marc@spaggiari.org> wrote:
> >
> > > Hi Varun,
> > >
> > > The hard drivers I used are now used on the hadoop/hbase cluster, but
> > they
> > > was clear and formated for the tests I did. The computer where I run
> > those
> > > tests was one of the region servers. It was re-installed to be very
> > clear,
> > > and it's now running a datanode and a RS.
> > >
> > > Regarding RAID, I think you are confusing RAID0 and RAID1. It's RAID1
> > which
> > > need to access the 2 files each time. RAID0 is more like JBOD, but
> > faster.
> > >
> > > JM
> > >
> > > 2012/12/20 Varun Sharma <va...@pinterest.com>
> > >
> > > > Hi Jean,
> > > >
> > > > Very interesting benchmark - how are these numbers arrived at ? Is
> this
> > > on
> > > > a real hbase cluster ? To me, it felt kind of counter intuitive that
> > > RAID0
> > > > beats JBOD on random seeks because with RAID0 all disks need to seek
> at
> > > the
> > > > same time and the performance should basically be as bad as the
> slowest
> > > > seeking disk.
> > > >
> > > > Varun
> > > >
> > > > On Wed, Dec 19, 2012 at 5:14 PM, Michael Segel <
> > > michael_segel@hotmail.com
> > > > >wrote:
> > > >
> > > > > Yeah,
> > > > > I couldn't argue against LVMs when talking with the system admins.
> > > > > In terms of speed its noise because the CPUs are pretty efficient
> and
> > > > > unless you have more than 1 drive per physical core, you will end
> up
> > > > > saturating your disk I/O.
> > > > >
> > > > > In terms of MapR, you want the raw disk. (But we're talking Apache)
> > > > >
> > > > >
> > > > > On Dec 19, 2012, at 4:59 PM, Jean-Marc Spaggiari <
> > > > jean-marc@spaggiari.org>
> > > > > wrote:
> > > > >
> > > > > > Finally, it took me a while to run those tests because it was way
> > > > > > longer than expected, but here are the results:
> > > > > >
> > > > > > http://www.spaggiari.org/bonnie.html
> > > > > >
> > > > > > LVM is not really slower than JBOD and not really taking more
> CPU.
> > So
> > > > > > I will say, if you have to choose between the 2, take the one you
> > > > > > prefer. Personally, I prefer LVM because it's easy to configure.
> > > > > >
> > > > > > The big winner here is RAID0. It's WAY faster than anything else.
> > But
> > > > > > it's using twice the space... Your choice.
> > > > > >
> > > > > > I did not get a chance to test with the Ubuntu tool because it's
> > not
> > > > > > working with LVM drives.
> > > > > >
> > > > > > JM
> > > > > >
> > > > > > 2012/11/28, Michael Segel <mi...@hotmail.com>:
> > > > > >> Ok, just a caveat.
> > > > > >>
> > > > > >> I am discussing MapR as part of a complete response. As Mohit
> > posted
> > > > > MapR
> > > > > >> takes the raw device for their MapR File System.
> > > > > >> They do stripe on their own within what they call a volume.
> > > > > >>
> > > > > >> But going back to Apache...
> > > > > >> You can stripe drives, however I wouldn't recommend it. I don't
> > > think
> > > > > the
> > > > > >> performance gains would really matter.
> > > > > >> You're going to end up getting blocked first by disk i/o, then
> > your
> > > > > >> controller card, then your network... assuming 10GBe.
> > > > > >>
> > > > > >> With only 2 disks on an 8 core system, you will hit disk i/o
> first
> > > and
> > > > > then
> > > > > >> you'll watch your CPU Wait I/O climb.
> > > > > >>
> > > > > >> HTH
> > > > > >>
> > > > > >> -Mike
> > > > > >>
> > > > > >> On Nov 28, 2012, at 7:28 PM, Jean-Marc Spaggiari <
> > > > > jean-marc@spaggiari.org>
> > > > > >> wrote:
> > > > > >>
> > > > > >>> Hi Mike,
> > > > > >>>
> > > > > >>> Why not using LVM with MapR? Since LVM is reading from 2 drives
> > > > almost
> > > > > >>> at the same time, it should be better than RAID0 or a single
> > drive,
> > > > > >>> no?
> > > > > >>>
> > > > > >>> 2012/11/28, Michael Segel <mi...@hotmail.com>:
> > > > > >>>> Just a couple of things.
> > > > > >>>>
> > > > > >>>> I'm neutral on the use of LVMs. Some would point out that
> > there's
> > > > some
> > > > > >>>> overhead, but on the flip side, it can make managing the
> > machines
> > > > > >>>> easier.
> > > > > >>>> If you're using MapR, you don't want to use LVMs but raw
> > devices.
> > > > > >>>>
> > > > > >>>> In terms of GC, its going to depend on the heap size and not
> the
> > > > total
> > > > > >>>> memory. With respect to HBase. ... MSLABS is the way to go.
> > > > > >>>>
> > > > > >>>>
> > > > > >>>> On Nov 28, 2012, at 12:05 PM, Jean-Marc Spaggiari
> > > > > >>>> <je...@spaggiari.org>
> > > > > >>>> wrote:
> > > > > >>>>
> > > > > >>>>> Hi Gregory,
> > > > > >>>>>
> > > > > >>>>> I founs this about LVM:
> > > > > >>>>> -> http://blog.andrew.net.au/2006/08/09
> > > > > >>>>> ->
> > > > > >>>>>
> > > > >
> > http://www.phoronix.com/scan.php?page=article&item=fedora_15_lvm&num=2
> > > > > >>>>>
> > > > > >>>>> Seems that performances are still correct with it. I will
> most
> > > > > >>>>> probably give it a try and bench that too... I have one new
> > hard
> > > > > drive
> > > > > >>>>> which should arrived tomorrow. Perfect timing ;)
> > > > > >>>>>
> > > > > >>>>>
> > > > > >>>>>
> > > > > >>>>> JM
> > > > > >>>>>
> > > > > >>>>> 2012/11/28, Mohit Anchlia <mo...@gmail.com>:
> > > > > >>>>>>
> > > > > >>>>>>
> > > > > >>>>>>
> > > > > >>>>>>
> > > > > >>>>>> On Nov 28, 2012, at 9:07 AM, Adrien Mogenet <
> > > > > adrien.mogenet@gmail.com>
> > > > > >>>>>> wrote:
> > > > > >>>>>>
> > > > > >>>>>>> Does HBase really benefit from 64 GB of RAM since
> allocating
> > > too
> > > > > >>>>>>> large
> > > > > >>>>>>> heap
> > > > > >>>>>>> might increase GC time ?
> > > > > >>>>>>>
> > > > > >>>>>> Benefit you get is from OS cache
> > > > > >>>>>>> Another question : why not RAID 0, in order to aggregate
> disk
> > > > > >>>>>>> bandwidth
> > > > > >>>>>>> ?
> > > > > >>>>>>> (and thus keep 3x replication factor)
> > > > > >>>>>>>
> > > > > >>>>>>>
> > > > > >>>>>>> On Wed, Nov 28, 2012 at 5:58 PM, Michael Segel
> > > > > >>>>>>> <mi...@hotmail.com>wrote:
> > > > > >>>>>>>
> > > > > >>>>>>>> Sorry,
> > > > > >>>>>>>>
> > > > > >>>>>>>> I need to clarify.
> > > > > >>>>>>>>
> > > > > >>>>>>>> 4GB per physical core is a good starting point.
> > > > > >>>>>>>> So with 2 quad core chips, that is going to be 32GB.
> > > > > >>>>>>>>
> > > > > >>>>>>>> IMHO that's a minimum. If you go with HBase, you will want
> > > more.
> > > > > >>>>>>>> (Actually
> > > > > >>>>>>>> you will need more.) The next logical jump would be to 48
> or
> > > > 64GB.
> > > > > >>>>>>>>
> > > > > >>>>>>>> If we start to price out memory, depending on vendor, your
> > > > > company's
> > > > > >>>>>>>> procurement,  there really isn't much of a price
> difference
> > in
> > > > > terms
> > > > > >>>>>>>> of
> > > > > >>>>>>>> 32,48, or 64 GB.
> > > > > >>>>>>>> Note that it also depends on the chips themselves. Also
> you
> > > need
> > > > > to
> > > > > >>>>>>>> see
> > > > > >>>>>>>> how many memory channels exist in the mother board. You
> may
> > > need
> > > > > to
> > > > > >>>>>>>> buy
> > > > > >>>>>>>> in
> > > > > >>>>>>>> pairs or triplets. Your hardware vendor can help you.
> (Also
> > > you
> > > > > need
> > > > > >>>>>>>> to
> > > > > >>>>>>>> keep an eye on your hardware vendor. Sometimes they will
> > give
> > > > you
> > > > > >>>>>>>> higher
> > > > > >>>>>>>> density chips that are going to be more expensive...) ;-)
> > > > > >>>>>>>>
> > > > > >>>>>>>> I tend to like having extra memory from the start.
> > > > > >>>>>>>> It gives you a bit more freedom and also protects you from
> > > 'fat'
> > > > > >>>>>>>> code.
> > > > > >>>>>>>>
> > > > > >>>>>>>> Looking at YARN... you will need more memory too.
> > > > > >>>>>>>>
> > > > > >>>>>>>>
> > > > > >>>>>>>> With respect to the hard drives...
> > > > > >>>>>>>>
> > > > > >>>>>>>> The best recommendation is to keep the drives as JBOD and
> > then
> > > > use
> > > > > >>>>>>>> 3x
> > > > > >>>>>>>> replication.
> > > > > >>>>>>>> In this case, make sure that the disk controller cards can
> > > > handle
> > > > > >>>>>>>> JBOD.
> > > > > >>>>>>>> (Some don't support JBOD out of the box)
> > > > > >>>>>>>>
> > > > > >>>>>>>> With respect to RAID...
> > > > > >>>>>>>>
> > > > > >>>>>>>> If you are running MapR, no need for RAID.
> > > > > >>>>>>>> If you are running an Apache derivative, you could use
> RAID
> > 1.
> > > > > Then
> > > > > >>>>>>>> cut
> > > > > >>>>>>>> your replication to 2X. This makes it easier to manage
> drive
> > > > > >>>>>>>> failures.
> > > > > >>>>>>>> (Its not the norm, but it works...) In some clusters, they
> > are
> > > > > using
> > > > > >>>>>>>> appliances like Net App's e series where the machines see
> > the
> > > > > drives
> > > > > >>>>>>>> as
> > > > > >>>>>>>> local attached storage and I think the appliances
> themselves
> > > are
> > > > > >>>>>>>> using
> > > > > >>>>>>>> RAID.  I haven't played with this configuration, however
> it
> > > > could
> > > > > >>>>>>>> make
> > > > > >>>>>>>> sense and its a valid design.
> > > > > >>>>>>>>
> > > > > >>>>>>>> HTH
> > > > > >>>>>>>>
> > > > > >>>>>>>> -Mike
> > > > > >>>>>>>>
> > > > > >>>>>>>> On Nov 28, 2012, at 10:33 AM, Jean-Marc Spaggiari
> > > > > >>>>>>>> <je...@spaggiari.org>
> > > > > >>>>>>>> wrote:
> > > > > >>>>>>>>
> > > > > >>>>>>>>> Hi Mike,
> > > > > >>>>>>>>>
> > > > > >>>>>>>>> Thanks for all those details!
> > > > > >>>>>>>>>
> > > > > >>>>>>>>> So to simplify the equation, for 16 virtual cores we need
> > 48
> > > to
> > > > > >>>>>>>>> 64GB.
> > > > > >>>>>>>>> Which mean 3 to 4GB per core. So with quad cores, 12GB to
> > > 16GB
> > > > > are
> > > > > >>>>>>>>> a
> > > > > >>>>>>>>> good start? Or I simplified it to much?
> > > > > >>>>>>>>>
> > > > > >>>>>>>>> Regarding the hard drives. If you add more than one
> drive,
> > do
> > > > you
> > > > > >>>>>>>>> need
> > > > > >>>>>>>>> to build them on RAID or similar systems? Or can
> > Hadoop/HBase
> > > > be
> > > > > >>>>>>>>> configured to use more than one drive?
> > > > > >>>>>>>>>
> > > > > >>>>>>>>> Thanks,
> > > > > >>>>>>>>>
> > > > > >>>>>>>>> JM
> > > > > >>>>>>>>>
> > > > > >>>>>>>>> 2012/11/27, Michael Segel <mi...@hotmail.com>:
> > > > > >>>>>>>>>>
> > > > > >>>>>>>>>> OK... I don't know why Cloudera is so hung up on 32GB.
> ;-)
> > > > [Its
> > > > > an
> > > > > >>>>>>>> inside
> > > > > >>>>>>>>>> joke ...]
> > > > > >>>>>>>>>>
> > > > > >>>>>>>>>> So here's the problem...
> > > > > >>>>>>>>>>
> > > > > >>>>>>>>>> By default, your child processes in a map/reduce job
> get a
> > > > > default
> > > > > >>>>>>>> 512MB.
> > > > > >>>>>>>>>> The majority of the time, this gets raised to 1GB.
> > > > > >>>>>>>>>>
> > > > > >>>>>>>>>> 8 cores (dual quad cores) shows up at 16 virtual
> > processors
> > > in
> > > > > >>>>>>>>>> Linux.
> > > > > >>>>>>>> (Note:
> > > > > >>>>>>>>>> This is why when people talk about the number of cores,
> > you
> > > > have
> > > > > >>>>>>>>>> to
> > > > > >>>>>>>> specify
> > > > > >>>>>>>>>> physical cores or logical cores....)
> > > > > >>>>>>>>>>
> > > > > >>>>>>>>>> So if you were to over subscribe and have lets say 12
> > >  mappers
> > > > > and
> > > > > >>>>>>>>>> 12
> > > > > >>>>>>>>>> reducers, that's 24 slots. Which means that you would
> need
> > > > 24GB
> > > > > of
> > > > > >>>>>>>> memory
> > > > > >>>>>>>>>> reserved just for the child processes. This would leave
> > 8GB
> > > > for
> > > > > >>>>>>>>>> DN,
> > > > > >>>>>>>>>> TT
> > > > > >>>>>>>> and
> > > > > >>>>>>>>>> the rest of the linux OS processes.
> > > > > >>>>>>>>>>
> > > > > >>>>>>>>>> Can you live with that? Sure.
> > > > > >>>>>>>>>> Now add in R, HBase, Impala, or some other set of tools
> on
> > > top
> > > > > of
> > > > > >>>>>>>>>> the
> > > > > >>>>>>>>>> cluster.
> > > > > >>>>>>>>>>
> > > > > >>>>>>>>>> Ooops! Now you are in trouble because you will swap.
> > > > > >>>>>>>>>> Also adding in R, you may want to bump up those child
> > procs
> > > > from
> > > > > >>>>>>>>>> 1GB
> > > > > >>>>>>>>>> to
> > > > > >>>>>>>> 2
> > > > > >>>>>>>>>> GB. That means the 24 slots would now require 48GB.  Now
> > you
> > > > > have
> > > > > >>>>>>>>>> swap
> > > > > >>>>>>>> and
> > > > > >>>>>>>>>> if that happens you will see HBase in a cascading
> failure.
> > > > > >>>>>>>>>>
> > > > > >>>>>>>>>> So while you can do a rolling restart with the changed
> > > > > >>>>>>>>>> configuration
> > > > > >>>>>>>>>> (reducing the number of mappers and reducers) you end up
> > > with
> > > > > less
> > > > > >>>>>>>>>> slots
> > > > > >>>>>>>>>> which will mean in longer run time for your jobs. (Less
> > > slots
> > > > ==
> > > > > >>>>>>>>>> less
> > > > > >>>>>>>>>> parallelism )
> > > > > >>>>>>>>>>
> > > > > >>>>>>>>>> Looking at the price of memory... you can get 48GB or
> even
> > > > 64GB
> > > > > >>>>>>>>>> for
> > > > > >>>>>>>> around
> > > > > >>>>>>>>>> the same price point. (8GB chips)
> > > > > >>>>>>>>>>
> > > > > >>>>>>>>>> And I didn't even talk about adding SOLR either again a
> > > memory
> > > > > >>>>>>>>>> hog...
> > > > > >>>>>>>> ;-)
> > > > > >>>>>>>>>>
> > > > > >>>>>>>>>> Note that I matched the number of mappers w reducers.
> You
> > > > could
> > > > > go
> > > > > >>>>>>>>>> with
> > > > > >>>>>>>>>> fewer reducers if you want. I tend to recommend a ratio
> of
> > > 2:1
> > > > > >>>>>>>>>> mappers
> > > > > >>>>>>>> to
> > > > > >>>>>>>>>> reducers, depending on the work flow....
> > > > > >>>>>>>>>>
> > > > > >>>>>>>>>> As to the disks... no 7200 SATA III drives are fine.
> SATA
> > > III
> > > > > >>>>>>>>>> interface
> > > > > >>>>>>>> is
> > > > > >>>>>>>>>> pretty much available in the new kit being shipped.
> > > > > >>>>>>>>>> Its just that you don't have enough drives. 8 cores
> should
> > > be
> > > > 8
> > > > > >>>>>>>> spindles if
> > > > > >>>>>>>>>> available.
> > > > > >>>>>>>>>> Otherwise you end up seeing your CPU load climb on wait
> > > states
> > > > > as
> > > > > >>>>>>>>>> the
> > > > > >>>>>>>>>> processes wait for the disk i/o to catch up.
> > > > > >>>>>>>>>>
> > > > > >>>>>>>>>> I mean you could build out a cluster w 4 x 3 3.5" 2TB
> > drives
> > > > in
> > > > > a
> > > > > >>>>>>>>>> 1
> > > > > >>>>>>>>>> U
> > > > > >>>>>>>>>> chassis based on price. You're making a trade off and
> you
> > > > should
> > > > > >>>>>>>>>> be
> > > > > >>>>>>>> aware of
> > > > > >>>>>>>>>> the performance hit you will take.
> > > > > >>>>>>>>>>
> > > > > >>>>>>>>>> HTH
> > > > > >>>>>>>>>>
> > > > > >>>>>>>>>> -Mike
> > > > > >>>>>>>>>>
> > > > > >>>>>>>>>> On Nov 27, 2012, at 1:52 PM, Jean-Marc Spaggiari <
> > > > > >>>>>>>> jean-marc@spaggiari.org>
> > > > > >>>>>>>>>> wrote:
> > > > > >>>>>>>>>>
> > > > > >>>>>>>>>>> Hi Michael,
> > > > > >>>>>>>>>>>
> > > > > >>>>>>>>>>> so are you recommanding 32Gb per node?
> > > > > >>>>>>>>>>>
> > > > > >>>>>>>>>>> What about the disks? SATA drives are to slow?
> > > > > >>>>>>>>>>>
> > > > > >>>>>>>>>>> JM
> > > > > >>>>>>>>>>>
> > > > > >>>>>>>>>>> 2012/11/26, Michael Segel <mi...@hotmail.com>:
> > > > > >>>>>>>>>>>> Uhm, those specs are actually now out of date.
> > > > > >>>>>>>>>>>>
> > > > > >>>>>>>>>>>> If you're running HBase, or want to also run R on top
> of
> > > > > Hadoop,
> > > > > >>>>>>>>>>>> you
> > > > > >>>>>>>>>>>> will
> > > > > >>>>>>>>>>>> need to add more memory.
> > > > > >>>>>>>>>>>> Also forget 1GBe got 10GBe,  and w 2 SATA drives, you
> > will
> > > > be
> > > > > >>>>>>>>>>>> disk
> > > > > >>>>>>>>>>>> i/o
> > > > > >>>>>>>>>>>> bound
> > > > > >>>>>>>>>>>> way too quickly.
> > > > > >>>>>>>>>>>>
> > > > > >>>>>>>>>>>>
> > > > > >>>>>>>>>>>> On Nov 26, 2012, at 8:05 AM, Marcos Ortiz <
> > mlortiz@uci.cu
> > > >
> > > > > >>>>>>>>>>>> wrote:
> > > > > >>>>>>>>>>>>
> > > > > >>>>>>>>>>>>> Are you asking about hardware recommendations?
> > > > > >>>>>>>>>>>>> Eric Sammer on his "Hadoop Operations" book, did a
> > great
> > > > job
> > > > > >>>>>>>>>>>>> about
> > > > > >>>>>>>>>>>>> this:
> > > > > >>>>>>>>>>>>> For middle size clusters (until 300 nodes):
> > > > > >>>>>>>>>>>>> Processor: A dual quad-core 2.6 Ghz
> > > > > >>>>>>>>>>>>> RAM: 24 GB DDR3
> > > > > >>>>>>>>>>>>> Dual 1 Gb Ethernet NICs
> > > > > >>>>>>>>>>>>> a SAS drive controller
> > > > > >>>>>>>>>>>>> at least two SATA II drives in a JBOD configuration
> > > > > >>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>> The replication factor depends heavily of the primary
> > use
> > > > of
> > > > > >>>>>>>>>>>>> your
> > > > > >>>>>>>>>>>>> cluster.
> > > > > >>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>> On 11/26/2012 08:53 AM, David Charle wrote:
> > > > > >>>>>>>>>>>>>> hi
> > > > > >>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>> what's the recommended nodes for NN, hmaster and zk
> > > nodes
> > > > > for
> > > > > >>>>>>>>>>>>>> a
> > > > > >>>>>>>> larger
> > > > > >>>>>>>>>>>>>> cluster, lets say 50-100+
> > > > > >>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>> also, what would be the ideal replication factor for
> > > > larger
> > > > > >>>>>>>>>>>>>> clusters
> > > > > >>>>>>>>>>>>>> when
> > > > > >>>>>>>>>>>>>> u have 3-4 racks ?
> > > > > >>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>> --
> > > > > >>>>>>>>>>>>>> David
> > > > > >>>>>>>>>>>>>> 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD
> DE
> > > LAS
> > > > > >>>>>>>>>>>>>> CIENCIAS
> > > > > >>>>>>>>>>>>>> INFORMATICAS...
> > > > > >>>>>>>>>>>>>> CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION
> > > > > >>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>> http://www.uci.cu
> > > > > >>>>>>>>>>>>>> http://www.facebook.com/universidad.uci
> > > > > >>>>>>>>>>>>>> http://www.flickr.com/photos/universidad_uci
> > > > > >>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>> --
> > > > > >>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>> Marcos Luis Ortíz Valmaseda
> > > > > >>>>>>>>>>>>> about.me/marcosortiz <http://about.me/marcosortiz>
> > > > > >>>>>>>>>>>>> @marcosluis2186 <http://twitter.com/marcosluis2186>
> > > > > >>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>> 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE
> > LAS
> > > > > >>>>>>>>>>>>> CIENCIAS
> > > > > >>>>>>>>>>>>> INFORMATICAS...
> > > > > >>>>>>>>>>>>> CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION
> > > > > >>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>> http://www.uci.cu
> > > > > >>>>>>>>>>>>> http://www.facebook.com/universidad.uci
> > > > > >>>>>>>>>>>>> http://www.flickr.com/photos/universidad_uci
> > > > > >>>>>>>
> > > > > >>>>>>>
> > > > > >>>>>>> --
> > > > > >>>>>>> Adrien Mogenet
> > > > > >>>>>>> 06.59.16.64.22
> > > > > >>>>>>> http://www.mogenet.me
> > > > > >>>>>>
> > > > > >>>>>
> > > > > >>>>
> > > > > >>>>
> > > > > >>>
> > > > > >>
> > > > > >>
> > > > > >
> > > > >
> > > > >
> > > >
> > >
> >
>



-- 
Adrien Mogenet
06.59.16.64.22
http://www.mogenet.me

Re: recommended nodes

Posted by Jean-Marc Spaggiari <je...@spaggiari.org>.

I did the test with a 2GB file... So read and write were spread over the 2
drives for RAID0.

Those test were to give an overall idea of the performances vs CPU usage
etc. and you might need to adjust them based on the way it's configured on
your system.

I don't know how RAID0 is managing small files (<=64k) but maybe it's still
spread on the 2 disks too?

JM

2012/12/20 Varun Sharma <va...@pinterest.com>

> Hmm, I thought that RAID0 simply stripes across all disks. So if you got 4
> disks - an HFile block for example could get striped across 4 disks. So to
> read that block, you would need all 4 of them to seek so that you could
> read all 4 stripes for that HFile block. This could make things as slow as
> the slowest seeking disk for that random read. However, certainly, data
> xfer rate would be much faster with RAID0 but since this is merely 64K for
> a HFile block, I would have expected the seek latency to play a major role
> and not really the data xfer latency.
>
> However, your tests indeed show that RAID0 still outperforms JBOD on seeks.
> Am I missing something ?
>
> On Thu, Dec 20, 2012 at 1:26 PM, Jean-Marc Spaggiari <
> jean-marc@spaggiari.org> wrote:
>
> > Hi Varun,
> >
> > The hard drivers I used are now used on the hadoop/hbase cluster, but
> they
> > was clear and formated for the tests I did. The computer where I run
> those
> > tests was one of the region servers. It was re-installed to be very
> clear,
> > and it's now running a datanode and a RS.
> >
> > Regarding RAID, I think you are confusing RAID0 and RAID1. It's RAID1
> which
> > need to access the 2 files each time. RAID0 is more like JBOD, but
> faster.
> >
> > JM
> >
> > 2012/12/20 Varun Sharma <va...@pinterest.com>
> >
> > > Hi Jean,
> > >
> > > Very interesting benchmark - how are these numbers arrived at ? Is this
> > on
> > > a real hbase cluster ? To me, it felt kind of counter intuitive that
> > RAID0
> > > beats JBOD on random seeks because with RAID0 all disks need to seek at
> > the
> > > same time and the performance should basically be as bad as the slowest
> > > seeking disk.
> > >
> > > Varun
> > >
> > > On Wed, Dec 19, 2012 at 5:14 PM, Michael Segel <
> > michael_segel@hotmail.com
> > > >wrote:
> > >
> > > > Yeah,
> > > > I couldn't argue against LVMs when talking with the system admins.
> > > > In terms of speed its noise because the CPUs are pretty efficient and
> > > > unless you have more than 1 drive per physical core, you will end up
> > > > saturating your disk I/O.
> > > >
> > > > In terms of MapR, you want the raw disk. (But we're talking Apache)
> > > >
> > > >
> > > > On Dec 19, 2012, at 4:59 PM, Jean-Marc Spaggiari <
> > > jean-marc@spaggiari.org>
> > > > wrote:
> > > >
> > > > > Finally, it took me a while to run those tests because it was way
> > > > > longer than expected, but here are the results:
> > > > >
> > > > > http://www.spaggiari.org/bonnie.html
> > > > >
> > > > > LVM is not really slower than JBOD and not really taking more CPU.
> So
> > > > > I will say, if you have to choose between the 2, take the one you
> > > > > prefer. Personally, I prefer LVM because it's easy to configure.
> > > > >
> > > > > The big winner here is RAID0. It's WAY faster than anything else.
> But
> > > > > it's using twice the space... Your choice.
> > > > >
> > > > > I did not get a chance to test with the Ubuntu tool because it's
> not
> > > > > working with LVM drives.
> > > > >
> > > > > JM
> > > > >
> > > > > 2012/11/28, Michael Segel <mi...@hotmail.com>:
> > > > >> Ok, just a caveat.
> > > > >>
> > > > >> I am discussing MapR as part of a complete response. As Mohit
> posted
> > > > MapR
> > > > >> takes the raw device for their MapR File System.
> > > > >> They do stripe on their own within what they call a volume.
> > > > >>
> > > > >> But going back to Apache...
> > > > >> You can stripe drives, however I wouldn't recommend it. I don't
> > think
> > > > the
> > > > >> performance gains would really matter.
> > > > >> You're going to end up getting blocked first by disk i/o, then
> your
> > > > >> controller card, then your network... assuming 10GBe.
> > > > >>
> > > > >> With only 2 disks on an 8 core system, you will hit disk i/o first
> > and
> > > > then
> > > > >> you'll watch your CPU Wait I/O climb.
> > > > >>
> > > > >> HTH
> > > > >>
> > > > >> -Mike
> > > > >>
> > > > >> On Nov 28, 2012, at 7:28 PM, Jean-Marc Spaggiari <
> > > > jean-marc@spaggiari.org>
> > > > >> wrote:
> > > > >>
> > > > >>> Hi Mike,
> > > > >>>
> > > > >>> Why not using LVM with MapR? Since LVM is reading from 2 drives
> > > almost
> > > > >>> at the same time, it should be better than RAID0 or a single
> drive,
> > > > >>> no?
> > > > >>>
> > > > >>> 2012/11/28, Michael Segel <mi...@hotmail.com>:
> > > > >>>> Just a couple of things.
> > > > >>>>
> > > > >>>> I'm neutral on the use of LVMs. Some would point out that
> there's
> > > some
> > > > >>>> overhead, but on the flip side, it can make managing the
> machines
> > > > >>>> easier.
> > > > >>>> If you're using MapR, you don't want to use LVMs but raw
> devices.
> > > > >>>>
> > > > >>>> In terms of GC, its going to depend on the heap size and not the
> > > total
> > > > >>>> memory. With respect to HBase. ... MSLABS is the way to go.
> > > > >>>>
> > > > >>>>
> > > > >>>> On Nov 28, 2012, at 12:05 PM, Jean-Marc Spaggiari
> > > > >>>> <je...@spaggiari.org>
> > > > >>>> wrote:
> > > > >>>>
> > > > >>>>> Hi Gregory,
> > > > >>>>>
> > > > >>>>> I founs this about LVM:
> > > > >>>>> -> http://blog.andrew.net.au/2006/08/09
> > > > >>>>> ->
> > > > >>>>>
> > > >
> http://www.phoronix.com/scan.php?page=article&item=fedora_15_lvm&num=2
> > > > >>>>>
> > > > >>>>> Seems that performances are still correct with it. I will most
> > > > >>>>> probably give it a try and bench that too... I have one new
> hard
> > > > drive
> > > > >>>>> which should arrived tomorrow. Perfect timing ;)
> > > > >>>>>
> > > > >>>>>
> > > > >>>>>
> > > > >>>>> JM
> > > > >>>>>
> > > > >>>>> 2012/11/28, Mohit Anchlia <mo...@gmail.com>:
> > > > >>>>>>
> > > > >>>>>>
> > > > >>>>>>
> > > > >>>>>>
> > > > >>>>>> On Nov 28, 2012, at 9:07 AM, Adrien Mogenet <
> > > > adrien.mogenet@gmail.com>
> > > > >>>>>> wrote:
> > > > >>>>>>
> > > > >>>>>>> Does HBase really benefit from 64 GB of RAM since allocating
> > too
> > > > >>>>>>> large
> > > > >>>>>>> heap
> > > > >>>>>>> might increase GC time ?
> > > > >>>>>>>
> > > > >>>>>> Benefit you get is from OS cache
> > > > >>>>>>> Another question : why not RAID 0, in order to aggregate disk
> > > > >>>>>>> bandwidth
> > > > >>>>>>> ?
> > > > >>>>>>> (and thus keep 3x replication factor)
> > > > >>>>>>>
> > > > >>>>>>>
> > > > >>>>>>> On Wed, Nov 28, 2012 at 5:58 PM, Michael Segel
> > > > >>>>>>> <mi...@hotmail.com>wrote:
> > > > >>>>>>>
> > > > >>>>>>>> Sorry,
> > > > >>>>>>>>
> > > > >>>>>>>> I need to clarify.
> > > > >>>>>>>>
> > > > >>>>>>>> 4GB per physical core is a good starting point.
> > > > >>>>>>>> So with 2 quad core chips, that is going to be 32GB.
> > > > >>>>>>>>
> > > > >>>>>>>> IMHO that's a minimum. If you go with HBase, you will want
> > more.
> > > > >>>>>>>> (Actually
> > > > >>>>>>>> you will need more.) The next logical jump would be to 48 or
> > > 64GB.
> > > > >>>>>>>>
> > > > >>>>>>>> If we start to price out memory, depending on vendor, your
> > > > company's
> > > > >>>>>>>> procurement,  there really isn't much of a price difference
> in
> > > > terms
> > > > >>>>>>>> of
> > > > >>>>>>>> 32,48, or 64 GB.
> > > > >>>>>>>> Note that it also depends on the chips themselves. Also you
> > need
> > > > to
> > > > >>>>>>>> see
> > > > >>>>>>>> how many memory channels exist in the mother board. You may
> > need
> > > > to
> > > > >>>>>>>> buy
> > > > >>>>>>>> in
> > > > >>>>>>>> pairs or triplets. Your hardware vendor can help you. (Also
> > you
> > > > need
> > > > >>>>>>>> to
> > > > >>>>>>>> keep an eye on your hardware vendor. Sometimes they will
> give
> > > you
> > > > >>>>>>>> higher
> > > > >>>>>>>> density chips that are going to be more expensive...) ;-)
> > > > >>>>>>>>
> > > > >>>>>>>> I tend to like having extra memory from the start.
> > > > >>>>>>>> It gives you a bit more freedom and also protects you from
> > 'fat'
> > > > >>>>>>>> code.
> > > > >>>>>>>>
> > > > >>>>>>>> Looking at YARN... you will need more memory too.
> > > > >>>>>>>>
> > > > >>>>>>>>
> > > > >>>>>>>> With respect to the hard drives...
> > > > >>>>>>>>
> > > > >>>>>>>> The best recommendation is to keep the drives as JBOD and
> then
> > > use
> > > > >>>>>>>> 3x
> > > > >>>>>>>> replication.
> > > > >>>>>>>> In this case, make sure that the disk controller cards can
> > > handle
> > > > >>>>>>>> JBOD.
> > > > >>>>>>>> (Some don't support JBOD out of the box)
> > > > >>>>>>>>
> > > > >>>>>>>> With respect to RAID...
> > > > >>>>>>>>
> > > > >>>>>>>> If you are running MapR, no need for RAID.
> > > > >>>>>>>> If you are running an Apache derivative, you could use RAID
> 1.
> > > > Then
> > > > >>>>>>>> cut
> > > > >>>>>>>> your replication to 2X. This makes it easier to manage drive
> > > > >>>>>>>> failures.
> > > > >>>>>>>> (Its not the norm, but it works...) In some clusters, they
> are
> > > > using
> > > > >>>>>>>> appliances like Net App's e series where the machines see
> the
> > > > drives
> > > > >>>>>>>> as
> > > > >>>>>>>> local attached storage and I think the appliances themselves
> > are
> > > > >>>>>>>> using
> > > > >>>>>>>> RAID.  I haven't played with this configuration, however it
> > > could
> > > > >>>>>>>> make
> > > > >>>>>>>> sense and its a valid design.
> > > > >>>>>>>>
> > > > >>>>>>>> HTH
> > > > >>>>>>>>
> > > > >>>>>>>> -Mike
> > > > >>>>>>>>
> > > > >>>>>>>> On Nov 28, 2012, at 10:33 AM, Jean-Marc Spaggiari
> > > > >>>>>>>> <je...@spaggiari.org>
> > > > >>>>>>>> wrote:
> > > > >>>>>>>>
> > > > >>>>>>>>> Hi Mike,
> > > > >>>>>>>>>
> > > > >>>>>>>>> Thanks for all those details!
> > > > >>>>>>>>>
> > > > >>>>>>>>> So to simplify the equation, for 16 virtual cores we need
> 48
> > to
> > > > >>>>>>>>> 64GB.
> > > > >>>>>>>>> Which mean 3 to 4GB per core. So with quad cores, 12GB to
> > 16GB
> > > > are
> > > > >>>>>>>>> a
> > > > >>>>>>>>> good start? Or I simplified it to much?
> > > > >>>>>>>>>
> > > > >>>>>>>>> Regarding the hard drives. If you add more than one drive,
> do
> > > you
> > > > >>>>>>>>> need
> > > > >>>>>>>>> to build them on RAID or similar systems? Or can
> Hadoop/HBase
> > > be
> > > > >>>>>>>>> configured to use more than one drive?
> > > > >>>>>>>>>
> > > > >>>>>>>>> Thanks,
> > > > >>>>>>>>>
> > > > >>>>>>>>> JM
> > > > >>>>>>>>>
> > > > >>>>>>>>> 2012/11/27, Michael Segel <mi...@hotmail.com>:
> > > > >>>>>>>>>>
> > > > >>>>>>>>>> OK... I don't know why Cloudera is so hung up on 32GB. ;-)
> > > [Its
> > > > an
> > > > >>>>>>>> inside
> > > > >>>>>>>>>> joke ...]
> > > > >>>>>>>>>>
> > > > >>>>>>>>>> So here's the problem...
> > > > >>>>>>>>>>
> > > > >>>>>>>>>> By default, your child processes in a map/reduce job get a
> > > > default
> > > > >>>>>>>> 512MB.
> > > > >>>>>>>>>> The majority of the time, this gets raised to 1GB.
> > > > >>>>>>>>>>
> > > > >>>>>>>>>> 8 cores (dual quad cores) shows up at 16 virtual
> processors
> > in
> > > > >>>>>>>>>> Linux.
> > > > >>>>>>>> (Note:
> > > > >>>>>>>>>> This is why when people talk about the number of cores,
> you
> > > have
> > > > >>>>>>>>>> to
> > > > >>>>>>>> specify
> > > > >>>>>>>>>> physical cores or logical cores....)
> > > > >>>>>>>>>>
> > > > >>>>>>>>>> So if you were to over subscribe and have lets say 12
> >  mappers
> > > > and
> > > > >>>>>>>>>> 12
> > > > >>>>>>>>>> reducers, that's 24 slots. Which means that you would need
> > > 24GB
> > > > of
> > > > >>>>>>>> memory
> > > > >>>>>>>>>> reserved just for the child processes. This would leave
> 8GB
> > > for
> > > > >>>>>>>>>> DN,
> > > > >>>>>>>>>> TT
> > > > >>>>>>>> and
> > > > >>>>>>>>>> the rest of the linux OS processes.
> > > > >>>>>>>>>>
> > > > >>>>>>>>>> Can you live with that? Sure.
> > > > >>>>>>>>>> Now add in R, HBase, Impala, or some other set of tools on
> > top
> > > > of
> > > > >>>>>>>>>> the
> > > > >>>>>>>>>> cluster.
> > > > >>>>>>>>>>
> > > > >>>>>>>>>> Ooops! Now you are in trouble because you will swap.
> > > > >>>>>>>>>> Also adding in R, you may want to bump up those child
> procs
> > > from
> > > > >>>>>>>>>> 1GB
> > > > >>>>>>>>>> to
> > > > >>>>>>>> 2
> > > > >>>>>>>>>> GB. That means the 24 slots would now require 48GB.  Now
> you
> > > > have
> > > > >>>>>>>>>> swap
> > > > >>>>>>>> and
> > > > >>>>>>>>>> if that happens you will see HBase in a cascading failure.
> > > > >>>>>>>>>>
> > > > >>>>>>>>>> So while you can do a rolling restart with the changed
> > > > >>>>>>>>>> configuration
> > > > >>>>>>>>>> (reducing the number of mappers and reducers) you end up
> > with
> > > > less
> > > > >>>>>>>>>> slots
> > > > >>>>>>>>>> which will mean in longer run time for your jobs. (Less
> > slots
> > > ==
> > > > >>>>>>>>>> less
> > > > >>>>>>>>>> parallelism )
> > > > >>>>>>>>>>
> > > > >>>>>>>>>> Looking at the price of memory... you can get 48GB or even
> > > 64GB
> > > > >>>>>>>>>> for
> > > > >>>>>>>> around
> > > > >>>>>>>>>> the same price point. (8GB chips)
> > > > >>>>>>>>>>
> > > > >>>>>>>>>> And I didn't even talk about adding SOLR either again a
> > memory
> > > > >>>>>>>>>> hog...
> > > > >>>>>>>> ;-)
> > > > >>>>>>>>>>
> > > > >>>>>>>>>> Note that I matched the number of mappers w reducers. You
> > > could
> > > > go
> > > > >>>>>>>>>> with
> > > > >>>>>>>>>> fewer reducers if you want. I tend to recommend a ratio of
> > 2:1
> > > > >>>>>>>>>> mappers
> > > > >>>>>>>> to
> > > > >>>>>>>>>> reducers, depending on the work flow....
> > > > >>>>>>>>>>
> > > > >>>>>>>>>> As to the disks... no 7200 SATA III drives are fine. SATA
> > III
> > > > >>>>>>>>>> interface
> > > > >>>>>>>> is
> > > > >>>>>>>>>> pretty much available in the new kit being shipped.
> > > > >>>>>>>>>> Its just that you don't have enough drives. 8 cores should
> > be
> > > 8
> > > > >>>>>>>> spindles if
> > > > >>>>>>>>>> available.
> > > > >>>>>>>>>> Otherwise you end up seeing your CPU load climb on wait
> > states
> > > > as
> > > > >>>>>>>>>> the
> > > > >>>>>>>>>> processes wait for the disk i/o to catch up.
> > > > >>>>>>>>>>
> > > > >>>>>>>>>> I mean you could build out a cluster w 4 x 3 3.5" 2TB
> drives
> > > in
> > > > a
> > > > >>>>>>>>>> 1
> > > > >>>>>>>>>> U
> > > > >>>>>>>>>> chassis based on price. You're making a trade off and you
> > > should
> > > > >>>>>>>>>> be
> > > > >>>>>>>> aware of
> > > > >>>>>>>>>> the performance hit you will take.
> > > > >>>>>>>>>>
> > > > >>>>>>>>>> HTH
> > > > >>>>>>>>>>
> > > > >>>>>>>>>> -Mike
> > > > >>>>>>>>>>
> > > > >>>>>>>>>> On Nov 27, 2012, at 1:52 PM, Jean-Marc Spaggiari <
> > > > >>>>>>>> jean-marc@spaggiari.org>
> > > > >>>>>>>>>> wrote:
> > > > >>>>>>>>>>
> > > > >>>>>>>>>>> Hi Michael,
> > > > >>>>>>>>>>>
> > > > >>>>>>>>>>> so are you recommanding 32Gb per node?
> > > > >>>>>>>>>>>
> > > > >>>>>>>>>>> What about the disks? SATA drives are to slow?
> > > > >>>>>>>>>>>
> > > > >>>>>>>>>>> JM
> > > > >>>>>>>>>>>
> > > > >>>>>>>>>>> 2012/11/26, Michael Segel <mi...@hotmail.com>:
> > > > >>>>>>>>>>>> Uhm, those specs are actually now out of date.
> > > > >>>>>>>>>>>>
> > > > >>>>>>>>>>>> If you're running HBase, or want to also run R on top of
> > > > Hadoop,
> > > > >>>>>>>>>>>> you
> > > > >>>>>>>>>>>> will
> > > > >>>>>>>>>>>> need to add more memory.
> > > > >>>>>>>>>>>> Also forget 1GBe got 10GBe,  and w 2 SATA drives, you
> will
> > > be
> > > > >>>>>>>>>>>> disk
> > > > >>>>>>>>>>>> i/o
> > > > >>>>>>>>>>>> bound
> > > > >>>>>>>>>>>> way too quickly.
> > > > >>>>>>>>>>>>
> > > > >>>>>>>>>>>>
> > > > >>>>>>>>>>>> On Nov 26, 2012, at 8:05 AM, Marcos Ortiz <
> mlortiz@uci.cu
> > >
> > > > >>>>>>>>>>>> wrote:
> > > > >>>>>>>>>>>>
> > > > >>>>>>>>>>>>> Are you asking about hardware recommendations?
> > > > >>>>>>>>>>>>> Eric Sammer on his "Hadoop Operations" book, did a
> great
> > > job
> > > > >>>>>>>>>>>>> about
> > > > >>>>>>>>>>>>> this:
> > > > >>>>>>>>>>>>> For middle size clusters (until 300 nodes):
> > > > >>>>>>>>>>>>> Processor: A dual quad-core 2.6 Ghz
> > > > >>>>>>>>>>>>> RAM: 24 GB DDR3
> > > > >>>>>>>>>>>>> Dual 1 Gb Ethernet NICs
> > > > >>>>>>>>>>>>> a SAS drive controller
> > > > >>>>>>>>>>>>> at least two SATA II drives in a JBOD configuration
> > > > >>>>>>>>>>>>>
> > > > >>>>>>>>>>>>> The replication factor depends heavily of the primary
> use
> > > of
> > > > >>>>>>>>>>>>> your
> > > > >>>>>>>>>>>>> cluster.
> > > > >>>>>>>>>>>>>
> > > > >>>>>>>>>>>>> On 11/26/2012 08:53 AM, David Charle wrote:
> > > > >>>>>>>>>>>>>> hi
> > > > >>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>> what's the recommended nodes for NN, hmaster and zk
> > nodes
> > > > for
> > > > >>>>>>>>>>>>>> a
> > > > >>>>>>>> larger
> > > > >>>>>>>>>>>>>> cluster, lets say 50-100+
> > > > >>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>> also, what would be the ideal replication factor for
> > > larger
> > > > >>>>>>>>>>>>>> clusters
> > > > >>>>>>>>>>>>>> when
> > > > >>>>>>>>>>>>>> u have 3-4 racks ?
> > > > >>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>> --
> > > > >>>>>>>>>>>>>> David
> > > > >>>>>>>>>>>>>> 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE
> > LAS
> > > > >>>>>>>>>>>>>> CIENCIAS
> > > > >>>>>>>>>>>>>> INFORMATICAS...
> > > > >>>>>>>>>>>>>> CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION
> > > > >>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>> http://www.uci.cu
> > > > >>>>>>>>>>>>>> http://www.facebook.com/universidad.uci
> > > > >>>>>>>>>>>>>> http://www.flickr.com/photos/universidad_uci
> > > > >>>>>>>>>>>>>
> > > > >>>>>>>>>>>>> --
> > > > >>>>>>>>>>>>>
> > > > >>>>>>>>>>>>> Marcos Luis Ortíz Valmaseda
> > > > >>>>>>>>>>>>> about.me/marcosortiz <http://about.me/marcosortiz>
> > > > >>>>>>>>>>>>> @marcosluis2186 <http://twitter.com/marcosluis2186>
> > > > >>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>
> > > > >>>>>>>>>>>>> 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE
> LAS
> > > > >>>>>>>>>>>>> CIENCIAS
> > > > >>>>>>>>>>>>> INFORMATICAS...
> > > > >>>>>>>>>>>>> CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION
> > > > >>>>>>>>>>>>>
> > > > >>>>>>>>>>>>> http://www.uci.cu
> > > > >>>>>>>>>>>>> http://www.facebook.com/universidad.uci
> > > > >>>>>>>>>>>>> http://www.flickr.com/photos/universidad_uci
> > > > >>>>>>>
> > > > >>>>>>>
> > > > >>>>>>> --
> > > > >>>>>>> Adrien Mogenet
> > > > >>>>>>> 06.59.16.64.22
> > > > >>>>>>> http://www.mogenet.me
> > > > >>>>>>
> > > > >>>>>
> > > > >>>>
> > > > >>>>
> > > > >>>
> > > > >>
> > > > >>
> > > > >
> > > >
> > > >
> > >
> >
>

Re: recommended nodes

Posted by Varun Sharma <va...@pinterest.com>.

Hmm, I thought that RAID0 simply stripes across all disks. So if you got 4
disks - an HFile block for example could get striped across 4 disks. So to
read that block, you would need all 4 of them to seek so that you could
read all 4 stripes for that HFile block. This could make things as slow as
the slowest seeking disk for that random read. However, certainly, data
xfer rate would be much faster with RAID0 but since this is merely 64K for
a HFile block, I would have expected the seek latency to play a major role
and not really the data xfer latency.

However, your tests indeed show that RAID0 still outperforms JBOD on seeks.
Am I missing something ?

On Thu, Dec 20, 2012 at 1:26 PM, Jean-Marc Spaggiari <
jean-marc@spaggiari.org> wrote:

> Hi Varun,
>
> The hard drivers I used are now used on the hadoop/hbase cluster, but they
> was clear and formated for the tests I did. The computer where I run those
> tests was one of the region servers. It was re-installed to be very clear,
> and it's now running a datanode and a RS.
>
> Regarding RAID, I think you are confusing RAID0 and RAID1. It's RAID1 which
> need to access the 2 files each time. RAID0 is more like JBOD, but faster.
>
> JM
>
> 2012/12/20 Varun Sharma <va...@pinterest.com>
>
> > Hi Jean,
> >
> > Very interesting benchmark - how are these numbers arrived at ? Is this
> on
> > a real hbase cluster ? To me, it felt kind of counter intuitive that
> RAID0
> > beats JBOD on random seeks because with RAID0 all disks need to seek at
> the
> > same time and the performance should basically be as bad as the slowest
> > seeking disk.
> >
> > Varun
> >
> > On Wed, Dec 19, 2012 at 5:14 PM, Michael Segel <
> michael_segel@hotmail.com
> > >wrote:
> >
> > > Yeah,
> > > I couldn't argue against LVMs when talking with the system admins.
> > > In terms of speed its noise because the CPUs are pretty efficient and
> > > unless you have more than 1 drive per physical core, you will end up
> > > saturating your disk I/O.
> > >
> > > In terms of MapR, you want the raw disk. (But we're talking Apache)
> > >
> > >
> > > On Dec 19, 2012, at 4:59 PM, Jean-Marc Spaggiari <
> > jean-marc@spaggiari.org>
> > > wrote:
> > >
> > > > Finally, it took me a while to run those tests because it was way
> > > > longer than expected, but here are the results:
> > > >
> > > > http://www.spaggiari.org/bonnie.html
> > > >
> > > > LVM is not really slower than JBOD and not really taking more CPU. So
> > > > I will say, if you have to choose between the 2, take the one you
> > > > prefer. Personally, I prefer LVM because it's easy to configure.
> > > >
> > > > The big winner here is RAID0. It's WAY faster than anything else. But
> > > > it's using twice the space... Your choice.
> > > >
> > > > I did not get a chance to test with the Ubuntu tool because it's not
> > > > working with LVM drives.
> > > >
> > > > JM
> > > >
> > > > 2012/11/28, Michael Segel <mi...@hotmail.com>:
> > > >> Ok, just a caveat.
> > > >>
> > > >> I am discussing MapR as part of a complete response. As Mohit posted
> > > MapR
> > > >> takes the raw device for their MapR File System.
> > > >> They do stripe on their own within what they call a volume.
> > > >>
> > > >> But going back to Apache...
> > > >> You can stripe drives, however I wouldn't recommend it. I don't
> think
> > > the
> > > >> performance gains would really matter.
> > > >> You're going to end up getting blocked first by disk i/o, then your
> > > >> controller card, then your network... assuming 10GBe.
> > > >>
> > > >> With only 2 disks on an 8 core system, you will hit disk i/o first
> and
> > > then
> > > >> you'll watch your CPU Wait I/O climb.
> > > >>
> > > >> HTH
> > > >>
> > > >> -Mike
> > > >>
> > > >> On Nov 28, 2012, at 7:28 PM, Jean-Marc Spaggiari <
> > > jean-marc@spaggiari.org>
> > > >> wrote:
> > > >>
> > > >>> Hi Mike,
> > > >>>
> > > >>> Why not using LVM with MapR? Since LVM is reading from 2 drives
> > almost
> > > >>> at the same time, it should be better than RAID0 or a single drive,
> > > >>> no?
> > > >>>
> > > >>> 2012/11/28, Michael Segel <mi...@hotmail.com>:
> > > >>>> Just a couple of things.
> > > >>>>
> > > >>>> I'm neutral on the use of LVMs. Some would point out that there's
> > some
> > > >>>> overhead, but on the flip side, it can make managing the machines
> > > >>>> easier.
> > > >>>> If you're using MapR, you don't want to use LVMs but raw devices.
> > > >>>>
> > > >>>> In terms of GC, its going to depend on the heap size and not the
> > total
> > > >>>> memory. With respect to HBase. ... MSLABS is the way to go.
> > > >>>>
> > > >>>>
> > > >>>> On Nov 28, 2012, at 12:05 PM, Jean-Marc Spaggiari
> > > >>>> <je...@spaggiari.org>
> > > >>>> wrote:
> > > >>>>
> > > >>>>> Hi Gregory,
> > > >>>>>
> > > >>>>> I founs this about LVM:
> > > >>>>> -> http://blog.andrew.net.au/2006/08/09
> > > >>>>> ->
> > > >>>>>
> > > http://www.phoronix.com/scan.php?page=article&item=fedora_15_lvm&num=2
> > > >>>>>
> > > >>>>> Seems that performances are still correct with it. I will most
> > > >>>>> probably give it a try and bench that too... I have one new hard
> > > drive
> > > >>>>> which should arrived tomorrow. Perfect timing ;)
> > > >>>>>
> > > >>>>>
> > > >>>>>
> > > >>>>> JM
> > > >>>>>
> > > >>>>> 2012/11/28, Mohit Anchlia <mo...@gmail.com>:
> > > >>>>>>
> > > >>>>>>
> > > >>>>>>
> > > >>>>>>
> > > >>>>>> On Nov 28, 2012, at 9:07 AM, Adrien Mogenet <
> > > adrien.mogenet@gmail.com>
> > > >>>>>> wrote:
> > > >>>>>>
> > > >>>>>>> Does HBase really benefit from 64 GB of RAM since allocating
> too
> > > >>>>>>> large
> > > >>>>>>> heap
> > > >>>>>>> might increase GC time ?
> > > >>>>>>>
> > > >>>>>> Benefit you get is from OS cache
> > > >>>>>>> Another question : why not RAID 0, in order to aggregate disk
> > > >>>>>>> bandwidth
> > > >>>>>>> ?
> > > >>>>>>> (and thus keep 3x replication factor)
> > > >>>>>>>
> > > >>>>>>>
> > > >>>>>>> On Wed, Nov 28, 2012 at 5:58 PM, Michael Segel
> > > >>>>>>> <mi...@hotmail.com>wrote:
> > > >>>>>>>
> > > >>>>>>>> Sorry,
> > > >>>>>>>>
> > > >>>>>>>> I need to clarify.
> > > >>>>>>>>
> > > >>>>>>>> 4GB per physical core is a good starting point.
> > > >>>>>>>> So with 2 quad core chips, that is going to be 32GB.
> > > >>>>>>>>
> > > >>>>>>>> IMHO that's a minimum. If you go with HBase, you will want
> more.
> > > >>>>>>>> (Actually
> > > >>>>>>>> you will need more.) The next logical jump would be to 48 or
> > 64GB.
> > > >>>>>>>>
> > > >>>>>>>> If we start to price out memory, depending on vendor, your
> > > company's
> > > >>>>>>>> procurement,  there really isn't much of a price difference in
> > > terms
> > > >>>>>>>> of
> > > >>>>>>>> 32,48, or 64 GB.
> > > >>>>>>>> Note that it also depends on the chips themselves. Also you
> need
> > > to
> > > >>>>>>>> see
> > > >>>>>>>> how many memory channels exist in the mother board. You may
> need
> > > to
> > > >>>>>>>> buy
> > > >>>>>>>> in
> > > >>>>>>>> pairs or triplets. Your hardware vendor can help you. (Also
> you
> > > need
> > > >>>>>>>> to
> > > >>>>>>>> keep an eye on your hardware vendor. Sometimes they will give
> > you
> > > >>>>>>>> higher
> > > >>>>>>>> density chips that are going to be more expensive...) ;-)
> > > >>>>>>>>
> > > >>>>>>>> I tend to like having extra memory from the start.
> > > >>>>>>>> It gives you a bit more freedom and also protects you from
> 'fat'
> > > >>>>>>>> code.
> > > >>>>>>>>
> > > >>>>>>>> Looking at YARN... you will need more memory too.
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>> With respect to the hard drives...
> > > >>>>>>>>
> > > >>>>>>>> The best recommendation is to keep the drives as JBOD and then
> > use
> > > >>>>>>>> 3x
> > > >>>>>>>> replication.
> > > >>>>>>>> In this case, make sure that the disk controller cards can
> > handle
> > > >>>>>>>> JBOD.
> > > >>>>>>>> (Some don't support JBOD out of the box)
> > > >>>>>>>>
> > > >>>>>>>> With respect to RAID...
> > > >>>>>>>>
> > > >>>>>>>> If you are running MapR, no need for RAID.
> > > >>>>>>>> If you are running an Apache derivative, you could use RAID 1.
> > > Then
> > > >>>>>>>> cut
> > > >>>>>>>> your replication to 2X. This makes it easier to manage drive
> > > >>>>>>>> failures.
> > > >>>>>>>> (Its not the norm, but it works...) In some clusters, they are
> > > using
> > > >>>>>>>> appliances like Net App's e series where the machines see the
> > > drives
> > > >>>>>>>> as
> > > >>>>>>>> local attached storage and I think the appliances themselves
> are
> > > >>>>>>>> using
> > > >>>>>>>> RAID.  I haven't played with this configuration, however it
> > could
> > > >>>>>>>> make
> > > >>>>>>>> sense and its a valid design.
> > > >>>>>>>>
> > > >>>>>>>> HTH
> > > >>>>>>>>
> > > >>>>>>>> -Mike
> > > >>>>>>>>
> > > >>>>>>>> On Nov 28, 2012, at 10:33 AM, Jean-Marc Spaggiari
> > > >>>>>>>> <je...@spaggiari.org>
> > > >>>>>>>> wrote:
> > > >>>>>>>>
> > > >>>>>>>>> Hi Mike,
> > > >>>>>>>>>
> > > >>>>>>>>> Thanks for all those details!
> > > >>>>>>>>>
> > > >>>>>>>>> So to simplify the equation, for 16 virtual cores we need 48
> to
> > > >>>>>>>>> 64GB.
> > > >>>>>>>>> Which mean 3 to 4GB per core. So with quad cores, 12GB to
> 16GB
> > > are
> > > >>>>>>>>> a
> > > >>>>>>>>> good start? Or I simplified it to much?
> > > >>>>>>>>>
> > > >>>>>>>>> Regarding the hard drives. If you add more than one drive, do
> > you
> > > >>>>>>>>> need
> > > >>>>>>>>> to build them on RAID or similar systems? Or can Hadoop/HBase
> > be
> > > >>>>>>>>> configured to use more than one drive?
> > > >>>>>>>>>
> > > >>>>>>>>> Thanks,
> > > >>>>>>>>>
> > > >>>>>>>>> JM
> > > >>>>>>>>>
> > > >>>>>>>>> 2012/11/27, Michael Segel <mi...@hotmail.com>:
> > > >>>>>>>>>>
> > > >>>>>>>>>> OK... I don't know why Cloudera is so hung up on 32GB. ;-)
> > [Its
> > > an
> > > >>>>>>>> inside
> > > >>>>>>>>>> joke ...]
> > > >>>>>>>>>>
> > > >>>>>>>>>> So here's the problem...
> > > >>>>>>>>>>
> > > >>>>>>>>>> By default, your child processes in a map/reduce job get a
> > > default
> > > >>>>>>>> 512MB.
> > > >>>>>>>>>> The majority of the time, this gets raised to 1GB.
> > > >>>>>>>>>>
> > > >>>>>>>>>> 8 cores (dual quad cores) shows up at 16 virtual processors
> in
> > > >>>>>>>>>> Linux.
> > > >>>>>>>> (Note:
> > > >>>>>>>>>> This is why when people talk about the number of cores, you
> > have
> > > >>>>>>>>>> to
> > > >>>>>>>> specify
> > > >>>>>>>>>> physical cores or logical cores....)
> > > >>>>>>>>>>
> > > >>>>>>>>>> So if you were to over subscribe and have lets say 12
>  mappers
> > > and
> > > >>>>>>>>>> 12
> > > >>>>>>>>>> reducers, that's 24 slots. Which means that you would need
> > 24GB
> > > of
> > > >>>>>>>> memory
> > > >>>>>>>>>> reserved just for the child processes. This would leave 8GB
> > for
> > > >>>>>>>>>> DN,
> > > >>>>>>>>>> TT
> > > >>>>>>>> and
> > > >>>>>>>>>> the rest of the linux OS processes.
> > > >>>>>>>>>>
> > > >>>>>>>>>> Can you live with that? Sure.
> > > >>>>>>>>>> Now add in R, HBase, Impala, or some other set of tools on
> top
> > > of
> > > >>>>>>>>>> the
> > > >>>>>>>>>> cluster.
> > > >>>>>>>>>>
> > > >>>>>>>>>> Ooops! Now you are in trouble because you will swap.
> > > >>>>>>>>>> Also adding in R, you may want to bump up those child procs
> > from
> > > >>>>>>>>>> 1GB
> > > >>>>>>>>>> to
> > > >>>>>>>> 2
> > > >>>>>>>>>> GB. That means the 24 slots would now require 48GB.  Now you
> > > have
> > > >>>>>>>>>> swap
> > > >>>>>>>> and
> > > >>>>>>>>>> if that happens you will see HBase in a cascading failure.
> > > >>>>>>>>>>
> > > >>>>>>>>>> So while you can do a rolling restart with the changed
> > > >>>>>>>>>> configuration
> > > >>>>>>>>>> (reducing the number of mappers and reducers) you end up
> with
> > > less
> > > >>>>>>>>>> slots
> > > >>>>>>>>>> which will mean in longer run time for your jobs. (Less
> slots
> > ==
> > > >>>>>>>>>> less
> > > >>>>>>>>>> parallelism )
> > > >>>>>>>>>>
> > > >>>>>>>>>> Looking at the price of memory... you can get 48GB or even
> > 64GB
> > > >>>>>>>>>> for
> > > >>>>>>>> around
> > > >>>>>>>>>> the same price point. (8GB chips)
> > > >>>>>>>>>>
> > > >>>>>>>>>> And I didn't even talk about adding SOLR either again a
> memory
> > > >>>>>>>>>> hog...
> > > >>>>>>>> ;-)
> > > >>>>>>>>>>
> > > >>>>>>>>>> Note that I matched the number of mappers w reducers. You
> > could
> > > go
> > > >>>>>>>>>> with
> > > >>>>>>>>>> fewer reducers if you want. I tend to recommend a ratio of
> 2:1
> > > >>>>>>>>>> mappers
> > > >>>>>>>> to
> > > >>>>>>>>>> reducers, depending on the work flow....
> > > >>>>>>>>>>
> > > >>>>>>>>>> As to the disks... no 7200 SATA III drives are fine. SATA
> III
> > > >>>>>>>>>> interface
> > > >>>>>>>> is
> > > >>>>>>>>>> pretty much available in the new kit being shipped.
> > > >>>>>>>>>> Its just that you don't have enough drives. 8 cores should
> be
> > 8
> > > >>>>>>>> spindles if
> > > >>>>>>>>>> available.
> > > >>>>>>>>>> Otherwise you end up seeing your CPU load climb on wait
> states
> > > as
> > > >>>>>>>>>> the
> > > >>>>>>>>>> processes wait for the disk i/o to catch up.
> > > >>>>>>>>>>
> > > >>>>>>>>>> I mean you could build out a cluster w 4 x 3 3.5" 2TB drives
> > in
> > > a
> > > >>>>>>>>>> 1
> > > >>>>>>>>>> U
> > > >>>>>>>>>> chassis based on price. You're making a trade off and you
> > should
> > > >>>>>>>>>> be
> > > >>>>>>>> aware of
> > > >>>>>>>>>> the performance hit you will take.
> > > >>>>>>>>>>
> > > >>>>>>>>>> HTH
> > > >>>>>>>>>>
> > > >>>>>>>>>> -Mike
> > > >>>>>>>>>>
> > > >>>>>>>>>> On Nov 27, 2012, at 1:52 PM, Jean-Marc Spaggiari <
> > > >>>>>>>> jean-marc@spaggiari.org>
> > > >>>>>>>>>> wrote:
> > > >>>>>>>>>>
> > > >>>>>>>>>>> Hi Michael,
> > > >>>>>>>>>>>
> > > >>>>>>>>>>> so are you recommanding 32Gb per node?
> > > >>>>>>>>>>>
> > > >>>>>>>>>>> What about the disks? SATA drives are to slow?
> > > >>>>>>>>>>>
> > > >>>>>>>>>>> JM
> > > >>>>>>>>>>>
> > > >>>>>>>>>>> 2012/11/26, Michael Segel <mi...@hotmail.com>:
> > > >>>>>>>>>>>> Uhm, those specs are actually now out of date.
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>> If you're running HBase, or want to also run R on top of
> > > Hadoop,
> > > >>>>>>>>>>>> you
> > > >>>>>>>>>>>> will
> > > >>>>>>>>>>>> need to add more memory.
> > > >>>>>>>>>>>> Also forget 1GBe got 10GBe,  and w 2 SATA drives, you will
> > be
> > > >>>>>>>>>>>> disk
> > > >>>>>>>>>>>> i/o
> > > >>>>>>>>>>>> bound
> > > >>>>>>>>>>>> way too quickly.
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>> On Nov 26, 2012, at 8:05 AM, Marcos Ortiz <mlortiz@uci.cu
> >
> > > >>>>>>>>>>>> wrote:
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>>> Are you asking about hardware recommendations?
> > > >>>>>>>>>>>>> Eric Sammer on his "Hadoop Operations" book, did a great
> > job
> > > >>>>>>>>>>>>> about
> > > >>>>>>>>>>>>> this:
> > > >>>>>>>>>>>>> For middle size clusters (until 300 nodes):
> > > >>>>>>>>>>>>> Processor: A dual quad-core 2.6 Ghz
> > > >>>>>>>>>>>>> RAM: 24 GB DDR3
> > > >>>>>>>>>>>>> Dual 1 Gb Ethernet NICs
> > > >>>>>>>>>>>>> a SAS drive controller
> > > >>>>>>>>>>>>> at least two SATA II drives in a JBOD configuration
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>>> The replication factor depends heavily of the primary use
> > of
> > > >>>>>>>>>>>>> your
> > > >>>>>>>>>>>>> cluster.
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>>> On 11/26/2012 08:53 AM, David Charle wrote:
> > > >>>>>>>>>>>>>> hi
> > > >>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>> what's the recommended nodes for NN, hmaster and zk
> nodes
> > > for
> > > >>>>>>>>>>>>>> a
> > > >>>>>>>> larger
> > > >>>>>>>>>>>>>> cluster, lets say 50-100+
> > > >>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>> also, what would be the ideal replication factor for
> > larger
> > > >>>>>>>>>>>>>> clusters
> > > >>>>>>>>>>>>>> when
> > > >>>>>>>>>>>>>> u have 3-4 racks ?
> > > >>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>> --
> > > >>>>>>>>>>>>>> David
> > > >>>>>>>>>>>>>> 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE
> LAS
> > > >>>>>>>>>>>>>> CIENCIAS
> > > >>>>>>>>>>>>>> INFORMATICAS...
> > > >>>>>>>>>>>>>> CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION
> > > >>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>> http://www.uci.cu
> > > >>>>>>>>>>>>>> http://www.facebook.com/universidad.uci
> > > >>>>>>>>>>>>>> http://www.flickr.com/photos/universidad_uci
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>>> --
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>>> Marcos Luis Ortíz Valmaseda
> > > >>>>>>>>>>>>> about.me/marcosortiz <http://about.me/marcosortiz>
> > > >>>>>>>>>>>>> @marcosluis2186 <http://twitter.com/marcosluis2186>
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>>> 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS
> > > >>>>>>>>>>>>> CIENCIAS
> > > >>>>>>>>>>>>> INFORMATICAS...
> > > >>>>>>>>>>>>> CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>>> http://www.uci.cu
> > > >>>>>>>>>>>>> http://www.facebook.com/universidad.uci
> > > >>>>>>>>>>>>> http://www.flickr.com/photos/universidad_uci
> > > >>>>>>>
> > > >>>>>>>
> > > >>>>>>> --
> > > >>>>>>> Adrien Mogenet
> > > >>>>>>> 06.59.16.64.22
> > > >>>>>>> http://www.mogenet.me
> > > >>>>>>
> > > >>>>>
> > > >>>>
> > > >>>>
> > > >>>
> > > >>
> > > >>
> > > >
> > >
> > >
> >
>

Re: recommended nodes

Posted by Jean-Marc Spaggiari <je...@spaggiari.org>.

Hi Varun,

The hard drivers I used are now used on the hadoop/hbase cluster, but they
was clear and formated for the tests I did. The computer where I run those
tests was one of the region servers. It was re-installed to be very clear,
and it's now running a datanode and a RS.

Regarding RAID, I think you are confusing RAID0 and RAID1. It's RAID1 which
need to access the 2 files each time. RAID0 is more like JBOD, but faster.

JM

2012/12/20 Varun Sharma <va...@pinterest.com>

> Hi Jean,
>
> Very interesting benchmark - how are these numbers arrived at ? Is this on
> a real hbase cluster ? To me, it felt kind of counter intuitive that RAID0
> beats JBOD on random seeks because with RAID0 all disks need to seek at the
> same time and the performance should basically be as bad as the slowest
> seeking disk.
>
> Varun
>
> On Wed, Dec 19, 2012 at 5:14 PM, Michael Segel <michael_segel@hotmail.com
> >wrote:
>
> > Yeah,
> > I couldn't argue against LVMs when talking with the system admins.
> > In terms of speed its noise because the CPUs are pretty efficient and
> > unless you have more than 1 drive per physical core, you will end up
> > saturating your disk I/O.
> >
> > In terms of MapR, you want the raw disk. (But we're talking Apache)
> >
> >
> > On Dec 19, 2012, at 4:59 PM, Jean-Marc Spaggiari <
> jean-marc@spaggiari.org>
> > wrote:
> >
> > > Finally, it took me a while to run those tests because it was way
> > > longer than expected, but here are the results:
> > >
> > > http://www.spaggiari.org/bonnie.html
> > >
> > > LVM is not really slower than JBOD and not really taking more CPU. So
> > > I will say, if you have to choose between the 2, take the one you
> > > prefer. Personally, I prefer LVM because it's easy to configure.
> > >
> > > The big winner here is RAID0. It's WAY faster than anything else. But
> > > it's using twice the space... Your choice.
> > >
> > > I did not get a chance to test with the Ubuntu tool because it's not
> > > working with LVM drives.
> > >
> > > JM
> > >
> > > 2012/11/28, Michael Segel <mi...@hotmail.com>:
> > >> Ok, just a caveat.
> > >>
> > >> I am discussing MapR as part of a complete response. As Mohit posted
> > MapR
> > >> takes the raw device for their MapR File System.
> > >> They do stripe on their own within what they call a volume.
> > >>
> > >> But going back to Apache...
> > >> You can stripe drives, however I wouldn't recommend it. I don't think
> > the
> > >> performance gains would really matter.
> > >> You're going to end up getting blocked first by disk i/o, then your
> > >> controller card, then your network... assuming 10GBe.
> > >>
> > >> With only 2 disks on an 8 core system, you will hit disk i/o first and
> > then
> > >> you'll watch your CPU Wait I/O climb.
> > >>
> > >> HTH
> > >>
> > >> -Mike
> > >>
> > >> On Nov 28, 2012, at 7:28 PM, Jean-Marc Spaggiari <
> > jean-marc@spaggiari.org>
> > >> wrote:
> > >>
> > >>> Hi Mike,
> > >>>
> > >>> Why not using LVM with MapR? Since LVM is reading from 2 drives
> almost
> > >>> at the same time, it should be better than RAID0 or a single drive,
> > >>> no?
> > >>>
> > >>> 2012/11/28, Michael Segel <mi...@hotmail.com>:
> > >>>> Just a couple of things.
> > >>>>
> > >>>> I'm neutral on the use of LVMs. Some would point out that there's
> some
> > >>>> overhead, but on the flip side, it can make managing the machines
> > >>>> easier.
> > >>>> If you're using MapR, you don't want to use LVMs but raw devices.
> > >>>>
> > >>>> In terms of GC, its going to depend on the heap size and not the
> total
> > >>>> memory. With respect to HBase. ... MSLABS is the way to go.
> > >>>>
> > >>>>
> > >>>> On Nov 28, 2012, at 12:05 PM, Jean-Marc Spaggiari
> > >>>> <je...@spaggiari.org>
> > >>>> wrote:
> > >>>>
> > >>>>> Hi Gregory,
> > >>>>>
> > >>>>> I founs this about LVM:
> > >>>>> -> http://blog.andrew.net.au/2006/08/09
> > >>>>> ->
> > >>>>>
> > http://www.phoronix.com/scan.php?page=article&item=fedora_15_lvm&num=2
> > >>>>>
> > >>>>> Seems that performances are still correct with it. I will most
> > >>>>> probably give it a try and bench that too... I have one new hard
> > drive
> > >>>>> which should arrived tomorrow. Perfect timing ;)
> > >>>>>
> > >>>>>
> > >>>>>
> > >>>>> JM
> > >>>>>
> > >>>>> 2012/11/28, Mohit Anchlia <mo...@gmail.com>:
> > >>>>>>
> > >>>>>>
> > >>>>>>
> > >>>>>>
> > >>>>>> On Nov 28, 2012, at 9:07 AM, Adrien Mogenet <
> > adrien.mogenet@gmail.com>
> > >>>>>> wrote:
> > >>>>>>
> > >>>>>>> Does HBase really benefit from 64 GB of RAM since allocating too
> > >>>>>>> large
> > >>>>>>> heap
> > >>>>>>> might increase GC time ?
> > >>>>>>>
> > >>>>>> Benefit you get is from OS cache
> > >>>>>>> Another question : why not RAID 0, in order to aggregate disk
> > >>>>>>> bandwidth
> > >>>>>>> ?
> > >>>>>>> (and thus keep 3x replication factor)
> > >>>>>>>
> > >>>>>>>
> > >>>>>>> On Wed, Nov 28, 2012 at 5:58 PM, Michael Segel
> > >>>>>>> <mi...@hotmail.com>wrote:
> > >>>>>>>
> > >>>>>>>> Sorry,
> > >>>>>>>>
> > >>>>>>>> I need to clarify.
> > >>>>>>>>
> > >>>>>>>> 4GB per physical core is a good starting point.
> > >>>>>>>> So with 2 quad core chips, that is going to be 32GB.
> > >>>>>>>>
> > >>>>>>>> IMHO that's a minimum. If you go with HBase, you will want more.
> > >>>>>>>> (Actually
> > >>>>>>>> you will need more.) The next logical jump would be to 48 or
> 64GB.
> > >>>>>>>>
> > >>>>>>>> If we start to price out memory, depending on vendor, your
> > company's
> > >>>>>>>> procurement,  there really isn't much of a price difference in
> > terms
> > >>>>>>>> of
> > >>>>>>>> 32,48, or 64 GB.
> > >>>>>>>> Note that it also depends on the chips themselves. Also you need
> > to
> > >>>>>>>> see
> > >>>>>>>> how many memory channels exist in the mother board. You may need
> > to
> > >>>>>>>> buy
> > >>>>>>>> in
> > >>>>>>>> pairs or triplets. Your hardware vendor can help you. (Also you
> > need
> > >>>>>>>> to
> > >>>>>>>> keep an eye on your hardware vendor. Sometimes they will give
> you
> > >>>>>>>> higher
> > >>>>>>>> density chips that are going to be more expensive...) ;-)
> > >>>>>>>>
> > >>>>>>>> I tend to like having extra memory from the start.
> > >>>>>>>> It gives you a bit more freedom and also protects you from 'fat'
> > >>>>>>>> code.
> > >>>>>>>>
> > >>>>>>>> Looking at YARN... you will need more memory too.
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>> With respect to the hard drives...
> > >>>>>>>>
> > >>>>>>>> The best recommendation is to keep the drives as JBOD and then
> use
> > >>>>>>>> 3x
> > >>>>>>>> replication.
> > >>>>>>>> In this case, make sure that the disk controller cards can
> handle
> > >>>>>>>> JBOD.
> > >>>>>>>> (Some don't support JBOD out of the box)
> > >>>>>>>>
> > >>>>>>>> With respect to RAID...
> > >>>>>>>>
> > >>>>>>>> If you are running MapR, no need for RAID.
> > >>>>>>>> If you are running an Apache derivative, you could use RAID 1.
> > Then
> > >>>>>>>> cut
> > >>>>>>>> your replication to 2X. This makes it easier to manage drive
> > >>>>>>>> failures.
> > >>>>>>>> (Its not the norm, but it works...) In some clusters, they are
> > using
> > >>>>>>>> appliances like Net App's e series where the machines see the
> > drives
> > >>>>>>>> as
> > >>>>>>>> local attached storage and I think the appliances themselves are
> > >>>>>>>> using
> > >>>>>>>> RAID.  I haven't played with this configuration, however it
> could
> > >>>>>>>> make
> > >>>>>>>> sense and its a valid design.
> > >>>>>>>>
> > >>>>>>>> HTH
> > >>>>>>>>
> > >>>>>>>> -Mike
> > >>>>>>>>
> > >>>>>>>> On Nov 28, 2012, at 10:33 AM, Jean-Marc Spaggiari
> > >>>>>>>> <je...@spaggiari.org>
> > >>>>>>>> wrote:
> > >>>>>>>>
> > >>>>>>>>> Hi Mike,
> > >>>>>>>>>
> > >>>>>>>>> Thanks for all those details!
> > >>>>>>>>>
> > >>>>>>>>> So to simplify the equation, for 16 virtual cores we need 48 to
> > >>>>>>>>> 64GB.
> > >>>>>>>>> Which mean 3 to 4GB per core. So with quad cores, 12GB to 16GB
> > are
> > >>>>>>>>> a
> > >>>>>>>>> good start? Or I simplified it to much?
> > >>>>>>>>>
> > >>>>>>>>> Regarding the hard drives. If you add more than one drive, do
> you
> > >>>>>>>>> need
> > >>>>>>>>> to build them on RAID or similar systems? Or can Hadoop/HBase
> be
> > >>>>>>>>> configured to use more than one drive?
> > >>>>>>>>>
> > >>>>>>>>> Thanks,
> > >>>>>>>>>
> > >>>>>>>>> JM
> > >>>>>>>>>
> > >>>>>>>>> 2012/11/27, Michael Segel <mi...@hotmail.com>:
> > >>>>>>>>>>
> > >>>>>>>>>> OK... I don't know why Cloudera is so hung up on 32GB. ;-)
> [Its
> > an
> > >>>>>>>> inside
> > >>>>>>>>>> joke ...]
> > >>>>>>>>>>
> > >>>>>>>>>> So here's the problem...
> > >>>>>>>>>>
> > >>>>>>>>>> By default, your child processes in a map/reduce job get a
> > default
> > >>>>>>>> 512MB.
> > >>>>>>>>>> The majority of the time, this gets raised to 1GB.
> > >>>>>>>>>>
> > >>>>>>>>>> 8 cores (dual quad cores) shows up at 16 virtual processors in
> > >>>>>>>>>> Linux.
> > >>>>>>>> (Note:
> > >>>>>>>>>> This is why when people talk about the number of cores, you
> have
> > >>>>>>>>>> to
> > >>>>>>>> specify
> > >>>>>>>>>> physical cores or logical cores....)
> > >>>>>>>>>>
> > >>>>>>>>>> So if you were to over subscribe and have lets say 12  mappers
> > and
> > >>>>>>>>>> 12
> > >>>>>>>>>> reducers, that's 24 slots. Which means that you would need
> 24GB
> > of
> > >>>>>>>> memory
> > >>>>>>>>>> reserved just for the child processes. This would leave 8GB
> for
> > >>>>>>>>>> DN,
> > >>>>>>>>>> TT
> > >>>>>>>> and
> > >>>>>>>>>> the rest of the linux OS processes.
> > >>>>>>>>>>
> > >>>>>>>>>> Can you live with that? Sure.
> > >>>>>>>>>> Now add in R, HBase, Impala, or some other set of tools on top
> > of
> > >>>>>>>>>> the
> > >>>>>>>>>> cluster.
> > >>>>>>>>>>
> > >>>>>>>>>> Ooops! Now you are in trouble because you will swap.
> > >>>>>>>>>> Also adding in R, you may want to bump up those child procs
> from
> > >>>>>>>>>> 1GB
> > >>>>>>>>>> to
> > >>>>>>>> 2
> > >>>>>>>>>> GB. That means the 24 slots would now require 48GB.  Now you
> > have
> > >>>>>>>>>> swap
> > >>>>>>>> and
> > >>>>>>>>>> if that happens you will see HBase in a cascading failure.
> > >>>>>>>>>>
> > >>>>>>>>>> So while you can do a rolling restart with the changed
> > >>>>>>>>>> configuration
> > >>>>>>>>>> (reducing the number of mappers and reducers) you end up with
> > less
> > >>>>>>>>>> slots
> > >>>>>>>>>> which will mean in longer run time for your jobs. (Less slots
> ==
> > >>>>>>>>>> less
> > >>>>>>>>>> parallelism )
> > >>>>>>>>>>
> > >>>>>>>>>> Looking at the price of memory... you can get 48GB or even
> 64GB
> > >>>>>>>>>> for
> > >>>>>>>> around
> > >>>>>>>>>> the same price point. (8GB chips)
> > >>>>>>>>>>
> > >>>>>>>>>> And I didn't even talk about adding SOLR either again a memory
> > >>>>>>>>>> hog...
> > >>>>>>>> ;-)
> > >>>>>>>>>>
> > >>>>>>>>>> Note that I matched the number of mappers w reducers. You
> could
> > go
> > >>>>>>>>>> with
> > >>>>>>>>>> fewer reducers if you want. I tend to recommend a ratio of 2:1
> > >>>>>>>>>> mappers
> > >>>>>>>> to
> > >>>>>>>>>> reducers, depending on the work flow....
> > >>>>>>>>>>
> > >>>>>>>>>> As to the disks... no 7200 SATA III drives are fine. SATA III
> > >>>>>>>>>> interface
> > >>>>>>>> is
> > >>>>>>>>>> pretty much available in the new kit being shipped.
> > >>>>>>>>>> Its just that you don't have enough drives. 8 cores should be
> 8
> > >>>>>>>> spindles if
> > >>>>>>>>>> available.
> > >>>>>>>>>> Otherwise you end up seeing your CPU load climb on wait states
> > as
> > >>>>>>>>>> the
> > >>>>>>>>>> processes wait for the disk i/o to catch up.
> > >>>>>>>>>>
> > >>>>>>>>>> I mean you could build out a cluster w 4 x 3 3.5" 2TB drives
> in
> > a
> > >>>>>>>>>> 1
> > >>>>>>>>>> U
> > >>>>>>>>>> chassis based on price. You're making a trade off and you
> should
> > >>>>>>>>>> be
> > >>>>>>>> aware of
> > >>>>>>>>>> the performance hit you will take.
> > >>>>>>>>>>
> > >>>>>>>>>> HTH
> > >>>>>>>>>>
> > >>>>>>>>>> -Mike
> > >>>>>>>>>>
> > >>>>>>>>>> On Nov 27, 2012, at 1:52 PM, Jean-Marc Spaggiari <
> > >>>>>>>> jean-marc@spaggiari.org>
> > >>>>>>>>>> wrote:
> > >>>>>>>>>>
> > >>>>>>>>>>> Hi Michael,
> > >>>>>>>>>>>
> > >>>>>>>>>>> so are you recommanding 32Gb per node?
> > >>>>>>>>>>>
> > >>>>>>>>>>> What about the disks? SATA drives are to slow?
> > >>>>>>>>>>>
> > >>>>>>>>>>> JM
> > >>>>>>>>>>>
> > >>>>>>>>>>> 2012/11/26, Michael Segel <mi...@hotmail.com>:
> > >>>>>>>>>>>> Uhm, those specs are actually now out of date.
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> If you're running HBase, or want to also run R on top of
> > Hadoop,
> > >>>>>>>>>>>> you
> > >>>>>>>>>>>> will
> > >>>>>>>>>>>> need to add more memory.
> > >>>>>>>>>>>> Also forget 1GBe got 10GBe,  and w 2 SATA drives, you will
> be
> > >>>>>>>>>>>> disk
> > >>>>>>>>>>>> i/o
> > >>>>>>>>>>>> bound
> > >>>>>>>>>>>> way too quickly.
> > >>>>>>>>>>>>
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> On Nov 26, 2012, at 8:05 AM, Marcos Ortiz <ml...@uci.cu>
> > >>>>>>>>>>>> wrote:
> > >>>>>>>>>>>>
> > >>>>>>>>>>>>> Are you asking about hardware recommendations?
> > >>>>>>>>>>>>> Eric Sammer on his "Hadoop Operations" book, did a great
> job
> > >>>>>>>>>>>>> about
> > >>>>>>>>>>>>> this:
> > >>>>>>>>>>>>> For middle size clusters (until 300 nodes):
> > >>>>>>>>>>>>> Processor: A dual quad-core 2.6 Ghz
> > >>>>>>>>>>>>> RAM: 24 GB DDR3
> > >>>>>>>>>>>>> Dual 1 Gb Ethernet NICs
> > >>>>>>>>>>>>> a SAS drive controller
> > >>>>>>>>>>>>> at least two SATA II drives in a JBOD configuration
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> The replication factor depends heavily of the primary use
> of
> > >>>>>>>>>>>>> your
> > >>>>>>>>>>>>> cluster.
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> On 11/26/2012 08:53 AM, David Charle wrote:
> > >>>>>>>>>>>>>> hi
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>> what's the recommended nodes for NN, hmaster and zk nodes
> > for
> > >>>>>>>>>>>>>> a
> > >>>>>>>> larger
> > >>>>>>>>>>>>>> cluster, lets say 50-100+
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>> also, what would be the ideal replication factor for
> larger
> > >>>>>>>>>>>>>> clusters
> > >>>>>>>>>>>>>> when
> > >>>>>>>>>>>>>> u have 3-4 racks ?
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>> --
> > >>>>>>>>>>>>>> David
> > >>>>>>>>>>>>>> 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS
> > >>>>>>>>>>>>>> CIENCIAS
> > >>>>>>>>>>>>>> INFORMATICAS...
> > >>>>>>>>>>>>>> CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>> http://www.uci.cu
> > >>>>>>>>>>>>>> http://www.facebook.com/universidad.uci
> > >>>>>>>>>>>>>> http://www.flickr.com/photos/universidad_uci
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> --
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> Marcos Luis Ortíz Valmaseda
> > >>>>>>>>>>>>> about.me/marcosortiz <http://about.me/marcosortiz>
> > >>>>>>>>>>>>> @marcosluis2186 <http://twitter.com/marcosluis2186>
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS
> > >>>>>>>>>>>>> CIENCIAS
> > >>>>>>>>>>>>> INFORMATICAS...
> > >>>>>>>>>>>>> CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> http://www.uci.cu
> > >>>>>>>>>>>>> http://www.facebook.com/universidad.uci
> > >>>>>>>>>>>>> http://www.flickr.com/photos/universidad_uci
> > >>>>>>>
> > >>>>>>>
> > >>>>>>> --
> > >>>>>>> Adrien Mogenet
> > >>>>>>> 06.59.16.64.22
> > >>>>>>> http://www.mogenet.me
> > >>>>>>
> > >>>>>
> > >>>>
> > >>>>
> > >>>
> > >>
> > >>
> > >
> >
> >
>

Re: recommended nodes

Posted by Varun Sharma <va...@pinterest.com>.

Hi Jean,

Very interesting benchmark - how are these numbers arrived at ? Is this on
a real hbase cluster ? To me, it felt kind of counter intuitive that RAID0
beats JBOD on random seeks because with RAID0 all disks need to seek at the
same time and the performance should basically be as bad as the slowest
seeking disk.

Varun

On Wed, Dec 19, 2012 at 5:14 PM, Michael Segel <mi...@hotmail.com>wrote:

> Yeah,
> I couldn't argue against LVMs when talking with the system admins.
> In terms of speed its noise because the CPUs are pretty efficient and
> unless you have more than 1 drive per physical core, you will end up
> saturating your disk I/O.
>
> In terms of MapR, you want the raw disk. (But we're talking Apache)
>
>
> On Dec 19, 2012, at 4:59 PM, Jean-Marc Spaggiari <je...@spaggiari.org>
> wrote:
>
> > Finally, it took me a while to run those tests because it was way
> > longer than expected, but here are the results:
> >
> > http://www.spaggiari.org/bonnie.html
> >
> > LVM is not really slower than JBOD and not really taking more CPU. So
> > I will say, if you have to choose between the 2, take the one you
> > prefer. Personally, I prefer LVM because it's easy to configure.
> >
> > The big winner here is RAID0. It's WAY faster than anything else. But
> > it's using twice the space... Your choice.
> >
> > I did not get a chance to test with the Ubuntu tool because it's not
> > working with LVM drives.
> >
> > JM
> >
> > 2012/11/28, Michael Segel <mi...@hotmail.com>:
> >> Ok, just a caveat.
> >>
> >> I am discussing MapR as part of a complete response. As Mohit posted
> MapR
> >> takes the raw device for their MapR File System.
> >> They do stripe on their own within what they call a volume.
> >>
> >> But going back to Apache...
> >> You can stripe drives, however I wouldn't recommend it. I don't think
> the
> >> performance gains would really matter.
> >> You're going to end up getting blocked first by disk i/o, then your
> >> controller card, then your network... assuming 10GBe.
> >>
> >> With only 2 disks on an 8 core system, you will hit disk i/o first and
> then
> >> you'll watch your CPU Wait I/O climb.
> >>
> >> HTH
> >>
> >> -Mike
> >>
> >> On Nov 28, 2012, at 7:28 PM, Jean-Marc Spaggiari <
> jean-marc@spaggiari.org>
> >> wrote:
> >>
> >>> Hi Mike,
> >>>
> >>> Why not using LVM with MapR? Since LVM is reading from 2 drives almost
> >>> at the same time, it should be better than RAID0 or a single drive,
> >>> no?
> >>>
> >>> 2012/11/28, Michael Segel <mi...@hotmail.com>:
> >>>> Just a couple of things.
> >>>>
> >>>> I'm neutral on the use of LVMs. Some would point out that there's some
> >>>> overhead, but on the flip side, it can make managing the machines
> >>>> easier.
> >>>> If you're using MapR, you don't want to use LVMs but raw devices.
> >>>>
> >>>> In terms of GC, its going to depend on the heap size and not the total
> >>>> memory. With respect to HBase. ... MSLABS is the way to go.
> >>>>
> >>>>
> >>>> On Nov 28, 2012, at 12:05 PM, Jean-Marc Spaggiari
> >>>> <je...@spaggiari.org>
> >>>> wrote:
> >>>>
> >>>>> Hi Gregory,
> >>>>>
> >>>>> I founs this about LVM:
> >>>>> -> http://blog.andrew.net.au/2006/08/09
> >>>>> ->
> >>>>>
> http://www.phoronix.com/scan.php?page=article&item=fedora_15_lvm&num=2
> >>>>>
> >>>>> Seems that performances are still correct with it. I will most
> >>>>> probably give it a try and bench that too... I have one new hard
> drive
> >>>>> which should arrived tomorrow. Perfect timing ;)
> >>>>>
> >>>>>
> >>>>>
> >>>>> JM
> >>>>>
> >>>>> 2012/11/28, Mohit Anchlia <mo...@gmail.com>:
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> On Nov 28, 2012, at 9:07 AM, Adrien Mogenet <
> adrien.mogenet@gmail.com>
> >>>>>> wrote:
> >>>>>>
> >>>>>>> Does HBase really benefit from 64 GB of RAM since allocating too
> >>>>>>> large
> >>>>>>> heap
> >>>>>>> might increase GC time ?
> >>>>>>>
> >>>>>> Benefit you get is from OS cache
> >>>>>>> Another question : why not RAID 0, in order to aggregate disk
> >>>>>>> bandwidth
> >>>>>>> ?
> >>>>>>> (and thus keep 3x replication factor)
> >>>>>>>
> >>>>>>>
> >>>>>>> On Wed, Nov 28, 2012 at 5:58 PM, Michael Segel
> >>>>>>> <mi...@hotmail.com>wrote:
> >>>>>>>
> >>>>>>>> Sorry,
> >>>>>>>>
> >>>>>>>> I need to clarify.
> >>>>>>>>
> >>>>>>>> 4GB per physical core is a good starting point.
> >>>>>>>> So with 2 quad core chips, that is going to be 32GB.
> >>>>>>>>
> >>>>>>>> IMHO that's a minimum. If you go with HBase, you will want more.
> >>>>>>>> (Actually
> >>>>>>>> you will need more.) The next logical jump would be to 48 or 64GB.
> >>>>>>>>
> >>>>>>>> If we start to price out memory, depending on vendor, your
> company's
> >>>>>>>> procurement,  there really isn't much of a price difference in
> terms
> >>>>>>>> of
> >>>>>>>> 32,48, or 64 GB.
> >>>>>>>> Note that it also depends on the chips themselves. Also you need
> to
> >>>>>>>> see
> >>>>>>>> how many memory channels exist in the mother board. You may need
> to
> >>>>>>>> buy
> >>>>>>>> in
> >>>>>>>> pairs or triplets. Your hardware vendor can help you. (Also you
> need
> >>>>>>>> to
> >>>>>>>> keep an eye on your hardware vendor. Sometimes they will give you
> >>>>>>>> higher
> >>>>>>>> density chips that are going to be more expensive...) ;-)
> >>>>>>>>
> >>>>>>>> I tend to like having extra memory from the start.
> >>>>>>>> It gives you a bit more freedom and also protects you from 'fat'
> >>>>>>>> code.
> >>>>>>>>
> >>>>>>>> Looking at YARN... you will need more memory too.
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> With respect to the hard drives...
> >>>>>>>>
> >>>>>>>> The best recommendation is to keep the drives as JBOD and then use
> >>>>>>>> 3x
> >>>>>>>> replication.
> >>>>>>>> In this case, make sure that the disk controller cards can handle
> >>>>>>>> JBOD.
> >>>>>>>> (Some don't support JBOD out of the box)
> >>>>>>>>
> >>>>>>>> With respect to RAID...
> >>>>>>>>
> >>>>>>>> If you are running MapR, no need for RAID.
> >>>>>>>> If you are running an Apache derivative, you could use RAID 1.
> Then
> >>>>>>>> cut
> >>>>>>>> your replication to 2X. This makes it easier to manage drive
> >>>>>>>> failures.
> >>>>>>>> (Its not the norm, but it works...) In some clusters, they are
> using
> >>>>>>>> appliances like Net App's e series where the machines see the
> drives
> >>>>>>>> as
> >>>>>>>> local attached storage and I think the appliances themselves are
> >>>>>>>> using
> >>>>>>>> RAID.  I haven't played with this configuration, however it could
> >>>>>>>> make
> >>>>>>>> sense and its a valid design.
> >>>>>>>>
> >>>>>>>> HTH
> >>>>>>>>
> >>>>>>>> -Mike
> >>>>>>>>
> >>>>>>>> On Nov 28, 2012, at 10:33 AM, Jean-Marc Spaggiari
> >>>>>>>> <je...@spaggiari.org>
> >>>>>>>> wrote:
> >>>>>>>>
> >>>>>>>>> Hi Mike,
> >>>>>>>>>
> >>>>>>>>> Thanks for all those details!
> >>>>>>>>>
> >>>>>>>>> So to simplify the equation, for 16 virtual cores we need 48 to
> >>>>>>>>> 64GB.
> >>>>>>>>> Which mean 3 to 4GB per core. So with quad cores, 12GB to 16GB
> are
> >>>>>>>>> a
> >>>>>>>>> good start? Or I simplified it to much?
> >>>>>>>>>
> >>>>>>>>> Regarding the hard drives. If you add more than one drive, do you
> >>>>>>>>> need
> >>>>>>>>> to build them on RAID or similar systems? Or can Hadoop/HBase be
> >>>>>>>>> configured to use more than one drive?
> >>>>>>>>>
> >>>>>>>>> Thanks,
> >>>>>>>>>
> >>>>>>>>> JM
> >>>>>>>>>
> >>>>>>>>> 2012/11/27, Michael Segel <mi...@hotmail.com>:
> >>>>>>>>>>
> >>>>>>>>>> OK... I don't know why Cloudera is so hung up on 32GB. ;-) [Its
> an
> >>>>>>>> inside
> >>>>>>>>>> joke ...]
> >>>>>>>>>>
> >>>>>>>>>> So here's the problem...
> >>>>>>>>>>
> >>>>>>>>>> By default, your child processes in a map/reduce job get a
> default
> >>>>>>>> 512MB.
> >>>>>>>>>> The majority of the time, this gets raised to 1GB.
> >>>>>>>>>>
> >>>>>>>>>> 8 cores (dual quad cores) shows up at 16 virtual processors in
> >>>>>>>>>> Linux.
> >>>>>>>> (Note:
> >>>>>>>>>> This is why when people talk about the number of cores, you have
> >>>>>>>>>> to
> >>>>>>>> specify
> >>>>>>>>>> physical cores or logical cores....)
> >>>>>>>>>>
> >>>>>>>>>> So if you were to over subscribe and have lets say 12  mappers
> and
> >>>>>>>>>> 12
> >>>>>>>>>> reducers, that's 24 slots. Which means that you would need 24GB
> of
> >>>>>>>> memory
> >>>>>>>>>> reserved just for the child processes. This would leave 8GB for
> >>>>>>>>>> DN,
> >>>>>>>>>> TT
> >>>>>>>> and
> >>>>>>>>>> the rest of the linux OS processes.
> >>>>>>>>>>
> >>>>>>>>>> Can you live with that? Sure.
> >>>>>>>>>> Now add in R, HBase, Impala, or some other set of tools on top
> of
> >>>>>>>>>> the
> >>>>>>>>>> cluster.
> >>>>>>>>>>
> >>>>>>>>>> Ooops! Now you are in trouble because you will swap.
> >>>>>>>>>> Also adding in R, you may want to bump up those child procs from
> >>>>>>>>>> 1GB
> >>>>>>>>>> to
> >>>>>>>> 2
> >>>>>>>>>> GB. That means the 24 slots would now require 48GB.  Now you
> have
> >>>>>>>>>> swap
> >>>>>>>> and
> >>>>>>>>>> if that happens you will see HBase in a cascading failure.
> >>>>>>>>>>
> >>>>>>>>>> So while you can do a rolling restart with the changed
> >>>>>>>>>> configuration
> >>>>>>>>>> (reducing the number of mappers and reducers) you end up with
> less
> >>>>>>>>>> slots
> >>>>>>>>>> which will mean in longer run time for your jobs. (Less slots ==
> >>>>>>>>>> less
> >>>>>>>>>> parallelism )
> >>>>>>>>>>
> >>>>>>>>>> Looking at the price of memory... you can get 48GB or even 64GB
> >>>>>>>>>> for
> >>>>>>>> around
> >>>>>>>>>> the same price point. (8GB chips)
> >>>>>>>>>>
> >>>>>>>>>> And I didn't even talk about adding SOLR either again a memory
> >>>>>>>>>> hog...
> >>>>>>>> ;-)
> >>>>>>>>>>
> >>>>>>>>>> Note that I matched the number of mappers w reducers. You could
> go
> >>>>>>>>>> with
> >>>>>>>>>> fewer reducers if you want. I tend to recommend a ratio of 2:1
> >>>>>>>>>> mappers
> >>>>>>>> to
> >>>>>>>>>> reducers, depending on the work flow....
> >>>>>>>>>>
> >>>>>>>>>> As to the disks... no 7200 SATA III drives are fine. SATA III
> >>>>>>>>>> interface
> >>>>>>>> is
> >>>>>>>>>> pretty much available in the new kit being shipped.
> >>>>>>>>>> Its just that you don't have enough drives. 8 cores should be 8
> >>>>>>>> spindles if
> >>>>>>>>>> available.
> >>>>>>>>>> Otherwise you end up seeing your CPU load climb on wait states
> as
> >>>>>>>>>> the
> >>>>>>>>>> processes wait for the disk i/o to catch up.
> >>>>>>>>>>
> >>>>>>>>>> I mean you could build out a cluster w 4 x 3 3.5" 2TB drives in
> a
> >>>>>>>>>> 1
> >>>>>>>>>> U
> >>>>>>>>>> chassis based on price. You're making a trade off and you should
> >>>>>>>>>> be
> >>>>>>>> aware of
> >>>>>>>>>> the performance hit you will take.
> >>>>>>>>>>
> >>>>>>>>>> HTH
> >>>>>>>>>>
> >>>>>>>>>> -Mike
> >>>>>>>>>>
> >>>>>>>>>> On Nov 27, 2012, at 1:52 PM, Jean-Marc Spaggiari <
> >>>>>>>> jean-marc@spaggiari.org>
> >>>>>>>>>> wrote:
> >>>>>>>>>>
> >>>>>>>>>>> Hi Michael,
> >>>>>>>>>>>
> >>>>>>>>>>> so are you recommanding 32Gb per node?
> >>>>>>>>>>>
> >>>>>>>>>>> What about the disks? SATA drives are to slow?
> >>>>>>>>>>>
> >>>>>>>>>>> JM
> >>>>>>>>>>>
> >>>>>>>>>>> 2012/11/26, Michael Segel <mi...@hotmail.com>:
> >>>>>>>>>>>> Uhm, those specs are actually now out of date.
> >>>>>>>>>>>>
> >>>>>>>>>>>> If you're running HBase, or want to also run R on top of
> Hadoop,
> >>>>>>>>>>>> you
> >>>>>>>>>>>> will
> >>>>>>>>>>>> need to add more memory.
> >>>>>>>>>>>> Also forget 1GBe got 10GBe,  and w 2 SATA drives, you will be
> >>>>>>>>>>>> disk
> >>>>>>>>>>>> i/o
> >>>>>>>>>>>> bound
> >>>>>>>>>>>> way too quickly.
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>> On Nov 26, 2012, at 8:05 AM, Marcos Ortiz <ml...@uci.cu>
> >>>>>>>>>>>> wrote:
> >>>>>>>>>>>>
> >>>>>>>>>>>>> Are you asking about hardware recommendations?
> >>>>>>>>>>>>> Eric Sammer on his "Hadoop Operations" book, did a great job
> >>>>>>>>>>>>> about
> >>>>>>>>>>>>> this:
> >>>>>>>>>>>>> For middle size clusters (until 300 nodes):
> >>>>>>>>>>>>> Processor: A dual quad-core 2.6 Ghz
> >>>>>>>>>>>>> RAM: 24 GB DDR3
> >>>>>>>>>>>>> Dual 1 Gb Ethernet NICs
> >>>>>>>>>>>>> a SAS drive controller
> >>>>>>>>>>>>> at least two SATA II drives in a JBOD configuration
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> The replication factor depends heavily of the primary use of
> >>>>>>>>>>>>> your
> >>>>>>>>>>>>> cluster.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> On 11/26/2012 08:53 AM, David Charle wrote:
> >>>>>>>>>>>>>> hi
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> what's the recommended nodes for NN, hmaster and zk nodes
> for
> >>>>>>>>>>>>>> a
> >>>>>>>> larger
> >>>>>>>>>>>>>> cluster, lets say 50-100+
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> also, what would be the ideal replication factor for larger
> >>>>>>>>>>>>>> clusters
> >>>>>>>>>>>>>> when
> >>>>>>>>>>>>>> u have 3-4 racks ?
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> --
> >>>>>>>>>>>>>> David
> >>>>>>>>>>>>>> 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS
> >>>>>>>>>>>>>> CIENCIAS
> >>>>>>>>>>>>>> INFORMATICAS...
> >>>>>>>>>>>>>> CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> http://www.uci.cu
> >>>>>>>>>>>>>> http://www.facebook.com/universidad.uci
> >>>>>>>>>>>>>> http://www.flickr.com/photos/universidad_uci
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> --
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Marcos Luis Ortíz Valmaseda
> >>>>>>>>>>>>> about.me/marcosortiz <http://about.me/marcosortiz>
> >>>>>>>>>>>>> @marcosluis2186 <http://twitter.com/marcosluis2186>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS
> >>>>>>>>>>>>> CIENCIAS
> >>>>>>>>>>>>> INFORMATICAS...
> >>>>>>>>>>>>> CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> http://www.uci.cu
> >>>>>>>>>>>>> http://www.facebook.com/universidad.uci
> >>>>>>>>>>>>> http://www.flickr.com/photos/universidad_uci
> >>>>>>>
> >>>>>>>
> >>>>>>> --
> >>>>>>> Adrien Mogenet
> >>>>>>> 06.59.16.64.22
> >>>>>>> http://www.mogenet.me
> >>>>>>
> >>>>>
> >>>>
> >>>>
> >>>
> >>
> >>
> >
>
>

Re: recommended nodes

Posted by Michael Segel <mi...@hotmail.com>.

Yeah, 
I couldn't argue against LVMs when talking with the system admins. 
In terms of speed its noise because the CPUs are pretty efficient and unless you have more than 1 drive per physical core, you will end up saturating your disk I/O.

In terms of MapR, you want the raw disk. (But we're talking Apache)


On Dec 19, 2012, at 4:59 PM, Jean-Marc Spaggiari <je...@spaggiari.org> wrote:

> Finally, it took me a while to run those tests because it was way
> longer than expected, but here are the results:
> 
> http://www.spaggiari.org/bonnie.html
> 
> LVM is not really slower than JBOD and not really taking more CPU. So
> I will say, if you have to choose between the 2, take the one you
> prefer. Personally, I prefer LVM because it's easy to configure.
> 
> The big winner here is RAID0. It's WAY faster than anything else. But
> it's using twice the space... Your choice.
> 
> I did not get a chance to test with the Ubuntu tool because it's not
> working with LVM drives.
> 
> JM
> 
> 2012/11/28, Michael Segel <mi...@hotmail.com>:
>> Ok, just a caveat.
>> 
>> I am discussing MapR as part of a complete response. As Mohit posted MapR
>> takes the raw device for their MapR File System.
>> They do stripe on their own within what they call a volume.
>> 
>> But going back to Apache...
>> You can stripe drives, however I wouldn't recommend it. I don't think the
>> performance gains would really matter.
>> You're going to end up getting blocked first by disk i/o, then your
>> controller card, then your network... assuming 10GBe.
>> 
>> With only 2 disks on an 8 core system, you will hit disk i/o first and then
>> you'll watch your CPU Wait I/O climb.
>> 
>> HTH
>> 
>> -Mike
>> 
>> On Nov 28, 2012, at 7:28 PM, Jean-Marc Spaggiari <je...@spaggiari.org>
>> wrote:
>> 
>>> Hi Mike,
>>> 
>>> Why not using LVM with MapR? Since LVM is reading from 2 drives almost
>>> at the same time, it should be better than RAID0 or a single drive,
>>> no?
>>> 
>>> 2012/11/28, Michael Segel <mi...@hotmail.com>:
>>>> Just a couple of things.
>>>> 
>>>> I'm neutral on the use of LVMs. Some would point out that there's some
>>>> overhead, but on the flip side, it can make managing the machines
>>>> easier.
>>>> If you're using MapR, you don't want to use LVMs but raw devices.
>>>> 
>>>> In terms of GC, its going to depend on the heap size and not the total
>>>> memory. With respect to HBase. ... MSLABS is the way to go.
>>>> 
>>>> 
>>>> On Nov 28, 2012, at 12:05 PM, Jean-Marc Spaggiari
>>>> <je...@spaggiari.org>
>>>> wrote:
>>>> 
>>>>> Hi Gregory,
>>>>> 
>>>>> I founs this about LVM:
>>>>> -> http://blog.andrew.net.au/2006/08/09
>>>>> ->
>>>>> http://www.phoronix.com/scan.php?page=article&item=fedora_15_lvm&num=2
>>>>> 
>>>>> Seems that performances are still correct with it. I will most
>>>>> probably give it a try and bench that too... I have one new hard drive
>>>>> which should arrived tomorrow. Perfect timing ;)
>>>>> 
>>>>> 
>>>>> 
>>>>> JM
>>>>> 
>>>>> 2012/11/28, Mohit Anchlia <mo...@gmail.com>:
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> On Nov 28, 2012, at 9:07 AM, Adrien Mogenet <ad...@gmail.com>
>>>>>> wrote:
>>>>>> 
>>>>>>> Does HBase really benefit from 64 GB of RAM since allocating too
>>>>>>> large
>>>>>>> heap
>>>>>>> might increase GC time ?
>>>>>>> 
>>>>>> Benefit you get is from OS cache
>>>>>>> Another question : why not RAID 0, in order to aggregate disk
>>>>>>> bandwidth
>>>>>>> ?
>>>>>>> (and thus keep 3x replication factor)
>>>>>>> 
>>>>>>> 
>>>>>>> On Wed, Nov 28, 2012 at 5:58 PM, Michael Segel
>>>>>>> <mi...@hotmail.com>wrote:
>>>>>>> 
>>>>>>>> Sorry,
>>>>>>>> 
>>>>>>>> I need to clarify.
>>>>>>>> 
>>>>>>>> 4GB per physical core is a good starting point.
>>>>>>>> So with 2 quad core chips, that is going to be 32GB.
>>>>>>>> 
>>>>>>>> IMHO that's a minimum. If you go with HBase, you will want more.
>>>>>>>> (Actually
>>>>>>>> you will need more.) The next logical jump would be to 48 or 64GB.
>>>>>>>> 
>>>>>>>> If we start to price out memory, depending on vendor, your company's
>>>>>>>> procurement,  there really isn't much of a price difference in terms
>>>>>>>> of
>>>>>>>> 32,48, or 64 GB.
>>>>>>>> Note that it also depends on the chips themselves. Also you need to
>>>>>>>> see
>>>>>>>> how many memory channels exist in the mother board. You may need to
>>>>>>>> buy
>>>>>>>> in
>>>>>>>> pairs or triplets. Your hardware vendor can help you. (Also you need
>>>>>>>> to
>>>>>>>> keep an eye on your hardware vendor. Sometimes they will give you
>>>>>>>> higher
>>>>>>>> density chips that are going to be more expensive...) ;-)
>>>>>>>> 
>>>>>>>> I tend to like having extra memory from the start.
>>>>>>>> It gives you a bit more freedom and also protects you from 'fat'
>>>>>>>> code.
>>>>>>>> 
>>>>>>>> Looking at YARN... you will need more memory too.
>>>>>>>> 
>>>>>>>> 
>>>>>>>> With respect to the hard drives...
>>>>>>>> 
>>>>>>>> The best recommendation is to keep the drives as JBOD and then use
>>>>>>>> 3x
>>>>>>>> replication.
>>>>>>>> In this case, make sure that the disk controller cards can handle
>>>>>>>> JBOD.
>>>>>>>> (Some don't support JBOD out of the box)
>>>>>>>> 
>>>>>>>> With respect to RAID...
>>>>>>>> 
>>>>>>>> If you are running MapR, no need for RAID.
>>>>>>>> If you are running an Apache derivative, you could use RAID 1. Then
>>>>>>>> cut
>>>>>>>> your replication to 2X. This makes it easier to manage drive
>>>>>>>> failures.
>>>>>>>> (Its not the norm, but it works...) In some clusters, they are using
>>>>>>>> appliances like Net App's e series where the machines see the drives
>>>>>>>> as
>>>>>>>> local attached storage and I think the appliances themselves are
>>>>>>>> using
>>>>>>>> RAID.  I haven't played with this configuration, however it could
>>>>>>>> make
>>>>>>>> sense and its a valid design.
>>>>>>>> 
>>>>>>>> HTH
>>>>>>>> 
>>>>>>>> -Mike
>>>>>>>> 
>>>>>>>> On Nov 28, 2012, at 10:33 AM, Jean-Marc Spaggiari
>>>>>>>> <je...@spaggiari.org>
>>>>>>>> wrote:
>>>>>>>> 
>>>>>>>>> Hi Mike,
>>>>>>>>> 
>>>>>>>>> Thanks for all those details!
>>>>>>>>> 
>>>>>>>>> So to simplify the equation, for 16 virtual cores we need 48 to
>>>>>>>>> 64GB.
>>>>>>>>> Which mean 3 to 4GB per core. So with quad cores, 12GB to 16GB are
>>>>>>>>> a
>>>>>>>>> good start? Or I simplified it to much?
>>>>>>>>> 
>>>>>>>>> Regarding the hard drives. If you add more than one drive, do you
>>>>>>>>> need
>>>>>>>>> to build them on RAID or similar systems? Or can Hadoop/HBase be
>>>>>>>>> configured to use more than one drive?
>>>>>>>>> 
>>>>>>>>> Thanks,
>>>>>>>>> 
>>>>>>>>> JM
>>>>>>>>> 
>>>>>>>>> 2012/11/27, Michael Segel <mi...@hotmail.com>:
>>>>>>>>>> 
>>>>>>>>>> OK... I don't know why Cloudera is so hung up on 32GB. ;-) [Its an
>>>>>>>> inside
>>>>>>>>>> joke ...]
>>>>>>>>>> 
>>>>>>>>>> So here's the problem...
>>>>>>>>>> 
>>>>>>>>>> By default, your child processes in a map/reduce job get a default
>>>>>>>> 512MB.
>>>>>>>>>> The majority of the time, this gets raised to 1GB.
>>>>>>>>>> 
>>>>>>>>>> 8 cores (dual quad cores) shows up at 16 virtual processors in
>>>>>>>>>> Linux.
>>>>>>>> (Note:
>>>>>>>>>> This is why when people talk about the number of cores, you have
>>>>>>>>>> to
>>>>>>>> specify
>>>>>>>>>> physical cores or logical cores....)
>>>>>>>>>> 
>>>>>>>>>> So if you were to over subscribe and have lets say 12  mappers and
>>>>>>>>>> 12
>>>>>>>>>> reducers, that's 24 slots. Which means that you would need 24GB of
>>>>>>>> memory
>>>>>>>>>> reserved just for the child processes. This would leave 8GB for
>>>>>>>>>> DN,
>>>>>>>>>> TT
>>>>>>>> and
>>>>>>>>>> the rest of the linux OS processes.
>>>>>>>>>> 
>>>>>>>>>> Can you live with that? Sure.
>>>>>>>>>> Now add in R, HBase, Impala, or some other set of tools on top of
>>>>>>>>>> the
>>>>>>>>>> cluster.
>>>>>>>>>> 
>>>>>>>>>> Ooops! Now you are in trouble because you will swap.
>>>>>>>>>> Also adding in R, you may want to bump up those child procs from
>>>>>>>>>> 1GB
>>>>>>>>>> to
>>>>>>>> 2
>>>>>>>>>> GB. That means the 24 slots would now require 48GB.  Now you have
>>>>>>>>>> swap
>>>>>>>> and
>>>>>>>>>> if that happens you will see HBase in a cascading failure.
>>>>>>>>>> 
>>>>>>>>>> So while you can do a rolling restart with the changed
>>>>>>>>>> configuration
>>>>>>>>>> (reducing the number of mappers and reducers) you end up with less
>>>>>>>>>> slots
>>>>>>>>>> which will mean in longer run time for your jobs. (Less slots ==
>>>>>>>>>> less
>>>>>>>>>> parallelism )
>>>>>>>>>> 
>>>>>>>>>> Looking at the price of memory... you can get 48GB or even 64GB
>>>>>>>>>> for
>>>>>>>> around
>>>>>>>>>> the same price point. (8GB chips)
>>>>>>>>>> 
>>>>>>>>>> And I didn't even talk about adding SOLR either again a memory
>>>>>>>>>> hog...
>>>>>>>> ;-)
>>>>>>>>>> 
>>>>>>>>>> Note that I matched the number of mappers w reducers. You could go
>>>>>>>>>> with
>>>>>>>>>> fewer reducers if you want. I tend to recommend a ratio of 2:1
>>>>>>>>>> mappers
>>>>>>>> to
>>>>>>>>>> reducers, depending on the work flow....
>>>>>>>>>> 
>>>>>>>>>> As to the disks... no 7200 SATA III drives are fine. SATA III
>>>>>>>>>> interface
>>>>>>>> is
>>>>>>>>>> pretty much available in the new kit being shipped.
>>>>>>>>>> Its just that you don't have enough drives. 8 cores should be 8
>>>>>>>> spindles if
>>>>>>>>>> available.
>>>>>>>>>> Otherwise you end up seeing your CPU load climb on wait states as
>>>>>>>>>> the
>>>>>>>>>> processes wait for the disk i/o to catch up.
>>>>>>>>>> 
>>>>>>>>>> I mean you could build out a cluster w 4 x 3 3.5" 2TB drives in a
>>>>>>>>>> 1
>>>>>>>>>> U
>>>>>>>>>> chassis based on price. You're making a trade off and you should
>>>>>>>>>> be
>>>>>>>> aware of
>>>>>>>>>> the performance hit you will take.
>>>>>>>>>> 
>>>>>>>>>> HTH
>>>>>>>>>> 
>>>>>>>>>> -Mike
>>>>>>>>>> 
>>>>>>>>>> On Nov 27, 2012, at 1:52 PM, Jean-Marc Spaggiari <
>>>>>>>> jean-marc@spaggiari.org>
>>>>>>>>>> wrote:
>>>>>>>>>> 
>>>>>>>>>>> Hi Michael,
>>>>>>>>>>> 
>>>>>>>>>>> so are you recommanding 32Gb per node?
>>>>>>>>>>> 
>>>>>>>>>>> What about the disks? SATA drives are to slow?
>>>>>>>>>>> 
>>>>>>>>>>> JM
>>>>>>>>>>> 
>>>>>>>>>>> 2012/11/26, Michael Segel <mi...@hotmail.com>:
>>>>>>>>>>>> Uhm, those specs are actually now out of date.
>>>>>>>>>>>> 
>>>>>>>>>>>> If you're running HBase, or want to also run R on top of Hadoop,
>>>>>>>>>>>> you
>>>>>>>>>>>> will
>>>>>>>>>>>> need to add more memory.
>>>>>>>>>>>> Also forget 1GBe got 10GBe,  and w 2 SATA drives, you will be
>>>>>>>>>>>> disk
>>>>>>>>>>>> i/o
>>>>>>>>>>>> bound
>>>>>>>>>>>> way too quickly.
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> On Nov 26, 2012, at 8:05 AM, Marcos Ortiz <ml...@uci.cu>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>> 
>>>>>>>>>>>>> Are you asking about hardware recommendations?
>>>>>>>>>>>>> Eric Sammer on his "Hadoop Operations" book, did a great job
>>>>>>>>>>>>> about
>>>>>>>>>>>>> this:
>>>>>>>>>>>>> For middle size clusters (until 300 nodes):
>>>>>>>>>>>>> Processor: A dual quad-core 2.6 Ghz
>>>>>>>>>>>>> RAM: 24 GB DDR3
>>>>>>>>>>>>> Dual 1 Gb Ethernet NICs
>>>>>>>>>>>>> a SAS drive controller
>>>>>>>>>>>>> at least two SATA II drives in a JBOD configuration
>>>>>>>>>>>>> 
>>>>>>>>>>>>> The replication factor depends heavily of the primary use of
>>>>>>>>>>>>> your
>>>>>>>>>>>>> cluster.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> On 11/26/2012 08:53 AM, David Charle wrote:
>>>>>>>>>>>>>> hi
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> what's the recommended nodes for NN, hmaster and zk nodes for
>>>>>>>>>>>>>> a
>>>>>>>> larger
>>>>>>>>>>>>>> cluster, lets say 50-100+
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> also, what would be the ideal replication factor for larger
>>>>>>>>>>>>>> clusters
>>>>>>>>>>>>>> when
>>>>>>>>>>>>>> u have 3-4 racks ?
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> --
>>>>>>>>>>>>>> David
>>>>>>>>>>>>>> 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS
>>>>>>>>>>>>>> CIENCIAS
>>>>>>>>>>>>>> INFORMATICAS...
>>>>>>>>>>>>>> CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> http://www.uci.cu
>>>>>>>>>>>>>> http://www.facebook.com/universidad.uci
>>>>>>>>>>>>>> http://www.flickr.com/photos/universidad_uci
>>>>>>>>>>>>> 
>>>>>>>>>>>>> --
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Marcos Luis Ortíz Valmaseda
>>>>>>>>>>>>> about.me/marcosortiz <http://about.me/marcosortiz>
>>>>>>>>>>>>> @marcosluis2186 <http://twitter.com/marcosluis2186>
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS
>>>>>>>>>>>>> CIENCIAS
>>>>>>>>>>>>> INFORMATICAS...
>>>>>>>>>>>>> CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION
>>>>>>>>>>>>> 
>>>>>>>>>>>>> http://www.uci.cu
>>>>>>>>>>>>> http://www.facebook.com/universidad.uci
>>>>>>>>>>>>> http://www.flickr.com/photos/universidad_uci
>>>>>>> 
>>>>>>> 
>>>>>>> --
>>>>>>> Adrien Mogenet
>>>>>>> 06.59.16.64.22
>>>>>>> http://www.mogenet.me
>>>>>> 
>>>>> 
>>>> 
>>>> 
>>> 
>> 
>> 
>

Re: recommended nodes

Posted by Jean-Marc Spaggiari <je...@spaggiari.org>.

Finally, it took me a while to run those tests because it was way
longer than expected, but here are the results:

http://www.spaggiari.org/bonnie.html

LVM is not really slower than JBOD and not really taking more CPU. So
I will say, if you have to choose between the 2, take the one you
prefer. Personally, I prefer LVM because it's easy to configure.

The big winner here is RAID0. It's WAY faster than anything else. But
it's using twice the space... Your choice.

I did not get a chance to test with the Ubuntu tool because it's not
working with LVM drives.

JM

2012/11/28, Michael Segel <mi...@hotmail.com>:
> Ok, just a caveat.
>
> I am discussing MapR as part of a complete response. As Mohit posted MapR
> takes the raw device for their MapR File System.
> They do stripe on their own within what they call a volume.
>
> But going back to Apache...
> You can stripe drives, however I wouldn't recommend it. I don't think the
> performance gains would really matter.
> You're going to end up getting blocked first by disk i/o, then your
> controller card, then your network... assuming 10GBe.
>
> With only 2 disks on an 8 core system, you will hit disk i/o first and then
> you'll watch your CPU Wait I/O climb.
>
> HTH
>
> -Mike
>
> On Nov 28, 2012, at 7:28 PM, Jean-Marc Spaggiari <je...@spaggiari.org>
> wrote:
>
>> Hi Mike,
>>
>> Why not using LVM with MapR? Since LVM is reading from 2 drives almost
>> at the same time, it should be better than RAID0 or a single drive,
>> no?
>>
>> 2012/11/28, Michael Segel <mi...@hotmail.com>:
>>> Just a couple of things.
>>>
>>> I'm neutral on the use of LVMs. Some would point out that there's some
>>> overhead, but on the flip side, it can make managing the machines
>>> easier.
>>> If you're using MapR, you don't want to use LVMs but raw devices.
>>>
>>> In terms of GC, its going to depend on the heap size and not the total
>>> memory. With respect to HBase. ... MSLABS is the way to go.
>>>
>>>
>>> On Nov 28, 2012, at 12:05 PM, Jean-Marc Spaggiari
>>> <je...@spaggiari.org>
>>> wrote:
>>>
>>>> Hi Gregory,
>>>>
>>>> I founs this about LVM:
>>>> -> http://blog.andrew.net.au/2006/08/09
>>>> ->
>>>> http://www.phoronix.com/scan.php?page=article&item=fedora_15_lvm&num=2
>>>>
>>>> Seems that performances are still correct with it. I will most
>>>> probably give it a try and bench that too... I have one new hard drive
>>>> which should arrived tomorrow. Perfect timing ;)
>>>>
>>>>
>>>>
>>>> JM
>>>>
>>>> 2012/11/28, Mohit Anchlia <mo...@gmail.com>:
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Nov 28, 2012, at 9:07 AM, Adrien Mogenet <ad...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Does HBase really benefit from 64 GB of RAM since allocating too
>>>>>> large
>>>>>> heap
>>>>>> might increase GC time ?
>>>>>>
>>>>> Benefit you get is from OS cache
>>>>>> Another question : why not RAID 0, in order to aggregate disk
>>>>>> bandwidth
>>>>>> ?
>>>>>> (and thus keep 3x replication factor)
>>>>>>
>>>>>>
>>>>>> On Wed, Nov 28, 2012 at 5:58 PM, Michael Segel
>>>>>> <mi...@hotmail.com>wrote:
>>>>>>
>>>>>>> Sorry,
>>>>>>>
>>>>>>> I need to clarify.
>>>>>>>
>>>>>>> 4GB per physical core is a good starting point.
>>>>>>> So with 2 quad core chips, that is going to be 32GB.
>>>>>>>
>>>>>>> IMHO that's a minimum. If you go with HBase, you will want more.
>>>>>>> (Actually
>>>>>>> you will need more.) The next logical jump would be to 48 or 64GB.
>>>>>>>
>>>>>>> If we start to price out memory, depending on vendor, your company's
>>>>>>> procurement,  there really isn't much of a price difference in terms
>>>>>>> of
>>>>>>> 32,48, or 64 GB.
>>>>>>> Note that it also depends on the chips themselves. Also you need to
>>>>>>> see
>>>>>>> how many memory channels exist in the mother board. You may need to
>>>>>>> buy
>>>>>>> in
>>>>>>> pairs or triplets. Your hardware vendor can help you. (Also you need
>>>>>>> to
>>>>>>> keep an eye on your hardware vendor. Sometimes they will give you
>>>>>>> higher
>>>>>>> density chips that are going to be more expensive...) ;-)
>>>>>>>
>>>>>>> I tend to like having extra memory from the start.
>>>>>>> It gives you a bit more freedom and also protects you from 'fat'
>>>>>>> code.
>>>>>>>
>>>>>>> Looking at YARN... you will need more memory too.
>>>>>>>
>>>>>>>
>>>>>>> With respect to the hard drives...
>>>>>>>
>>>>>>> The best recommendation is to keep the drives as JBOD and then use
>>>>>>> 3x
>>>>>>> replication.
>>>>>>> In this case, make sure that the disk controller cards can handle
>>>>>>> JBOD.
>>>>>>> (Some don't support JBOD out of the box)
>>>>>>>
>>>>>>> With respect to RAID...
>>>>>>>
>>>>>>> If you are running MapR, no need for RAID.
>>>>>>> If you are running an Apache derivative, you could use RAID 1. Then
>>>>>>> cut
>>>>>>> your replication to 2X. This makes it easier to manage drive
>>>>>>> failures.
>>>>>>> (Its not the norm, but it works...) In some clusters, they are using
>>>>>>> appliances like Net App's e series where the machines see the drives
>>>>>>> as
>>>>>>> local attached storage and I think the appliances themselves are
>>>>>>> using
>>>>>>> RAID.  I haven't played with this configuration, however it could
>>>>>>> make
>>>>>>> sense and its a valid design.
>>>>>>>
>>>>>>> HTH
>>>>>>>
>>>>>>> -Mike
>>>>>>>
>>>>>>> On Nov 28, 2012, at 10:33 AM, Jean-Marc Spaggiari
>>>>>>> <je...@spaggiari.org>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Hi Mike,
>>>>>>>>
>>>>>>>> Thanks for all those details!
>>>>>>>>
>>>>>>>> So to simplify the equation, for 16 virtual cores we need 48 to
>>>>>>>> 64GB.
>>>>>>>> Which mean 3 to 4GB per core. So with quad cores, 12GB to 16GB are
>>>>>>>> a
>>>>>>>> good start? Or I simplified it to much?
>>>>>>>>
>>>>>>>> Regarding the hard drives. If you add more than one drive, do you
>>>>>>>> need
>>>>>>>> to build them on RAID or similar systems? Or can Hadoop/HBase be
>>>>>>>> configured to use more than one drive?
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>>
>>>>>>>> JM
>>>>>>>>
>>>>>>>> 2012/11/27, Michael Segel <mi...@hotmail.com>:
>>>>>>>>>
>>>>>>>>> OK... I don't know why Cloudera is so hung up on 32GB. ;-) [Its an
>>>>>>> inside
>>>>>>>>> joke ...]
>>>>>>>>>
>>>>>>>>> So here's the problem...
>>>>>>>>>
>>>>>>>>> By default, your child processes in a map/reduce job get a default
>>>>>>> 512MB.
>>>>>>>>> The majority of the time, this gets raised to 1GB.
>>>>>>>>>
>>>>>>>>> 8 cores (dual quad cores) shows up at 16 virtual processors in
>>>>>>>>> Linux.
>>>>>>> (Note:
>>>>>>>>> This is why when people talk about the number of cores, you have
>>>>>>>>> to
>>>>>>> specify
>>>>>>>>> physical cores or logical cores....)
>>>>>>>>>
>>>>>>>>> So if you were to over subscribe and have lets say 12  mappers and
>>>>>>>>> 12
>>>>>>>>> reducers, that's 24 slots. Which means that you would need 24GB of
>>>>>>> memory
>>>>>>>>> reserved just for the child processes. This would leave 8GB for
>>>>>>>>> DN,
>>>>>>>>> TT
>>>>>>> and
>>>>>>>>> the rest of the linux OS processes.
>>>>>>>>>
>>>>>>>>> Can you live with that? Sure.
>>>>>>>>> Now add in R, HBase, Impala, or some other set of tools on top of
>>>>>>>>> the
>>>>>>>>> cluster.
>>>>>>>>>
>>>>>>>>> Ooops! Now you are in trouble because you will swap.
>>>>>>>>> Also adding in R, you may want to bump up those child procs from
>>>>>>>>> 1GB
>>>>>>>>> to
>>>>>>> 2
>>>>>>>>> GB. That means the 24 slots would now require 48GB.  Now you have
>>>>>>>>> swap
>>>>>>> and
>>>>>>>>> if that happens you will see HBase in a cascading failure.
>>>>>>>>>
>>>>>>>>> So while you can do a rolling restart with the changed
>>>>>>>>> configuration
>>>>>>>>> (reducing the number of mappers and reducers) you end up with less
>>>>>>>>> slots
>>>>>>>>> which will mean in longer run time for your jobs. (Less slots ==
>>>>>>>>> less
>>>>>>>>> parallelism )
>>>>>>>>>
>>>>>>>>> Looking at the price of memory... you can get 48GB or even 64GB
>>>>>>>>> for
>>>>>>> around
>>>>>>>>> the same price point. (8GB chips)
>>>>>>>>>
>>>>>>>>> And I didn't even talk about adding SOLR either again a memory
>>>>>>>>> hog...
>>>>>>> ;-)
>>>>>>>>>
>>>>>>>>> Note that I matched the number of mappers w reducers. You could go
>>>>>>>>> with
>>>>>>>>> fewer reducers if you want. I tend to recommend a ratio of 2:1
>>>>>>>>> mappers
>>>>>>> to
>>>>>>>>> reducers, depending on the work flow....
>>>>>>>>>
>>>>>>>>> As to the disks... no 7200 SATA III drives are fine. SATA III
>>>>>>>>> interface
>>>>>>> is
>>>>>>>>> pretty much available in the new kit being shipped.
>>>>>>>>> Its just that you don't have enough drives. 8 cores should be 8
>>>>>>> spindles if
>>>>>>>>> available.
>>>>>>>>> Otherwise you end up seeing your CPU load climb on wait states as
>>>>>>>>> the
>>>>>>>>> processes wait for the disk i/o to catch up.
>>>>>>>>>
>>>>>>>>> I mean you could build out a cluster w 4 x 3 3.5" 2TB drives in a
>>>>>>>>> 1
>>>>>>>>> U
>>>>>>>>> chassis based on price. You're making a trade off and you should
>>>>>>>>> be
>>>>>>> aware of
>>>>>>>>> the performance hit you will take.
>>>>>>>>>
>>>>>>>>> HTH
>>>>>>>>>
>>>>>>>>> -Mike
>>>>>>>>>
>>>>>>>>> On Nov 27, 2012, at 1:52 PM, Jean-Marc Spaggiari <
>>>>>>> jean-marc@spaggiari.org>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Hi Michael,
>>>>>>>>>>
>>>>>>>>>> so are you recommanding 32Gb per node?
>>>>>>>>>>
>>>>>>>>>> What about the disks? SATA drives are to slow?
>>>>>>>>>>
>>>>>>>>>> JM
>>>>>>>>>>
>>>>>>>>>> 2012/11/26, Michael Segel <mi...@hotmail.com>:
>>>>>>>>>>> Uhm, those specs are actually now out of date.
>>>>>>>>>>>
>>>>>>>>>>> If you're running HBase, or want to also run R on top of Hadoop,
>>>>>>>>>>> you
>>>>>>>>>>> will
>>>>>>>>>>> need to add more memory.
>>>>>>>>>>> Also forget 1GBe got 10GBe,  and w 2 SATA drives, you will be
>>>>>>>>>>> disk
>>>>>>>>>>> i/o
>>>>>>>>>>> bound
>>>>>>>>>>> way too quickly.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Nov 26, 2012, at 8:05 AM, Marcos Ortiz <ml...@uci.cu>
>>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Are you asking about hardware recommendations?
>>>>>>>>>>>> Eric Sammer on his "Hadoop Operations" book, did a great job
>>>>>>>>>>>> about
>>>>>>>>>>>> this:
>>>>>>>>>>>> For middle size clusters (until 300 nodes):
>>>>>>>>>>>> Processor: A dual quad-core 2.6 Ghz
>>>>>>>>>>>> RAM: 24 GB DDR3
>>>>>>>>>>>> Dual 1 Gb Ethernet NICs
>>>>>>>>>>>> a SAS drive controller
>>>>>>>>>>>> at least two SATA II drives in a JBOD configuration
>>>>>>>>>>>>
>>>>>>>>>>>> The replication factor depends heavily of the primary use of
>>>>>>>>>>>> your
>>>>>>>>>>>> cluster.
>>>>>>>>>>>>
>>>>>>>>>>>> On 11/26/2012 08:53 AM, David Charle wrote:
>>>>>>>>>>>>> hi
>>>>>>>>>>>>>
>>>>>>>>>>>>> what's the recommended nodes for NN, hmaster and zk nodes for
>>>>>>>>>>>>> a
>>>>>>> larger
>>>>>>>>>>>>> cluster, lets say 50-100+
>>>>>>>>>>>>>
>>>>>>>>>>>>> also, what would be the ideal replication factor for larger
>>>>>>>>>>>>> clusters
>>>>>>>>>>>>> when
>>>>>>>>>>>>> u have 3-4 racks ?
>>>>>>>>>>>>>
>>>>>>>>>>>>> --
>>>>>>>>>>>>> David
>>>>>>>>>>>>> 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS
>>>>>>>>>>>>> CIENCIAS
>>>>>>>>>>>>> INFORMATICAS...
>>>>>>>>>>>>> CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION
>>>>>>>>>>>>>
>>>>>>>>>>>>> http://www.uci.cu
>>>>>>>>>>>>> http://www.facebook.com/universidad.uci
>>>>>>>>>>>>> http://www.flickr.com/photos/universidad_uci
>>>>>>>>>>>>
>>>>>>>>>>>> --
>>>>>>>>>>>>
>>>>>>>>>>>> Marcos Luis Ortíz Valmaseda
>>>>>>>>>>>> about.me/marcosortiz <http://about.me/marcosortiz>
>>>>>>>>>>>> @marcosluis2186 <http://twitter.com/marcosluis2186>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS
>>>>>>>>>>>> CIENCIAS
>>>>>>>>>>>> INFORMATICAS...
>>>>>>>>>>>> CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION
>>>>>>>>>>>>
>>>>>>>>>>>> http://www.uci.cu
>>>>>>>>>>>> http://www.facebook.com/universidad.uci
>>>>>>>>>>>> http://www.flickr.com/photos/universidad_uci
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Adrien Mogenet
>>>>>> 06.59.16.64.22
>>>>>> http://www.mogenet.me
>>>>>
>>>>
>>>
>>>
>>
>
>

Re: recommended nodes

Posted by Mohit Anchlia <mo...@gmail.com>.

MApr has its own concept of storage pools and stripe width

Sent from my iPhone

On Nov 28, 2012, at 5:28 PM, Jean-Marc Spaggiari <je...@spaggiari.org> wrote:

> Hi Mike,
> 
> Why not using LVM with MapR? Since LVM is reading from 2 drives almost
> at the same time, it should be better than RAID0 or a single drive,
> no?
> 
> 2012/11/28, Michael Segel <mi...@hotmail.com>:
>> Just a couple of things.
>> 
>> I'm neutral on the use of LVMs. Some would point out that there's some
>> overhead, but on the flip side, it can make managing the machines easier.
>> If you're using MapR, you don't want to use LVMs but raw devices.
>> 
>> In terms of GC, its going to depend on the heap size and not the total
>> memory. With respect to HBase. ... MSLABS is the way to go.
>> 
>> 
>> On Nov 28, 2012, at 12:05 PM, Jean-Marc Spaggiari <je...@spaggiari.org>
>> wrote:
>> 
>>> Hi Gregory,
>>> 
>>> I founs this about LVM:
>>> -> http://blog.andrew.net.au/2006/08/09
>>> -> http://www.phoronix.com/scan.php?page=article&item=fedora_15_lvm&num=2
>>> 
>>> Seems that performances are still correct with it. I will most
>>> probably give it a try and bench that too... I have one new hard drive
>>> which should arrived tomorrow. Perfect timing ;)
>>> 
>>> 
>>> 
>>> JM
>>> 
>>> 2012/11/28, Mohit Anchlia <mo...@gmail.com>:
>>>> 
>>>> 
>>>> 
>>>> 
>>>> On Nov 28, 2012, at 9:07 AM, Adrien Mogenet <ad...@gmail.com>
>>>> wrote:
>>>> 
>>>>> Does HBase really benefit from 64 GB of RAM since allocating too large
>>>>> heap
>>>>> might increase GC time ?
>>>> Benefit you get is from OS cache
>>>>> Another question : why not RAID 0, in order to aggregate disk bandwidth
>>>>> ?
>>>>> (and thus keep 3x replication factor)
>>>>> 
>>>>> 
>>>>> On Wed, Nov 28, 2012 at 5:58 PM, Michael Segel
>>>>> <mi...@hotmail.com>wrote:
>>>>> 
>>>>>> Sorry,
>>>>>> 
>>>>>> I need to clarify.
>>>>>> 
>>>>>> 4GB per physical core is a good starting point.
>>>>>> So with 2 quad core chips, that is going to be 32GB.
>>>>>> 
>>>>>> IMHO that's a minimum. If you go with HBase, you will want more.
>>>>>> (Actually
>>>>>> you will need more.) The next logical jump would be to 48 or 64GB.
>>>>>> 
>>>>>> If we start to price out memory, depending on vendor, your company's
>>>>>> procurement,  there really isn't much of a price difference in terms
>>>>>> of
>>>>>> 32,48, or 64 GB.
>>>>>> Note that it also depends on the chips themselves. Also you need to
>>>>>> see
>>>>>> how many memory channels exist in the mother board. You may need to
>>>>>> buy
>>>>>> in
>>>>>> pairs or triplets. Your hardware vendor can help you. (Also you need
>>>>>> to
>>>>>> keep an eye on your hardware vendor. Sometimes they will give you
>>>>>> higher
>>>>>> density chips that are going to be more expensive...) ;-)
>>>>>> 
>>>>>> I tend to like having extra memory from the start.
>>>>>> It gives you a bit more freedom and also protects you from 'fat' code.
>>>>>> 
>>>>>> Looking at YARN... you will need more memory too.
>>>>>> 
>>>>>> 
>>>>>> With respect to the hard drives...
>>>>>> 
>>>>>> The best recommendation is to keep the drives as JBOD and then use 3x
>>>>>> replication.
>>>>>> In this case, make sure that the disk controller cards can handle
>>>>>> JBOD.
>>>>>> (Some don't support JBOD out of the box)
>>>>>> 
>>>>>> With respect to RAID...
>>>>>> 
>>>>>> If you are running MapR, no need for RAID.
>>>>>> If you are running an Apache derivative, you could use RAID 1. Then
>>>>>> cut
>>>>>> your replication to 2X. This makes it easier to manage drive failures.
>>>>>> (Its not the norm, but it works...) In some clusters, they are using
>>>>>> appliances like Net App's e series where the machines see the drives
>>>>>> as
>>>>>> local attached storage and I think the appliances themselves are using
>>>>>> RAID.  I haven't played with this configuration, however it could make
>>>>>> sense and its a valid design.
>>>>>> 
>>>>>> HTH
>>>>>> 
>>>>>> -Mike
>>>>>> 
>>>>>> On Nov 28, 2012, at 10:33 AM, Jean-Marc Spaggiari
>>>>>> <je...@spaggiari.org>
>>>>>> wrote:
>>>>>> 
>>>>>>> Hi Mike,
>>>>>>> 
>>>>>>> Thanks for all those details!
>>>>>>> 
>>>>>>> So to simplify the equation, for 16 virtual cores we need 48 to 64GB.
>>>>>>> Which mean 3 to 4GB per core. So with quad cores, 12GB to 16GB are a
>>>>>>> good start? Or I simplified it to much?
>>>>>>> 
>>>>>>> Regarding the hard drives. If you add more than one drive, do you
>>>>>>> need
>>>>>>> to build them on RAID or similar systems? Or can Hadoop/HBase be
>>>>>>> configured to use more than one drive?
>>>>>>> 
>>>>>>> Thanks,
>>>>>>> 
>>>>>>> JM
>>>>>>> 
>>>>>>> 2012/11/27, Michael Segel <mi...@hotmail.com>:
>>>>>>>> 
>>>>>>>> OK... I don't know why Cloudera is so hung up on 32GB. ;-) [Its an
>>>>>> inside
>>>>>>>> joke ...]
>>>>>>>> 
>>>>>>>> So here's the problem...
>>>>>>>> 
>>>>>>>> By default, your child processes in a map/reduce job get a default
>>>>>> 512MB.
>>>>>>>> The majority of the time, this gets raised to 1GB.
>>>>>>>> 
>>>>>>>> 8 cores (dual quad cores) shows up at 16 virtual processors in
>>>>>>>> Linux.
>>>>>> (Note:
>>>>>>>> This is why when people talk about the number of cores, you have to
>>>>>> specify
>>>>>>>> physical cores or logical cores....)
>>>>>>>> 
>>>>>>>> So if you were to over subscribe and have lets say 12  mappers and
>>>>>>>> 12
>>>>>>>> reducers, that's 24 slots. Which means that you would need 24GB of
>>>>>> memory
>>>>>>>> reserved just for the child processes. This would leave 8GB for DN,
>>>>>>>> TT
>>>>>> and
>>>>>>>> the rest of the linux OS processes.
>>>>>>>> 
>>>>>>>> Can you live with that? Sure.
>>>>>>>> Now add in R, HBase, Impala, or some other set of tools on top of
>>>>>>>> the
>>>>>>>> cluster.
>>>>>>>> 
>>>>>>>> Ooops! Now you are in trouble because you will swap.
>>>>>>>> Also adding in R, you may want to bump up those child procs from 1GB
>>>>>>>> to
>>>>>> 2
>>>>>>>> GB. That means the 24 slots would now require 48GB.  Now you have
>>>>>>>> swap
>>>>>> and
>>>>>>>> if that happens you will see HBase in a cascading failure.
>>>>>>>> 
>>>>>>>> So while you can do a rolling restart with the changed configuration
>>>>>>>> (reducing the number of mappers and reducers) you end up with less
>>>>>>>> slots
>>>>>>>> which will mean in longer run time for your jobs. (Less slots ==
>>>>>>>> less
>>>>>>>> parallelism )
>>>>>>>> 
>>>>>>>> Looking at the price of memory... you can get 48GB or even 64GB  for
>>>>>> around
>>>>>>>> the same price point. (8GB chips)
>>>>>>>> 
>>>>>>>> And I didn't even talk about adding SOLR either again a memory
>>>>>>>> hog...
>>>>>> ;-)
>>>>>>>> 
>>>>>>>> Note that I matched the number of mappers w reducers. You could go
>>>>>>>> with
>>>>>>>> fewer reducers if you want. I tend to recommend a ratio of 2:1
>>>>>>>> mappers
>>>>>> to
>>>>>>>> reducers, depending on the work flow....
>>>>>>>> 
>>>>>>>> As to the disks... no 7200 SATA III drives are fine. SATA III
>>>>>>>> interface
>>>>>> is
>>>>>>>> pretty much available in the new kit being shipped.
>>>>>>>> Its just that you don't have enough drives. 8 cores should be 8
>>>>>> spindles if
>>>>>>>> available.
>>>>>>>> Otherwise you end up seeing your CPU load climb on wait states as
>>>>>>>> the
>>>>>>>> processes wait for the disk i/o to catch up.
>>>>>>>> 
>>>>>>>> I mean you could build out a cluster w 4 x 3 3.5" 2TB drives in a 1
>>>>>>>> U
>>>>>>>> chassis based on price. You're making a trade off and you should be
>>>>>> aware of
>>>>>>>> the performance hit you will take.
>>>>>>>> 
>>>>>>>> HTH
>>>>>>>> 
>>>>>>>> -Mike
>>>>>>>> 
>>>>>>>> On Nov 27, 2012, at 1:52 PM, Jean-Marc Spaggiari <
>>>>>> jean-marc@spaggiari.org>
>>>>>>>> wrote:
>>>>>>>> 
>>>>>>>>> Hi Michael,
>>>>>>>>> 
>>>>>>>>> so are you recommanding 32Gb per node?
>>>>>>>>> 
>>>>>>>>> What about the disks? SATA drives are to slow?
>>>>>>>>> 
>>>>>>>>> JM
>>>>>>>>> 
>>>>>>>>> 2012/11/26, Michael Segel <mi...@hotmail.com>:
>>>>>>>>>> Uhm, those specs are actually now out of date.
>>>>>>>>>> 
>>>>>>>>>> If you're running HBase, or want to also run R on top of Hadoop,
>>>>>>>>>> you
>>>>>>>>>> will
>>>>>>>>>> need to add more memory.
>>>>>>>>>> Also forget 1GBe got 10GBe,  and w 2 SATA drives, you will be disk
>>>>>>>>>> i/o
>>>>>>>>>> bound
>>>>>>>>>> way too quickly.
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> On Nov 26, 2012, at 8:05 AM, Marcos Ortiz <ml...@uci.cu> wrote:
>>>>>>>>>> 
>>>>>>>>>>> Are you asking about hardware recommendations?
>>>>>>>>>>> Eric Sammer on his "Hadoop Operations" book, did a great job
>>>>>>>>>>> about
>>>>>>>>>>> this:
>>>>>>>>>>> For middle size clusters (until 300 nodes):
>>>>>>>>>>> Processor: A dual quad-core 2.6 Ghz
>>>>>>>>>>> RAM: 24 GB DDR3
>>>>>>>>>>> Dual 1 Gb Ethernet NICs
>>>>>>>>>>> a SAS drive controller
>>>>>>>>>>> at least two SATA II drives in a JBOD configuration
>>>>>>>>>>> 
>>>>>>>>>>> The replication factor depends heavily of the primary use of your
>>>>>>>>>>> cluster.
>>>>>>>>>>> 
>>>>>>>>>>> On 11/26/2012 08:53 AM, David Charle wrote:
>>>>>>>>>>>> hi
>>>>>>>>>>>> 
>>>>>>>>>>>> what's the recommended nodes for NN, hmaster and zk nodes for a
>>>>>> larger
>>>>>>>>>>>> cluster, lets say 50-100+
>>>>>>>>>>>> 
>>>>>>>>>>>> also, what would be the ideal replication factor for larger
>>>>>>>>>>>> clusters
>>>>>>>>>>>> when
>>>>>>>>>>>> u have 3-4 racks ?
>>>>>>>>>>>> 
>>>>>>>>>>>> --
>>>>>>>>>>>> David
>>>>>>>>>>>> 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS
>>>>>>>>>>>> CIENCIAS
>>>>>>>>>>>> INFORMATICAS...
>>>>>>>>>>>> CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION
>>>>>>>>>>>> 
>>>>>>>>>>>> http://www.uci.cu
>>>>>>>>>>>> http://www.facebook.com/universidad.uci
>>>>>>>>>>>> http://www.flickr.com/photos/universidad_uci
>>>>>>>>>>> 
>>>>>>>>>>> --
>>>>>>>>>>> 
>>>>>>>>>>> Marcos Luis Ortíz Valmaseda
>>>>>>>>>>> about.me/marcosortiz <http://about.me/marcosortiz>
>>>>>>>>>>> @marcosluis2186 <http://twitter.com/marcosluis2186>
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS
>>>>>>>>>>> CIENCIAS
>>>>>>>>>>> INFORMATICAS...
>>>>>>>>>>> CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION
>>>>>>>>>>> 
>>>>>>>>>>> http://www.uci.cu
>>>>>>>>>>> http://www.facebook.com/universidad.uci
>>>>>>>>>>> http://www.flickr.com/photos/universidad_uci
>>>>> 
>>>>> 
>>>>> --
>>>>> Adrien Mogenet
>>>>> 06.59.16.64.22
>>>>> http://www.mogenet.me
>> 
>>

Re: recommended nodes

Posted by Michael Segel <mi...@hotmail.com>.

Ok, just a caveat.

I am discussing MapR as part of a complete response. As Mohit posted MapR takes the raw device for their MapR File System. 
They do stripe on their own within what they call a volume. 

But going back to Apache... 
You can stripe drives, however I wouldn't recommend it. I don't think the performance gains would really matter. 
You're going to end up getting blocked first by disk i/o, then your controller card, then your network... assuming 10GBe. 

With only 2 disks on an 8 core system, you will hit disk i/o first and then you'll watch your CPU Wait I/O climb. 

HTH

-Mike

On Nov 28, 2012, at 7:28 PM, Jean-Marc Spaggiari <je...@spaggiari.org> wrote:

> Hi Mike,
> 
> Why not using LVM with MapR? Since LVM is reading from 2 drives almost
> at the same time, it should be better than RAID0 or a single drive,
> no?
> 
> 2012/11/28, Michael Segel <mi...@hotmail.com>:
>> Just a couple of things.
>> 
>> I'm neutral on the use of LVMs. Some would point out that there's some
>> overhead, but on the flip side, it can make managing the machines easier.
>> If you're using MapR, you don't want to use LVMs but raw devices.
>> 
>> In terms of GC, its going to depend on the heap size and not the total
>> memory. With respect to HBase. ... MSLABS is the way to go.
>> 
>> 
>> On Nov 28, 2012, at 12:05 PM, Jean-Marc Spaggiari <je...@spaggiari.org>
>> wrote:
>> 
>>> Hi Gregory,
>>> 
>>> I founs this about LVM:
>>> -> http://blog.andrew.net.au/2006/08/09
>>> -> http://www.phoronix.com/scan.php?page=article&item=fedora_15_lvm&num=2
>>> 
>>> Seems that performances are still correct with it. I will most
>>> probably give it a try and bench that too... I have one new hard drive
>>> which should arrived tomorrow. Perfect timing ;)
>>> 
>>> 
>>> 
>>> JM
>>> 
>>> 2012/11/28, Mohit Anchlia <mo...@gmail.com>:
>>>> 
>>>> 
>>>> 
>>>> 
>>>> On Nov 28, 2012, at 9:07 AM, Adrien Mogenet <ad...@gmail.com>
>>>> wrote:
>>>> 
>>>>> Does HBase really benefit from 64 GB of RAM since allocating too large
>>>>> heap
>>>>> might increase GC time ?
>>>>> 
>>>> Benefit you get is from OS cache
>>>>> Another question : why not RAID 0, in order to aggregate disk bandwidth
>>>>> ?
>>>>> (and thus keep 3x replication factor)
>>>>> 
>>>>> 
>>>>> On Wed, Nov 28, 2012 at 5:58 PM, Michael Segel
>>>>> <mi...@hotmail.com>wrote:
>>>>> 
>>>>>> Sorry,
>>>>>> 
>>>>>> I need to clarify.
>>>>>> 
>>>>>> 4GB per physical core is a good starting point.
>>>>>> So with 2 quad core chips, that is going to be 32GB.
>>>>>> 
>>>>>> IMHO that's a minimum. If you go with HBase, you will want more.
>>>>>> (Actually
>>>>>> you will need more.) The next logical jump would be to 48 or 64GB.
>>>>>> 
>>>>>> If we start to price out memory, depending on vendor, your company's
>>>>>> procurement,  there really isn't much of a price difference in terms
>>>>>> of
>>>>>> 32,48, or 64 GB.
>>>>>> Note that it also depends on the chips themselves. Also you need to
>>>>>> see
>>>>>> how many memory channels exist in the mother board. You may need to
>>>>>> buy
>>>>>> in
>>>>>> pairs or triplets. Your hardware vendor can help you. (Also you need
>>>>>> to
>>>>>> keep an eye on your hardware vendor. Sometimes they will give you
>>>>>> higher
>>>>>> density chips that are going to be more expensive...) ;-)
>>>>>> 
>>>>>> I tend to like having extra memory from the start.
>>>>>> It gives you a bit more freedom and also protects you from 'fat' code.
>>>>>> 
>>>>>> Looking at YARN... you will need more memory too.
>>>>>> 
>>>>>> 
>>>>>> With respect to the hard drives...
>>>>>> 
>>>>>> The best recommendation is to keep the drives as JBOD and then use 3x
>>>>>> replication.
>>>>>> In this case, make sure that the disk controller cards can handle
>>>>>> JBOD.
>>>>>> (Some don't support JBOD out of the box)
>>>>>> 
>>>>>> With respect to RAID...
>>>>>> 
>>>>>> If you are running MapR, no need for RAID.
>>>>>> If you are running an Apache derivative, you could use RAID 1. Then
>>>>>> cut
>>>>>> your replication to 2X. This makes it easier to manage drive failures.
>>>>>> (Its not the norm, but it works...) In some clusters, they are using
>>>>>> appliances like Net App's e series where the machines see the drives
>>>>>> as
>>>>>> local attached storage and I think the appliances themselves are using
>>>>>> RAID.  I haven't played with this configuration, however it could make
>>>>>> sense and its a valid design.
>>>>>> 
>>>>>> HTH
>>>>>> 
>>>>>> -Mike
>>>>>> 
>>>>>> On Nov 28, 2012, at 10:33 AM, Jean-Marc Spaggiari
>>>>>> <je...@spaggiari.org>
>>>>>> wrote:
>>>>>> 
>>>>>>> Hi Mike,
>>>>>>> 
>>>>>>> Thanks for all those details!
>>>>>>> 
>>>>>>> So to simplify the equation, for 16 virtual cores we need 48 to 64GB.
>>>>>>> Which mean 3 to 4GB per core. So with quad cores, 12GB to 16GB are a
>>>>>>> good start? Or I simplified it to much?
>>>>>>> 
>>>>>>> Regarding the hard drives. If you add more than one drive, do you
>>>>>>> need
>>>>>>> to build them on RAID or similar systems? Or can Hadoop/HBase be
>>>>>>> configured to use more than one drive?
>>>>>>> 
>>>>>>> Thanks,
>>>>>>> 
>>>>>>> JM
>>>>>>> 
>>>>>>> 2012/11/27, Michael Segel <mi...@hotmail.com>:
>>>>>>>> 
>>>>>>>> OK... I don't know why Cloudera is so hung up on 32GB. ;-) [Its an
>>>>>> inside
>>>>>>>> joke ...]
>>>>>>>> 
>>>>>>>> So here's the problem...
>>>>>>>> 
>>>>>>>> By default, your child processes in a map/reduce job get a default
>>>>>> 512MB.
>>>>>>>> The majority of the time, this gets raised to 1GB.
>>>>>>>> 
>>>>>>>> 8 cores (dual quad cores) shows up at 16 virtual processors in
>>>>>>>> Linux.
>>>>>> (Note:
>>>>>>>> This is why when people talk about the number of cores, you have to
>>>>>> specify
>>>>>>>> physical cores or logical cores....)
>>>>>>>> 
>>>>>>>> So if you were to over subscribe and have lets say 12  mappers and
>>>>>>>> 12
>>>>>>>> reducers, that's 24 slots. Which means that you would need 24GB of
>>>>>> memory
>>>>>>>> reserved just for the child processes. This would leave 8GB for DN,
>>>>>>>> TT
>>>>>> and
>>>>>>>> the rest of the linux OS processes.
>>>>>>>> 
>>>>>>>> Can you live with that? Sure.
>>>>>>>> Now add in R, HBase, Impala, or some other set of tools on top of
>>>>>>>> the
>>>>>>>> cluster.
>>>>>>>> 
>>>>>>>> Ooops! Now you are in trouble because you will swap.
>>>>>>>> Also adding in R, you may want to bump up those child procs from 1GB
>>>>>>>> to
>>>>>> 2
>>>>>>>> GB. That means the 24 slots would now require 48GB.  Now you have
>>>>>>>> swap
>>>>>> and
>>>>>>>> if that happens you will see HBase in a cascading failure.
>>>>>>>> 
>>>>>>>> So while you can do a rolling restart with the changed configuration
>>>>>>>> (reducing the number of mappers and reducers) you end up with less
>>>>>>>> slots
>>>>>>>> which will mean in longer run time for your jobs. (Less slots ==
>>>>>>>> less
>>>>>>>> parallelism )
>>>>>>>> 
>>>>>>>> Looking at the price of memory... you can get 48GB or even 64GB  for
>>>>>> around
>>>>>>>> the same price point. (8GB chips)
>>>>>>>> 
>>>>>>>> And I didn't even talk about adding SOLR either again a memory
>>>>>>>> hog...
>>>>>> ;-)
>>>>>>>> 
>>>>>>>> Note that I matched the number of mappers w reducers. You could go
>>>>>>>> with
>>>>>>>> fewer reducers if you want. I tend to recommend a ratio of 2:1
>>>>>>>> mappers
>>>>>> to
>>>>>>>> reducers, depending on the work flow....
>>>>>>>> 
>>>>>>>> As to the disks... no 7200 SATA III drives are fine. SATA III
>>>>>>>> interface
>>>>>> is
>>>>>>>> pretty much available in the new kit being shipped.
>>>>>>>> Its just that you don't have enough drives. 8 cores should be 8
>>>>>> spindles if
>>>>>>>> available.
>>>>>>>> Otherwise you end up seeing your CPU load climb on wait states as
>>>>>>>> the
>>>>>>>> processes wait for the disk i/o to catch up.
>>>>>>>> 
>>>>>>>> I mean you could build out a cluster w 4 x 3 3.5" 2TB drives in a 1
>>>>>>>> U
>>>>>>>> chassis based on price. You're making a trade off and you should be
>>>>>> aware of
>>>>>>>> the performance hit you will take.
>>>>>>>> 
>>>>>>>> HTH
>>>>>>>> 
>>>>>>>> -Mike
>>>>>>>> 
>>>>>>>> On Nov 27, 2012, at 1:52 PM, Jean-Marc Spaggiari <
>>>>>> jean-marc@spaggiari.org>
>>>>>>>> wrote:
>>>>>>>> 
>>>>>>>>> Hi Michael,
>>>>>>>>> 
>>>>>>>>> so are you recommanding 32Gb per node?
>>>>>>>>> 
>>>>>>>>> What about the disks? SATA drives are to slow?
>>>>>>>>> 
>>>>>>>>> JM
>>>>>>>>> 
>>>>>>>>> 2012/11/26, Michael Segel <mi...@hotmail.com>:
>>>>>>>>>> Uhm, those specs are actually now out of date.
>>>>>>>>>> 
>>>>>>>>>> If you're running HBase, or want to also run R on top of Hadoop,
>>>>>>>>>> you
>>>>>>>>>> will
>>>>>>>>>> need to add more memory.
>>>>>>>>>> Also forget 1GBe got 10GBe,  and w 2 SATA drives, you will be disk
>>>>>>>>>> i/o
>>>>>>>>>> bound
>>>>>>>>>> way too quickly.
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> On Nov 26, 2012, at 8:05 AM, Marcos Ortiz <ml...@uci.cu> wrote:
>>>>>>>>>> 
>>>>>>>>>>> Are you asking about hardware recommendations?
>>>>>>>>>>> Eric Sammer on his "Hadoop Operations" book, did a great job
>>>>>>>>>>> about
>>>>>>>>>>> this:
>>>>>>>>>>> For middle size clusters (until 300 nodes):
>>>>>>>>>>> Processor: A dual quad-core 2.6 Ghz
>>>>>>>>>>> RAM: 24 GB DDR3
>>>>>>>>>>> Dual 1 Gb Ethernet NICs
>>>>>>>>>>> a SAS drive controller
>>>>>>>>>>> at least two SATA II drives in a JBOD configuration
>>>>>>>>>>> 
>>>>>>>>>>> The replication factor depends heavily of the primary use of your
>>>>>>>>>>> cluster.
>>>>>>>>>>> 
>>>>>>>>>>> On 11/26/2012 08:53 AM, David Charle wrote:
>>>>>>>>>>>> hi
>>>>>>>>>>>> 
>>>>>>>>>>>> what's the recommended nodes for NN, hmaster and zk nodes for a
>>>>>> larger
>>>>>>>>>>>> cluster, lets say 50-100+
>>>>>>>>>>>> 
>>>>>>>>>>>> also, what would be the ideal replication factor for larger
>>>>>>>>>>>> clusters
>>>>>>>>>>>> when
>>>>>>>>>>>> u have 3-4 racks ?
>>>>>>>>>>>> 
>>>>>>>>>>>> --
>>>>>>>>>>>> David
>>>>>>>>>>>> 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS
>>>>>>>>>>>> CIENCIAS
>>>>>>>>>>>> INFORMATICAS...
>>>>>>>>>>>> CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION
>>>>>>>>>>>> 
>>>>>>>>>>>> http://www.uci.cu
>>>>>>>>>>>> http://www.facebook.com/universidad.uci
>>>>>>>>>>>> http://www.flickr.com/photos/universidad_uci
>>>>>>>>>>> 
>>>>>>>>>>> --
>>>>>>>>>>> 
>>>>>>>>>>> Marcos Luis Ortíz Valmaseda
>>>>>>>>>>> about.me/marcosortiz <http://about.me/marcosortiz>
>>>>>>>>>>> @marcosluis2186 <http://twitter.com/marcosluis2186>
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS
>>>>>>>>>>> CIENCIAS
>>>>>>>>>>> INFORMATICAS...
>>>>>>>>>>> CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION
>>>>>>>>>>> 
>>>>>>>>>>> http://www.uci.cu
>>>>>>>>>>> http://www.facebook.com/universidad.uci
>>>>>>>>>>> http://www.flickr.com/photos/universidad_uci
>>>>> 
>>>>> 
>>>>> --
>>>>> Adrien Mogenet
>>>>> 06.59.16.64.22
>>>>> http://www.mogenet.me
>>>> 
>>> 
>> 
>> 
>

Re: recommended nodes

Posted by Jean-Marc Spaggiari <je...@spaggiari.org>.

Hi Mike,

Why not using LVM with MapR? Since LVM is reading from 2 drives almost
at the same time, it should be better than RAID0 or a single drive,
no?

2012/11/28, Michael Segel <mi...@hotmail.com>:
> Just a couple of things.
>
> I'm neutral on the use of LVMs. Some would point out that there's some
> overhead, but on the flip side, it can make managing the machines easier.
> If you're using MapR, you don't want to use LVMs but raw devices.
>
> In terms of GC, its going to depend on the heap size and not the total
> memory. With respect to HBase. ... MSLABS is the way to go.
>
>
> On Nov 28, 2012, at 12:05 PM, Jean-Marc Spaggiari <je...@spaggiari.org>
> wrote:
>
>> Hi Gregory,
>>
>> I founs this about LVM:
>> -> http://blog.andrew.net.au/2006/08/09
>> -> http://www.phoronix.com/scan.php?page=article&item=fedora_15_lvm&num=2
>>
>> Seems that performances are still correct with it. I will most
>> probably give it a try and bench that too... I have one new hard drive
>> which should arrived tomorrow. Perfect timing ;)
>>
>>
>>
>> JM
>>
>> 2012/11/28, Mohit Anchlia <mo...@gmail.com>:
>>>
>>>
>>>
>>>
>>> On Nov 28, 2012, at 9:07 AM, Adrien Mogenet <ad...@gmail.com>
>>> wrote:
>>>
>>>> Does HBase really benefit from 64 GB of RAM since allocating too large
>>>> heap
>>>> might increase GC time ?
>>>>
>>> Benefit you get is from OS cache
>>>> Another question : why not RAID 0, in order to aggregate disk bandwidth
>>>> ?
>>>> (and thus keep 3x replication factor)
>>>>
>>>>
>>>> On Wed, Nov 28, 2012 at 5:58 PM, Michael Segel
>>>> <mi...@hotmail.com>wrote:
>>>>
>>>>> Sorry,
>>>>>
>>>>> I need to clarify.
>>>>>
>>>>> 4GB per physical core is a good starting point.
>>>>> So with 2 quad core chips, that is going to be 32GB.
>>>>>
>>>>> IMHO that's a minimum. If you go with HBase, you will want more.
>>>>> (Actually
>>>>> you will need more.) The next logical jump would be to 48 or 64GB.
>>>>>
>>>>> If we start to price out memory, depending on vendor, your company's
>>>>> procurement,  there really isn't much of a price difference in terms
>>>>> of
>>>>> 32,48, or 64 GB.
>>>>> Note that it also depends on the chips themselves. Also you need to
>>>>> see
>>>>> how many memory channels exist in the mother board. You may need to
>>>>> buy
>>>>> in
>>>>> pairs or triplets. Your hardware vendor can help you. (Also you need
>>>>> to
>>>>> keep an eye on your hardware vendor. Sometimes they will give you
>>>>> higher
>>>>> density chips that are going to be more expensive...) ;-)
>>>>>
>>>>> I tend to like having extra memory from the start.
>>>>> It gives you a bit more freedom and also protects you from 'fat' code.
>>>>>
>>>>> Looking at YARN... you will need more memory too.
>>>>>
>>>>>
>>>>> With respect to the hard drives...
>>>>>
>>>>> The best recommendation is to keep the drives as JBOD and then use 3x
>>>>> replication.
>>>>> In this case, make sure that the disk controller cards can handle
>>>>> JBOD.
>>>>> (Some don't support JBOD out of the box)
>>>>>
>>>>> With respect to RAID...
>>>>>
>>>>> If you are running MapR, no need for RAID.
>>>>> If you are running an Apache derivative, you could use RAID 1. Then
>>>>> cut
>>>>> your replication to 2X. This makes it easier to manage drive failures.
>>>>> (Its not the norm, but it works...) In some clusters, they are using
>>>>> appliances like Net App's e series where the machines see the drives
>>>>> as
>>>>> local attached storage and I think the appliances themselves are using
>>>>> RAID.  I haven't played with this configuration, however it could make
>>>>> sense and its a valid design.
>>>>>
>>>>> HTH
>>>>>
>>>>> -Mike
>>>>>
>>>>> On Nov 28, 2012, at 10:33 AM, Jean-Marc Spaggiari
>>>>> <je...@spaggiari.org>
>>>>> wrote:
>>>>>
>>>>>> Hi Mike,
>>>>>>
>>>>>> Thanks for all those details!
>>>>>>
>>>>>> So to simplify the equation, for 16 virtual cores we need 48 to 64GB.
>>>>>> Which mean 3 to 4GB per core. So with quad cores, 12GB to 16GB are a
>>>>>> good start? Or I simplified it to much?
>>>>>>
>>>>>> Regarding the hard drives. If you add more than one drive, do you
>>>>>> need
>>>>>> to build them on RAID or similar systems? Or can Hadoop/HBase be
>>>>>> configured to use more than one drive?
>>>>>>
>>>>>> Thanks,
>>>>>>
>>>>>> JM
>>>>>>
>>>>>> 2012/11/27, Michael Segel <mi...@hotmail.com>:
>>>>>>>
>>>>>>> OK... I don't know why Cloudera is so hung up on 32GB. ;-) [Its an
>>>>> inside
>>>>>>> joke ...]
>>>>>>>
>>>>>>> So here's the problem...
>>>>>>>
>>>>>>> By default, your child processes in a map/reduce job get a default
>>>>> 512MB.
>>>>>>> The majority of the time, this gets raised to 1GB.
>>>>>>>
>>>>>>> 8 cores (dual quad cores) shows up at 16 virtual processors in
>>>>>>> Linux.
>>>>> (Note:
>>>>>>> This is why when people talk about the number of cores, you have to
>>>>> specify
>>>>>>> physical cores or logical cores....)
>>>>>>>
>>>>>>> So if you were to over subscribe and have lets say 12  mappers and
>>>>>>> 12
>>>>>>> reducers, that's 24 slots. Which means that you would need 24GB of
>>>>> memory
>>>>>>> reserved just for the child processes. This would leave 8GB for DN,
>>>>>>> TT
>>>>> and
>>>>>>> the rest of the linux OS processes.
>>>>>>>
>>>>>>> Can you live with that? Sure.
>>>>>>> Now add in R, HBase, Impala, or some other set of tools on top of
>>>>>>> the
>>>>>>> cluster.
>>>>>>>
>>>>>>> Ooops! Now you are in trouble because you will swap.
>>>>>>> Also adding in R, you may want to bump up those child procs from 1GB
>>>>>>> to
>>>>> 2
>>>>>>> GB. That means the 24 slots would now require 48GB.  Now you have
>>>>>>> swap
>>>>> and
>>>>>>> if that happens you will see HBase in a cascading failure.
>>>>>>>
>>>>>>> So while you can do a rolling restart with the changed configuration
>>>>>>> (reducing the number of mappers and reducers) you end up with less
>>>>>>> slots
>>>>>>> which will mean in longer run time for your jobs. (Less slots ==
>>>>>>> less
>>>>>>> parallelism )
>>>>>>>
>>>>>>> Looking at the price of memory... you can get 48GB or even 64GB  for
>>>>> around
>>>>>>> the same price point. (8GB chips)
>>>>>>>
>>>>>>> And I didn't even talk about adding SOLR either again a memory
>>>>>>> hog...
>>>>> ;-)
>>>>>>>
>>>>>>> Note that I matched the number of mappers w reducers. You could go
>>>>>>> with
>>>>>>> fewer reducers if you want. I tend to recommend a ratio of 2:1
>>>>>>> mappers
>>>>> to
>>>>>>> reducers, depending on the work flow....
>>>>>>>
>>>>>>> As to the disks... no 7200 SATA III drives are fine. SATA III
>>>>>>> interface
>>>>> is
>>>>>>> pretty much available in the new kit being shipped.
>>>>>>> Its just that you don't have enough drives. 8 cores should be 8
>>>>> spindles if
>>>>>>> available.
>>>>>>> Otherwise you end up seeing your CPU load climb on wait states as
>>>>>>> the
>>>>>>> processes wait for the disk i/o to catch up.
>>>>>>>
>>>>>>> I mean you could build out a cluster w 4 x 3 3.5" 2TB drives in a 1
>>>>>>> U
>>>>>>> chassis based on price. You're making a trade off and you should be
>>>>> aware of
>>>>>>> the performance hit you will take.
>>>>>>>
>>>>>>> HTH
>>>>>>>
>>>>>>> -Mike
>>>>>>>
>>>>>>> On Nov 27, 2012, at 1:52 PM, Jean-Marc Spaggiari <
>>>>> jean-marc@spaggiari.org>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Hi Michael,
>>>>>>>>
>>>>>>>> so are you recommanding 32Gb per node?
>>>>>>>>
>>>>>>>> What about the disks? SATA drives are to slow?
>>>>>>>>
>>>>>>>> JM
>>>>>>>>
>>>>>>>> 2012/11/26, Michael Segel <mi...@hotmail.com>:
>>>>>>>>> Uhm, those specs are actually now out of date.
>>>>>>>>>
>>>>>>>>> If you're running HBase, or want to also run R on top of Hadoop,
>>>>>>>>> you
>>>>>>>>> will
>>>>>>>>> need to add more memory.
>>>>>>>>> Also forget 1GBe got 10GBe,  and w 2 SATA drives, you will be disk
>>>>>>>>> i/o
>>>>>>>>> bound
>>>>>>>>> way too quickly.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Nov 26, 2012, at 8:05 AM, Marcos Ortiz <ml...@uci.cu> wrote:
>>>>>>>>>
>>>>>>>>>> Are you asking about hardware recommendations?
>>>>>>>>>> Eric Sammer on his "Hadoop Operations" book, did a great job
>>>>>>>>>> about
>>>>>>>>>> this:
>>>>>>>>>> For middle size clusters (until 300 nodes):
>>>>>>>>>> Processor: A dual quad-core 2.6 Ghz
>>>>>>>>>> RAM: 24 GB DDR3
>>>>>>>>>> Dual 1 Gb Ethernet NICs
>>>>>>>>>> a SAS drive controller
>>>>>>>>>> at least two SATA II drives in a JBOD configuration
>>>>>>>>>>
>>>>>>>>>> The replication factor depends heavily of the primary use of your
>>>>>>>>>> cluster.
>>>>>>>>>>
>>>>>>>>>> On 11/26/2012 08:53 AM, David Charle wrote:
>>>>>>>>>>> hi
>>>>>>>>>>>
>>>>>>>>>>> what's the recommended nodes for NN, hmaster and zk nodes for a
>>>>> larger
>>>>>>>>>>> cluster, lets say 50-100+
>>>>>>>>>>>
>>>>>>>>>>> also, what would be the ideal replication factor for larger
>>>>>>>>>>> clusters
>>>>>>>>>>> when
>>>>>>>>>>> u have 3-4 racks ?
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>> David
>>>>>>>>>>> 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS
>>>>>>>>>>> CIENCIAS
>>>>>>>>>>> INFORMATICAS...
>>>>>>>>>>> CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION
>>>>>>>>>>>
>>>>>>>>>>> http://www.uci.cu
>>>>>>>>>>> http://www.facebook.com/universidad.uci
>>>>>>>>>>> http://www.flickr.com/photos/universidad_uci
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>>
>>>>>>>>>> Marcos Luis Ortíz Valmaseda
>>>>>>>>>> about.me/marcosortiz <http://about.me/marcosortiz>
>>>>>>>>>> @marcosluis2186 <http://twitter.com/marcosluis2186>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS
>>>>>>>>>> CIENCIAS
>>>>>>>>>> INFORMATICAS...
>>>>>>>>>> CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION
>>>>>>>>>>
>>>>>>>>>> http://www.uci.cu
>>>>>>>>>> http://www.facebook.com/universidad.uci
>>>>>>>>>> http://www.flickr.com/photos/universidad_uci
>>>>
>>>>
>>>> --
>>>> Adrien Mogenet
>>>> 06.59.16.64.22
>>>> http://www.mogenet.me
>>>
>>
>
>

Re: recommended nodes

Posted by Michael Segel <mi...@hotmail.com>.

Just a couple of things. 

I'm neutral on the use of LVMs. Some would point out that there's some overhead, but on the flip side, it can make managing the machines easier. 
If you're using MapR, you don't want to use LVMs but raw devices. 

In terms of GC, its going to depend on the heap size and not the total memory. With respect to HBase. ... MSLABS is the way to go. 


On Nov 28, 2012, at 12:05 PM, Jean-Marc Spaggiari <je...@spaggiari.org> wrote:

> Hi Gregory,
> 
> I founs this about LVM:
> -> http://blog.andrew.net.au/2006/08/09
> -> http://www.phoronix.com/scan.php?page=article&item=fedora_15_lvm&num=2
> 
> Seems that performances are still correct with it. I will most
> probably give it a try and bench that too... I have one new hard drive
> which should arrived tomorrow. Perfect timing ;)
> 
> 
> 
> JM
> 
> 2012/11/28, Mohit Anchlia <mo...@gmail.com>:
>> 
>> 
>> 
>> 
>> On Nov 28, 2012, at 9:07 AM, Adrien Mogenet <ad...@gmail.com>
>> wrote:
>> 
>>> Does HBase really benefit from 64 GB of RAM since allocating too large
>>> heap
>>> might increase GC time ?
>>> 
>> Benefit you get is from OS cache
>>> Another question : why not RAID 0, in order to aggregate disk bandwidth ?
>>> (and thus keep 3x replication factor)
>>> 
>>> 
>>> On Wed, Nov 28, 2012 at 5:58 PM, Michael Segel
>>> <mi...@hotmail.com>wrote:
>>> 
>>>> Sorry,
>>>> 
>>>> I need to clarify.
>>>> 
>>>> 4GB per physical core is a good starting point.
>>>> So with 2 quad core chips, that is going to be 32GB.
>>>> 
>>>> IMHO that's a minimum. If you go with HBase, you will want more.
>>>> (Actually
>>>> you will need more.) The next logical jump would be to 48 or 64GB.
>>>> 
>>>> If we start to price out memory, depending on vendor, your company's
>>>> procurement,  there really isn't much of a price difference in terms of
>>>> 32,48, or 64 GB.
>>>> Note that it also depends on the chips themselves. Also you need to see
>>>> how many memory channels exist in the mother board. You may need to buy
>>>> in
>>>> pairs or triplets. Your hardware vendor can help you. (Also you need to
>>>> keep an eye on your hardware vendor. Sometimes they will give you higher
>>>> density chips that are going to be more expensive...) ;-)
>>>> 
>>>> I tend to like having extra memory from the start.
>>>> It gives you a bit more freedom and also protects you from 'fat' code.
>>>> 
>>>> Looking at YARN... you will need more memory too.
>>>> 
>>>> 
>>>> With respect to the hard drives...
>>>> 
>>>> The best recommendation is to keep the drives as JBOD and then use 3x
>>>> replication.
>>>> In this case, make sure that the disk controller cards can handle JBOD.
>>>> (Some don't support JBOD out of the box)
>>>> 
>>>> With respect to RAID...
>>>> 
>>>> If you are running MapR, no need for RAID.
>>>> If you are running an Apache derivative, you could use RAID 1. Then cut
>>>> your replication to 2X. This makes it easier to manage drive failures.
>>>> (Its not the norm, but it works...) In some clusters, they are using
>>>> appliances like Net App's e series where the machines see the drives as
>>>> local attached storage and I think the appliances themselves are using
>>>> RAID.  I haven't played with this configuration, however it could make
>>>> sense and its a valid design.
>>>> 
>>>> HTH
>>>> 
>>>> -Mike
>>>> 
>>>> On Nov 28, 2012, at 10:33 AM, Jean-Marc Spaggiari
>>>> <je...@spaggiari.org>
>>>> wrote:
>>>> 
>>>>> Hi Mike,
>>>>> 
>>>>> Thanks for all those details!
>>>>> 
>>>>> So to simplify the equation, for 16 virtual cores we need 48 to 64GB.
>>>>> Which mean 3 to 4GB per core. So with quad cores, 12GB to 16GB are a
>>>>> good start? Or I simplified it to much?
>>>>> 
>>>>> Regarding the hard drives. If you add more than one drive, do you need
>>>>> to build them on RAID or similar systems? Or can Hadoop/HBase be
>>>>> configured to use more than one drive?
>>>>> 
>>>>> Thanks,
>>>>> 
>>>>> JM
>>>>> 
>>>>> 2012/11/27, Michael Segel <mi...@hotmail.com>:
>>>>>> 
>>>>>> OK... I don't know why Cloudera is so hung up on 32GB. ;-) [Its an
>>>> inside
>>>>>> joke ...]
>>>>>> 
>>>>>> So here's the problem...
>>>>>> 
>>>>>> By default, your child processes in a map/reduce job get a default
>>>> 512MB.
>>>>>> The majority of the time, this gets raised to 1GB.
>>>>>> 
>>>>>> 8 cores (dual quad cores) shows up at 16 virtual processors in Linux.
>>>> (Note:
>>>>>> This is why when people talk about the number of cores, you have to
>>>> specify
>>>>>> physical cores or logical cores....)
>>>>>> 
>>>>>> So if you were to over subscribe and have lets say 12  mappers and 12
>>>>>> reducers, that's 24 slots. Which means that you would need 24GB of
>>>> memory
>>>>>> reserved just for the child processes. This would leave 8GB for DN, TT
>>>> and
>>>>>> the rest of the linux OS processes.
>>>>>> 
>>>>>> Can you live with that? Sure.
>>>>>> Now add in R, HBase, Impala, or some other set of tools on top of the
>>>>>> cluster.
>>>>>> 
>>>>>> Ooops! Now you are in trouble because you will swap.
>>>>>> Also adding in R, you may want to bump up those child procs from 1GB
>>>>>> to
>>>> 2
>>>>>> GB. That means the 24 slots would now require 48GB.  Now you have swap
>>>> and
>>>>>> if that happens you will see HBase in a cascading failure.
>>>>>> 
>>>>>> So while you can do a rolling restart with the changed configuration
>>>>>> (reducing the number of mappers and reducers) you end up with less
>>>>>> slots
>>>>>> which will mean in longer run time for your jobs. (Less slots == less
>>>>>> parallelism )
>>>>>> 
>>>>>> Looking at the price of memory... you can get 48GB or even 64GB  for
>>>> around
>>>>>> the same price point. (8GB chips)
>>>>>> 
>>>>>> And I didn't even talk about adding SOLR either again a memory hog...
>>>> ;-)
>>>>>> 
>>>>>> Note that I matched the number of mappers w reducers. You could go
>>>>>> with
>>>>>> fewer reducers if you want. I tend to recommend a ratio of 2:1 mappers
>>>> to
>>>>>> reducers, depending on the work flow....
>>>>>> 
>>>>>> As to the disks... no 7200 SATA III drives are fine. SATA III
>>>>>> interface
>>>> is
>>>>>> pretty much available in the new kit being shipped.
>>>>>> Its just that you don't have enough drives. 8 cores should be 8
>>>> spindles if
>>>>>> available.
>>>>>> Otherwise you end up seeing your CPU load climb on wait states as the
>>>>>> processes wait for the disk i/o to catch up.
>>>>>> 
>>>>>> I mean you could build out a cluster w 4 x 3 3.5" 2TB drives in a 1 U
>>>>>> chassis based on price. You're making a trade off and you should be
>>>> aware of
>>>>>> the performance hit you will take.
>>>>>> 
>>>>>> HTH
>>>>>> 
>>>>>> -Mike
>>>>>> 
>>>>>> On Nov 27, 2012, at 1:52 PM, Jean-Marc Spaggiari <
>>>> jean-marc@spaggiari.org>
>>>>>> wrote:
>>>>>> 
>>>>>>> Hi Michael,
>>>>>>> 
>>>>>>> so are you recommanding 32Gb per node?
>>>>>>> 
>>>>>>> What about the disks? SATA drives are to slow?
>>>>>>> 
>>>>>>> JM
>>>>>>> 
>>>>>>> 2012/11/26, Michael Segel <mi...@hotmail.com>:
>>>>>>>> Uhm, those specs are actually now out of date.
>>>>>>>> 
>>>>>>>> If you're running HBase, or want to also run R on top of Hadoop, you
>>>>>>>> will
>>>>>>>> need to add more memory.
>>>>>>>> Also forget 1GBe got 10GBe,  and w 2 SATA drives, you will be disk
>>>>>>>> i/o
>>>>>>>> bound
>>>>>>>> way too quickly.
>>>>>>>> 
>>>>>>>> 
>>>>>>>> On Nov 26, 2012, at 8:05 AM, Marcos Ortiz <ml...@uci.cu> wrote:
>>>>>>>> 
>>>>>>>>> Are you asking about hardware recommendations?
>>>>>>>>> Eric Sammer on his "Hadoop Operations" book, did a great job about
>>>>>>>>> this:
>>>>>>>>> For middle size clusters (until 300 nodes):
>>>>>>>>> Processor: A dual quad-core 2.6 Ghz
>>>>>>>>> RAM: 24 GB DDR3
>>>>>>>>> Dual 1 Gb Ethernet NICs
>>>>>>>>> a SAS drive controller
>>>>>>>>> at least two SATA II drives in a JBOD configuration
>>>>>>>>> 
>>>>>>>>> The replication factor depends heavily of the primary use of your
>>>>>>>>> cluster.
>>>>>>>>> 
>>>>>>>>> On 11/26/2012 08:53 AM, David Charle wrote:
>>>>>>>>>> hi
>>>>>>>>>> 
>>>>>>>>>> what's the recommended nodes for NN, hmaster and zk nodes for a
>>>> larger
>>>>>>>>>> cluster, lets say 50-100+
>>>>>>>>>> 
>>>>>>>>>> also, what would be the ideal replication factor for larger
>>>>>>>>>> clusters
>>>>>>>>>> when
>>>>>>>>>> u have 3-4 racks ?
>>>>>>>>>> 
>>>>>>>>>> --
>>>>>>>>>> David
>>>>>>>>>> 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS
>>>>>>>>>> INFORMATICAS...
>>>>>>>>>> CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION
>>>>>>>>>> 
>>>>>>>>>> http://www.uci.cu
>>>>>>>>>> http://www.facebook.com/universidad.uci
>>>>>>>>>> http://www.flickr.com/photos/universidad_uci
>>>>>>>>> 
>>>>>>>>> --
>>>>>>>>> 
>>>>>>>>> Marcos Luis Ortíz Valmaseda
>>>>>>>>> about.me/marcosortiz <http://about.me/marcosortiz>
>>>>>>>>> @marcosluis2186 <http://twitter.com/marcosluis2186>
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS
>>>>>>>>> INFORMATICAS...
>>>>>>>>> CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION
>>>>>>>>> 
>>>>>>>>> http://www.uci.cu
>>>>>>>>> http://www.facebook.com/universidad.uci
>>>>>>>>> http://www.flickr.com/photos/universidad_uci
>>> 
>>> 
>>> --
>>> Adrien Mogenet
>>> 06.59.16.64.22
>>> http://www.mogenet.me
>> 
>

Re: recommended nodes

Posted by Jean-Marc Spaggiari <je...@spaggiari.org>.

Hi Gregory,

I founs this about LVM:
-> http://blog.andrew.net.au/2006/08/09
-> http://www.phoronix.com/scan.php?page=article&item=fedora_15_lvm&num=2

Seems that performances are still correct with it. I will most
probably give it a try and bench that too... I have one new hard drive
which should arrived tomorrow. Perfect timing ;)



JM

2012/11/28, Mohit Anchlia <mo...@gmail.com>:
>
>
>
>
> On Nov 28, 2012, at 9:07 AM, Adrien Mogenet <ad...@gmail.com>
> wrote:
>
>> Does HBase really benefit from 64 GB of RAM since allocating too large
>> heap
>> might increase GC time ?
>>
> Benefit you get is from OS cache
>> Another question : why not RAID 0, in order to aggregate disk bandwidth ?
>> (and thus keep 3x replication factor)
>>
>>
>> On Wed, Nov 28, 2012 at 5:58 PM, Michael Segel
>> <mi...@hotmail.com>wrote:
>>
>>> Sorry,
>>>
>>> I need to clarify.
>>>
>>> 4GB per physical core is a good starting point.
>>> So with 2 quad core chips, that is going to be 32GB.
>>>
>>> IMHO that's a minimum. If you go with HBase, you will want more.
>>> (Actually
>>> you will need more.) The next logical jump would be to 48 or 64GB.
>>>
>>> If we start to price out memory, depending on vendor, your company's
>>> procurement,  there really isn't much of a price difference in terms of
>>> 32,48, or 64 GB.
>>> Note that it also depends on the chips themselves. Also you need to see
>>> how many memory channels exist in the mother board. You may need to buy
>>> in
>>> pairs or triplets. Your hardware vendor can help you. (Also you need to
>>> keep an eye on your hardware vendor. Sometimes they will give you higher
>>> density chips that are going to be more expensive...) ;-)
>>>
>>> I tend to like having extra memory from the start.
>>> It gives you a bit more freedom and also protects you from 'fat' code.
>>>
>>> Looking at YARN... you will need more memory too.
>>>
>>>
>>> With respect to the hard drives...
>>>
>>> The best recommendation is to keep the drives as JBOD and then use 3x
>>> replication.
>>> In this case, make sure that the disk controller cards can handle JBOD.
>>> (Some don't support JBOD out of the box)
>>>
>>> With respect to RAID...
>>>
>>> If you are running MapR, no need for RAID.
>>> If you are running an Apache derivative, you could use RAID 1. Then cut
>>> your replication to 2X. This makes it easier to manage drive failures.
>>> (Its not the norm, but it works...) In some clusters, they are using
>>> appliances like Net App's e series where the machines see the drives as
>>> local attached storage and I think the appliances themselves are using
>>> RAID.  I haven't played with this configuration, however it could make
>>> sense and its a valid design.
>>>
>>> HTH
>>>
>>> -Mike
>>>
>>> On Nov 28, 2012, at 10:33 AM, Jean-Marc Spaggiari
>>> <je...@spaggiari.org>
>>> wrote:
>>>
>>>> Hi Mike,
>>>>
>>>> Thanks for all those details!
>>>>
>>>> So to simplify the equation, for 16 virtual cores we need 48 to 64GB.
>>>> Which mean 3 to 4GB per core. So with quad cores, 12GB to 16GB are a
>>>> good start? Or I simplified it to much?
>>>>
>>>> Regarding the hard drives. If you add more than one drive, do you need
>>>> to build them on RAID or similar systems? Or can Hadoop/HBase be
>>>> configured to use more than one drive?
>>>>
>>>> Thanks,
>>>>
>>>> JM
>>>>
>>>> 2012/11/27, Michael Segel <mi...@hotmail.com>:
>>>>>
>>>>> OK... I don't know why Cloudera is so hung up on 32GB. ;-) [Its an
>>> inside
>>>>> joke ...]
>>>>>
>>>>> So here's the problem...
>>>>>
>>>>> By default, your child processes in a map/reduce job get a default
>>> 512MB.
>>>>> The majority of the time, this gets raised to 1GB.
>>>>>
>>>>> 8 cores (dual quad cores) shows up at 16 virtual processors in Linux.
>>> (Note:
>>>>> This is why when people talk about the number of cores, you have to
>>> specify
>>>>> physical cores or logical cores....)
>>>>>
>>>>> So if you were to over subscribe and have lets say 12  mappers and 12
>>>>> reducers, that's 24 slots. Which means that you would need 24GB of
>>> memory
>>>>> reserved just for the child processes. This would leave 8GB for DN, TT
>>> and
>>>>> the rest of the linux OS processes.
>>>>>
>>>>> Can you live with that? Sure.
>>>>> Now add in R, HBase, Impala, or some other set of tools on top of the
>>>>> cluster.
>>>>>
>>>>> Ooops! Now you are in trouble because you will swap.
>>>>> Also adding in R, you may want to bump up those child procs from 1GB
>>>>> to
>>> 2
>>>>> GB. That means the 24 slots would now require 48GB.  Now you have swap
>>> and
>>>>> if that happens you will see HBase in a cascading failure.
>>>>>
>>>>> So while you can do a rolling restart with the changed configuration
>>>>> (reducing the number of mappers and reducers) you end up with less
>>>>> slots
>>>>> which will mean in longer run time for your jobs. (Less slots == less
>>>>> parallelism )
>>>>>
>>>>> Looking at the price of memory... you can get 48GB or even 64GB  for
>>> around
>>>>> the same price point. (8GB chips)
>>>>>
>>>>> And I didn't even talk about adding SOLR either again a memory hog...
>>> ;-)
>>>>>
>>>>> Note that I matched the number of mappers w reducers. You could go
>>>>> with
>>>>> fewer reducers if you want. I tend to recommend a ratio of 2:1 mappers
>>> to
>>>>> reducers, depending on the work flow....
>>>>>
>>>>> As to the disks... no 7200 SATA III drives are fine. SATA III
>>>>> interface
>>> is
>>>>> pretty much available in the new kit being shipped.
>>>>> Its just that you don't have enough drives. 8 cores should be 8
>>> spindles if
>>>>> available.
>>>>> Otherwise you end up seeing your CPU load climb on wait states as the
>>>>> processes wait for the disk i/o to catch up.
>>>>>
>>>>> I mean you could build out a cluster w 4 x 3 3.5" 2TB drives in a 1 U
>>>>> chassis based on price. You're making a trade off and you should be
>>> aware of
>>>>> the performance hit you will take.
>>>>>
>>>>> HTH
>>>>>
>>>>> -Mike
>>>>>
>>>>> On Nov 27, 2012, at 1:52 PM, Jean-Marc Spaggiari <
>>> jean-marc@spaggiari.org>
>>>>> wrote:
>>>>>
>>>>>> Hi Michael,
>>>>>>
>>>>>> so are you recommanding 32Gb per node?
>>>>>>
>>>>>> What about the disks? SATA drives are to slow?
>>>>>>
>>>>>> JM
>>>>>>
>>>>>> 2012/11/26, Michael Segel <mi...@hotmail.com>:
>>>>>>> Uhm, those specs are actually now out of date.
>>>>>>>
>>>>>>> If you're running HBase, or want to also run R on top of Hadoop, you
>>>>>>> will
>>>>>>> need to add more memory.
>>>>>>> Also forget 1GBe got 10GBe,  and w 2 SATA drives, you will be disk
>>>>>>> i/o
>>>>>>> bound
>>>>>>> way too quickly.
>>>>>>>
>>>>>>>
>>>>>>> On Nov 26, 2012, at 8:05 AM, Marcos Ortiz <ml...@uci.cu> wrote:
>>>>>>>
>>>>>>>> Are you asking about hardware recommendations?
>>>>>>>> Eric Sammer on his "Hadoop Operations" book, did a great job about
>>>>>>>> this:
>>>>>>>> For middle size clusters (until 300 nodes):
>>>>>>>> Processor: A dual quad-core 2.6 Ghz
>>>>>>>> RAM: 24 GB DDR3
>>>>>>>> Dual 1 Gb Ethernet NICs
>>>>>>>> a SAS drive controller
>>>>>>>> at least two SATA II drives in a JBOD configuration
>>>>>>>>
>>>>>>>> The replication factor depends heavily of the primary use of your
>>>>>>>> cluster.
>>>>>>>>
>>>>>>>> On 11/26/2012 08:53 AM, David Charle wrote:
>>>>>>>>> hi
>>>>>>>>>
>>>>>>>>> what's the recommended nodes for NN, hmaster and zk nodes for a
>>> larger
>>>>>>>>> cluster, lets say 50-100+
>>>>>>>>>
>>>>>>>>> also, what would be the ideal replication factor for larger
>>>>>>>>> clusters
>>>>>>>>> when
>>>>>>>>> u have 3-4 racks ?
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> David
>>>>>>>>> 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS
>>>>>>>>> INFORMATICAS...
>>>>>>>>> CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION
>>>>>>>>>
>>>>>>>>> http://www.uci.cu
>>>>>>>>> http://www.facebook.com/universidad.uci
>>>>>>>>> http://www.flickr.com/photos/universidad_uci
>>>>>>>>
>>>>>>>> --
>>>>>>>>
>>>>>>>> Marcos Luis Ortíz Valmaseda
>>>>>>>> about.me/marcosortiz <http://about.me/marcosortiz>
>>>>>>>> @marcosluis2186 <http://twitter.com/marcosluis2186>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS
>>>>>>>> INFORMATICAS...
>>>>>>>> CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION
>>>>>>>>
>>>>>>>> http://www.uci.cu
>>>>>>>> http://www.facebook.com/universidad.uci
>>>>>>>> http://www.flickr.com/photos/universidad_uci
>>
>>
>> --
>> Adrien Mogenet
>> 06.59.16.64.22
>> http://www.mogenet.me
>

Re: recommended nodes

Posted by Mohit Anchlia <mo...@gmail.com>.




On Nov 28, 2012, at 9:07 AM, Adrien Mogenet <ad...@gmail.com> wrote:

> Does HBase really benefit from 64 GB of RAM since allocating too large heap
> might increase GC time ?
> 
Benefit you get is from OS cache
> Another question : why not RAID 0, in order to aggregate disk bandwidth ?
> (and thus keep 3x replication factor)
> 
> 
> On Wed, Nov 28, 2012 at 5:58 PM, Michael Segel <mi...@hotmail.com>wrote:
> 
>> Sorry,
>> 
>> I need to clarify.
>> 
>> 4GB per physical core is a good starting point.
>> So with 2 quad core chips, that is going to be 32GB.
>> 
>> IMHO that's a minimum. If you go with HBase, you will want more. (Actually
>> you will need more.) The next logical jump would be to 48 or 64GB.
>> 
>> If we start to price out memory, depending on vendor, your company's
>> procurement,  there really isn't much of a price difference in terms of
>> 32,48, or 64 GB.
>> Note that it also depends on the chips themselves. Also you need to see
>> how many memory channels exist in the mother board. You may need to buy in
>> pairs or triplets. Your hardware vendor can help you. (Also you need to
>> keep an eye on your hardware vendor. Sometimes they will give you higher
>> density chips that are going to be more expensive...) ;-)
>> 
>> I tend to like having extra memory from the start.
>> It gives you a bit more freedom and also protects you from 'fat' code.
>> 
>> Looking at YARN... you will need more memory too.
>> 
>> 
>> With respect to the hard drives...
>> 
>> The best recommendation is to keep the drives as JBOD and then use 3x
>> replication.
>> In this case, make sure that the disk controller cards can handle JBOD.
>> (Some don't support JBOD out of the box)
>> 
>> With respect to RAID...
>> 
>> If you are running MapR, no need for RAID.
>> If you are running an Apache derivative, you could use RAID 1. Then cut
>> your replication to 2X. This makes it easier to manage drive failures.
>> (Its not the norm, but it works...) In some clusters, they are using
>> appliances like Net App's e series where the machines see the drives as
>> local attached storage and I think the appliances themselves are using
>> RAID.  I haven't played with this configuration, however it could make
>> sense and its a valid design.
>> 
>> HTH
>> 
>> -Mike
>> 
>> On Nov 28, 2012, at 10:33 AM, Jean-Marc Spaggiari <je...@spaggiari.org>
>> wrote:
>> 
>>> Hi Mike,
>>> 
>>> Thanks for all those details!
>>> 
>>> So to simplify the equation, for 16 virtual cores we need 48 to 64GB.
>>> Which mean 3 to 4GB per core. So with quad cores, 12GB to 16GB are a
>>> good start? Or I simplified it to much?
>>> 
>>> Regarding the hard drives. If you add more than one drive, do you need
>>> to build them on RAID or similar systems? Or can Hadoop/HBase be
>>> configured to use more than one drive?
>>> 
>>> Thanks,
>>> 
>>> JM
>>> 
>>> 2012/11/27, Michael Segel <mi...@hotmail.com>:
>>>> 
>>>> OK... I don't know why Cloudera is so hung up on 32GB. ;-) [Its an
>> inside
>>>> joke ...]
>>>> 
>>>> So here's the problem...
>>>> 
>>>> By default, your child processes in a map/reduce job get a default
>> 512MB.
>>>> The majority of the time, this gets raised to 1GB.
>>>> 
>>>> 8 cores (dual quad cores) shows up at 16 virtual processors in Linux.
>> (Note:
>>>> This is why when people talk about the number of cores, you have to
>> specify
>>>> physical cores or logical cores....)
>>>> 
>>>> So if you were to over subscribe and have lets say 12  mappers and 12
>>>> reducers, that's 24 slots. Which means that you would need 24GB of
>> memory
>>>> reserved just for the child processes. This would leave 8GB for DN, TT
>> and
>>>> the rest of the linux OS processes.
>>>> 
>>>> Can you live with that? Sure.
>>>> Now add in R, HBase, Impala, or some other set of tools on top of the
>>>> cluster.
>>>> 
>>>> Ooops! Now you are in trouble because you will swap.
>>>> Also adding in R, you may want to bump up those child procs from 1GB to
>> 2
>>>> GB. That means the 24 slots would now require 48GB.  Now you have swap
>> and
>>>> if that happens you will see HBase in a cascading failure.
>>>> 
>>>> So while you can do a rolling restart with the changed configuration
>>>> (reducing the number of mappers and reducers) you end up with less slots
>>>> which will mean in longer run time for your jobs. (Less slots == less
>>>> parallelism )
>>>> 
>>>> Looking at the price of memory... you can get 48GB or even 64GB  for
>> around
>>>> the same price point. (8GB chips)
>>>> 
>>>> And I didn't even talk about adding SOLR either again a memory hog...
>> ;-)
>>>> 
>>>> Note that I matched the number of mappers w reducers. You could go with
>>>> fewer reducers if you want. I tend to recommend a ratio of 2:1 mappers
>> to
>>>> reducers, depending on the work flow....
>>>> 
>>>> As to the disks... no 7200 SATA III drives are fine. SATA III interface
>> is
>>>> pretty much available in the new kit being shipped.
>>>> Its just that you don't have enough drives. 8 cores should be 8
>> spindles if
>>>> available.
>>>> Otherwise you end up seeing your CPU load climb on wait states as the
>>>> processes wait for the disk i/o to catch up.
>>>> 
>>>> I mean you could build out a cluster w 4 x 3 3.5" 2TB drives in a 1 U
>>>> chassis based on price. You're making a trade off and you should be
>> aware of
>>>> the performance hit you will take.
>>>> 
>>>> HTH
>>>> 
>>>> -Mike
>>>> 
>>>> On Nov 27, 2012, at 1:52 PM, Jean-Marc Spaggiari <
>> jean-marc@spaggiari.org>
>>>> wrote:
>>>> 
>>>>> Hi Michael,
>>>>> 
>>>>> so are you recommanding 32Gb per node?
>>>>> 
>>>>> What about the disks? SATA drives are to slow?
>>>>> 
>>>>> JM
>>>>> 
>>>>> 2012/11/26, Michael Segel <mi...@hotmail.com>:
>>>>>> Uhm, those specs are actually now out of date.
>>>>>> 
>>>>>> If you're running HBase, or want to also run R on top of Hadoop, you
>>>>>> will
>>>>>> need to add more memory.
>>>>>> Also forget 1GBe got 10GBe,  and w 2 SATA drives, you will be disk i/o
>>>>>> bound
>>>>>> way too quickly.
>>>>>> 
>>>>>> 
>>>>>> On Nov 26, 2012, at 8:05 AM, Marcos Ortiz <ml...@uci.cu> wrote:
>>>>>> 
>>>>>>> Are you asking about hardware recommendations?
>>>>>>> Eric Sammer on his "Hadoop Operations" book, did a great job about
>>>>>>> this:
>>>>>>> For middle size clusters (until 300 nodes):
>>>>>>> Processor: A dual quad-core 2.6 Ghz
>>>>>>> RAM: 24 GB DDR3
>>>>>>> Dual 1 Gb Ethernet NICs
>>>>>>> a SAS drive controller
>>>>>>> at least two SATA II drives in a JBOD configuration
>>>>>>> 
>>>>>>> The replication factor depends heavily of the primary use of your
>>>>>>> cluster.
>>>>>>> 
>>>>>>> On 11/26/2012 08:53 AM, David Charle wrote:
>>>>>>>> hi
>>>>>>>> 
>>>>>>>> what's the recommended nodes for NN, hmaster and zk nodes for a
>> larger
>>>>>>>> cluster, lets say 50-100+
>>>>>>>> 
>>>>>>>> also, what would be the ideal replication factor for larger clusters
>>>>>>>> when
>>>>>>>> u have 3-4 racks ?
>>>>>>>> 
>>>>>>>> --
>>>>>>>> David
>>>>>>>> 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS
>>>>>>>> INFORMATICAS...
>>>>>>>> CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION
>>>>>>>> 
>>>>>>>> http://www.uci.cu
>>>>>>>> http://www.facebook.com/universidad.uci
>>>>>>>> http://www.flickr.com/photos/universidad_uci
>>>>>>> 
>>>>>>> --
>>>>>>> 
>>>>>>> Marcos Luis Ortíz Valmaseda
>>>>>>> about.me/marcosortiz <http://about.me/marcosortiz>
>>>>>>> @marcosluis2186 <http://twitter.com/marcosluis2186>
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS
>>>>>>> INFORMATICAS...
>>>>>>> CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION
>>>>>>> 
>>>>>>> http://www.uci.cu
>>>>>>> http://www.facebook.com/universidad.uci
>>>>>>> http://www.flickr.com/photos/universidad_uci
> 
> 
> -- 
> Adrien Mogenet
> 06.59.16.64.22
> http://www.mogenet.me

Re: recommended nodes

Posted by Adrien Mogenet <ad...@gmail.com>.

Does HBase really benefit from 64 GB of RAM since allocating too large heap
might increase GC time ?

Another question : why not RAID 0, in order to aggregate disk bandwidth ?
(and thus keep 3x replication factor)


On Wed, Nov 28, 2012 at 5:58 PM, Michael Segel <mi...@hotmail.com>wrote:

> Sorry,
>
> I need to clarify.
>
> 4GB per physical core is a good starting point.
> So with 2 quad core chips, that is going to be 32GB.
>
> IMHO that's a minimum. If you go with HBase, you will want more. (Actually
> you will need more.) The next logical jump would be to 48 or 64GB.
>
> If we start to price out memory, depending on vendor, your company's
> procurement,  there really isn't much of a price difference in terms of
> 32,48, or 64 GB.
> Note that it also depends on the chips themselves. Also you need to see
> how many memory channels exist in the mother board. You may need to buy in
> pairs or triplets. Your hardware vendor can help you. (Also you need to
> keep an eye on your hardware vendor. Sometimes they will give you higher
> density chips that are going to be more expensive...) ;-)
>
> I tend to like having extra memory from the start.
> It gives you a bit more freedom and also protects you from 'fat' code.
>
> Looking at YARN... you will need more memory too.
>
>
> With respect to the hard drives...
>
> The best recommendation is to keep the drives as JBOD and then use 3x
> replication.
> In this case, make sure that the disk controller cards can handle JBOD.
> (Some don't support JBOD out of the box)
>
> With respect to RAID...
>
> If you are running MapR, no need for RAID.
> If you are running an Apache derivative, you could use RAID 1. Then cut
> your replication to 2X. This makes it easier to manage drive failures.
> (Its not the norm, but it works...) In some clusters, they are using
> appliances like Net App's e series where the machines see the drives as
> local attached storage and I think the appliances themselves are using
> RAID.  I haven't played with this configuration, however it could make
> sense and its a valid design.
>
> HTH
>
> -Mike
>
> On Nov 28, 2012, at 10:33 AM, Jean-Marc Spaggiari <je...@spaggiari.org>
> wrote:
>
> > Hi Mike,
> >
> > Thanks for all those details!
> >
> > So to simplify the equation, for 16 virtual cores we need 48 to 64GB.
> > Which mean 3 to 4GB per core. So with quad cores, 12GB to 16GB are a
> > good start? Or I simplified it to much?
> >
> > Regarding the hard drives. If you add more than one drive, do you need
> > to build them on RAID or similar systems? Or can Hadoop/HBase be
> > configured to use more than one drive?
> >
> > Thanks,
> >
> > JM
> >
> > 2012/11/27, Michael Segel <mi...@hotmail.com>:
> >>
> >> OK... I don't know why Cloudera is so hung up on 32GB. ;-) [Its an
> inside
> >> joke ...]
> >>
> >> So here's the problem...
> >>
> >> By default, your child processes in a map/reduce job get a default
> 512MB.
> >> The majority of the time, this gets raised to 1GB.
> >>
> >> 8 cores (dual quad cores) shows up at 16 virtual processors in Linux.
> (Note:
> >> This is why when people talk about the number of cores, you have to
> specify
> >> physical cores or logical cores....)
> >>
> >> So if you were to over subscribe and have lets say 12  mappers and 12
> >> reducers, that's 24 slots. Which means that you would need 24GB of
> memory
> >> reserved just for the child processes. This would leave 8GB for DN, TT
> and
> >> the rest of the linux OS processes.
> >>
> >> Can you live with that? Sure.
> >> Now add in R, HBase, Impala, or some other set of tools on top of the
> >> cluster.
> >>
> >> Ooops! Now you are in trouble because you will swap.
> >> Also adding in R, you may want to bump up those child procs from 1GB to
> 2
> >> GB. That means the 24 slots would now require 48GB.  Now you have swap
> and
> >> if that happens you will see HBase in a cascading failure.
> >>
> >> So while you can do a rolling restart with the changed configuration
> >> (reducing the number of mappers and reducers) you end up with less slots
> >> which will mean in longer run time for your jobs. (Less slots == less
> >> parallelism )
> >>
> >> Looking at the price of memory... you can get 48GB or even 64GB  for
> around
> >> the same price point. (8GB chips)
> >>
> >> And I didn't even talk about adding SOLR either again a memory hog...
> ;-)
> >>
> >> Note that I matched the number of mappers w reducers. You could go with
> >> fewer reducers if you want. I tend to recommend a ratio of 2:1 mappers
> to
> >> reducers, depending on the work flow....
> >>
> >> As to the disks... no 7200 SATA III drives are fine. SATA III interface
> is
> >> pretty much available in the new kit being shipped.
> >> Its just that you don't have enough drives. 8 cores should be 8
> spindles if
> >> available.
> >> Otherwise you end up seeing your CPU load climb on wait states as the
> >> processes wait for the disk i/o to catch up.
> >>
> >> I mean you could build out a cluster w 4 x 3 3.5" 2TB drives in a 1 U
> >> chassis based on price. You're making a trade off and you should be
> aware of
> >> the performance hit you will take.
> >>
> >> HTH
> >>
> >> -Mike
> >>
> >> On Nov 27, 2012, at 1:52 PM, Jean-Marc Spaggiari <
> jean-marc@spaggiari.org>
> >> wrote:
> >>
> >>> Hi Michael,
> >>>
> >>> so are you recommanding 32Gb per node?
> >>>
> >>> What about the disks? SATA drives are to slow?
> >>>
> >>> JM
> >>>
> >>> 2012/11/26, Michael Segel <mi...@hotmail.com>:
> >>>> Uhm, those specs are actually now out of date.
> >>>>
> >>>> If you're running HBase, or want to also run R on top of Hadoop, you
> >>>> will
> >>>> need to add more memory.
> >>>> Also forget 1GBe got 10GBe,  and w 2 SATA drives, you will be disk i/o
> >>>> bound
> >>>> way too quickly.
> >>>>
> >>>>
> >>>> On Nov 26, 2012, at 8:05 AM, Marcos Ortiz <ml...@uci.cu> wrote:
> >>>>
> >>>>> Are you asking about hardware recommendations?
> >>>>> Eric Sammer on his "Hadoop Operations" book, did a great job about
> >>>>> this:
> >>>>> For middle size clusters (until 300 nodes):
> >>>>> Processor: A dual quad-core 2.6 Ghz
> >>>>> RAM: 24 GB DDR3
> >>>>> Dual 1 Gb Ethernet NICs
> >>>>> a SAS drive controller
> >>>>> at least two SATA II drives in a JBOD configuration
> >>>>>
> >>>>> The replication factor depends heavily of the primary use of your
> >>>>> cluster.
> >>>>>
> >>>>> On 11/26/2012 08:53 AM, David Charle wrote:
> >>>>>> hi
> >>>>>>
> >>>>>> what's the recommended nodes for NN, hmaster and zk nodes for a
> larger
> >>>>>> cluster, lets say 50-100+
> >>>>>>
> >>>>>> also, what would be the ideal replication factor for larger clusters
> >>>>>> when
> >>>>>> u have 3-4 racks ?
> >>>>>>
> >>>>>> --
> >>>>>> David
> >>>>>> 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS
> >>>>>> INFORMATICAS...
> >>>>>> CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION
> >>>>>>
> >>>>>> http://www.uci.cu
> >>>>>> http://www.facebook.com/universidad.uci
> >>>>>> http://www.flickr.com/photos/universidad_uci
> >>>>>
> >>>>> --
> >>>>>
> >>>>> Marcos Luis Ortíz Valmaseda
> >>>>> about.me/marcosortiz <http://about.me/marcosortiz>
> >>>>> @marcosluis2186 <http://twitter.com/marcosluis2186>
> >>>>>
> >>>>>
> >>>>>
> >>>>> 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS
> >>>>> INFORMATICAS...
> >>>>> CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION
> >>>>>
> >>>>> http://www.uci.cu
> >>>>> http://www.facebook.com/universidad.uci
> >>>>> http://www.flickr.com/photos/universidad_uci
> >>>>
> >>>>
> >>>
> >>
> >>
> >
>
>


-- 
Adrien Mogenet
06.59.16.64.22
http://www.mogenet.me

Re: recommended nodes

Posted by Michael Segel <mi...@hotmail.com>.

Sorry, 

I need to clarify. 

4GB per physical core is a good starting point. 
So with 2 quad core chips, that is going to be 32GB. 

IMHO that's a minimum. If you go with HBase, you will want more. (Actually you will need more.) The next logical jump would be to 48 or 64GB. 

If we start to price out memory, depending on vendor, your company's procurement,  there really isn't much of a price difference in terms of 32,48, or 64 GB. 
Note that it also depends on the chips themselves. Also you need to see how many memory channels exist in the mother board. You may need to buy in pairs or triplets. Your hardware vendor can help you. (Also you need to keep an eye on your hardware vendor. Sometimes they will give you higher density chips that are going to be more expensive...) ;-) 

I tend to like having extra memory from the start.  
It gives you a bit more freedom and also protects you from 'fat' code. 

Looking at YARN... you will need more memory too. 


With respect to the hard drives... 

The best recommendation is to keep the drives as JBOD and then use 3x replication. 
In this case, make sure that the disk controller cards can handle JBOD. (Some don't support JBOD out of the box) 

With respect to RAID... 

If you are running MapR, no need for RAID. 
If you are running an Apache derivative, you could use RAID 1. Then cut your replication to 2X. This makes it easier to manage drive failures. 
(Its not the norm, but it works...) In some clusters, they are using appliances like Net App's e series where the machines see the drives as local attached storage and I think the appliances themselves are using RAID.  I haven't played with this configuration, however it could make sense and its a valid design. 

HTH

-Mike

On Nov 28, 2012, at 10:33 AM, Jean-Marc Spaggiari <je...@spaggiari.org> wrote:

> Hi Mike,
> 
> Thanks for all those details!
> 
> So to simplify the equation, for 16 virtual cores we need 48 to 64GB.
> Which mean 3 to 4GB per core. So with quad cores, 12GB to 16GB are a
> good start? Or I simplified it to much?
> 
> Regarding the hard drives. If you add more than one drive, do you need
> to build them on RAID or similar systems? Or can Hadoop/HBase be
> configured to use more than one drive?
> 
> Thanks,
> 
> JM
> 
> 2012/11/27, Michael Segel <mi...@hotmail.com>:
>> 
>> OK... I don't know why Cloudera is so hung up on 32GB. ;-) [Its an inside
>> joke ...]
>> 
>> So here's the problem...
>> 
>> By default, your child processes in a map/reduce job get a default 512MB.
>> The majority of the time, this gets raised to 1GB.
>> 
>> 8 cores (dual quad cores) shows up at 16 virtual processors in Linux. (Note:
>> This is why when people talk about the number of cores, you have to specify
>> physical cores or logical cores....)
>> 
>> So if you were to over subscribe and have lets say 12  mappers and 12
>> reducers, that's 24 slots. Which means that you would need 24GB of memory
>> reserved just for the child processes. This would leave 8GB for DN, TT and
>> the rest of the linux OS processes.
>> 
>> Can you live with that? Sure.
>> Now add in R, HBase, Impala, or some other set of tools on top of the
>> cluster.
>> 
>> Ooops! Now you are in trouble because you will swap.
>> Also adding in R, you may want to bump up those child procs from 1GB to 2
>> GB. That means the 24 slots would now require 48GB.  Now you have swap and
>> if that happens you will see HBase in a cascading failure.
>> 
>> So while you can do a rolling restart with the changed configuration
>> (reducing the number of mappers and reducers) you end up with less slots
>> which will mean in longer run time for your jobs. (Less slots == less
>> parallelism )
>> 
>> Looking at the price of memory... you can get 48GB or even 64GB  for around
>> the same price point. (8GB chips)
>> 
>> And I didn't even talk about adding SOLR either again a memory hog... ;-)
>> 
>> Note that I matched the number of mappers w reducers. You could go with
>> fewer reducers if you want. I tend to recommend a ratio of 2:1 mappers to
>> reducers, depending on the work flow....
>> 
>> As to the disks... no 7200 SATA III drives are fine. SATA III interface is
>> pretty much available in the new kit being shipped.
>> Its just that you don't have enough drives. 8 cores should be 8 spindles if
>> available.
>> Otherwise you end up seeing your CPU load climb on wait states as the
>> processes wait for the disk i/o to catch up.
>> 
>> I mean you could build out a cluster w 4 x 3 3.5" 2TB drives in a 1 U
>> chassis based on price. You're making a trade off and you should be aware of
>> the performance hit you will take.
>> 
>> HTH
>> 
>> -Mike
>> 
>> On Nov 27, 2012, at 1:52 PM, Jean-Marc Spaggiari <je...@spaggiari.org>
>> wrote:
>> 
>>> Hi Michael,
>>> 
>>> so are you recommanding 32Gb per node?
>>> 
>>> What about the disks? SATA drives are to slow?
>>> 
>>> JM
>>> 
>>> 2012/11/26, Michael Segel <mi...@hotmail.com>:
>>>> Uhm, those specs are actually now out of date.
>>>> 
>>>> If you're running HBase, or want to also run R on top of Hadoop, you
>>>> will
>>>> need to add more memory.
>>>> Also forget 1GBe got 10GBe,  and w 2 SATA drives, you will be disk i/o
>>>> bound
>>>> way too quickly.
>>>> 
>>>> 
>>>> On Nov 26, 2012, at 8:05 AM, Marcos Ortiz <ml...@uci.cu> wrote:
>>>> 
>>>>> Are you asking about hardware recommendations?
>>>>> Eric Sammer on his "Hadoop Operations" book, did a great job about
>>>>> this:
>>>>> For middle size clusters (until 300 nodes):
>>>>> Processor: A dual quad-core 2.6 Ghz
>>>>> RAM: 24 GB DDR3
>>>>> Dual 1 Gb Ethernet NICs
>>>>> a SAS drive controller
>>>>> at least two SATA II drives in a JBOD configuration
>>>>> 
>>>>> The replication factor depends heavily of the primary use of your
>>>>> cluster.
>>>>> 
>>>>> On 11/26/2012 08:53 AM, David Charle wrote:
>>>>>> hi
>>>>>> 
>>>>>> what's the recommended nodes for NN, hmaster and zk nodes for a larger
>>>>>> cluster, lets say 50-100+
>>>>>> 
>>>>>> also, what would be the ideal replication factor for larger clusters
>>>>>> when
>>>>>> u have 3-4 racks ?
>>>>>> 
>>>>>> --
>>>>>> David
>>>>>> 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS
>>>>>> INFORMATICAS...
>>>>>> CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION
>>>>>> 
>>>>>> http://www.uci.cu
>>>>>> http://www.facebook.com/universidad.uci
>>>>>> http://www.flickr.com/photos/universidad_uci
>>>>> 
>>>>> --
>>>>> 
>>>>> Marcos Luis Ortíz Valmaseda
>>>>> about.me/marcosortiz <http://about.me/marcosortiz>
>>>>> @marcosluis2186 <http://twitter.com/marcosluis2186>
>>>>> 
>>>>> 
>>>>> 
>>>>> 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS
>>>>> INFORMATICAS...
>>>>> CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION
>>>>> 
>>>>> http://www.uci.cu
>>>>> http://www.facebook.com/universidad.uci
>>>>> http://www.flickr.com/photos/universidad_uci
>>>> 
>>>> 
>>> 
>> 
>> 
>

Re: recommended nodes

Posted by Jean-Marc Spaggiari <je...@spaggiari.org>.

Hi Mike,

Thanks for all those details!

So to simplify the equation, for 16 virtual cores we need 48 to 64GB.
Which mean 3 to 4GB per core. So with quad cores, 12GB to 16GB are a
good start? Or I simplified it to much?

Regarding the hard drives. If you add more than one drive, do you need
to build them on RAID or similar systems? Or can Hadoop/HBase be
configured to use more than one drive?

Thanks,

JM

2012/11/27, Michael Segel <mi...@hotmail.com>:
>
> OK... I don't know why Cloudera is so hung up on 32GB. ;-) [Its an inside
> joke ...]
>
> So here's the problem...
>
> By default, your child processes in a map/reduce job get a default 512MB.
> The majority of the time, this gets raised to 1GB.
>
> 8 cores (dual quad cores) shows up at 16 virtual processors in Linux. (Note:
> This is why when people talk about the number of cores, you have to specify
> physical cores or logical cores....)
>
> So if you were to over subscribe and have lets say 12  mappers and 12
> reducers, that's 24 slots. Which means that you would need 24GB of memory
> reserved just for the child processes. This would leave 8GB for DN, TT and
> the rest of the linux OS processes.
>
> Can you live with that? Sure.
> Now add in R, HBase, Impala, or some other set of tools on top of the
> cluster.
>
> Ooops! Now you are in trouble because you will swap.
> Also adding in R, you may want to bump up those child procs from 1GB to 2
> GB. That means the 24 slots would now require 48GB.  Now you have swap and
> if that happens you will see HBase in a cascading failure.
>
> So while you can do a rolling restart with the changed configuration
> (reducing the number of mappers and reducers) you end up with less slots
> which will mean in longer run time for your jobs. (Less slots == less
> parallelism )
>
> Looking at the price of memory... you can get 48GB or even 64GB  for around
> the same price point. (8GB chips)
>
> And I didn't even talk about adding SOLR either again a memory hog... ;-)
>
> Note that I matched the number of mappers w reducers. You could go with
> fewer reducers if you want. I tend to recommend a ratio of 2:1 mappers to
> reducers, depending on the work flow....
>
> As to the disks... no 7200 SATA III drives are fine. SATA III interface is
> pretty much available in the new kit being shipped.
> Its just that you don't have enough drives. 8 cores should be 8 spindles if
> available.
> Otherwise you end up seeing your CPU load climb on wait states as the
> processes wait for the disk i/o to catch up.
>
> I mean you could build out a cluster w 4 x 3 3.5" 2TB drives in a 1 U
> chassis based on price. You're making a trade off and you should be aware of
> the performance hit you will take.
>
> HTH
>
> -Mike
>
> On Nov 27, 2012, at 1:52 PM, Jean-Marc Spaggiari <je...@spaggiari.org>
> wrote:
>
>> Hi Michael,
>>
>> so are you recommanding 32Gb per node?
>>
>> What about the disks? SATA drives are to slow?
>>
>> JM
>>
>> 2012/11/26, Michael Segel <mi...@hotmail.com>:
>>> Uhm, those specs are actually now out of date.
>>>
>>> If you're running HBase, or want to also run R on top of Hadoop, you
>>> will
>>> need to add more memory.
>>> Also forget 1GBe got 10GBe,  and w 2 SATA drives, you will be disk i/o
>>> bound
>>> way too quickly.
>>>
>>>
>>> On Nov 26, 2012, at 8:05 AM, Marcos Ortiz <ml...@uci.cu> wrote:
>>>
>>>> Are you asking about hardware recommendations?
>>>> Eric Sammer on his "Hadoop Operations" book, did a great job about
>>>> this:
>>>> For middle size clusters (until 300 nodes):
>>>> Processor: A dual quad-core 2.6 Ghz
>>>> RAM: 24 GB DDR3
>>>> Dual 1 Gb Ethernet NICs
>>>> a SAS drive controller
>>>> at least two SATA II drives in a JBOD configuration
>>>>
>>>> The replication factor depends heavily of the primary use of your
>>>> cluster.
>>>>
>>>> On 11/26/2012 08:53 AM, David Charle wrote:
>>>>> hi
>>>>>
>>>>> what's the recommended nodes for NN, hmaster and zk nodes for a larger
>>>>> cluster, lets say 50-100+
>>>>>
>>>>> also, what would be the ideal replication factor for larger clusters
>>>>> when
>>>>> u have 3-4 racks ?
>>>>>
>>>>> --
>>>>> David
>>>>> 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS
>>>>> INFORMATICAS...
>>>>> CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION
>>>>>
>>>>> http://www.uci.cu
>>>>> http://www.facebook.com/universidad.uci
>>>>> http://www.flickr.com/photos/universidad_uci
>>>>
>>>> --
>>>>
>>>> Marcos Luis Ortíz Valmaseda
>>>> about.me/marcosortiz <http://about.me/marcosortiz>
>>>> @marcosluis2186 <http://twitter.com/marcosluis2186>
>>>>
>>>>
>>>>
>>>> 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS
>>>> INFORMATICAS...
>>>> CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION
>>>>
>>>> http://www.uci.cu
>>>> http://www.facebook.com/universidad.uci
>>>> http://www.flickr.com/photos/universidad_uci
>>>
>>>
>>
>
>

Re: recommended nodes

Posted by Michael Segel <mi...@hotmail.com>.

OK... I don't know why Cloudera is so hung up on 32GB. ;-) [Its an inside joke ...]

So here's the problem... 

By default, your child processes in a map/reduce job get a default 512MB. The majority of the time, this gets raised to 1GB.

8 cores (dual quad cores) shows up at 16 virtual processors in Linux. (Note: This is why when people talk about the number of cores, you have to specify physical cores or logical cores....) 

So if you were to over subscribe and have lets say 12  mappers and 12 reducers, that's 24 slots. Which means that you would need 24GB of memory reserved just for the child processes. This would leave 8GB for DN, TT and the rest of the linux OS processes. 

Can you live with that? Sure. 
Now add in R, HBase, Impala, or some other set of tools on top of the cluster. 

Ooops! Now you are in trouble because you will swap. 
Also adding in R, you may want to bump up those child procs from 1GB to 2 GB. That means the 24 slots would now require 48GB.  Now you have swap and if that happens you will see HBase in a cascading failure. 

So while you can do a rolling restart with the changed configuration (reducing the number of mappers and reducers) you end up with less slots which will mean in longer run time for your jobs. (Less slots == less parallelism ) 

Looking at the price of memory... you can get 48GB or even 64GB  for around the same price point. (8GB chips) 

And I didn't even talk about adding SOLR either again a memory hog... ;-) 

Note that I matched the number of mappers w reducers. You could go with fewer reducers if you want. I tend to recommend a ratio of 2:1 mappers to reducers, depending on the work flow.... 

As to the disks... no 7200 SATA III drives are fine. SATA III interface is pretty much available in the new kit being shipped. 
Its just that you don't have enough drives. 8 cores should be 8 spindles if available. 
Otherwise you end up seeing your CPU load climb on wait states as the processes wait for the disk i/o to catch up. 

I mean you could build out a cluster w 4 x 3 3.5" 2TB drives in a 1 U chassis based on price. You're making a trade off and you should be aware of the performance hit you will take. 

HTH 

-Mike

On Nov 27, 2012, at 1:52 PM, Jean-Marc Spaggiari <je...@spaggiari.org> wrote:

> Hi Michael,
> 
> so are you recommanding 32Gb per node?
> 
> What about the disks? SATA drives are to slow?
> 
> JM
> 
> 2012/11/26, Michael Segel <mi...@hotmail.com>:
>> Uhm, those specs are actually now out of date.
>> 
>> If you're running HBase, or want to also run R on top of Hadoop, you will
>> need to add more memory.
>> Also forget 1GBe got 10GBe,  and w 2 SATA drives, you will be disk i/o bound
>> way too quickly.
>> 
>> 
>> On Nov 26, 2012, at 8:05 AM, Marcos Ortiz <ml...@uci.cu> wrote:
>> 
>>> Are you asking about hardware recommendations?
>>> Eric Sammer on his "Hadoop Operations" book, did a great job about this:
>>> For middle size clusters (until 300 nodes):
>>> Processor: A dual quad-core 2.6 Ghz
>>> RAM: 24 GB DDR3
>>> Dual 1 Gb Ethernet NICs
>>> a SAS drive controller
>>> at least two SATA II drives in a JBOD configuration
>>> 
>>> The replication factor depends heavily of the primary use of your
>>> cluster.
>>> 
>>> On 11/26/2012 08:53 AM, David Charle wrote:
>>>> hi
>>>> 
>>>> what's the recommended nodes for NN, hmaster and zk nodes for a larger
>>>> cluster, lets say 50-100+
>>>> 
>>>> also, what would be the ideal replication factor for larger clusters when
>>>> u have 3-4 racks ?
>>>> 
>>>> --
>>>> David
>>>> 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS
>>>> INFORMATICAS...
>>>> CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION
>>>> 
>>>> http://www.uci.cu
>>>> http://www.facebook.com/universidad.uci
>>>> http://www.flickr.com/photos/universidad_uci
>>> 
>>> --
>>> 
>>> Marcos Luis Ortíz Valmaseda
>>> about.me/marcosortiz <http://about.me/marcosortiz>
>>> @marcosluis2186 <http://twitter.com/marcosluis2186>
>>> 
>>> 
>>> 
>>> 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS
>>> INFORMATICAS...
>>> CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION
>>> 
>>> http://www.uci.cu
>>> http://www.facebook.com/universidad.uci
>>> http://www.flickr.com/photos/universidad_uci
>> 
>> 
>

Re: recommended nodes

Posted by Jean-Marc Spaggiari <je...@spaggiari.org>.

Hi Michael,

so are you recommanding 32Gb per node?

What about the disks? SATA drives are to slow?

JM

2012/11/26, Michael Segel <mi...@hotmail.com>:
> Uhm, those specs are actually now out of date.
>
> If you're running HBase, or want to also run R on top of Hadoop, you will
> need to add more memory.
> Also forget 1GBe got 10GBe,  and w 2 SATA drives, you will be disk i/o bound
> way too quickly.
>
>
> On Nov 26, 2012, at 8:05 AM, Marcos Ortiz <ml...@uci.cu> wrote:
>
>> Are you asking about hardware recommendations?
>> Eric Sammer on his "Hadoop Operations" book, did a great job about this:
>> For middle size clusters (until 300 nodes):
>> Processor: A dual quad-core 2.6 Ghz
>> RAM: 24 GB DDR3
>> Dual 1 Gb Ethernet NICs
>> a SAS drive controller
>> at least two SATA II drives in a JBOD configuration
>>
>> The replication factor depends heavily of the primary use of your
>> cluster.
>>
>> On 11/26/2012 08:53 AM, David Charle wrote:
>>> hi
>>>
>>> what's the recommended nodes for NN, hmaster and zk nodes for a larger
>>> cluster, lets say 50-100+
>>>
>>> also, what would be the ideal replication factor for larger clusters when
>>> u have 3-4 racks ?
>>>
>>> --
>>> David
>>> 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS
>>> INFORMATICAS...
>>> CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION
>>>
>>> http://www.uci.cu
>>> http://www.facebook.com/universidad.uci
>>> http://www.flickr.com/photos/universidad_uci
>>
>> --
>>
>> Marcos Luis Ortíz Valmaseda
>> about.me/marcosortiz <http://about.me/marcosortiz>
>> @marcosluis2186 <http://twitter.com/marcosluis2186>
>>
>>
>>
>> 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS
>> INFORMATICAS...
>> CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION
>>
>> http://www.uci.cu
>> http://www.facebook.com/universidad.uci
>> http://www.flickr.com/photos/universidad_uci
>
>

Re: recommended nodes

Posted by Michael Segel <mi...@hotmail.com>.

Uhm, those specs are actually now out of date. 

If you're running HBase, or want to also run R on top of Hadoop, you will need to add more memory. 
Also forget 1GBe got 10GBe,  and w 2 SATA drives, you will be disk i/o bound way too quickly. 


On Nov 26, 2012, at 8:05 AM, Marcos Ortiz <ml...@uci.cu> wrote:

> Are you asking about hardware recommendations?
> Eric Sammer on his "Hadoop Operations" book, did a great job about this:
> For middle size clusters (until 300 nodes):
> Processor: A dual quad-core 2.6 Ghz
> RAM: 24 GB DDR3
> Dual 1 Gb Ethernet NICs
> a SAS drive controller
> at least two SATA II drives in a JBOD configuration
> 
> The replication factor depends heavily of the primary use of your cluster.
> 
> On 11/26/2012 08:53 AM, David Charle wrote:
>> hi
>> 
>> what's the recommended nodes for NN, hmaster and zk nodes for a larger cluster, lets say 50-100+
>> 
>> also, what would be the ideal replication factor for larger clusters when u have 3-4 racks ?
>> 
>> --
>> David
>> 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS INFORMATICAS...
>> CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION
>> 
>> http://www.uci.cu
>> http://www.facebook.com/universidad.uci
>> http://www.flickr.com/photos/universidad_uci
> 
> -- 
> 
> Marcos Luis Ortíz Valmaseda
> about.me/marcosortiz <http://about.me/marcosortiz>
> @marcosluis2186 <http://twitter.com/marcosluis2186>
> 
> 
> 
> 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS INFORMATICAS...
> CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION
> 
> http://www.uci.cu
> http://www.facebook.com/universidad.uci
> http://www.flickr.com/photos/universidad_uci

Re: recommended nodes

Posted by Marcos Ortiz <ml...@uci.cu>.

Are you asking about hardware recommendations?
Eric Sammer on his "Hadoop Operations" book, did a great job about this:
For middle size clusters (until 300 nodes):
Processor: A dual quad-core 2.6 Ghz
RAM: 24 GB DDR3
Dual 1 Gb Ethernet NICs
a SAS drive controller
at least two SATA II drives in a JBOD configuration

The replication factor depends heavily of the primary use of your cluster.

On 11/26/2012 08:53 AM, David Charle wrote:
> hi
>
> what's the recommended nodes for NN, hmaster and zk nodes for a larger cluster, lets say 50-100+
>
> also, what would be the ideal replication factor for larger clusters when u have 3-4 racks ?
>
> --
> David
> 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS INFORMATICAS...
> CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION
>
> http://www.uci.cu
> http://www.facebook.com/universidad.uci
> http://www.flickr.com/photos/universidad_uci

-- 

Marcos Luis Ortíz Valmaseda
about.me/marcosortiz <http://about.me/marcosortiz>
@marcosluis2186 <http://twitter.com/marcosluis2186>

10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS INFORMATICAS...
CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION

http://www.uci.cu
http://www.facebook.com/universidad.uci
http://www.flickr.com/photos/universidad_uci