You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Sandeep Reddy P <sa...@gmail.com> on 2012/06/13 16:36:59 UTC

Hardware specs calculation for io

Hi,
 I need to know difference between two hardware configurations below for
24TB of data. (slave machines only for hadoop,hive and pig)

TYPE A: 2 quad core, 32 GB memory, 6 x 1TB drives(6TB / machine)

TYPE B: 4 quad core, 48 GB memory, 12 x 1TB drives (12TB / machine)

suppose we choose 4 type A machines for 24tb of data and 2 type b machines
for 24 tb data. Assuming disk io speed is constant (7200 RPM sata), cost is
same for 4Type A and 2 Type B machines.

I need which type of machines will give me best results in terms of
performance.


-- 
Thanks,
sandeep

Re: Hardware specs calculation for io

Posted by Sandeep Reddy P <sa...@gmail.com>.
Thanks for the reply Matt,
We have 6TB of raw data. We are io bound.


On Wed, Jun 13, 2012 at 11:44 AM, Matt Davies <ma...@mattdavies.net> wrote:

> Sandeep,
>
> I think one critical piece missing is whether or not you are counting the
> 24 TB as raw or as replicated.  In a standard environment with a rep factor
> of 3 you really need 72 TB disk space which triples your hardware
> requirements.
>
> Regardless, my experience has been to favor A and scale out vs a scale up.
>  A simple metric might be a 2 quad core would equate to 8+ worker threads
> and B would be 16+.  So, if you take out 1-2 GB for OS, 1 GB JT, and 1 GB
> for DN you have 28/8 (~3.5) for each worker.  The same overhead on B would
> be 44/16 (2.75 GB ) per worker.  This is but one metric.
>
> The other is amount of HD per core.  I've heard anywhere from .8 to 1.5 TB/
> core so that would definitely favor A.
>
> Perhaps the biggest factor of all is expected workload.  Will you be
> computationally bound or IO bound?  I.e. all things being equal
> hardware-wise will you be spending most of your time crunching or reading
> data?
>
> A few thoughts.
>
> -Matt
>
> On Wed, Jun 13, 2012 at 7:36 AM, Sandeep Reddy P <
> sandeepreddy.3647@gmail.com> wrote:
>
> > Hi,
> >  I need to know difference between two hardware configurations below for
> > 24TB of data. (slave machines only for hadoop,hive and pig)
> >
> > TYPE A: 2 quad core, 32 GB memory, 6 x 1TB drives(6TB / machine)
> >
> > TYPE B: 4 quad core, 48 GB memory, 12 x 1TB drives (12TB / machine)
> >
> > suppose we choose 4 type A machines for 24tb of data and 2 type b
> machines
> > for 24 tb data. Assuming disk io speed is constant (7200 RPM sata), cost
> is
> > same for 4Type A and 2 Type B machines.
> >
> > I need which type of machines will give me best results in terms of
> > performance.
> >
> >
> > --
> > Thanks,
> > sandeep
> >
>



-- 
Thanks,
sandeep

Re: Hardware specs calculation for io

Posted by Matt Davies <ma...@mattdavies.net>.
Sandeep,

I think one critical piece missing is whether or not you are counting the
24 TB as raw or as replicated.  In a standard environment with a rep factor
of 3 you really need 72 TB disk space which triples your hardware
requirements.

Regardless, my experience has been to favor A and scale out vs a scale up.
 A simple metric might be a 2 quad core would equate to 8+ worker threads
and B would be 16+.  So, if you take out 1-2 GB for OS, 1 GB JT, and 1 GB
for DN you have 28/8 (~3.5) for each worker.  The same overhead on B would
be 44/16 (2.75 GB ) per worker.  This is but one metric.

The other is amount of HD per core.  I've heard anywhere from .8 to 1.5 TB/
core so that would definitely favor A.

Perhaps the biggest factor of all is expected workload.  Will you be
computationally bound or IO bound?  I.e. all things being equal
hardware-wise will you be spending most of your time crunching or reading
data?

A few thoughts.

-Matt

On Wed, Jun 13, 2012 at 7:36 AM, Sandeep Reddy P <
sandeepreddy.3647@gmail.com> wrote:

> Hi,
>  I need to know difference between two hardware configurations below for
> 24TB of data. (slave machines only for hadoop,hive and pig)
>
> TYPE A: 2 quad core, 32 GB memory, 6 x 1TB drives(6TB / machine)
>
> TYPE B: 4 quad core, 48 GB memory, 12 x 1TB drives (12TB / machine)
>
> suppose we choose 4 type A machines for 24tb of data and 2 type b machines
> for 24 tb data. Assuming disk io speed is constant (7200 RPM sata), cost is
> same for 4Type A and 2 Type B machines.
>
> I need which type of machines will give me best results in terms of
> performance.
>
>
> --
> Thanks,
> sandeep
>

Re: Hardware specs calculation for io

Posted by Michael Segel <mi...@hotmail.com>.
You will want something in between...

8 cores means 8 spindles. 

16 cores means 16 spindles. 

You may want to up the memory, especially if you're running or thinking about running HBase. 

If you go beyond 4 spindles, you will saturate your 1GBe link. If you think about Type B, you will need 10GBe.


On Jun 13, 2012, at 9:36 AM, Sandeep Reddy P wrote:

> Hi,
> I need to know difference between two hardware configurations below for
> 24TB of data. (slave machines only for hadoop,hive and pig)
> 
> TYPE A: 2 quad core, 32 GB memory, 6 x 1TB drives(6TB / machine)
> 
> TYPE B: 4 quad core, 48 GB memory, 12 x 1TB drives (12TB / machine)
> 
> suppose we choose 4 type A machines for 24tb of data and 2 type b machines
> for 24 tb data. Assuming disk io speed is constant (7200 RPM sata), cost is
> same for 4Type A and 2 Type B machines.
> 
> I need which type of machines will give me best results in terms of
> performance.
> 
> 
> -- 
> Thanks,
> sandeep