You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Sandeep Reddy P <sa...@gmail.com> on 2012/06/13 16:36:59 UTC
Hardware specs calculation for io
Hi,
I need to know difference between two hardware configurations below for
24TB of data. (slave machines only for hadoop,hive and pig)
TYPE A: 2 quad core, 32 GB memory, 6 x 1TB drives(6TB / machine)
TYPE B: 4 quad core, 48 GB memory, 12 x 1TB drives (12TB / machine)
suppose we choose 4 type A machines for 24tb of data and 2 type b machines
for 24 tb data. Assuming disk io speed is constant (7200 RPM sata), cost is
same for 4Type A and 2 Type B machines.
I need which type of machines will give me best results in terms of
performance.
--
Thanks,
sandeep
Re: Hardware specs calculation for io
Posted by Sandeep Reddy P <sa...@gmail.com>.
Thanks for the reply Matt,
We have 6TB of raw data. We are io bound.
On Wed, Jun 13, 2012 at 11:44 AM, Matt Davies <ma...@mattdavies.net> wrote:
> Sandeep,
>
> I think one critical piece missing is whether or not you are counting the
> 24 TB as raw or as replicated. In a standard environment with a rep factor
> of 3 you really need 72 TB disk space which triples your hardware
> requirements.
>
> Regardless, my experience has been to favor A and scale out vs a scale up.
> A simple metric might be a 2 quad core would equate to 8+ worker threads
> and B would be 16+. So, if you take out 1-2 GB for OS, 1 GB JT, and 1 GB
> for DN you have 28/8 (~3.5) for each worker. The same overhead on B would
> be 44/16 (2.75 GB ) per worker. This is but one metric.
>
> The other is amount of HD per core. I've heard anywhere from .8 to 1.5 TB/
> core so that would definitely favor A.
>
> Perhaps the biggest factor of all is expected workload. Will you be
> computationally bound or IO bound? I.e. all things being equal
> hardware-wise will you be spending most of your time crunching or reading
> data?
>
> A few thoughts.
>
> -Matt
>
> On Wed, Jun 13, 2012 at 7:36 AM, Sandeep Reddy P <
> sandeepreddy.3647@gmail.com> wrote:
>
> > Hi,
> > I need to know difference between two hardware configurations below for
> > 24TB of data. (slave machines only for hadoop,hive and pig)
> >
> > TYPE A: 2 quad core, 32 GB memory, 6 x 1TB drives(6TB / machine)
> >
> > TYPE B: 4 quad core, 48 GB memory, 12 x 1TB drives (12TB / machine)
> >
> > suppose we choose 4 type A machines for 24tb of data and 2 type b
> machines
> > for 24 tb data. Assuming disk io speed is constant (7200 RPM sata), cost
> is
> > same for 4Type A and 2 Type B machines.
> >
> > I need which type of machines will give me best results in terms of
> > performance.
> >
> >
> > --
> > Thanks,
> > sandeep
> >
>
--
Thanks,
sandeep
Re: Hardware specs calculation for io
Posted by Matt Davies <ma...@mattdavies.net>.
Sandeep,
I think one critical piece missing is whether or not you are counting the
24 TB as raw or as replicated. In a standard environment with a rep factor
of 3 you really need 72 TB disk space which triples your hardware
requirements.
Regardless, my experience has been to favor A and scale out vs a scale up.
A simple metric might be a 2 quad core would equate to 8+ worker threads
and B would be 16+. So, if you take out 1-2 GB for OS, 1 GB JT, and 1 GB
for DN you have 28/8 (~3.5) for each worker. The same overhead on B would
be 44/16 (2.75 GB ) per worker. This is but one metric.
The other is amount of HD per core. I've heard anywhere from .8 to 1.5 TB/
core so that would definitely favor A.
Perhaps the biggest factor of all is expected workload. Will you be
computationally bound or IO bound? I.e. all things being equal
hardware-wise will you be spending most of your time crunching or reading
data?
A few thoughts.
-Matt
On Wed, Jun 13, 2012 at 7:36 AM, Sandeep Reddy P <
sandeepreddy.3647@gmail.com> wrote:
> Hi,
> I need to know difference between two hardware configurations below for
> 24TB of data. (slave machines only for hadoop,hive and pig)
>
> TYPE A: 2 quad core, 32 GB memory, 6 x 1TB drives(6TB / machine)
>
> TYPE B: 4 quad core, 48 GB memory, 12 x 1TB drives (12TB / machine)
>
> suppose we choose 4 type A machines for 24tb of data and 2 type b machines
> for 24 tb data. Assuming disk io speed is constant (7200 RPM sata), cost is
> same for 4Type A and 2 Type B machines.
>
> I need which type of machines will give me best results in terms of
> performance.
>
>
> --
> Thanks,
> sandeep
>
Re: Hardware specs calculation for io
Posted by Michael Segel <mi...@hotmail.com>.
You will want something in between...
8 cores means 8 spindles.
16 cores means 16 spindles.
You may want to up the memory, especially if you're running or thinking about running HBase.
If you go beyond 4 spindles, you will saturate your 1GBe link. If you think about Type B, you will need 10GBe.
On Jun 13, 2012, at 9:36 AM, Sandeep Reddy P wrote:
> Hi,
> I need to know difference between two hardware configurations below for
> 24TB of data. (slave machines only for hadoop,hive and pig)
>
> TYPE A: 2 quad core, 32 GB memory, 6 x 1TB drives(6TB / machine)
>
> TYPE B: 4 quad core, 48 GB memory, 12 x 1TB drives (12TB / machine)
>
> suppose we choose 4 type A machines for 24tb of data and 2 type b machines
> for 24 tb data. Assuming disk io speed is constant (7200 RPM sata), cost is
> same for 4Type A and 2 Type B machines.
>
> I need which type of machines will give me best results in terms of
> performance.
>
>
> --
> Thanks,
> sandeep