You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Bhagaban Khatai <em...@gmail.com> on 2015/05/29 08:00:39 UTC

Cluster sizing

Hi,

I wanted to know how I can determine how many nodes with cores/storage in
TB and RAM needed, if I will receieve the data volume increase from 1TB to
100TB per day. Can someone help me here to create a excel based on this.

Thanks

Re: Cluster sizing

Posted by Krishna Kalyan <kr...@gmail.com>.
Hi Bhagban,
Some nice articles you would want to look at:
http://info.hortonworks.com/SizingGuide.html
http://blog.cloudera.com/blog/2013/08/how-to-select-the-right-hardware-for-your-new-hadoop-cluster/

I would say that initially you should aim at 50% of cluster utilization.
Also how fast do need this data processed ?. This would require more slots.
(Generally one slot handels 500mb to 1GB).

Best,
Krishna

On Fri, May 29, 2015 at 12:42 PM, Bhagaban Khatai <em...@gmail.com>
wrote:

> Thanks Ashish for your help.
>
> We dont have any clear picture and we are approching few clients on this
> and many clients are asking for configuration on cluster size.
>
> one customer gave us few requirement like, they will process 100TB data
> per day and we need to come up with nodes details and how much memory and
> core required for this. note: 100TB is without relication.
>
> We may be going with Cloudera.
>
> Please suggest.
>
> On Fri, May 29, 2015 at 11:45 AM, Ashish Kumar9 <as...@in.ibm.com>
> wrote:
>
>> Can you share some more inputs on requirement .
>>
>> What is the analytics usecase ? ( Batch Processing , Real Time ,
>> In-Memory Requirements )
>> Which distribution of Hadoop ?
>> What is the storage growth rate ?
>> What are the data ingest requirements ?
>> What kind of jobs will run on the cluster ?
>> What is the nature of data ? Is data compression applicable ?
>> What is the HA requirements ? What is the performance expectations ?
>>
>> Based on these requirements , you would have to design compute , storage
>> and also network elements
>>
>> Thanks and Regards,
>> Ashish Kumar
>> IBM Systems BigData Analytics Solutions Architect
>>
>>
>> From:        Bhagaban Khatai <em...@gmail.com>
>> To:        user@hadoop.apache.org
>> Date:        05/29/2015 11:32 AM
>> Subject:        Cluster sizing
>> ------------------------------
>>
>>
>>
>> Hi,
>>
>> I wanted to know how I can determine how many nodes with cores/storage in
>> TB and RAM needed, if I will receieve the data volume increase from 1TB to
>> 100TB per day. Can someone help me here to create a excel based on this.
>>
>> Thanks
>>
>
>

Re: Cluster sizing

Posted by Krishna Kalyan <kr...@gmail.com>.
Hi Bhagban,
Some nice articles you would want to look at:
http://info.hortonworks.com/SizingGuide.html
http://blog.cloudera.com/blog/2013/08/how-to-select-the-right-hardware-for-your-new-hadoop-cluster/

I would say that initially you should aim at 50% of cluster utilization.
Also how fast do need this data processed ?. This would require more slots.
(Generally one slot handels 500mb to 1GB).

Best,
Krishna

On Fri, May 29, 2015 at 12:42 PM, Bhagaban Khatai <em...@gmail.com>
wrote:

> Thanks Ashish for your help.
>
> We dont have any clear picture and we are approching few clients on this
> and many clients are asking for configuration on cluster size.
>
> one customer gave us few requirement like, they will process 100TB data
> per day and we need to come up with nodes details and how much memory and
> core required for this. note: 100TB is without relication.
>
> We may be going with Cloudera.
>
> Please suggest.
>
> On Fri, May 29, 2015 at 11:45 AM, Ashish Kumar9 <as...@in.ibm.com>
> wrote:
>
>> Can you share some more inputs on requirement .
>>
>> What is the analytics usecase ? ( Batch Processing , Real Time ,
>> In-Memory Requirements )
>> Which distribution of Hadoop ?
>> What is the storage growth rate ?
>> What are the data ingest requirements ?
>> What kind of jobs will run on the cluster ?
>> What is the nature of data ? Is data compression applicable ?
>> What is the HA requirements ? What is the performance expectations ?
>>
>> Based on these requirements , you would have to design compute , storage
>> and also network elements
>>
>> Thanks and Regards,
>> Ashish Kumar
>> IBM Systems BigData Analytics Solutions Architect
>>
>>
>> From:        Bhagaban Khatai <em...@gmail.com>
>> To:        user@hadoop.apache.org
>> Date:        05/29/2015 11:32 AM
>> Subject:        Cluster sizing
>> ------------------------------
>>
>>
>>
>> Hi,
>>
>> I wanted to know how I can determine how many nodes with cores/storage in
>> TB and RAM needed, if I will receieve the data volume increase from 1TB to
>> 100TB per day. Can someone help me here to create a excel based on this.
>>
>> Thanks
>>
>
>

Re: Cluster sizing

Posted by Krishna Kalyan <kr...@gmail.com>.
Hi Bhagban,
Some nice articles you would want to look at:
http://info.hortonworks.com/SizingGuide.html
http://blog.cloudera.com/blog/2013/08/how-to-select-the-right-hardware-for-your-new-hadoop-cluster/

I would say that initially you should aim at 50% of cluster utilization.
Also how fast do need this data processed ?. This would require more slots.
(Generally one slot handels 500mb to 1GB).

Best,
Krishna

On Fri, May 29, 2015 at 12:42 PM, Bhagaban Khatai <em...@gmail.com>
wrote:

> Thanks Ashish for your help.
>
> We dont have any clear picture and we are approching few clients on this
> and many clients are asking for configuration on cluster size.
>
> one customer gave us few requirement like, they will process 100TB data
> per day and we need to come up with nodes details and how much memory and
> core required for this. note: 100TB is without relication.
>
> We may be going with Cloudera.
>
> Please suggest.
>
> On Fri, May 29, 2015 at 11:45 AM, Ashish Kumar9 <as...@in.ibm.com>
> wrote:
>
>> Can you share some more inputs on requirement .
>>
>> What is the analytics usecase ? ( Batch Processing , Real Time ,
>> In-Memory Requirements )
>> Which distribution of Hadoop ?
>> What is the storage growth rate ?
>> What are the data ingest requirements ?
>> What kind of jobs will run on the cluster ?
>> What is the nature of data ? Is data compression applicable ?
>> What is the HA requirements ? What is the performance expectations ?
>>
>> Based on these requirements , you would have to design compute , storage
>> and also network elements
>>
>> Thanks and Regards,
>> Ashish Kumar
>> IBM Systems BigData Analytics Solutions Architect
>>
>>
>> From:        Bhagaban Khatai <em...@gmail.com>
>> To:        user@hadoop.apache.org
>> Date:        05/29/2015 11:32 AM
>> Subject:        Cluster sizing
>> ------------------------------
>>
>>
>>
>> Hi,
>>
>> I wanted to know how I can determine how many nodes with cores/storage in
>> TB and RAM needed, if I will receieve the data volume increase from 1TB to
>> 100TB per day. Can someone help me here to create a excel based on this.
>>
>> Thanks
>>
>
>

Re: Cluster sizing

Posted by Krishna Kalyan <kr...@gmail.com>.
Hi Bhagban,
Some nice articles you would want to look at:
http://info.hortonworks.com/SizingGuide.html
http://blog.cloudera.com/blog/2013/08/how-to-select-the-right-hardware-for-your-new-hadoop-cluster/

I would say that initially you should aim at 50% of cluster utilization.
Also how fast do need this data processed ?. This would require more slots.
(Generally one slot handels 500mb to 1GB).

Best,
Krishna

On Fri, May 29, 2015 at 12:42 PM, Bhagaban Khatai <em...@gmail.com>
wrote:

> Thanks Ashish for your help.
>
> We dont have any clear picture and we are approching few clients on this
> and many clients are asking for configuration on cluster size.
>
> one customer gave us few requirement like, they will process 100TB data
> per day and we need to come up with nodes details and how much memory and
> core required for this. note: 100TB is without relication.
>
> We may be going with Cloudera.
>
> Please suggest.
>
> On Fri, May 29, 2015 at 11:45 AM, Ashish Kumar9 <as...@in.ibm.com>
> wrote:
>
>> Can you share some more inputs on requirement .
>>
>> What is the analytics usecase ? ( Batch Processing , Real Time ,
>> In-Memory Requirements )
>> Which distribution of Hadoop ?
>> What is the storage growth rate ?
>> What are the data ingest requirements ?
>> What kind of jobs will run on the cluster ?
>> What is the nature of data ? Is data compression applicable ?
>> What is the HA requirements ? What is the performance expectations ?
>>
>> Based on these requirements , you would have to design compute , storage
>> and also network elements
>>
>> Thanks and Regards,
>> Ashish Kumar
>> IBM Systems BigData Analytics Solutions Architect
>>
>>
>> From:        Bhagaban Khatai <em...@gmail.com>
>> To:        user@hadoop.apache.org
>> Date:        05/29/2015 11:32 AM
>> Subject:        Cluster sizing
>> ------------------------------
>>
>>
>>
>> Hi,
>>
>> I wanted to know how I can determine how many nodes with cores/storage in
>> TB and RAM needed, if I will receieve the data volume increase from 1TB to
>> 100TB per day. Can someone help me here to create a excel based on this.
>>
>> Thanks
>>
>
>

Re: Cluster sizing

Posted by Bhagaban Khatai <em...@gmail.com>.
Thanks Ashish for your help.

We dont have any clear picture and we are approching few clients on this
and many clients are asking for configuration on cluster size.

one customer gave us few requirement like, they will process 100TB data per
day and we need to come up with nodes details and how much memory and core
required for this. note: 100TB is without relication.

We may be going with Cloudera.

Please suggest.

On Fri, May 29, 2015 at 11:45 AM, Ashish Kumar9 <as...@in.ibm.com> wrote:

> Can you share some more inputs on requirement .
>
> What is the analytics usecase ? ( Batch Processing , Real Time , In-Memory
> Requirements )
> Which distribution of Hadoop ?
> What is the storage growth rate ?
> What are the data ingest requirements ?
> What kind of jobs will run on the cluster ?
> What is the nature of data ? Is data compression applicable ?
> What is the HA requirements ? What is the performance expectations ?
>
> Based on these requirements , you would have to design compute , storage
> and also network elements
>
> Thanks and Regards,
> Ashish Kumar
> IBM Systems BigData Analytics Solutions Architect
>
>
> From:        Bhagaban Khatai <em...@gmail.com>
> To:        user@hadoop.apache.org
> Date:        05/29/2015 11:32 AM
> Subject:        Cluster sizing
> ------------------------------
>
>
>
> Hi,
>
> I wanted to know how I can determine how many nodes with cores/storage in
> TB and RAM needed, if I will receieve the data volume increase from 1TB to
> 100TB per day. Can someone help me here to create a excel based on this.
>
> Thanks
>

Re: Cluster sizing

Posted by Bhagaban Khatai <em...@gmail.com>.
Thanks Ashish for your help.

We dont have any clear picture and we are approching few clients on this
and many clients are asking for configuration on cluster size.

one customer gave us few requirement like, they will process 100TB data per
day and we need to come up with nodes details and how much memory and core
required for this. note: 100TB is without relication.

We may be going with Cloudera.

Please suggest.

On Fri, May 29, 2015 at 11:45 AM, Ashish Kumar9 <as...@in.ibm.com> wrote:

> Can you share some more inputs on requirement .
>
> What is the analytics usecase ? ( Batch Processing , Real Time , In-Memory
> Requirements )
> Which distribution of Hadoop ?
> What is the storage growth rate ?
> What are the data ingest requirements ?
> What kind of jobs will run on the cluster ?
> What is the nature of data ? Is data compression applicable ?
> What is the HA requirements ? What is the performance expectations ?
>
> Based on these requirements , you would have to design compute , storage
> and also network elements
>
> Thanks and Regards,
> Ashish Kumar
> IBM Systems BigData Analytics Solutions Architect
>
>
> From:        Bhagaban Khatai <em...@gmail.com>
> To:        user@hadoop.apache.org
> Date:        05/29/2015 11:32 AM
> Subject:        Cluster sizing
> ------------------------------
>
>
>
> Hi,
>
> I wanted to know how I can determine how many nodes with cores/storage in
> TB and RAM needed, if I will receieve the data volume increase from 1TB to
> 100TB per day. Can someone help me here to create a excel based on this.
>
> Thanks
>

Re: Cluster sizing

Posted by Bhagaban Khatai <em...@gmail.com>.
Thanks Ashish for your help.

We dont have any clear picture and we are approching few clients on this
and many clients are asking for configuration on cluster size.

one customer gave us few requirement like, they will process 100TB data per
day and we need to come up with nodes details and how much memory and core
required for this. note: 100TB is without relication.

We may be going with Cloudera.

Please suggest.

On Fri, May 29, 2015 at 11:45 AM, Ashish Kumar9 <as...@in.ibm.com> wrote:

> Can you share some more inputs on requirement .
>
> What is the analytics usecase ? ( Batch Processing , Real Time , In-Memory
> Requirements )
> Which distribution of Hadoop ?
> What is the storage growth rate ?
> What are the data ingest requirements ?
> What kind of jobs will run on the cluster ?
> What is the nature of data ? Is data compression applicable ?
> What is the HA requirements ? What is the performance expectations ?
>
> Based on these requirements , you would have to design compute , storage
> and also network elements
>
> Thanks and Regards,
> Ashish Kumar
> IBM Systems BigData Analytics Solutions Architect
>
>
> From:        Bhagaban Khatai <em...@gmail.com>
> To:        user@hadoop.apache.org
> Date:        05/29/2015 11:32 AM
> Subject:        Cluster sizing
> ------------------------------
>
>
>
> Hi,
>
> I wanted to know how I can determine how many nodes with cores/storage in
> TB and RAM needed, if I will receieve the data volume increase from 1TB to
> 100TB per day. Can someone help me here to create a excel based on this.
>
> Thanks
>

Re: Cluster sizing

Posted by Bhagaban Khatai <em...@gmail.com>.
Thanks Ashish for your help.

We dont have any clear picture and we are approching few clients on this
and many clients are asking for configuration on cluster size.

one customer gave us few requirement like, they will process 100TB data per
day and we need to come up with nodes details and how much memory and core
required for this. note: 100TB is without relication.

We may be going with Cloudera.

Please suggest.

On Fri, May 29, 2015 at 11:45 AM, Ashish Kumar9 <as...@in.ibm.com> wrote:

> Can you share some more inputs on requirement .
>
> What is the analytics usecase ? ( Batch Processing , Real Time , In-Memory
> Requirements )
> Which distribution of Hadoop ?
> What is the storage growth rate ?
> What are the data ingest requirements ?
> What kind of jobs will run on the cluster ?
> What is the nature of data ? Is data compression applicable ?
> What is the HA requirements ? What is the performance expectations ?
>
> Based on these requirements , you would have to design compute , storage
> and also network elements
>
> Thanks and Regards,
> Ashish Kumar
> IBM Systems BigData Analytics Solutions Architect
>
>
> From:        Bhagaban Khatai <em...@gmail.com>
> To:        user@hadoop.apache.org
> Date:        05/29/2015 11:32 AM
> Subject:        Cluster sizing
> ------------------------------
>
>
>
> Hi,
>
> I wanted to know how I can determine how many nodes with cores/storage in
> TB and RAM needed, if I will receieve the data volume increase from 1TB to
> 100TB per day. Can someone help me here to create a excel based on this.
>
> Thanks
>

Re: Cluster sizing

Posted by Ashish Kumar9 <as...@in.ibm.com>.
Can you share some more inputs on requirement .

What is the analytics usecase ? ( Batch Processing , Real Time , In-Memory 
Requirements ) 
Which distribution of Hadoop ?
What is the storage growth rate ?
What are the data ingest requirements ?
What kind of jobs will run on the cluster ?
What is the nature of data ? Is data compression applicable ? 
What is the HA requirements ? What is the performance expectations ? 

Based on these requirements , you would have to design compute , storage 
and also network elements 

Thanks and Regards,
Ashish Kumar
IBM Systems BigData Analytics Solutions Architect


From:   Bhagaban Khatai <em...@gmail.com>
To:     user@hadoop.apache.org
Date:   05/29/2015 11:32 AM
Subject:        Cluster sizing



Hi,

I wanted to know how I can determine how many nodes with cores/storage in 
TB and RAM needed, if I will receieve the data volume increase from 1TB to 
100TB per day. Can someone help me here to create a excel based on this.

Thanks

Re: Cluster sizing

Posted by Ashish Kumar9 <as...@in.ibm.com>.
Can you share some more inputs on requirement .

What is the analytics usecase ? ( Batch Processing , Real Time , In-Memory 
Requirements ) 
Which distribution of Hadoop ?
What is the storage growth rate ?
What are the data ingest requirements ?
What kind of jobs will run on the cluster ?
What is the nature of data ? Is data compression applicable ? 
What is the HA requirements ? What is the performance expectations ? 

Based on these requirements , you would have to design compute , storage 
and also network elements 

Thanks and Regards,
Ashish Kumar
IBM Systems BigData Analytics Solutions Architect


From:   Bhagaban Khatai <em...@gmail.com>
To:     user@hadoop.apache.org
Date:   05/29/2015 11:32 AM
Subject:        Cluster sizing



Hi,

I wanted to know how I can determine how many nodes with cores/storage in 
TB and RAM needed, if I will receieve the data volume increase from 1TB to 
100TB per day. Can someone help me here to create a excel based on this.

Thanks

Re: Cluster sizing

Posted by Ashish Kumar9 <as...@in.ibm.com>.
Can you share some more inputs on requirement .

What is the analytics usecase ? ( Batch Processing , Real Time , In-Memory 
Requirements ) 
Which distribution of Hadoop ?
What is the storage growth rate ?
What are the data ingest requirements ?
What kind of jobs will run on the cluster ?
What is the nature of data ? Is data compression applicable ? 
What is the HA requirements ? What is the performance expectations ? 

Based on these requirements , you would have to design compute , storage 
and also network elements 

Thanks and Regards,
Ashish Kumar
IBM Systems BigData Analytics Solutions Architect


From:   Bhagaban Khatai <em...@gmail.com>
To:     user@hadoop.apache.org
Date:   05/29/2015 11:32 AM
Subject:        Cluster sizing



Hi,

I wanted to know how I can determine how many nodes with cores/storage in 
TB and RAM needed, if I will receieve the data volume increase from 1TB to 
100TB per day. Can someone help me here to create a excel based on this.

Thanks

Re: Cluster sizing

Posted by Ashish Kumar9 <as...@in.ibm.com>.
Can you share some more inputs on requirement .

What is the analytics usecase ? ( Batch Processing , Real Time , In-Memory 
Requirements ) 
Which distribution of Hadoop ?
What is the storage growth rate ?
What are the data ingest requirements ?
What kind of jobs will run on the cluster ?
What is the nature of data ? Is data compression applicable ? 
What is the HA requirements ? What is the performance expectations ? 

Based on these requirements , you would have to design compute , storage 
and also network elements 

Thanks and Regards,
Ashish Kumar
IBM Systems BigData Analytics Solutions Architect


From:   Bhagaban Khatai <em...@gmail.com>
To:     user@hadoop.apache.org
Date:   05/29/2015 11:32 AM
Subject:        Cluster sizing



Hi,

I wanted to know how I can determine how many nodes with cores/storage in 
TB and RAM needed, if I will receieve the data volume increase from 1TB to 
100TB per day. Can someone help me here to create a excel based on this.

Thanks