You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Nasir Khan <na...@gmail.com> on 2015/03/08 22:29:16 UTC

General Purpose Spark Cluster Hardware Requirements?

HI, I am going to submit a proposal to my University to setup my Standalone
Spark Cluster, what hardware should i include in my proposal?

I will be Working on classification (Spark MLlib) of Data streams (Spark
Streams)

If some body can fill up this answers, that will be great! Thanks

*Cores *= (example 64 nodes, 1024 cores, your figures) ____________?

*Performance**= (example= ~5.12TFlops, ~2TFlops, your figures) ___________?

*GPU*= YES/NO ___________? 

*Fat Node* = YES/NO ___________?

*CPU Hrs/ Yr* = (example 2000, 8000, your figures) ___________?

*RAM/CPU* = (example 256GB, your figures) ___________?
*
Storage Processing* = (example 200TB, your figures) ___________?

*Storage Output* = (example 5TB, 4TB HHD/SSD, your figures) ___________?

*Most processors today carryout 4 FLOPS per cycle,  thus a single-core 2.5
GHz processor has a theoretical performance of 10 billion FLOPS = 10GFLOPS

Note:I Need a *general purpose* cluster, not very high end nor very low
specs. It will not be dedicated to just one project i guess. You people
already have experience in setting up clusters, that's the reason i posted
it here :)





--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/General-Purpose-Spark-Cluster-Hardware-Requirements-tp21963.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


Re: General Purpose Spark Cluster Hardware Requirements?

Posted by Ted Yu <yu...@gmail.com>.
Cycling related bits:
http://search-hadoop.com/m/LgpTk2DLMvc

On Sun, Mar 8, 2015 at 2:29 PM, Nasir Khan <na...@gmail.com>
wrote:

> HI, I am going to submit a proposal to my University to setup my Standalone
> Spark Cluster, what hardware should i include in my proposal?
>
> I will be Working on classification (Spark MLlib) of Data streams (Spark
> Streams)
>
> If some body can fill up this answers, that will be great! Thanks
>
> *Cores *= (example 64 nodes, 1024 cores, your figures) ____________?
>
> *Performance**= (example= ~5.12TFlops, ~2TFlops, your figures) ___________?
>
> *GPU*= YES/NO ___________?
>
> *Fat Node* = YES/NO ___________?
>
> *CPU Hrs/ Yr* = (example 2000, 8000, your figures) ___________?
>
> *RAM/CPU* = (example 256GB, your figures) ___________?
> *
> Storage Processing* = (example 200TB, your figures) ___________?
>
> *Storage Output* = (example 5TB, 4TB HHD/SSD, your figures) ___________?
>
> *Most processors today carryout 4 FLOPS per cycle,  thus a single-core 2.5
> GHz processor has a theoretical performance of 10 billion FLOPS = 10GFLOPS
>
> Note:I Need a *general purpose* cluster, not very high end nor very low
> specs. It will not be dedicated to just one project i guess. You people
> already have experience in setting up clusters, that's the reason i posted
> it here :)
>
>
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/General-Purpose-Spark-Cluster-Hardware-Requirements-tp21963.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> For additional commands, e-mail: user-help@spark.apache.org
>
>

Re: General Purpose Spark Cluster Hardware Requirements?

Posted by Cui Lin <cu...@hds.com>.
No woder I had out of memory issue before…

I doubt if we really need such configuration on production level…

Best regards,

Cui Lin

From: Krishna Sankar <ks...@gmail.com>>
Date: Sunday, March 8, 2015 at 3:27 PM
To: Nasir Khan <na...@gmail.com>>
Cc: "user@spark.apache.org<ma...@spark.apache.org>" <us...@spark.apache.org>>
Subject: Re: General Purpose Spark Cluster Hardware Requirements?

Without knowing the data size, computation & storage requirements ... :

  *   Dual 6 or 8 core machines, 256 GB memory each, 12-15 TB per machine. Probably 5-10 machines.
  *   Don't go for the most exotic machines, otoh don't go for cheapest ones either.
     *   Find a sweet spot with your vendor i.e. if dual 6 cores are a lot cheaper than dual 10 cores then go with the less expensive ones. Same with disks - may be 2TB is a lot cheaper than 3 TB.
  *   Decide if these are going to be storage intensive or compute intensive (I assume the latter) and configure accordingly
  *   Make sure you can add storage to the machines - ie have free storage bays.
     *   Or other way is to add more machines and buy the smaller speced machines.
  *   Unless one has very firm I/O and compute requirements, I have found that FLOPS, and things of that nature, do not make that much sense.
     *   Think in terms of RAM, CPU and storage - that is what will become the initial limitations.
     *   Once there are enough production jobs, you can then figure out the FLOPS et al
  *   10 G network is a better choice, so price-in a 24-48 port TOR switch.
     *   More concerned with the bandwidth between the cluster nodes, for shuffles et al

Cheers
<k/>

On Sun, Mar 8, 2015 at 2:29 PM, Nasir Khan <na...@gmail.com>> wrote:
HI, I am going to submit a proposal to my University to setup my Standalone
Spark Cluster, what hardware should i include in my proposal?

I will be Working on classification (Spark MLlib) of Data streams (Spark
Streams)

If some body can fill up this answers, that will be great! Thanks

*Cores *= (example 64 nodes, 1024 cores, your figures) ____________?

*Performance**= (example= ~5.12TFlops, ~2TFlops, your figures) ___________?

*GPU*= YES/NO ___________?

*Fat Node* = YES/NO ___________?

*CPU Hrs/ Yr* = (example 2000, 8000, your figures) ___________?

*RAM/CPU* = (example 256GB, your figures) ___________?
*
Storage Processing* = (example 200TB, your figures) ___________?

*Storage Output* = (example 5TB, 4TB HHD/SSD, your figures) ___________?

*Most processors today carryout 4 FLOPS per cycle,  thus a single-core 2.5
GHz processor has a theoretical performance of 10 billion FLOPS = 10GFLOPS

Note:I Need a *general purpose* cluster, not very high end nor very low
specs. It will not be dedicated to just one project i guess. You people
already have experience in setting up clusters, that's the reason i posted
it here :)





--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/General-Purpose-Spark-Cluster-Hardware-Requirements-tp21963.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org<ma...@spark.apache.org>
For additional commands, e-mail: user-help@spark.apache.org<ma...@spark.apache.org>



Re: General Purpose Spark Cluster Hardware Requirements?

Posted by Krishna Sankar <ks...@gmail.com>.
Without knowing the data size, computation & storage requirements ... :

   - Dual 6 or 8 core machines, 256 GB memory each, 12-15 TB per machine.
   Probably 5-10 machines.
   - Don't go for the most exotic machines, otoh don't go for cheapest ones
   either.
      - Find a sweet spot with your vendor i.e. if dual 6 cores are a lot
      cheaper than dual 10 cores then go with the less expensive ones.
Same with
      disks - may be 2TB is a lot cheaper than 3 TB.
   - Decide if these are going to be storage intensive or compute intensive
   (I assume the latter) and configure accordingly
   - Make sure you can add storage to the machines - ie have free storage
   bays.
      - Or other way is to add more machines and buy the smaller speced
      machines.
   - Unless one has very firm I/O and compute requirements, I have found
   that FLOPS, and things of that nature, do not make that much sense.
      - Think in terms of RAM, CPU and storage - that is what will become
      the initial limitations.
      - Once there are enough production jobs, you can then figure out the
      FLOPS et al
   - 10 G network is a better choice, so price-in a 24-48 port TOR switch.
      - More concerned with the bandwidth between the cluster nodes, for
      shuffles et al

Cheers
<k/>

On Sun, Mar 8, 2015 at 2:29 PM, Nasir Khan <na...@gmail.com>
wrote:

> HI, I am going to submit a proposal to my University to setup my Standalone
> Spark Cluster, what hardware should i include in my proposal?
>
> I will be Working on classification (Spark MLlib) of Data streams (Spark
> Streams)
>
> If some body can fill up this answers, that will be great! Thanks
>
> *Cores *= (example 64 nodes, 1024 cores, your figures) ____________?
>
> *Performance**= (example= ~5.12TFlops, ~2TFlops, your figures) ___________?
>
> *GPU*= YES/NO ___________?
>
> *Fat Node* = YES/NO ___________?
>
> *CPU Hrs/ Yr* = (example 2000, 8000, your figures) ___________?
>
> *RAM/CPU* = (example 256GB, your figures) ___________?
> *
> Storage Processing* = (example 200TB, your figures) ___________?
>
> *Storage Output* = (example 5TB, 4TB HHD/SSD, your figures) ___________?
>
> *Most processors today carryout 4 FLOPS per cycle,  thus a single-core 2.5
> GHz processor has a theoretical performance of 10 billion FLOPS = 10GFLOPS
>
> Note:I Need a *general purpose* cluster, not very high end nor very low
> specs. It will not be dedicated to just one project i guess. You people
> already have experience in setting up clusters, that's the reason i posted
> it here :)
>
>
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/General-Purpose-Spark-Cluster-Hardware-Requirements-tp21963.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> For additional commands, e-mail: user-help@spark.apache.org
>
>