You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by Jem Tucker <je...@gmail.com> on 2015/07/13 09:26:21 UTC

Spark Parallelism

Hi All,

We have recently begun performance testing our Spark application and have
found that changing the default parallelism has a much larger effect on the
performance than expected, meaning there seems to be an illusive sweet spot
that depends on the input size.

Does anyone have any idea of a good starting point to set the parralelism
at depending on cluster spec and data size?

Thanks

Jem