You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Mike Sam <mi...@gmail.com> on 2014/09/11 21:42:05 UTC

single worker vs multiple workers on each machine

Hi There,

I am new to Spark and I was wondering when you have so much memory on each
machine of the cluster, is it better to run multiple workers with limited
memory on each machine or is it better to run a single worker with access
to the majority of the machine memory? If the answer is "it depends", would
you please elaborate?

Thanks,
Mike

Re: single worker vs multiple workers on each machine

Posted by Mayur Rustagi <ma...@gmail.com>.
Another aspect to keep in mind is JVM above 8-10GB starts to misbehave.
Typically better to split up ~ 15GB intervals.
if you are choosing machines 10GB/Core is a approx to maintain.

Mayur Rustagi
Ph: +1 (760) 203 3257
http://www.sigmoidanalytics.com
@mayur_rustagi <https://twitter.com/mayur_rustagi>


On Fri, Sep 12, 2014 at 2:59 AM, Sean Owen <so...@cloudera.com> wrote:

> As I understand, there's generally not an advantage to running many
> executors per machine. Each will already use all the cores, and
> multiple executors just means splitting the available memory instead
> of having one big pool. I think there may be an argument at extremes
> of scale where one JVM with a huge heap might have excessive GC
> pauses, or too many open files, that kind of thing?
>
> On Thu, Sep 11, 2014 at 8:42 PM, Mike Sam <mi...@gmail.com> wrote:
> > Hi There,
> >
> > I am new to Spark and I was wondering when you have so much memory on
> each
> > machine of the cluster, is it better to run multiple workers with limited
> > memory on each machine or is it better to run a single worker with
> access to
> > the majority of the machine memory? If the answer is "it depends", would
> you
> > please elaborate?
> >
> > Thanks,
> > Mike
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> For additional commands, e-mail: user-help@spark.apache.org
>
>

Re: single worker vs multiple workers on each machine

Posted by Sean Owen <so...@cloudera.com>.
As I understand, there's generally not an advantage to running many
executors per machine. Each will already use all the cores, and
multiple executors just means splitting the available memory instead
of having one big pool. I think there may be an argument at extremes
of scale where one JVM with a huge heap might have excessive GC
pauses, or too many open files, that kind of thing?

On Thu, Sep 11, 2014 at 8:42 PM, Mike Sam <mi...@gmail.com> wrote:
> Hi There,
>
> I am new to Spark and I was wondering when you have so much memory on each
> machine of the cluster, is it better to run multiple workers with limited
> memory on each machine or is it better to run a single worker with access to
> the majority of the machine memory? If the answer is "it depends", would you
> please elaborate?
>
> Thanks,
> Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org