You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by maddenpj <ma...@gmail.com> on 2014/09/30 03:20:44 UTC

Re: shuffle memory requirements

Hey Ameet,

Thanks for the info, I'm running into the same issue myself and my last
attempt crashed and my ulimit was 16834. I'm going to up it and try again,
but yea I would like to know the best practice for computing this. Can you
talk about the worker nodes, what are their specs? At least 45 gigs of
memory and 6 cores?

Also I left my worker at the default memory size (512m I think) and gave all
of the memory to the executor. It was my understanding that the worker just
spawns the executor but all the work is done in the executor. What was your
reasoning for using 24G on the worker?



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/shuffle-memory-requirements-tp4048p15375.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org

Re: shuffle memory requirements

Posted by Andrew Ash <an...@andrewash.com>.

Hi Maddenpj,

Right now the best estimate I've heard for the open file limit is that
you'll need the square of the largest partition count in your dataset.

I filed a ticket to log the ulimit value when it's too low at
https://issues.apache.org/jira/browse/SPARK-3750

On Mon, Sep 29, 2014 at 6:20 PM, maddenpj <ma...@gmail.com> wrote:

> Hey Ameet,
>
> Thanks for the info, I'm running into the same issue myself and my last
> attempt crashed and my ulimit was 16834. I'm going to up it and try again,
> but yea I would like to know the best practice for computing this. Can you
> talk about the worker nodes, what are their specs? At least 45 gigs of
> memory and 6 cores?
>
> Also I left my worker at the default memory size (512m I think) and gave
> all
> of the memory to the executor. It was my understanding that the worker just
> spawns the executor but all the work is done in the executor. What was your
> reasoning for using 24G on the worker?
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/shuffle-memory-requirements-tp4048p15375.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> For additional commands, e-mail: user-help@spark.apache.org
>
>