You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Ognen Duzlevski <og...@nengoiksvelzud.com> on 2014/01/21 23:27:36 UTC

spark.default.parallelism

This is what docs/configuration.md says about the property:
" Default number of tasks to use for distributed shuffle operations
(<code>groupByKey</code>,
    <code>reduceByKey</code>, etc) when not set by user.
"

If I set this property to, let's say, 4 - what does this mean? 4 tasks per
core, per worker, per...? :)

Thanks!
Ognen

Re: spark.default.parallelism

Posted by Andrew Ash <an...@andrewash.com>.
https://github.com/apache/incubator-spark/pull/489


On Tue, Jan 21, 2014 at 3:41 PM, Ognen Duzlevski <
ognen@plainvanillagames.com> wrote:

> On Tue, Jan 21, 2014 at 10:37 PM, Andrew Ash <an...@andrewash.com> wrote:
>
>> Documentation suggestion:
>>
>> Default number of tasks to use *across the cluster* for distributed
>> shuffle operations (<code>groupByKey</code>, <code>reduceByKey</code>,
>> etc) when not set by user.
>>
>> Ognen would that have clarified for you?
>>
>
> Of course :)
>
> Thanks!
> Ognen
>

Re: spark.default.parallelism

Posted by Ognen Duzlevski <og...@plainvanillagames.com>.
On Tue, Jan 21, 2014 at 10:37 PM, Andrew Ash <an...@andrewash.com> wrote:

> Documentation suggestion:
>
> Default number of tasks to use *across the cluster* for distributed
> shuffle operations (<code>groupByKey</code>, <code>reduceByKey</code>,
> etc) when not set by user.
>
> Ognen would that have clarified for you?
>

Of course :)

Thanks!
Ognen

Re: spark.default.parallelism

Posted by Andrew Ash <an...@andrewash.com>.
Documentation suggestion:

Default number of tasks to use *across the cluster* for distributed shuffle
operations (<code>groupByKey</code>, <code>reduceByKey</code>, etc) when
not set by user.

Ognen would that have clarified for you?


On Tue, Jan 21, 2014 at 3:35 PM, Matei Zaharia <ma...@gmail.com>wrote:

> It’s just 4 over the whole cluster.
>
> Matei
>
> On Jan 21, 2014, at 2:27 PM, Ognen Duzlevski <og...@nengoiksvelzud.com>
> wrote:
>
> This is what docs/configuration.md says about the property:
> " Default number of tasks to use for distributed shuffle operations
> (<code>groupByKey</code>,
>     <code>reduceByKey</code>, etc) when not set by user.
> "
>
> If I set this property to, let's say, 4 - what does this mean? 4 tasks per
> core, per worker, per...? :)
>
> Thanks!
> Ognen
>
>
>

Re: spark.default.parallelism

Posted by Matei Zaharia <ma...@gmail.com>.
It’s just 4 over the whole cluster.

Matei

On Jan 21, 2014, at 2:27 PM, Ognen Duzlevski <og...@nengoiksvelzud.com> wrote:

> This is what docs/configuration.md says about the property:
> " Default number of tasks to use for distributed shuffle operations (<code>groupByKey</code>,
>     <code>reduceByKey</code>, etc) when not set by user.
> "
> 
> If I set this property to, let's say, 4 - what does this mean? 4 tasks per core, per worker, per...? :)
> 
> Thanks!
> Ognen