You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@flink.apache.org by Timo Walther <fl...@twalthr.com> on 2014/07/22 14:05:28 UTC

Which is the right degree of parallism?

Hey everyone,

I want to get the maximum performance of my small 2 node cluster. At the 
moment my execution plan has a "parallelism" of "1" at each operator.
What "-p XX" argument should I pass to the job? The number of nodes, 
number of CPUs or number of slots?

Thanks and regards,
Timo

Re: Which is the right degree of parallism?

Posted by Stephan Ewen <se...@apache.org>.
The number of slots that a machine offers is what you define in the config.
Setting it to #cores is in many cases reasonable.

Have a look at the default config under "/conf", it has an entry there
where you set the slots per machine.

Re: Which is the right degree of parallism?

Posted by Ufuk Celebi <u....@fu-berlin.de>.
On 22 Jul 2014, at 14:58, Aljoscha Krettek <al...@apache.org> wrote:

> Do the slots correlate with the number of cores? I think the slots business
> might be confusing for some users.

I think it depends as well. Number of cores would be a reasonable default for the slots though.

Re: Which is the right degree of parallism?

Posted by Aljoscha Krettek <al...@apache.org>.
Do the slots correlate with the number of cores? I think the slots business
might be confusing for some users.


On Tue, Jul 22, 2014 at 2:31 PM, Stephan Ewen <se...@apache.org> wrote:

> Hey!
>
> That depends on the job, but in general, #cores is a good point to start.
>
> Stephan
>
>
>
> On Tue, Jul 22, 2014 at 2:05 PM, Timo Walther <fl...@twalthr.com> wrote:
>
> > Hey everyone,
> >
> > I want to get the maximum performance of my small 2 node cluster. At the
> > moment my execution plan has a "parallelism" of "1" at each operator.
> > What "-p XX" argument should I pass to the job? The number of nodes,
> > number of CPUs or number of slots?
> >
> > Thanks and regards,
> > Timo
> >
>

Re: Which is the right degree of parallism?

Posted by Stephan Ewen <se...@apache.org>.
Hey!

That depends on the job, but in general, #cores is a good point to start.

Stephan



On Tue, Jul 22, 2014 at 2:05 PM, Timo Walther <fl...@twalthr.com> wrote:

> Hey everyone,
>
> I want to get the maximum performance of my small 2 node cluster. At the
> moment my execution plan has a "parallelism" of "1" at each operator.
> What "-p XX" argument should I pass to the job? The number of nodes,
> number of CPUs or number of slots?
>
> Thanks and regards,
> Timo
>