You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by Prez Cannady <re...@correlatesystems.com> on 2016/04/18 04:42:26 UTC

Configuring task slots and parallelism for single node Maven executed

Some background.

I’m running Flink application on a single machine, instrumented by Spring Boot and launched via the Maven Spring Boot plugin. Basically, I’m trying to figure out how much I can squeeze out of a single node processing my task before committing to a cluster solution.

Couple of questions.

I assume the configuration options taskmanager.numberOfTaskSlots and parallelism.default pertain to division of work on a single node. Am I correct?
Is there a way to configure these options programmatically instead of the configuration YAML? Or some Maven tooling that can ingest a properly formatted Flink config? For the record, I’m currently trying GlobalConfigeration.getConfiguration.setInteger(“<config option name>”,<config option value>). I am also going to try supplying them as properties in the pom. I’m preparing some tests to see if either of these do as I expect, but thought I’d ask in case I’m heading down a rabbit hole.
I figure task slots is limited to the number of processors/cores/whatever available (and the JVM can get at). Is this accurate?
Any feedback would be appreciated.


Prez Cannady  
p: 617 500 3378  
e: revprez@opencorrelate.org <ma...@opencorrelate.org>  
GH: https://github.com/opencorrelate <https://github.com/opencorrelate>  
LI: https://www.linkedin.com/in/revprez <https://www.linkedin.com/in/revprez>  










Re: Configuring task slots and parallelism for single node Maven executed

Posted by Prez Cannady <re...@correlatesystems.com>.
Thank you both.  Will let you guys know how it works out.

Prez Cannady  
p: 617 500 3378  
e: revprez@opencorrelate.org <ma...@opencorrelate.org>  
GH: https://github.com/opencorrelate <https://github.com/opencorrelate>  
LI: https://www.linkedin.com/in/revprez <https://www.linkedin.com/in/revprez>  









> On Apr 18, 2016, at 3:48 AM, Till Rohrmann <tr...@apache.org> wrote:
> 
> Hi Prez,
> 
> the configuration setting taskmanager.numberOfTaskSlots says with how many task slots a TaskManager will be started. As a rough rule of thumb, set this value to the number of cores of the machine the TM is running on. This this link [1] for further information. The configuration value parallelism.default is the default parallelism with which a program will be executed if the user didn’t specify it via the submission tool or from within the program.
> You can configure the parallelism programmatically by calling setParallelism on the ExecutionEnvironment. The GlobalConfiguration approach won’t work in a distributed setting.
> see 1.
> [1] https://ci.apache.org/projects/flink/flink-docs-release-1.0/concepts/concepts.html#workers-slots-resources <https://ci.apache.org/projects/flink/flink-docs-release-1.0/concepts/concepts.html#workers-slots-resources>
> Cheers,
> Till
> 
> 
> On Mon, Apr 18, 2016 at 6:55 AM, Balaji Rajagopalan <balaji.rajagopalan@olacabs.com <ma...@olacabs.com>> wrote:
> Answered based on my understanding. 
> 
> On Mon, Apr 18, 2016 at 8:12 AM, Prez Cannady <revprez@correlatesystems.com <ma...@correlatesystems.com>> wrote:
> Some background.
> 
> I’m running Flink application on a single machine, instrumented by Spring Boot and launched via the Maven Spring Boot plugin. Basically, I’m trying to figure out how much I can squeeze out of a single node processing my task before committing to a cluster solution.
> 
> Couple of questions.
> 
> I assume the configuration options taskmanager.numberOfTaskSlots and parallelism.default pertain to division of work on a single node. Am I correct? You will running with single instance of task manager say if you are running in 4 core machine, you can set the parallelism = 4 
> Is there a way to configure these options programmatically instead of the configuration YAML? Or some Maven tooling that can ingest a properly formatted Flink config? For the record, I’m currently trying GlobalConfigeration.getConfiguration.setInteger(“<config option name>”,<config option value>). I am also going to try supplying them as properties in the pom. I’m preparing some tests to see if either of these do as I expect, but thought I’d ask in case I’m heading down a rabbit hole.
>   I have been using GlobalConfiguration with no issues, but here is one thing you have to aware of, in clustered environment, you will have to copy over the yaml file in all the nodes, for example I read the file from /usr/share/flink/conf and I have sure this file is available in master node and task nodes as well.  Why do you want to injest the config from maven tool, you can do this main routine in our application code.  
> I figure task slots is limited to the number of processors/cores/whatever available (and the JVM can get at). Is this accurate?
> Any feedback would be appreciated.
> 
> 
> Prez Cannady  
> p: 617 500 3378 <tel:617%20500%203378>  
> e: revprez@opencorrelate.org <ma...@opencorrelate.org>  
> GH: https://github.com/opencorrelate <https://github.com/opencorrelate>  
> LI: https://www.linkedin.com/in/revprez <https://www.linkedin.com/in/revprez>  
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 


Re: Configuring task slots and parallelism for single node Maven executed

Posted by Till Rohrmann <tr...@apache.org>.
Hi Prez,

   1.

   the configuration setting taskmanager.numberOfTaskSlots says with how
   many task slots a TaskManager will be started. As a rough rule of thumb,
   set this value to the number of cores of the machine the TM is running on.
   This this link [1] for further information. The configuration value
   parallelism.default is the default parallelism with which a program will
   be executed if the user didn’t specify it via the submission tool or from
   within the program.
   2.

   You can configure the parallelism programmatically by calling
   setParallelism on the ExecutionEnvironment. The GlobalConfiguration
   approach won’t work in a distributed setting.
   3.

   see 1.

[1]
https://ci.apache.org/projects/flink/flink-docs-release-1.0/concepts/concepts.html#workers-slots-resources

Cheers,
Till
​

On Mon, Apr 18, 2016 at 6:55 AM, Balaji Rajagopalan <
balaji.rajagopalan@olacabs.com> wrote:

> Answered based on my understanding.
>
> On Mon, Apr 18, 2016 at 8:12 AM, Prez Cannady <
> revprez@correlatesystems.com> wrote:
>
>> Some background.
>>
>> I’m running Flink application on a single machine, instrumented by Spring
>> Boot and launched via the Maven Spring Boot plugin. Basically, I’m trying
>> to figure out how much I can squeeze out of a single node processing my
>> task before committing to a cluster solution.
>>
>> Couple of questions.
>>
>>    1. I assume the configuration options taskmanager.numberOfTaskSlots
>>    and parallelism.default pertain to division of work on a single node.
>>    Am I correct? You will running with single instance of task manager
>>    say if you are running in 4 core machine, you can set the parallelism = 4
>>
>>
>>    1. Is there a way to configure these options programmatically instead
>>    of the configuration YAML? Or some Maven tooling that can ingest a properly
>>    formatted Flink config? For the record, I’m currently trying GlobalConfigeration.getConfiguration.setInteger(“<config
>>    option name>”,<config option value>). I am also going to try
>>    supplying them as properties in the pom. I’m preparing some tests to see if
>>    either of these do as I expect, but thought I’d ask in case I’m heading
>>    down a rabbit hole.
>>
>>   I have been using GlobalConfiguration with no issues, but here is one
> thing you have to aware of, in clustered environment, you will have to copy
> over the yaml file in all the nodes, for example I read the file from
> /usr/share/flink/conf and I have sure this file is available in master node
> and task nodes as well.  Why do you want to injest the config from maven
> tool, you can do this main routine in our application code.
>
>>
>>    1.
>>    2. I figure task slots is limited to the number of
>>    processors/cores/whatever available (and the JVM can get at). Is this
>>    accurate?
>>
>> Any feedback would be appreciated.
>>
>> Prez Cannady
>> p: 617 500 3378
>> e: revprez@opencorrelate.org
>> GH: https://github.com/opencorrelate
>> LI: https://www.linkedin.com/in/revprez
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>

Re: Configuring task slots and parallelism for single node Maven executed

Posted by Balaji Rajagopalan <ba...@olacabs.com>.
Answered based on my understanding.

On Mon, Apr 18, 2016 at 8:12 AM, Prez Cannady <re...@correlatesystems.com>
wrote:

> Some background.
>
> I’m running Flink application on a single machine, instrumented by Spring
> Boot and launched via the Maven Spring Boot plugin. Basically, I’m trying
> to figure out how much I can squeeze out of a single node processing my
> task before committing to a cluster solution.
>
> Couple of questions.
>
>    1. I assume the configuration options taskmanager.numberOfTaskSlots
>    and parallelism.default pertain to division of work on a single node.
>    Am I correct? You will running with single instance of task manager
>    say if you are running in 4 core machine, you can set the parallelism = 4
>
>
>    1. Is there a way to configure these options programmatically instead
>    of the configuration YAML? Or some Maven tooling that can ingest a properly
>    formatted Flink config? For the record, I’m currently trying GlobalConfigeration.getConfiguration.setInteger(“<config
>    option name>”,<config option value>). I am also going to try supplying
>    them as properties in the pom. I’m preparing some tests to see if either of
>    these do as I expect, but thought I’d ask in case I’m heading down a rabbit
>    hole.
>
>   I have been using GlobalConfiguration with no issues, but here is one
thing you have to aware of, in clustered environment, you will have to copy
over the yaml file in all the nodes, for example I read the file from
/usr/share/flink/conf and I have sure this file is available in master node
and task nodes as well.  Why do you want to injest the config from maven
tool, you can do this main routine in our application code.

>
>    1.
>    2. I figure task slots is limited to the number of
>    processors/cores/whatever available (and the JVM can get at). Is this
>    accurate?
>
> Any feedback would be appreciated.
>
> Prez Cannady
> p: 617 500 3378
> e: revprez@opencorrelate.org
> GH: https://github.com/opencorrelate
> LI: https://www.linkedin.com/in/revprez
>
>
>
>
>
>
>
>
>
>