You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by Maciek Próchniak <mp...@touk.pl> on 2016/03/08 15:48:32 UTC

rebalance of streaming job after taskManager restart

Hi,

we have streaming job with paralelism 2 and two task managers. The job 
is occupying one slot on each task manager. When I stop manager2 the job 
is restarted and it runs on manager1 - occupying two of it's slots.
How can I trigger restart (or other similar process) that will cause the 
job to be balanced among task managers?

thanks,
maciek

Re: rebalance of streaming job after taskManager restart

Posted by Aljoscha Krettek <al...@apache.org>.
Yes, there are plans to make this more streamlined but we are not there yet, unfortunately.
> On 08 Mar 2016, at 16:07, Maciek Próchniak <mp...@touk.pl> wrote:
> 
> Hi,
> 
> thanks for quick answer - yes, I does what I want to accomplish,
> but I was hoping for some "easier" solution.
> Are there any plans for "restart" button/command or sth similar? I mean, the whole process of restarting is ready as I understand - as it's triggered when task manager dies.
> 
> thanks,
> maciek
> 
> On 08/03/2016 16:03, Aljoscha Krettek wrote:
>> Hi,
>> I think what you can do is make a savepoint of your program, then cancel it and restart it from the savepoint. This should make Flink redistribute it on all TaskManagers.
>> 
>> See https://ci.apache.org/projects/flink/flink-docs-master/apis/streaming/savepoints.html
>> and
>> https://ci.apache.org/projects/flink/flink-docs-master/apis/cli.html#savepoints
>> for documentation about savepoints.
>> 
>> The steps to follow should be:
>>  bin/flink savepoint <your job id>
>> 
>> this will print a savepoint path that you will need later.
>>  bin/flink cancel <your job id>
>> 
>> bin/flink run -s <savepoint path> …
>> 
>> The last command is your usual run command but with the additional “-s” parameter to continue from a savepoint.
>> 
>> I hope that helps.
>> 
>> Cheers,
>> Aljoscha
>>> On 08 Mar 2016, at 15:48, Maciek Próchniak <mp...@touk.pl> wrote:
>>> 
>>> Hi,
>>> 
>>> we have streaming job with paralelism 2 and two task managers. The job is occupying one slot on each task manager. When I stop manager2 the job is restarted and it runs on manager1 - occupying two of it's slots.
>>> How can I trigger restart (or other similar process) that will cause the job to be balanced among task managers?
>>> 
>>> thanks,
>>> maciek
>> 
> 


Re: rebalance of streaming job after taskManager restart

Posted by Maciek Próchniak <mp...@touk.pl>.
Hi,

thanks for quick answer - yes, I does what I want to accomplish,
but I was hoping for some "easier" solution.
Are there any plans for "restart" button/command or sth similar? I mean, 
the whole process of restarting is ready as I understand - as it's 
triggered when task manager dies.

thanks,
maciek

On 08/03/2016 16:03, Aljoscha Krettek wrote:
> Hi,
> I think what you can do is make a savepoint of your program, then cancel it and restart it from the savepoint. This should make Flink redistribute it on all TaskManagers.
>
> See https://ci.apache.org/projects/flink/flink-docs-master/apis/streaming/savepoints.html
> and
> https://ci.apache.org/projects/flink/flink-docs-master/apis/cli.html#savepoints
> for documentation about savepoints.
>
> The steps to follow should be:
>   
> bin/flink savepoint <your job id>
>
> this will print a savepoint path that you will need later.
>   
> bin/flink cancel <your job id>
>
> bin/flink run -s <savepoint path> …
>
> The last command is your usual run command but with the additional “-s” parameter to continue from a savepoint.
>
> I hope that helps.
>
> Cheers,
> Aljoscha
>> On 08 Mar 2016, at 15:48, Maciek Próchniak <mp...@touk.pl> wrote:
>>
>> Hi,
>>
>> we have streaming job with paralelism 2 and two task managers. The job is occupying one slot on each task manager. When I stop manager2 the job is restarted and it runs on manager1 - occupying two of it's slots.
>> How can I trigger restart (or other similar process) that will cause the job to be balanced among task managers?
>>
>> thanks,
>> maciek
>


Re: rebalance of streaming job after taskManager restart

Posted by Aljoscha Krettek <al...@apache.org>.
Hi,
I think what you can do is make a savepoint of your program, then cancel it and restart it from the savepoint. This should make Flink redistribute it on all TaskManagers.

See https://ci.apache.org/projects/flink/flink-docs-master/apis/streaming/savepoints.html
and
https://ci.apache.org/projects/flink/flink-docs-master/apis/cli.html#savepoints
for documentation about savepoints.

The steps to follow should be:
 
bin/flink savepoint <your job id>

this will print a savepoint path that you will need later.
 
bin/flink cancel <your job id>

bin/flink run -s <savepoint path> …

The last command is your usual run command but with the additional “-s” parameter to continue from a savepoint.

I hope that helps.

Cheers,
Aljoscha
> On 08 Mar 2016, at 15:48, Maciek Próchniak <mp...@touk.pl> wrote:
> 
> Hi,
> 
> we have streaming job with paralelism 2 and two task managers. The job is occupying one slot on each task manager. When I stop manager2 the job is restarted and it runs on manager1 - occupying two of it's slots.
> How can I trigger restart (or other similar process) that will cause the job to be balanced among task managers?
> 
> thanks,
> maciek