You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by Josh <jo...@gmail.com> on 2016/06/29 20:13:30 UTC

Flink on YARN - how to resize a running cluster?

I'm running a Flink cluster as a YARN application, started by:
./bin/yarn-session.sh -n 4 -jm 2048 -tm 4096 -d

There are 2 worker nodes, so each are allocated 2 task managers. There is a
stateful Flink job running on the cluster with a parallelism of 2.

If I now want to increase the number of worker nodes to 3, and add 2 extra
task managers, and then increase the job parallelism, how should I do this?

I'm using EMR, so adding an extra worker node and making it available to
YARN is easy to do via the AWS console. But I haven't been able to find any
information in Flink docs about how to resize a running Flink cluster on
YARN. Is it possible to resize it while the YARN application is running, or
do I need to stop the YARN application and redeploy the cluster? Also do I
need to redeploy my Flink job from a savepoint to increase its parallelism,
or do I do this while the job is running?

I tried redeploying the cluster having added a third worker node, via:

> yarn application -kill myflinkcluster

> ./bin/yarn-session.sh -n 6 -jm 2048 -tm 4096 -d

(note increasing the number of task managers from 4 to 6)

Surprisingly, this redeployed a Flink cluster with 4 task mangers (not 6!)
and restored my job from the last checkpoint.

Can anyone point me in the right direction?

Thanks,

Josh

Re: Flink on YARN - how to resize a running cluster?

Posted by Till Rohrmann <tr...@apache.org>.
Yes that's the way to go at the moment.

Cheers,
Till

On Thu, Jun 30, 2016 at 12:47 PM, Márton Balassi <ba...@gmail.com>
wrote:

> Hi Josh,
>
> Yes, currently that is a reasonable workaround.
>
> Best,
>
> Marton
>
> On Thu, Jun 30, 2016 at 12:38 PM, Josh <jo...@gmail.com> wrote:
>
>> Hi Till,
>>
>> Thanks, that's very helpful!
>> So I guess in that case, since it isn't possible to increase the job
>> parallelism later, it might be sensible to use say 10x the parallelism that
>> I need right now (even if only running on a couple of task managers) - so
>> that it's possible to scale the job in the future if I need to?
>>
>> Josh
>>
>> On Thu, Jun 30, 2016 at 11:17 AM, Till Rohrmann <tr...@apache.org>
>> wrote:
>>
>>> Hi Josh,
>>>
>>> at the moment it is not possible to dynamically increase the parallelism
>>> of your job. The same holds true for a restarting a job from a savepoint.
>>> But we're currently working on exactly this. So in order to change the
>>> parallelism of your job, you would have to restart the job from scratch.
>>>
>>> Adding task managers dynamically to your running Flink cluster, is
>>> possible if you allocate new YARN containers and then start a TaskManager
>>> process manually with the current job manager address and port. You can
>>> either find the address and port out using the web dashboard under job
>>> manager configuration or you look up the .yarn-properties file which is
>>> stored in your temp directory on your machine. This file also contains the
>>> job manager address. But the easier way would be to stop your yarn session
>>> and then restart it with an increased number of containers. Because then,
>>> you wouldn't have to ship the lib directory, which might contain user code
>>> classes, manually.
>>>
>>> Cheers,
>>> Till
>>>
>>> On Wed, Jun 29, 2016 at 10:13 PM, Josh <jo...@gmail.com> wrote:
>>>
>>>> I'm running a Flink cluster as a YARN application, started by:
>>>> ./bin/yarn-session.sh -n 4 -jm 2048 -tm 4096 -d
>>>>
>>>> There are 2 worker nodes, so each are allocated 2 task managers. There
>>>> is a stateful Flink job running on the cluster with a parallelism of 2.
>>>>
>>>> If I now want to increase the number of worker nodes to 3, and add 2
>>>> extra task managers, and then increase the job parallelism, how should I do
>>>> this?
>>>>
>>>> I'm using EMR, so adding an extra worker node and making it available
>>>> to YARN is easy to do via the AWS console. But I haven't been able to find
>>>> any information in Flink docs about how to resize a running Flink cluster
>>>> on YARN. Is it possible to resize it while the YARN application is running,
>>>> or do I need to stop the YARN application and redeploy the cluster? Also do
>>>> I need to redeploy my Flink job from a savepoint to increase its
>>>> parallelism, or do I do this while the job is running?
>>>>
>>>> I tried redeploying the cluster having added a third worker node, via:
>>>>
>>>> > yarn application -kill myflinkcluster
>>>>
>>>> > ./bin/yarn-session.sh -n 6 -jm 2048 -tm 4096 -d
>>>>
>>>> (note increasing the number of task managers from 4 to 6)
>>>>
>>>> Surprisingly, this redeployed a Flink cluster with 4 task mangers (not
>>>> 6!) and restored my job from the last checkpoint.
>>>>
>>>> Can anyone point me in the right direction?
>>>>
>>>> Thanks,
>>>>
>>>> Josh
>>>>
>>>>
>>>>
>>>>
>>>
>>
>

Re: Flink on YARN - how to resize a running cluster?

Posted by Márton Balassi <ba...@gmail.com>.
Hi Josh,

Yes, currently that is a reasonable workaround.

Best,

Marton

On Thu, Jun 30, 2016 at 12:38 PM, Josh <jo...@gmail.com> wrote:

> Hi Till,
>
> Thanks, that's very helpful!
> So I guess in that case, since it isn't possible to increase the job
> parallelism later, it might be sensible to use say 10x the parallelism that
> I need right now (even if only running on a couple of task managers) - so
> that it's possible to scale the job in the future if I need to?
>
> Josh
>
> On Thu, Jun 30, 2016 at 11:17 AM, Till Rohrmann <tr...@apache.org>
> wrote:
>
>> Hi Josh,
>>
>> at the moment it is not possible to dynamically increase the parallelism
>> of your job. The same holds true for a restarting a job from a savepoint.
>> But we're currently working on exactly this. So in order to change the
>> parallelism of your job, you would have to restart the job from scratch.
>>
>> Adding task managers dynamically to your running Flink cluster, is
>> possible if you allocate new YARN containers and then start a TaskManager
>> process manually with the current job manager address and port. You can
>> either find the address and port out using the web dashboard under job
>> manager configuration or you look up the .yarn-properties file which is
>> stored in your temp directory on your machine. This file also contains the
>> job manager address. But the easier way would be to stop your yarn session
>> and then restart it with an increased number of containers. Because then,
>> you wouldn't have to ship the lib directory, which might contain user code
>> classes, manually.
>>
>> Cheers,
>> Till
>>
>> On Wed, Jun 29, 2016 at 10:13 PM, Josh <jo...@gmail.com> wrote:
>>
>>> I'm running a Flink cluster as a YARN application, started by:
>>> ./bin/yarn-session.sh -n 4 -jm 2048 -tm 4096 -d
>>>
>>> There are 2 worker nodes, so each are allocated 2 task managers. There
>>> is a stateful Flink job running on the cluster with a parallelism of 2.
>>>
>>> If I now want to increase the number of worker nodes to 3, and add 2
>>> extra task managers, and then increase the job parallelism, how should I do
>>> this?
>>>
>>> I'm using EMR, so adding an extra worker node and making it available to
>>> YARN is easy to do via the AWS console. But I haven't been able to find any
>>> information in Flink docs about how to resize a running Flink cluster on
>>> YARN. Is it possible to resize it while the YARN application is running, or
>>> do I need to stop the YARN application and redeploy the cluster? Also do I
>>> need to redeploy my Flink job from a savepoint to increase its parallelism,
>>> or do I do this while the job is running?
>>>
>>> I tried redeploying the cluster having added a third worker node, via:
>>>
>>> > yarn application -kill myflinkcluster
>>>
>>> > ./bin/yarn-session.sh -n 6 -jm 2048 -tm 4096 -d
>>>
>>> (note increasing the number of task managers from 4 to 6)
>>>
>>> Surprisingly, this redeployed a Flink cluster with 4 task mangers (not
>>> 6!) and restored my job from the last checkpoint.
>>>
>>> Can anyone point me in the right direction?
>>>
>>> Thanks,
>>>
>>> Josh
>>>
>>>
>>>
>>>
>>
>

Re: Flink on YARN - how to resize a running cluster?

Posted by Josh <jo...@gmail.com>.
Hi Till,

Thanks, that's very helpful!
So I guess in that case, since it isn't possible to increase the job
parallelism later, it might be sensible to use say 10x the parallelism that
I need right now (even if only running on a couple of task managers) - so
that it's possible to scale the job in the future if I need to?

Josh

On Thu, Jun 30, 2016 at 11:17 AM, Till Rohrmann <tr...@apache.org>
wrote:

> Hi Josh,
>
> at the moment it is not possible to dynamically increase the parallelism
> of your job. The same holds true for a restarting a job from a savepoint.
> But we're currently working on exactly this. So in order to change the
> parallelism of your job, you would have to restart the job from scratch.
>
> Adding task managers dynamically to your running Flink cluster, is
> possible if you allocate new YARN containers and then start a TaskManager
> process manually with the current job manager address and port. You can
> either find the address and port out using the web dashboard under job
> manager configuration or you look up the .yarn-properties file which is
> stored in your temp directory on your machine. This file also contains the
> job manager address. But the easier way would be to stop your yarn session
> and then restart it with an increased number of containers. Because then,
> you wouldn't have to ship the lib directory, which might contain user code
> classes, manually.
>
> Cheers,
> Till
>
> On Wed, Jun 29, 2016 at 10:13 PM, Josh <jo...@gmail.com> wrote:
>
>> I'm running a Flink cluster as a YARN application, started by:
>> ./bin/yarn-session.sh -n 4 -jm 2048 -tm 4096 -d
>>
>> There are 2 worker nodes, so each are allocated 2 task managers. There is
>> a stateful Flink job running on the cluster with a parallelism of 2.
>>
>> If I now want to increase the number of worker nodes to 3, and add 2
>> extra task managers, and then increase the job parallelism, how should I do
>> this?
>>
>> I'm using EMR, so adding an extra worker node and making it available to
>> YARN is easy to do via the AWS console. But I haven't been able to find any
>> information in Flink docs about how to resize a running Flink cluster on
>> YARN. Is it possible to resize it while the YARN application is running, or
>> do I need to stop the YARN application and redeploy the cluster? Also do I
>> need to redeploy my Flink job from a savepoint to increase its parallelism,
>> or do I do this while the job is running?
>>
>> I tried redeploying the cluster having added a third worker node, via:
>>
>> > yarn application -kill myflinkcluster
>>
>> > ./bin/yarn-session.sh -n 6 -jm 2048 -tm 4096 -d
>>
>> (note increasing the number of task managers from 4 to 6)
>>
>> Surprisingly, this redeployed a Flink cluster with 4 task mangers (not
>> 6!) and restored my job from the last checkpoint.
>>
>> Can anyone point me in the right direction?
>>
>> Thanks,
>>
>> Josh
>>
>>
>>
>>
>

Re: Flink on YARN - how to resize a running cluster?

Posted by Till Rohrmann <tr...@apache.org>.
Hi Josh,

at the moment it is not possible to dynamically increase the parallelism of
your job. The same holds true for a restarting a job from a savepoint. But
we're currently working on exactly this. So in order to change the
parallelism of your job, you would have to restart the job from scratch.

Adding task managers dynamically to your running Flink cluster, is possible
if you allocate new YARN containers and then start a TaskManager process
manually with the current job manager address and port. You can either find
the address and port out using the web dashboard under job manager
configuration or you look up the .yarn-properties file which is stored in
your temp directory on your machine. This file also contains the job
manager address. But the easier way would be to stop your yarn session and
then restart it with an increased number of containers. Because then, you
wouldn't have to ship the lib directory, which might contain user code
classes, manually.

Cheers,
Till

On Wed, Jun 29, 2016 at 10:13 PM, Josh <jo...@gmail.com> wrote:

> I'm running a Flink cluster as a YARN application, started by:
> ./bin/yarn-session.sh -n 4 -jm 2048 -tm 4096 -d
>
> There are 2 worker nodes, so each are allocated 2 task managers. There is
> a stateful Flink job running on the cluster with a parallelism of 2.
>
> If I now want to increase the number of worker nodes to 3, and add 2 extra
> task managers, and then increase the job parallelism, how should I do this?
>
> I'm using EMR, so adding an extra worker node and making it available to
> YARN is easy to do via the AWS console. But I haven't been able to find any
> information in Flink docs about how to resize a running Flink cluster on
> YARN. Is it possible to resize it while the YARN application is running, or
> do I need to stop the YARN application and redeploy the cluster? Also do I
> need to redeploy my Flink job from a savepoint to increase its parallelism,
> or do I do this while the job is running?
>
> I tried redeploying the cluster having added a third worker node, via:
>
> > yarn application -kill myflinkcluster
>
> > ./bin/yarn-session.sh -n 6 -jm 2048 -tm 4096 -d
>
> (note increasing the number of task managers from 4 to 6)
>
> Surprisingly, this redeployed a Flink cluster with 4 task mangers (not 6!)
> and restored my job from the last checkpoint.
>
> Can anyone point me in the right direction?
>
> Thanks,
>
> Josh
>
>
>
>