You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@flink.apache.org by Al-Isawi Rami <Ra...@comptel.com> on 2016/12/07 12:20:31 UTC

Replace Flink job while cluster is down

Hi,

I have faulty flink streaming program running on a cluster that is consuming from kafka,so I brought the cluster down. Now I have a new version that has the fix. Now if I bring up the flink cluster again, the old faulty program will be recovered and it will consume and stream faulty results. How can i cancel it before brining up the cluster again? there is a million of kafka messages waiting to be consumed and I do not want the old program to consume them. The cluster is backed by S3 and I found some blobs there that flink will recover the old program from, but it sounds like bad idea to just delete them.

Any ideas?


Regards,
-Rami
Disclaimer: This message and any attachments thereto are intended solely for the addressed recipient(s) and may contain confidential information. If you are not the intended recipient, please notify the sender by reply e-mail and delete the e-mail (including any attachments thereto) without producing, distributing or retaining any copies thereof. Any review, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient(s) is prohibited. Thank you.

Re: Replace Flink job while cluster is down

Posted by Ufuk Celebi <uc...@apache.org>.

With HA enabled, Flink checks the configured ZooKeeper node for pre-existing jobs and checkpoints when starting.

What Stefan meant was that you can configure a different ZooKeeper node, which will start the cluster with a clean state.

You can check the available config options here: https://ci.apache.org/projects/flink/flink-docs-release-1.2/setup/config.html#high-availability-ha


On 8 December 2016 at 15:05:52, Al-Isawi Rami (rami.al-isawi@comptel.com) wrote:
> >

Re: Replace Flink job while cluster is down

Posted by Al-Isawi Rami <Ra...@comptel.com>.

Hi Stefan,

Yes, a cluster of 3 machines. Version 1.1.1

I did not get what is the difference between “remove entry from zookeeper” and “using flink zookeeper namespaces feature”.

Eventually, I started the cluster and it did recover the old program. However, I was fast enough to click Cancel in 4 sec, before the first checkpoint kicks (5sec) .

Wouldn’t it make sense that we can still use the fink.sh to deal with jobs and cancel them even if the cluster is offline or semi offline. I am not sure what is the best solution to the case that I have faced.

Regards,
-Rami

> On 7 Dec 2016, at 16:10, Stefan Richter <s....@data-artisans.com> wrote:
>
> Hi,
>
> first a few quick questions: I assume you are running in HA mode, right? Also what version of Flink are you running?
>
> In case you are not running HA, nothing is automatically recovered. With HA, you would need to manually remove the corresponding entry from Zookeeper. If this is the problem, I suggest using Flink’s Zookeeper namespaces feature, to isolate different runs of a job.
>
> Best,
> Stefan
>
>
>> Am 07.12.2016 um 13:20 schrieb Al-Isawi Rami <Ra...@comptel.com>:
>>
>> Hi,
>>
>> I have faulty flink streaming program running on a cluster that is consuming from kafka,so I brought the cluster down. Now I have a new version that has the fix. Now if I bring up the flink cluster again, the old faulty program will be recovered and it will consume and stream faulty results. How can i cancel it before brining up the cluster again? there is a million of kafka messages waiting to be consumed and I do not want the old program to consume them. The cluster is backed by S3 and I found some blobs there that flink will recover the old program from, but it sounds like bad idea to just delete them.
>>
>> Any ideas?
>>
>>
>> Regards,
>> -Rami
>> Disclaimer: This message and any attachments thereto are intended solely for the addressed recipient(s) and may contain confidential information. If you are not the intended recipient, please notify the sender by reply e-mail and delete the e-mail (including any attachments thereto) without producing, distributing or retaining any copies thereof. Any review, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient(s) is prohibited. Thank you.
>

Disclaimer: This message and any attachments thereto are intended solely for the addressed recipient(s) and may contain confidential information. If you are not the intended recipient, please notify the sender by reply e-mail and delete the e-mail (including any attachments thereto) without producing, distributing or retaining any copies thereof. Any review, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient(s) is prohibited. Thank you.

Re: Replace Flink job while cluster is down

Posted by Stefan Richter <s....@data-artisans.com>.

Hi,

first a few quick questions: I assume you are running in HA mode, right? Also what version of Flink are you running?

In case you are not running HA, nothing is automatically recovered. With HA, you would need to manually remove the corresponding entry from Zookeeper. If this is the problem, I suggest using Flink’s Zookeeper namespaces feature, to isolate different runs of a job.

Best,
Stefan


> Am 07.12.2016 um 13:20 schrieb Al-Isawi Rami <Ra...@comptel.com>:
> 
> Hi,
> 
> I have faulty flink streaming program running on a cluster that is consuming from kafka,so I brought the cluster down. Now I have a new version that has the fix. Now if I bring up the flink cluster again, the old faulty program will be recovered and it will consume and stream faulty results. How can i cancel it before brining up the cluster again? there is a million of kafka messages waiting to be consumed and I do not want the old program to consume them. The cluster is backed by S3 and I found some blobs there that flink will recover the old program from, but it sounds like bad idea to just delete them.
> 
> Any ideas?
> 
> 
> Regards,
> -Rami
> Disclaimer: This message and any attachments thereto are intended solely for the addressed recipient(s) and may contain confidential information. If you are not the intended recipient, please notify the sender by reply e-mail and delete the e-mail (including any attachments thereto) without producing, distributing or retaining any copies thereof. Any review, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient(s) is prohibited. Thank you.