You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by Seye Jin <se...@gmail.com> on 2019/03/02 04:20:18 UTC

Flink 1.7.1 Inaccessible

I am getting "service temporarily unavailable due to an ongoing leader
election" when I try to access Flink UI. The jobmanager has HA configured,
I have tried to restart jobmanager multiple times but no luck. I also tried
submitting my job from console but I also get the same message.
When I view logs during JM restart I see no errors, it even says
"jobmanager was granted leadership with ..."
Any hints to try and remediate this issue will be much appreciated. I have
multiple stateful applications running so I cannot start a new
cluster(since I am unable to do a savepoint also).
Thanks

Re: EXT :Re: Flink 1.7.1 Inaccessible

Posted by Seye Jin <se...@gmail.com>.
You will have to copy and the link in it's entirety,Gmail not recognizing
correctly
http://mail-archives.apache.org/mod_mbox/flink-user/201709.mbox/<
533686A2-71EE-4356-8961-68CF3F858EEC@expedia.com>

On Wed, Mar 6, 2019, 5:26 AM Till Rohrmann <tr...@apache.org> wrote:

> Hmm this is strange. Retrieving more information from the logs would be
> helpful to better understand the problem.
>
> The link to the related discussion does not work. Maybe you could repost
> it.
>
> Cheers,
> Till
>
> On Wed, Mar 6, 2019 at 4:32 AM Seye Jin <se...@gmail.com> wrote:
>
>>
>> Hi till, there were no warn or error log messages. We have been using
>> Flink for a long time now and never experienced this issue(we just migrated
>> to 1.7 from 1.4 though).It was a critical app and after multiple tries to
>> try and resolve, we updated the *high-availabilty.cluster-id* and attached
>> the TMs to new JM(even though we sadly lost state)
>>
>> @nick we are indeed running Flink and zookeeper in docker and we verified
>> it could resolve hostname, plus it got a new leader id, it even
>> acknowledged registering the jobs running on the cluster(even though
>> checkpoints were not getting triggered)
>>
>> We are keeping a close eye on this issue and trying to replicate and sift
>> through kibana logs and will post here if we find anything.
>>
>> P.S: it kind of looks similar to this that happened a while back (
>> http://mail-archives.apache.org/mod_mbox/flink-user/201709.mbox/<
>> 533686A2-71EE-4356-8961-68CF3F858EEC@expedia.com>
>> )
>>
>>
>> On Mon, Mar 4, 2019, 12:38 PM Martin, Nick <Ni...@ngc.com> wrote:
>>
>>> Seye, are you running Flink and Zookeeper in Docker? I’ve had problems
>>> with Jobmanagers not resolving the hostnames for Zookeeper when starting a
>>> stack on Docker.
>>>
>>>
>>>
>>> *From:* Till Rohrmann [mailto:trohrmann@apache.org]
>>> *Sent:* Monday, March 04, 2019 7:02 AM
>>> *To:* Seye Jin <se...@gmail.com>
>>> *Cc:* user <us...@flink.apache.org>
>>> *Subject:* EXT :Re: Flink 1.7.1 Inaccessible
>>>
>>>
>>>
>>> Hi Seye,
>>>
>>>
>>>
>>> usually, Flink's web UI should be accessible after a successful leader
>>> election. Could you share with us the cluster logs to see what's going on?
>>> Without this information it is hard to tell what's going wrong.
>>>
>>>
>>>
>>> What you could also do is to check the ZooKeeper znode which represents
>>> the cluster id (if you are using Yarn it should be something like
>>> /flink/application_...). There you could check the contents of the leader
>>> znode of the web ui (leader/rest_server_lock). It should contain the
>>> address of the current leader if there is one.
>>>
>>>
>>>
>>> Cheers,
>>>
>>> Till
>>>
>>>
>>>
>>> On Sat, Mar 2, 2019 at 5:29 AM Seye Jin <se...@gmail.com> wrote:
>>>
>>> I am getting "service temporarily unavailable due to an ongoing leader
>>> election" when I try to access Flink UI. The jobmanager has HA configured,
>>> I have tried to restart jobmanager multiple times but no luck. I also tried
>>> submitting my job from console but I also get the same message.
>>>
>>> When I view logs during JM restart I see no errors, it even says
>>> "jobmanager was granted leadership with ..."
>>>
>>> Any hints to try and remediate this issue will be much appreciated. I
>>> have multiple stateful applications running so I cannot start a new
>>> cluster(since I am unable to do a savepoint also).
>>>
>>> Thanks
>>>
>>>
>>>
>>>
>>> ------------------------------
>>>
>>> Notice: This e-mail is intended solely for use of the individual or
>>> entity to which it is addressed and may contain information that is
>>> proprietary, privileged and/or exempt from disclosure under applicable law.
>>> If the reader is not the intended recipient or agent responsible for
>>> delivering the message to the intended recipient, you are hereby notified
>>> that any dissemination, distribution or copying of this communication is
>>> strictly prohibited. This communication may also contain data subject to
>>> U.S. export laws. If so, data subject to the International Traffic in Arms
>>> Regulation cannot be disseminated, distributed, transferred, or copied,
>>> whether incorporated or in its original form, to foreign nationals residing
>>> in the U.S. or abroad, absent the express prior approval of the U.S.
>>> Department of State. Data subject to the Export Administration Act may not
>>> be disseminated, distributed, transferred or copied contrary to U. S.
>>> Department of Commerce regulations. If you have received this communication
>>> in error, please notify the sender by reply e-mail and destroy the e-mail
>>> message and any physical copies made of the communication.
>>>  Thank you.
>>> *********************
>>>
>>> ------------------------------
>>> Notice: This e-mail is intended solely for use of the individual or
>>> entity to which it is addressed and may contain information that is
>>> proprietary, privileged and/or exempt from disclosure under applicable law.
>>> If the reader is not the intended recipient or agent responsible for
>>> delivering the message to the intended recipient, you are hereby notified
>>> that any dissemination, distribution or copying of this communication is
>>> strictly prohibited. This communication may also contain data subject to
>>> U.S. export laws. If so, data subject to the International Traffic in Arms
>>> Regulation cannot be disseminated, distributed, transferred, or copied,
>>> whether incorporated or in its original form, to foreign nationals residing
>>> in the U.S. or abroad, absent the express prior approval of the U.S.
>>> Department of State. Data subject to the Export Administration Act may not
>>> be disseminated, distributed, transferred or copied contrary to U. S.
>>> Department of Commerce regulations. If you have received this communication
>>> in error, please notify the sender by reply e-mail and destroy the e-mail
>>> message and any physical copies made of the communication.
>>>  Thank you.
>>> *********************
>>>
>>

Re: EXT :Re: Flink 1.7.1 Inaccessible

Posted by Till Rohrmann <tr...@apache.org>.
Hmm this is strange. Retrieving more information from the logs would be
helpful to better understand the problem.

The link to the related discussion does not work. Maybe you could repost it.

Cheers,
Till

On Wed, Mar 6, 2019 at 4:32 AM Seye Jin <se...@gmail.com> wrote:

>
> Hi till, there were no warn or error log messages. We have been using
> Flink for a long time now and never experienced this issue(we just migrated
> to 1.7 from 1.4 though).It was a critical app and after multiple tries to
> try and resolve, we updated the *high-availabilty.cluster-id* and attached
> the TMs to new JM(even though we sadly lost state)
>
> @nick we are indeed running Flink and zookeeper in docker and we verified
> it could resolve hostname, plus it got a new leader id, it even
> acknowledged registering the jobs running on the cluster(even though
> checkpoints were not getting triggered)
>
> We are keeping a close eye on this issue and trying to replicate and sift
> through kibana logs and will post here if we find anything.
>
> P.S: it kind of looks similar to this that happened a while back (
> http://mail-archives.apache.org/mod_mbox/flink-user/201709.mbox/<
> 533686A2-71EE-4356-8961-68CF3F858EEC@expedia.com>
> )
>
>
> On Mon, Mar 4, 2019, 12:38 PM Martin, Nick <Ni...@ngc.com> wrote:
>
>> Seye, are you running Flink and Zookeeper in Docker? I’ve had problems
>> with Jobmanagers not resolving the hostnames for Zookeeper when starting a
>> stack on Docker.
>>
>>
>>
>> *From:* Till Rohrmann [mailto:trohrmann@apache.org]
>> *Sent:* Monday, March 04, 2019 7:02 AM
>> *To:* Seye Jin <se...@gmail.com>
>> *Cc:* user <us...@flink.apache.org>
>> *Subject:* EXT :Re: Flink 1.7.1 Inaccessible
>>
>>
>>
>> Hi Seye,
>>
>>
>>
>> usually, Flink's web UI should be accessible after a successful leader
>> election. Could you share with us the cluster logs to see what's going on?
>> Without this information it is hard to tell what's going wrong.
>>
>>
>>
>> What you could also do is to check the ZooKeeper znode which represents
>> the cluster id (if you are using Yarn it should be something like
>> /flink/application_...). There you could check the contents of the leader
>> znode of the web ui (leader/rest_server_lock). It should contain the
>> address of the current leader if there is one.
>>
>>
>>
>> Cheers,
>>
>> Till
>>
>>
>>
>> On Sat, Mar 2, 2019 at 5:29 AM Seye Jin <se...@gmail.com> wrote:
>>
>> I am getting "service temporarily unavailable due to an ongoing leader
>> election" when I try to access Flink UI. The jobmanager has HA configured,
>> I have tried to restart jobmanager multiple times but no luck. I also tried
>> submitting my job from console but I also get the same message.
>>
>> When I view logs during JM restart I see no errors, it even says
>> "jobmanager was granted leadership with ..."
>>
>> Any hints to try and remediate this issue will be much appreciated. I
>> have multiple stateful applications running so I cannot start a new
>> cluster(since I am unable to do a savepoint also).
>>
>> Thanks
>>
>>
>>
>>
>> ------------------------------
>>
>> Notice: This e-mail is intended solely for use of the individual or
>> entity to which it is addressed and may contain information that is
>> proprietary, privileged and/or exempt from disclosure under applicable law.
>> If the reader is not the intended recipient or agent responsible for
>> delivering the message to the intended recipient, you are hereby notified
>> that any dissemination, distribution or copying of this communication is
>> strictly prohibited. This communication may also contain data subject to
>> U.S. export laws. If so, data subject to the International Traffic in Arms
>> Regulation cannot be disseminated, distributed, transferred, or copied,
>> whether incorporated or in its original form, to foreign nationals residing
>> in the U.S. or abroad, absent the express prior approval of the U.S.
>> Department of State. Data subject to the Export Administration Act may not
>> be disseminated, distributed, transferred or copied contrary to U. S.
>> Department of Commerce regulations. If you have received this communication
>> in error, please notify the sender by reply e-mail and destroy the e-mail
>> message and any physical copies made of the communication.
>>  Thank you.
>> *********************
>>
>> ------------------------------
>> Notice: This e-mail is intended solely for use of the individual or
>> entity to which it is addressed and may contain information that is
>> proprietary, privileged and/or exempt from disclosure under applicable law.
>> If the reader is not the intended recipient or agent responsible for
>> delivering the message to the intended recipient, you are hereby notified
>> that any dissemination, distribution or copying of this communication is
>> strictly prohibited. This communication may also contain data subject to
>> U.S. export laws. If so, data subject to the International Traffic in Arms
>> Regulation cannot be disseminated, distributed, transferred, or copied,
>> whether incorporated or in its original form, to foreign nationals residing
>> in the U.S. or abroad, absent the express prior approval of the U.S.
>> Department of State. Data subject to the Export Administration Act may not
>> be disseminated, distributed, transferred or copied contrary to U. S.
>> Department of Commerce regulations. If you have received this communication
>> in error, please notify the sender by reply e-mail and destroy the e-mail
>> message and any physical copies made of the communication.
>>  Thank you.
>> *********************
>>
>

Re: EXT :Re: Flink 1.7.1 Inaccessible

Posted by Seye Jin <se...@gmail.com>.
Hi till, there were no warn or error log messages. We have been using Flink
for a long time now and never experienced this issue(we just migrated to
1.7 from 1.4 though).It was a critical app and after multiple tries to try
and resolve, we updated the *high-availabilty.cluster-id* and attached the
TMs to new JM(even though we sadly lost state)

@nick we are indeed running Flink and zookeeper in docker and we verified
it could resolve hostname, plus it got a new leader id, it even
acknowledged registering the jobs running on the cluster(even though
checkpoints were not getting triggered)

We are keeping a close eye on this issue and trying to replicate and sift
through kibana logs and will post here if we find anything.

P.S: it kind of looks similar to this that happened a while back (
http://mail-archives.apache.org/mod_mbox/flink-user/201709.mbox/<
533686A2-71EE-4356-8961-68CF3F858EEC@expedia.com>
)


On Mon, Mar 4, 2019, 12:38 PM Martin, Nick <Ni...@ngc.com> wrote:

> Seye, are you running Flink and Zookeeper in Docker? I’ve had problems
> with Jobmanagers not resolving the hostnames for Zookeeper when starting a
> stack on Docker.
>
>
>
> *From:* Till Rohrmann [mailto:trohrmann@apache.org]
> *Sent:* Monday, March 04, 2019 7:02 AM
> *To:* Seye Jin <se...@gmail.com>
> *Cc:* user <us...@flink.apache.org>
> *Subject:* EXT :Re: Flink 1.7.1 Inaccessible
>
>
>
> Hi Seye,
>
>
>
> usually, Flink's web UI should be accessible after a successful leader
> election. Could you share with us the cluster logs to see what's going on?
> Without this information it is hard to tell what's going wrong.
>
>
>
> What you could also do is to check the ZooKeeper znode which represents
> the cluster id (if you are using Yarn it should be something like
> /flink/application_...). There you could check the contents of the leader
> znode of the web ui (leader/rest_server_lock). It should contain the
> address of the current leader if there is one.
>
>
>
> Cheers,
>
> Till
>
>
>
> On Sat, Mar 2, 2019 at 5:29 AM Seye Jin <se...@gmail.com> wrote:
>
> I am getting "service temporarily unavailable due to an ongoing leader
> election" when I try to access Flink UI. The jobmanager has HA configured,
> I have tried to restart jobmanager multiple times but no luck. I also tried
> submitting my job from console but I also get the same message.
>
> When I view logs during JM restart I see no errors, it even says
> "jobmanager was granted leadership with ..."
>
> Any hints to try and remediate this issue will be much appreciated. I have
> multiple stateful applications running so I cannot start a new
> cluster(since I am unable to do a savepoint also).
>
> Thanks
>
>
>
>
> ------------------------------
>
> Notice: This e-mail is intended solely for use of the individual or entity
> to which it is addressed and may contain information that is proprietary,
> privileged and/or exempt from disclosure under applicable law. If the
> reader is not the intended recipient or agent responsible for delivering
> the message to the intended recipient, you are hereby notified that any
> dissemination, distribution or copying of this communication is strictly
> prohibited. This communication may also contain data subject to U.S. export
> laws. If so, data subject to the International Traffic in Arms Regulation
> cannot be disseminated, distributed, transferred, or copied, whether
> incorporated or in its original form, to foreign nationals residing in the
> U.S. or abroad, absent the express prior approval of the U.S. Department of
> State. Data subject to the Export Administration Act may not be
> disseminated, distributed, transferred or copied contrary to U. S.
> Department of Commerce regulations. If you have received this communication
> in error, please notify the sender by reply e-mail and destroy the e-mail
> message and any physical copies made of the communication.
>  Thank you.
> *********************
>
> ------------------------------
> Notice: This e-mail is intended solely for use of the individual or entity
> to which it is addressed and may contain information that is proprietary,
> privileged and/or exempt from disclosure under applicable law. If the
> reader is not the intended recipient or agent responsible for delivering
> the message to the intended recipient, you are hereby notified that any
> dissemination, distribution or copying of this communication is strictly
> prohibited. This communication may also contain data subject to U.S. export
> laws. If so, data subject to the International Traffic in Arms Regulation
> cannot be disseminated, distributed, transferred, or copied, whether
> incorporated or in its original form, to foreign nationals residing in the
> U.S. or abroad, absent the express prior approval of the U.S. Department of
> State. Data subject to the Export Administration Act may not be
> disseminated, distributed, transferred or copied contrary to U. S.
> Department of Commerce regulations. If you have received this communication
> in error, please notify the sender by reply e-mail and destroy the e-mail
> message and any physical copies made of the communication.
>  Thank you.
> *********************
>

RE: EXT :Re: Flink 1.7.1 Inaccessible

Posted by "Martin, Nick" <Ni...@ngc.com>.
Seye, are you running Flink and Zookeeper in Docker? I’ve had problems with Jobmanagers not resolving the hostnames for Zookeeper when starting a stack on Docker.

From: Till Rohrmann [mailto:trohrmann@apache.org]
Sent: Monday, March 04, 2019 7:02 AM
To: Seye Jin <se...@gmail.com>
Cc: user <us...@flink.apache.org>
Subject: EXT :Re: Flink 1.7.1 Inaccessible

Hi Seye,

usually, Flink's web UI should be accessible after a successful leader election. Could you share with us the cluster logs to see what's going on? Without this information it is hard to tell what's going wrong.

What you could also do is to check the ZooKeeper znode which represents the cluster id (if you are using Yarn it should be something like /flink/application_...). There you could check the contents of the leader znode of the web ui (leader/rest_server_lock). It should contain the address of the current leader if there is one.

Cheers,
Till

On Sat, Mar 2, 2019 at 5:29 AM Seye Jin <se...@gmail.com>> wrote:
I am getting "service temporarily unavailable due to an ongoing leader election" when I try to access Flink UI. The jobmanager has HA configured, I have tried to restart jobmanager multiple times but no luck. I also tried submitting my job from console but I also get the same message.
When I view logs during JM restart I see no errors, it even says "jobmanager was granted leadership with ..."
Any hints to try and remediate this issue will be much appreciated. I have multiple stateful applications running so I cannot start a new cluster(since I am unable to do a savepoint also).
Thanks


________________________________
Notice: This e-mail is intended solely for use of the individual or entity to which it is addressed and may contain information that is proprietary, privileged and/or exempt from disclosure under applicable law. If the reader is not the intended recipient or agent responsible for delivering the message to the intended recipient, you are hereby notified that any dissemination, distribution or copying of this communication is strictly prohibited. This communication may also contain data subject to U.S. export laws. If so, data subject to the International Traffic in Arms Regulation cannot be disseminated, distributed, transferred, or copied, whether incorporated or in its original form, to foreign nationals residing in the U.S. or abroad, absent the express prior approval of the U.S. Department of State. Data subject to the Export Administration Act may not be disseminated, distributed, transferred or copied contrary to U. S. Department of Commerce regulations. If you have received this communication in error, please notify the sender by reply e-mail and destroy the e-mail message and any physical copies made of the communication.
 Thank you.
*********************


------------------------------------------------------------------------------

Notice: This e-mail is intended solely for use of the individual or entity to which it is addressed and may contain information that is proprietary, privileged and/or exempt from disclosure under applicable law. If the reader is not the intended recipient or agent responsible for delivering the message to the intended recipient, you are hereby notified that any dissemination, distribution or copying of this communication is strictly prohibited. This communication may also contain data subject to U.S. export laws. If so, data subject to the International Traffic in Arms Regulation cannot be disseminated, distributed, transferred, or copied, whether incorporated or in its original form, to foreign nationals residing in the U.S. or abroad, absent the express prior approval of the U.S. Department of State. Data subject to the Export Administration Act may not be disseminated, distributed, transferred or copied contrary to U. S. Department of Commerce regulations. If you have received this communication in error, please notify the sender by reply e-mail and destroy the e-mail message and any physical copies made of the communication.
 Thank you. 
*********************

Re: Flink 1.7.1 Inaccessible

Posted by Till Rohrmann <tr...@apache.org>.
Hi Seye,

usually, Flink's web UI should be accessible after a successful leader
election. Could you share with us the cluster logs to see what's going on?
Without this information it is hard to tell what's going wrong.

What you could also do is to check the ZooKeeper znode which represents the
cluster id (if you are using Yarn it should be something like
/flink/application_...). There you could check the contents of the leader
znode of the web ui (leader/rest_server_lock). It should contain the
address of the current leader if there is one.

Cheers,
Till

On Sat, Mar 2, 2019 at 5:29 AM Seye Jin <se...@gmail.com> wrote:

> I am getting "service temporarily unavailable due to an ongoing leader
> election" when I try to access Flink UI. The jobmanager has HA configured,
> I have tried to restart jobmanager multiple times but no luck. I also tried
> submitting my job from console but I also get the same message.
> When I view logs during JM restart I see no errors, it even says
> "jobmanager was granted leadership with ..."
> Any hints to try and remediate this issue will be much appreciated. I have
> multiple stateful applications running so I cannot start a new
> cluster(since I am unable to do a savepoint also).
> Thanks
>
>