You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by Yang Wang <da...@gmail.com> on 2020/03/02 02:44:25 UTC

Re: Timeout error in ZooKeeper

Hi Samir.

It seems that your zookeeper connection timeout is set to 3000ms. And it
did not
connect to server for 14305ms, maybe due to full gc or network problem. When
it reconnected, the "ConnectionLossException" will be thrown.


So have you ever change the zookeeper client related timeout configurations
in Flink?
Or could you confirm the zookeeper server side timeout settings?


Best,
Yang

Samir Tusharbhai Chauhan <sa...@prudential.com.sg>
于2020年3月1日周日 上午12:57写道:

> Hi @Till Rohrmann <tr...@apache.org>,
>
>
>
> Thanks for the response. Unfortunately I could not capture much log on
> Flink side. I am still attaching whatever I could collect.
>
>
>
> I found this old ticket on same error. Not sure if this is related anyway.
>
> https://issues.apache.org/jira/browse/ZOOKEEPER-1582
>
>
>
> Somewhere I also read that it could be related to Znodes that ZNodes
> containing too much data or having too many children. By default ZooKeeper
> has a 1 MB transport limit.
>
>
>
> Warm Regards,
>
> *Samir Chauhan*
>
>
>
> *Regional Infrastructure & Operations*
>
>
>
> [image: cid:image002.png@01D12B8E.C23F3E10]
>
>
>
> *Prudential Services Singapore Pte Ltd *
>
> 1 Wallich Street #19-01, Guoco Tower Singapore 078881
>
>
>
> Direct (65) 6704 7264 Mobile (65) 9721 7548
>
> samir.tusharbhai.chauhan@prudential.com.sg
>
>
>
> www.prudential.com.sg
>
>
>
> *From:* Till Rohrmann <tr...@apache.org>
> *Sent:* Saturday, February 29, 2020 11:28 PM
> *To:* Samir Tusharbhai Chauhan <samir.tusharbhai.chauhan@prudential.com.sg
> >
> *Cc:* user@flink.apache.org
> *Subject:* Re: Timeout error in ZooKeeper
>
>
>
> Hi Samir,
>
>
>
> it is hard to tell what exactly happened without the Flink logs. However,
> newer Flink versions include some ZooKeeper improvements and fixes for some
> bugs [1]. Hence, it might make sense to try to upgrade your Flink version.
>
>
>
> [1] https://issues.apache.org/jira/browse/FLINK-14091
> <https://clicktime.symantec.com/38Q8Y9UEP4rRdbfq7PDFN9Y7Vc?u=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FFLINK-14091>
>
>
>
> Cheers,
>
> Till
>
>
>
> On Fri, Feb 28, 2020 at 7:41 PM Samir Tusharbhai Chauhan <
> samir.tusharbhai.chauhan@prudential.com.sg> wrote:
>
> *Hi,*
>
>
>
> Yesterday morning I got below error in Zookeeper. After this error, my
> Flink did not connect to ZK and jobs went to hang state. I had to cancel
> and redeploy my all jobs to bring it to normal state.
>
> 2020-02-28 02:45:56,811 [myid:1] - WARN  [NIOServerCxn.Factory:
> 0.0.0.0/0.0.0.0:2181:NIOServerCnxn@368] - caught end of stream exception
> EndOfStreamException: Unable to read additional data from client sessionid
> 0x1701028573403f3, likely client has closed socket
>         at
> org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:239)
>         at
> org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:203)
>         at java.lang.Thread.run(Thread.java:748)
>
> At the same time I saw below error in Flink.
>
> 2020-02-28 02:46:49,095 ERROR
> org.apache.curator.ConnectionState                            - Connection
> timed out for connection string (zk-cs:2181) and timeout (3000) / elapsed
> (14305)
>
> org.apache.curator.CuratorConnectionLossException: KeeperErrorCode =
> ConnectionLoss
>
>       at
> org.apache.curator.ConnectionState.checkTimeouts(ConnectionState.java:225)
>
>       at
> org.apache.curator.ConnectionState.getZooKeeper(ConnectionState.java:94)
>
>       at
> org.apache.curator.CuratorZookeeperClient.getZooKeeper(CuratorZookeeperClient.java:117)
>
>       at
> org.apache.curator.framework.imps.CuratorFrameworkImpl.performBackgroundOperation(CuratorFrameworkImpl.java:835)
>
>       at
> org.apache.curator.framework.imps.CuratorFrameworkImpl.backgroundOperationsLoop(CuratorFrameworkImpl.java:809)
>
>       at
> org.apache.curator.framework.imps.CuratorFrameworkImpl.access$300(CuratorFrameworkImpl.java:64)
>
>       at
> org.apache.curator.framework.imps.CuratorFrameworkImpl$4.call(CuratorFrameworkImpl.java:267)
>
>       at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>
>       at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
>
>       at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
>
>       at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>
>       at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>
>       at java.lang.Thread.run(Thread.java:748)
>
>
>
> Has anyone face similar error earlier.
>
>
>
> *My environment is*
>
> Azure Kubernetes 1.15.7
>
> Flink 1.6.0
>
> Zookeeper 3.4.10
>
>
>
> Warm Regards,
>
> *Samir Chauhan*
>
>
>
>
>
>
> There's a reason we support Fair Dealing. YOU.
>
>
> This email and any files transmitted with it or attached to it (the
> [Email]) may contain confidential, proprietary or legally privileged
> information and is intended solely for the use of the individual or entity
> to whom it is addressed. If you are not the intended recipient of the
> Email, you must not, directly or indirectly, copy, use, print, distribute,
> disclose to any other party or take any action in reliance on any part of
> the Email. Please notify the system manager or sender of the error and
> delete all copies of the Email immediately.
>
> No statement in the Email should be construed as investment advice being
> given within or outside Singapore. Prudential Assurance Company Singapore
> (Pte) Limited (PACS) and each of its related entities shall not be
> responsible for any losses, claims, penalties, costs or damages arising
> from or in connection with the use of the Email or the information therein,
> in whole or in part. You are solely responsible for conducting any virus
> checks prior to opening, accessing or disseminating the Email.
>
> PACS (Company Registration No. 199002477Z) is a company incorporated under
> the laws of Singapore and has its registered office at 30 Cecil Street,
> #30-01, Prudential Tower, Singapore 049712.
>
> PACS is an indirect wholly owned subsidiary of Prudential plc of the
> United Kingdom. PACS and Prudential plc are not affiliated in any manner
> with Prudential Financial, Inc., a company whose principal place of
> business is in the United States of America.
>
>
> There's a reason we support Fair Dealing. YOU.
>
>
> This email and any files transmitted with it or attached to it (the
> [Email]) may contain confidential, proprietary or legally privileged
> information and is intended solely for the use of the individual or entity
> to whom it is addressed. If you are not the intended recipient of the
> Email, you must not, directly or indirectly, copy, use, print, distribute,
> disclose to any other party or take any action in reliance on any part of
> the Email. Please notify the system manager or sender of the error and
> delete all copies of the Email immediately.
>
> No statement in the Email should be construed as investment advice being
> given within or outside Singapore. Prudential Assurance Company Singapore
> (Pte) Limited (PACS) and each of its related entities shall not be
> responsible for any losses, claims, penalties, costs or damages arising
> from or in connection with the use of the Email or the information therein,
> in whole or in part. You are solely responsible for conducting any virus
> checks prior to opening, accessing or disseminating the Email.
>
> PACS (Company Registration No. 199002477Z) is a company incorporated under
> the laws of Singapore and has its registered office at 30 Cecil Street,
> #30-01, Prudential Tower, Singapore 049712.
>
> PACS is an indirect wholly owned subsidiary of Prudential plc of the
> United Kingdom. PACS and Prudential plc are not affiliated in any manner
> with Prudential Financial, Inc., a company whose principal place of
> business is in the United States of America.
>