You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "Dong Lin (Jira)" <ji...@apache.org> on 2023/03/31 08:19:00 UTC
[jira] [Created] (FLINK-31681) Network connection timeout between operators should trigger either network re-connection or job failover
Dong Lin created FLINK-31681:
--------------------------------
Summary: Network connection timeout between operators should trigger either network re-connection or job failover
Key: FLINK-31681
URL: https://issues.apache.org/jira/browse/FLINK-31681
Project: Flink
Issue Type: Bug
Reporter: Dong Lin
If a network connection error occurs between two operators, the upstream operator may log the following error message in the method PartitionRequestQueue#handleException and subsequently close the connection. When this happens, the Flink job may become stuck without completing or failing.
To avoid this issue, we can either allow the upstream operator to reconnect with the downstream operator, or enable job failover so that users can take corrective action promptly.
org.apache.flink.runtime.io.network.netty.PartitionRequestQueue - Encountered error while consuming partitions org.apache.flink.shaded.netty4.io.netty.channel.unix.Errors#NativeIOException: writeAccess(...) failed: Connection timed out.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)