You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "Flink Jira Bot (Jira)" <ji...@apache.org> on 2022/01/04 10:40:00 UTC

[jira] [Updated] (FLINK-16030) Add heartbeat between netty server and client to detect long connection alive

     [ https://issues.apache.org/jira/browse/FLINK-16030?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Flink Jira Bot updated FLINK-16030:
-----------------------------------
      Labels: auto-deprioritized-major auto-deprioritized-minor auto-unassigned  (was: auto-deprioritized-major auto-unassigned stale-minor)
    Priority: Not a Priority  (was: Minor)

This issue was labeled "stale-minor" 7 days ago and has not received any updates so it is being deprioritized. If this ticket is actually Minor, please raise the priority and ask a committer to assign you the issue or revive the public discussion.


> Add heartbeat between netty server and client to detect long connection alive
> -----------------------------------------------------------------------------
>
>                 Key: FLINK-16030
>                 URL: https://issues.apache.org/jira/browse/FLINK-16030
>             Project: Flink
>          Issue Type: Improvement
>          Components: Runtime / Network
>    Affects Versions: 1.7.2, 1.8.3, 1.9.2, 1.10.0
>            Reporter: begginghard
>            Priority: Not a Priority
>              Labels: auto-deprioritized-major, auto-deprioritized-minor, auto-unassigned
>
> As reported on [the user mailing list|https://lists.apache.org/list.html?user@flink.apache.org:lte=1M:Encountered%20error%20while%20consuming%20partitions]
> Network can fail in many ways, sometimes pretty subtle (e.g. high ratio packet loss).  
> When the long tcp connection between netty client and server is lost, the server would failed to send response to the client, then shut down the channel. At the same time, the netty client does not know that the connection has been disconnected, so it has been waiting for two hours.
> To detect the long tcp connection alive on netty client and server, we should have two ways: tcp keepalive and heartbeat.
>  
> The tcp keepalive is 2 hours by default. When the long tcp connection dead, you continue to wait for 2 hours, the netty client will trigger exception and enter failover recovery.
> If you want to detect quickly, netty provides IdleStateHandler which it use ping-pang mechanism. If netty client sends continuously n ping message and receives no one pang message, then trigger exception.
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)