You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@geode.apache.org by "Dong Yang (JIRA)" <ji...@apache.org> on 2018/10/18 12:39:00 UTC

[jira] [Commented] (GEODE-5896) Function sendResult can not finish correctly when client stop receive data

    [ https://issues.apache.org/jira/browse/GEODE-5896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16655175#comment-16655175 ] 

Dong Yang commented on GEODE-5896:
----------------------------------

Root cause 

Client side onRegion function invocation actually need 2 meta information ready before executing the user-define function. The first is static meta include colocateWith, bucketCount,partitionResolver etc. The second is dynamic meta that mapping the bucketId to ServerLocation.

Client should send request to right server based on these meta info. But because GemFire is a dynamic cluster, sometime maybe the network issue, maybe node down or new node join in. Client-side meta can not catch up the change. Then the request send from client should go to the node A but unfortunately go to a node B, then the request been redirect from node B to node A. When function finish the logic and sending result using sendResult as streaming style, the result data stream will firstly send from nod A to node B ,then from node B to client. If client program been killed or cancelled, node B will catch an exception:

java.net.SocketException: Connection reset by peer: socket write error.

Then the server to client channel will be closed. But node A can not get any exception because the channel between server is shared . Node A found nothing wrong, so it will continuously send data to node B till all data send out. Based on the data volume, it will cost minutes, hours or even days. If client program send a new request and killed again, server-side resource will be exhausted. At that time, the only way is restarting the cluster.

 

> Function sendResult can not finish correctly when client stop receive data
> --------------------------------------------------------------------------
>
>                 Key: GEODE-5896
>                 URL: https://issues.apache.org/jira/browse/GEODE-5896
>             Project: Geode
>          Issue Type: Bug
>          Components: functions
>            Reporter: Dong Yang
>            Priority: Major
>
> Scenario:
>  # TCP client-server mode
>  # on Region with filter invocation
>  # single-hop enabled at client-side
>  # lots of data transfer from server to client
>  # Using sendResult send data from server to client as streaming style
> Incident:
> Client program killed or exit normally. Server-side can not detect the exception so still sending data to client. Resources occupied sometimes a very long time and get more worse when client resent the request. As result, the cluster looks like hang in and can not response any request include api invocation, gfsh comand , etc.
>  
>  
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)