You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@oozie.apache.org by "Abhishek Bafna (JIRA)" <ji...@apache.org> on 2017/01/06 07:41:58 UTC

[jira] [Commented] (OOZIE-2654) Zookeeper dependent services should not depend on Connectionstate to be valid before cleaning up

    [ https://issues.apache.org/jira/browse/OOZIE-2654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15803873#comment-15803873 ] 

Abhishek Bafna commented on OOZIE-2654:
---------------------------------------

+1. 
Committed to master. 
Thanks [~venkatnrangan] for the patch.

> Zookeeper dependent services should not depend on Connectionstate to be valid before cleaning up
> ------------------------------------------------------------------------------------------------
>
>                 Key: OOZIE-2654
>                 URL: https://issues.apache.org/jira/browse/OOZIE-2654
>             Project: Oozie
>          Issue Type: Bug
>          Components: HA
>    Affects Versions: 4.2.0
>            Reporter: Venkat Ranganathan
>            Assignee: Venkat Ranganathan
>             Fix For: 5.0.0
>
>         Attachments: OOZIE-2654.diff
>
>
> Currently in ZKUtils, ZKLocks and ZKJobsConcurrency services, we don't properly teardown the zookeeper connections when the callback was not received from zookeeper to change the connection state.
> We can get into this situation if the ZK session for example was closed by ZK before any callback was received to update the connection state. This can cause the oozie server in a HA mode to not terminate  with one or more sockets in close_wait state.
> Here is an instance of this issue
> From the network connections, we have one connection still on close_wait with indefinite wait.
> {quote} tcp6 143 0 x.x.x.1:46710 x.x.x.2:2181 CLOSE_WAIT 4688/java off (0.00/0/0)
> {quote}
> From the zookeeper logs,
> {quote}
> 016-08-18 20:45:29,921 - INFO NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@868 - Client attempting to establish new session at /x.x.x.1:46710 2016-08-18 20:45:29,926 - INFO CommitProcessor:1:ZooKeeperServer@617 - Established session 0x1569f576843000e with negotiated timeout 40000 for client /x.x.x.1:46710
> {quote}
> and later
> {quote}
> 2016-08-18 20:46:34,008 - INFO CommitProcessor:1:NIOServerCnxn@1007 - Closed socket connection for client /x.x.x.1:46710 which had sessionid 0x1569f576843000e
> {quote}
> The fix is to not check for the connectionstate during service destroy and  teardown the zk connections.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)