You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@manifoldcf.apache.org by "Karl Wright (JIRA)" <ji...@apache.org> on 2014/02/20 20:45:19 UTC

[jira] [Commented] (CONNECTORS-898) Agents fail to start if ZK ensemble member missing

    [ https://issues.apache.org/jira/browse/CONNECTORS-898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13907392#comment-13907392 ] 

Karl Wright commented on CONNECTORS-898:
----------------------------------------

What is the proposed fix for this?



> Agents fail to start if ZK ensemble member missing
> --------------------------------------------------
>
>                 Key: CONNECTORS-898
>                 URL: https://issues.apache.org/jira/browse/CONNECTORS-898
>             Project: ManifoldCF
>          Issue Type: Bug
>          Components: Framework agents process
>    Affects Versions: ManifoldCF 1.5
>         Environment: 4 Agents
> 3 member ZK ensemble (2 live, 1 dead)
>            Reporter: Graeme Seaton
>
> If a member of the ZK ensemble is down but there is still a majority of members active so that ZK is 'live' then when the agents startup any agents that try to connect to the missing member abort with:
> Opening socket connection to server overlorddev03/10.250.0.36:2181. Will not att
> empt to authenticate using SASL (unknown error)
> 71 [main-SendThread(overlorddev03:2181)] WARN org.apache.zookeeper.ClientCnxn - 
> Session 0x0 for server null, unexpected error, closing socket connection and att
> empting reconnect
> java.net.ConnectException: Connection refused
>         at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>         at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:735
> )
>         at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocket
> NIO.java:350)
>         at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1068)
> followed by:
> org.apache.manifoldcf.core.interfaces.ManifoldCFException: Initialization failed: KeeperErrorCode = ConnectionLoss for /org.apache.manifoldcf.configuration
>         at org.apache.manifoldcf.core.system.ManifoldCF.initializeEnvironment(ManifoldCF.java:269)
>         at org.apache.manifoldcf.agents.system.ManifoldCF.initializeEnvironment(ManifoldCF.java:43)
>         at org.apache.manifoldcf.agents.BaseAgentsInitializationCommand.execute(BaseAgentsInitializationCommand.java:36)
>         at org.apache.manifoldcf.agents.AgentRun.main(AgentRun.java:93)
> This has a knock affect to the other agents which then eventually fail with 'agents process could not start - shutting down'.  
> Besides exceptions of this type:
> 5401 [main-SendThread(overlorddev03:2181)] INFO org.apache.zookeeper.ClientCnxn 
> - Opening socket connection to server overlorddev03/10.250.0.36:2181. Will not a
> ttempt to authenticate using SASL (unknown error)
> 5403 [main-SendThread(overlorddev03:2181)] WARN org.apache.zookeeper.ClientCnxn 
> - Session 0x0 for server null, unexpected error, closing socket connection and a
> ttempting reconnect
> java.net.ConnectException: Connection refused
>         at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>         at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:735
> )
>         at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocket
> NIO.java:350)
>         at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1068)
> 5506 [main-SendThread(overlorddev04:2181)] INFO org.apache.zookeeper.ClientCnxn - Opening socket connection to server overlorddev04/10.250.0.46:2181. Will not attempt to authenticate using SASL (unknown error)
> 5507 [main-SendThread(overlorddev04:2181)] INFO org.apache.zookeeper.ClientCnxn - Socket connection established to overlorddev04/10.250.0.46:2181, initiating session
> the only other notable exception is:
> 5509 [main-SendThread(overlorddev04:2181)] INFO org.apache.zookeeper.ClientCnxn 
> - Session establishment complete on server overlorddev04/10.250.0.46:2181, sessi
> onid = 0x4444f2cb0590087, negotiated timeout = 8000
> org.apache.manifoldcf.core.interfaces.ManifoldCFException: KeeperErrorCode = Con
> nectionLoss for /org.apache.manifoldcf.flags-_AGENTRUN_
>         at org.apache.manifoldcf.core.lockmanager.ZooKeeperConnection.checkGlobalFlag(ZooKeeperConnection.java:499)
>         at org.apache.manifoldcf.core.lockmanager.ZooKeeperLockManager.checkGlobalFlag(ZooKeeperLockManager.java:787)
>         at org.apache.manifoldcf.agents.system.AgentsDaemon.runAgents(AgentsDaemon.java:110)
>         at org.apache.manifoldcf.agents.AgentRun.doExecute(AgentRun.java:64)
>         at org.apache.manifoldcf.agents.BaseAgentsInitializationCommand.execute(BaseAgentsInitializationCommand.java:37)
>         at org.apache.manifoldcf.agents.AgentRun.main(AgentRun.java:93)



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)