You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@nifi.apache.org by markap14 <gi...@git.apache.org> on 2016/07/27 15:56:02 UTC

[GitHub] nifi pull request #729: NIFI-2406: Ensure that hearbeat monitor continues to...

GitHub user markap14 opened a pull request:

    https://github.com/apache/nifi/pull/729

    NIFI-2406: Ensure that hearbeat monitor continues to run while instance \u2026

    \u2026is running. This way if a node sends heartbeat to this node as elected coordinator changes, we notify the node accordingly. Handle Exceptions more gracefully in leader election code.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/markap14/nifi NIFI-2406

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/nifi/pull/729.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #729
    
----
commit e9f5a2182362f32e837d4eabd47e6d3604a37e55
Author: Mark Payne <ma...@hotmail.com>
Date:   2016-07-27T15:55:02Z

    NIFI-2406: Ensure that hearbeat monitor continues to run while instance is running. This way if a node sends heartbeat to this node as elected coordinator changes, we notify the node accordingly. Handle Exceptions more gracefully in leader election code.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] nifi issue #729: NIFI-2406: Ensure that hearbeat monitor continues to run wh...

Posted by YolandaMDavis <gi...@git.apache.org>.
Github user YolandaMDavis commented on the issue:

    https://github.com/apache/nifi/pull/729
  
    @markap14 thanks! have retested tried disconnecting nodes as well as restarting nodes in cluster.  Did see the error logged "This node was elected Leader for Role 'Cluster Coordinator' but failed to take leadership. Will relinquish leadership role." on one node with error thrown due to the node not being part of the cluster.  That was also followed up with the logged warning "Failed to determine which node is elected active Cluster Coordinator: ZooKeeper reports the address as 127.0.0.1:11001, but there is no node with this address".  This occurred during initial startup and appeared to resolve once the cluster coodinator (node 2, on port 11001) came online.
    
    I also shutdown node 2 (which was Cluster Coordinator) and noted the repeated "Failed to send heartbeat due to: org.apache.nifi.cluster.protocol.ProtocolException: Failed to send message to Cluster Coordinator due to: java.net.ConnectException: Connection refused" warning in node 1's log.  This was expected based on notes in Jira.  Node 1 was primary node at the time.  When node 2 was back online the heartbeat warning no longer appeared in node 1's log however I could not access node 1's UI (it was displaying an error demonstrated in the snapshot below).
    
    I've included just the nifi-app logs from both servers since it seemed to have the most relevant information. Would focus on logs with timestamps beginning at 13:35 (08-04-2016). Please let me know if you need any additional information or logs.
    
    
    <img width="1440" alt="node1-error-cc-restart" src="https://cloud.githubusercontent.com/assets/1371858/17412342/3f28890a-5a4a-11e6-8ff8-d3acd5a5b0b8.png">
    
    [logs.zip](https://github.com/apache/nifi/files/402267/logs.zip)
    
    
    
    



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] nifi issue #729: NIFI-2406: Ensure that hearbeat monitor continues to run wh...

Posted by markap14 <gi...@git.apache.org>.
Github user markap14 commented on the issue:

    https://github.com/apache/nifi/pull/729
  
    @YolandaMDavis I've been able to replicate this running a 2-node cluster. I tweaked a few things to address how nodes reconnect and ensure that they are in-sync with the rest of the cluster. I pushed as a separate commit so you can see what's changed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] nifi issue #729: NIFI-2406: Ensure that hearbeat monitor continues to run wh...

Posted by YolandaMDavis <gi...@git.apache.org>.
Github user YolandaMDavis commented on the issue:

    https://github.com/apache/nifi/pull/729
  
    @markap14 started review


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] nifi issue #729: NIFI-2406: Ensure that hearbeat monitor continues to run wh...

Posted by markap14 <gi...@git.apache.org>.
Github user markap14 commented on the issue:

    https://github.com/apache/nifi/pull/729
  
    @YolandaMDavis is this cluster running an embedded zookeeper? If so, which node(s)?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] nifi issue #729: NIFI-2406: Ensure that hearbeat monitor continues to run wh...

Posted by YolandaMDavis <gi...@git.apache.org>.
Github user YolandaMDavis commented on the issue:

    https://github.com/apache/nifi/pull/729
  
    @markap14 awesome will review as soon as possible thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] nifi issue #729: NIFI-2406: Ensure that hearbeat monitor continues to run wh...

Posted by markap14 <gi...@git.apache.org>.
Github user markap14 commented on the issue:

    https://github.com/apache/nifi/pull/729
  
    @YolandaMDavis thanks - PR i was rebased & pushed


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] nifi issue #729: NIFI-2406: Ensure that hearbeat monitor continues to run wh...

Posted by YolandaMDavis <gi...@git.apache.org>.
Github user YolandaMDavis commented on the issue:

    https://github.com/apache/nifi/pull/729
  
    @markap14 was able to rebase and resolve conflicts locally on FlowController (conflict was based on imports). Executed on small cluster was able to trigger exception on takeLeadership (listener.onLeaderElecetion) and that properly logged error instead of throwing exception. Did not see the messages detailed in log in [NIFI-2406](https://issues.apache.org/jira/browse/NIFI-2406) after trying different combinations of removing nodes and restoring. 
    
    +1 (pending rebase/merge)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] nifi issue #729: NIFI-2406: Ensure that hearbeat monitor continues to run wh...

Posted by YolandaMDavis <gi...@git.apache.org>.
Github user YolandaMDavis commented on the issue:

    https://github.com/apache/nifi/pull/729
  
    @markap14 thank you for this update. I executed my tests as previously described and ran into no issues. When disconnecting a node designated as cluster coordinator from both the UI and on the command line (through stop/restart) the remaining nodes behaves as expected. It properly displays an error to the user and nots in logs it's inability to determine a cluster coordinator. When the coordinator is back online the logs indicate successful reconnection and communication. Refresh of the screen properly demonstrates the flow and configured flow (which consists of use of remote process groups communicating with a standalone nifi node) operate successfully
    
    +1
    
    thank you @markap14 for the quick turnaround on this!



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] nifi pull request #729: NIFI-2406: Ensure that hearbeat monitor continues to...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/nifi/pull/729


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---