You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@flume.apache.org by "Jonathan Hsieh (JIRA)" <ji...@apache.org> on 2011/08/11 11:12:27 UTC

[jira] [Updated] (FLUME-706) Flume nodes launch duplicate logical nodes

     [ https://issues.apache.org/jira/browse/FLUME-706?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Hsieh updated FLUME-706:
---------------------------------

    Attachment: 0001-FLUME-706-Flume-nodes-launch-duplicate-logical-nodes.patch

Eric's logs and initial logs were extremely helpful digging into this.  

I've attached a cut of my attempt to fix the problem based on the threading analysis in the previous comment.  This patch passes some the manual testing and I believe the fixes make sense.  

Caveat: It is not polished yet, and I have not written a new tests to check for this error condition, I have just kicked off a job to run  tests the full suite to make sure there are no new regressions.  

> Flume nodes launch duplicate logical nodes
> ------------------------------------------
>
>                 Key: FLUME-706
>                 URL: https://issues.apache.org/jira/browse/FLUME-706
>             Project: Flume
>          Issue Type: Bug
>          Components: Master, Node
>    Affects Versions: v0.9.5
>            Reporter: E. Sammer
>            Assignee: E. Sammer
>            Priority: Critical
>             Fix For: v0.9.5
>
>         Attachments: 0001-FLUME-706-Flume-nodes-launch-duplicate-logical-nodes.patch, FLUME-706.log
>
>
> When submitting a config command to the flume master, it seems as if the downstream node attempts to load the config twice.
> In a test case, starting a single master and a single node, I submitted a "config node rpcSource(12345) console". The node sees the config change on the next heartbeat and updates its config and starts the thrift source on port 12345. Immediately after, it logs "Taking another heartbeat" (DEBUG) and attempts to create another logical node with the same config. This leads to thrift errors in bind() and "Could not create ServerSocket on address ...". Looking at the root cause in a debugger (thrift swallows the original exception) I can see it's an "Address already in use" IOException.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira