You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@flume.apache.org by "Jonathan Hsieh (JIRA)" <ji...@apache.org> on 2011/09/08 08:39:29 UTC

[jira] [Commented] (FLUME-745) Fix Race condition in NaiveFileWALDeco and retransmit logic

    [ https://issues.apache.org/jira/browse/FLUME-745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13100105#comment-13100105 ] 

Jonathan Hsieh commented on FLUME-745:
--------------------------------------

The unit test that beats up on the synchronization and potential race can be run manually for by running the test using 

(execute until 10000k messages and rotations handled).
'flume class com.cloudera.flume.agent.durability.TestFlumeNodeWALNotifierRacy 100000'

The test will attempt to inject retry attempts every 10ms.  

> Fix Race condition in NaiveFileWALDeco and retransmit logic
> -----------------------------------------------------------
>
>                 Key: FLUME-745
>                 URL: https://issues.apache.org/jira/browse/FLUME-745
>             Project: Flume
>          Issue Type: Bug
>    Affects Versions: v0.9.5
>            Reporter: Jonathan Hsieh
>            Assignee: Jonathan Hsieh
>              Labels: wal
>             Fix For: v0.9.5
>
>         Attachments: 0001-FLUME-745-Race-condition-in-NaiveFileWALDeco-and-ret.patch
>
>
> There is a race condition in state transtiions that happen in the NaiveFileWALDeco and retransmits.  This condition is fairly rare but when it occurs it cause an agent or collector to hang.  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira