You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@flume.apache.org by "Ashish Paliwal (JIRA)" <ji...@apache.org> on 2014/11/05 11:17:34 UTC

[jira] [Resolved] (FLUME-629) DFO failure, stops buffering to disk, messages lost

     [ https://issues.apache.org/jira/browse/FLUME-629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ashish Paliwal resolved FLUME-629.
----------------------------------
       Resolution: Won't Fix
    Fix Version/s: v0.9.5

Won't fix. 0.X branch not maintained anymore

> DFO failure, stops buffering to disk, messages lost
> ---------------------------------------------------
>
>                 Key: FLUME-629
>                 URL: https://issues.apache.org/jira/browse/FLUME-629
>             Project: Flume
>          Issue Type: Bug
>          Components: Node
>    Affects Versions: v0.9.3
>            Reporter: Disabled imported user
>            Priority: Critical
>             Fix For: v0.9.5
>
>
> Single master
> agent: syslogTcp | agentE2EChain
> collector: collectorSource | collectorSink("hdfs://...")
> From reading through various logs, this is, I believe, the order of events:
> - NameNode crashed
> - This caused collector to fail writes to hdfs
> - Which in turn caused agents to start backing up and buffering on disk (correct so far)
> - WatchDog caught a crash and restarted the Flue Master
> - Eventually the DFO stops writing to disk but keeps trying to pass messages
> - ACKs continue to fail and eventually nothing is passed
> Disk space was fine throughout. We had another agent node which continued to operate normally during this period and buffered all messages as expected. Here's a snip of some of the relevant sections of log files:
> http://pastie.org/pastes/1883087/text?key=ouxaqhuodprfrmsailunw
> I can provide the full log files if they will be of use. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)