You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@flume.apache.org by Michael Luban <mi...@gmail.com> on 2011/10/17 16:37:09 UTC

HDFS Failover sink

Flume-users,

In the event of an HDFS failure, I would like to durably fail events over to
the local collector disk.  To that end, I've configured a failover sink in
the following manner :

config [logicalNodeName, rpcSource(54002), < lazyOpen stubbornAppend
collector(60000)
{escapedCustomDfs("hdfs://namenode/user/flume/%Y-%m-%d","send-%{rolltag}")}
? diskFailover insistentOpen stubbornAppend collector(60000)
{escapedCustomDfs("hdfs://namenode/user/flume/%Y-%m-%d","send-%{rolltag}")}
>]

I mock an HDFS connection failure by setting the directory permissions
on /user/flume/%Y-%m-%d
to readonly while the events are streaming.

Examining the log in such a case, however, it looks that although the sink
keeps retrying HDFS per the backoff policy:

2011-10-16 23:25:19,375 INFO
com.cloudera.flume.handlers.debug.InsistentAppendDecorator: append attempt 9
failed, backoff (60000ms):
org.apache.hadoop.security.AccessControlException: Permission denied:
user=flume, access=WRITE

and a sequence failover file is created locally:

2011-10-16 23:25:20,644 INFO
com.cloudera.flume.handlers.hdfs.SeqfileEventSink: constructed new seqfile
event sink:
file=/tmp/flume-flume/agent/logicalNodeName/dfo_writing/20111016-232520644-0600.9362465244700638.00007977
2011-10-16 23:25:20,644 INFO
com.cloudera.flume.agent.diskfailover.NaiveFileFailoverManager: opening new
file for 20111016-232510634-0600.9362455234272014.00007977

The sequence file is, in fact, empty and events seem to be merely queued up
in memory rather than on disk.

Is this a valid use case?  This might be overly cautious, but I would like
to persist events durably and prevent the logical node from queuing events
in memory in the event of HDFS connection failure.

Re: HDFS Failover sink

Posted by Chetan Sarva <cs...@evidon.com>.

The best practice approach to handling this type of failure is to do it on
the agent where the event is being generated using the agentSink
(agentE2ESink or agentE2EChain) connected to a collectorSource/Sink which
then writes to HDFS. This will cause your events to be written on the agent
node. See section 4.1 in the user guide for more info:

http://archive.cloudera.com/cdh/3/flume/UserGuide/index.html#_using_default_values

On Mon, Oct 17, 2011 at 10:37 AM, Michael Luban <mi...@gmail.com>wrote:

> Flume-users,
>
> In the event of an HDFS failure, I would like to durably fail events over
> to the local collector disk.  To that end, I've configured a failover sink
> in the following manner :
>
> config [logicalNodeName, rpcSource(54002), < lazyOpen stubbornAppend
> collector(60000)
> {escapedCustomDfs("hdfs://namenode/user/flume/%Y-%m-%d","send-%{rolltag}")}
> ? diskFailover insistentOpen stubbornAppend collector(60000)
> {escapedCustomDfs("hdfs://namenode/user/flume/%Y-%m-%d","send-%{rolltag}")}
> >]
>
> I mock an HDFS connection failure by setting the directory permissions on /user/flume/%Y-%m-%d
> to readonly while the events are streaming.
>
> Examining the log in such a case, however, it looks that although the sink
> keeps retrying HDFS per the backoff policy:
>
> 2011-10-16 23:25:19,375 INFO
> com.cloudera.flume.handlers.debug.InsistentAppendDecorator: append attempt 9
> failed, backoff (60000ms):
> org.apache.hadoop.security.AccessControlException: Permission denied:
> user=flume, access=WRITE
>
> and a sequence failover file is created locally:
>
> 2011-10-16 23:25:20,644 INFO
> com.cloudera.flume.handlers.hdfs.SeqfileEventSink: constructed new seqfile
> event sink:
> file=/tmp/flume-flume/agent/logicalNodeName/dfo_writing/20111016-232520644-0600.9362465244700638.00007977
> 2011-10-16 23:25:20,644 INFO
> com.cloudera.flume.agent.diskfailover.NaiveFileFailoverManager: opening new
> file for 20111016-232510634-0600.9362455234272014.00007977
>
> The sequence file is, in fact, empty and events seem to be merely queued up
> in memory rather than on disk.
>
> Is this a valid use case?  This might be overly cautious, but I would like
> to persist events durably and prevent the logical node from queuing events
> in memory in the event of HDFS connection failure.
>
>
>