You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@flume.apache.org by Gary Malouf <ma...@gmail.com> on 2014/08/21 02:11:33 UTC

Issues around HDFS and File Channels

My team has been battling severe problems with our Flume clusters on EC2
over the last several weeks - I wanted to share some of the pains we are
having and see if there are any configuration/deployment changes we should
be making.

We are on the Flume 1.4 release that is packaged with the CDH 5.0.2
distribution.  We've been in production with Flume for well over a year,
but have just in the last month or two seen more significant issues as our
traffic has ramped up

We have 2 sets of Flume 'clusters/configurations' - each instance has an
agent with 2 or more source-channel-sink combinations.  File channels are
configured for all of them as they allow us to buffer when downstream
issues occur (usually when replacing HDFS nodes - will touch on this at the
bottom).  All of the channels use capacities of 100000000 items (again,
using Flume as a buffer during downtime).

For reference, we have a 12 node HDFS cluster that these are writing to
with the default replication factor of 3.

An example channel configuration:

agent-1.channels.trdbuy-bid-req-ch1.type = file

agent-1.channels.trdbuy-bid-req-ch1.checkpointDir =
/opt/flume/trdbuy-req-ch1/checkpoint

agent-1.channels.trdbuy-bid-req-ch1.dataDirs =
/opt/flume/trdbuy-req-ch1/data

agent-1.channels.trdbuy-bid-req-ch1.capacity = 100000000

We use HDFS sinks for everything, an example configuration:

agent-1.sinks.hdfs-trdbuy-bid-req-sink1.channel = trdbuy-bid-req-ch1

agent-1.sinks.hdfs-trdbuy-bid-req-sink1.type = hdfs

agent-1.sinks.hdfs-trdbuy-bid-req-sink1.hdfs.path =
hdfs://nn-01:8020/flume/trading/buying/bidrequests/yr=%Y/mo=%m/d=%d/

agent-1.sinks.hdfs-trdbuy-bid-req-sink1.hdfs.filePrefix = %{host}s1

agent-1.sinks.hdfs-trdbuy-bid-req-sink1.hdfs.batchSize = 10000

agent-1.sinks.hdfs-trdbuy-bid-req-sink1.hdfs.rollInterval = 3600

agent-1.sinks.hdfs-trdbuy-bid-req-sink1.hdfs.rollCount = 0

agent-1.sinks.hdfs-trdbuy-bid-req-sink1.hdfs.rollSize = 0

agent-1.sinks.hdfs-trdbuy-bid-req-sink1.hdfs.idleTimeout = 1800



We are running into a few distinct problems:

1) Losing a single datanode causes our flume instances which are under
higher volumes (6k messages/second approximately) to be unable to write
(others seem to continue to hum along).  They do not seem to recover until
we restart the instance.  I think this exception is related:

20 Aug 2014 01:16:23,942 ERROR
[SinkRunner-PollingRunner-DefaultSinkProcessor]
(org.apache.flume.sink.hdfs.AbstractHDFSWriter.isUnderReplicated:82)
- Unexpected error while checking replication factor

java.lang.reflect.InvocationTargetException

        at sun.reflect.GeneratedMethodAccessor51.invoke(Unknown Source)

        at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

        at java.lang.reflect.Method.invoke(Method.java:601)

        at
org.apache.flume.sink.hdfs.AbstractHDFSWriter.getNumCurrentReplicas(AbstractHDFSWriter.java:147)

        at
org.apache.flume.sink.hdfs.AbstractHDFSWriter.isUnderReplicated(AbstractHDFSWriter.java:68)

        at
org.apache.flume.sink.hdfs.BucketWriter.shouldRotate(BucketWriter.java:505)

        at
org.apache.flume.sink.hdfs.BucketWriter.append(BucketWriter.java:440)

        at
org.apache.flume.sink.hdfs.HDFSEventSink.process(HDFSEventSink.java:401)

        at
org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:68)

        at
org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:147)

        at java.lang.Thread.run(Thread.java:722)

Caused by: java.io.IOException: Failed to replace a bad datanode on the
existing pipeline due to no more good datanodes being available to try.
(Nodes: current=[10.9.178.151:50010, 10.144.197.136:50010], original=[
10.9.178.151:50010, 10.144.197.136:50010]). The current failed datanode
replacement policy is DEFAULT, and a client may configure this via
'dfs.client.block.write.replace-datanode-on-failure.policy' in its
configuration.

        at
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.findNewDatanode(DFSOutputStream.java:960)

        at
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.addDatanode2ExistingPipeline(DFSOutputStream.java:1026)

        at
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1174)

        at
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:924)

        at
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:486)

2) Stopping a namenode (as expected) causes the Flume instances to not
write as well.  When the namenode comes back up, Flume does not recover
either.  It again needs a restart.

3) After restarting Flume in the case of either (1) or (2), it seems to
take an eternity (anywhere from 5-30 minutes based on the amount of data in
the channel) to open the source ports for listening.  During this time, a
single cpu on the machine is cranking but no data gets copied to HDFS.
 Once the ports are allowed to open, the data built up in the channel
starts to copy and the catch-up process begins.


This last point has caused us to consider other logging solutions, as our
system is stuck backed up waiting for Flume to figure itself out.  The
embedded agents (on the client/writer side) have protected us to date, but
they too can be subject to the same backup issues with file channels.

If there are any tips/obvious things we should be looking for and
modifying, we would really appreciate it.

Re: Issues around HDFS and File Channels

Posted by Hari Shreedharan <hs...@cloudera.com>.
Can you share the full logs which contain the replay taking so long? I'd 
like to see what is happening here. It is expected that with 100 million 
events in your channel, it might take some time to replay.

Gary Malouf wrote:
>
> My team has been battling severe problems with our Flume clusters on EC2
> over the last several weeks - I wanted to share some of the pains we are
> having and see if there are any configuration/deployment changes we should
> be making.
>
> We are on the Flume 1.4 release that is packaged with the CDH 5.0.2
> distribution. We've been in production with Flume for well over a year,
> but have just in the last month or two seen more significant issues as our
> traffic has ramped up
>
> We have 2 sets of Flume 'clusters/configurations' - each instance has an
> agent with 2 or more source-channel-sink combinations. File channels are
> configured for all of them as they allow us to buffer when downstream
> issues occur (usually when replacing HDFS nodes - will touch on this 
> at the
> bottom). All of the channels use capacities of 100000000 items (again,
> using Flume as a buffer during downtime).
>
> For reference, we have a 12 node HDFS cluster that these are writing to
> with the default replication factor of 3.
>
> An example channel configuration:
>
> agent-1.channels.trdbuy-bid-req-ch1.type = file
>
> agent-1.channels.trdbuy-bid-req-ch1.checkpointDir =
> /opt/flume/trdbuy-req-ch1/checkpoint
>
> agent-1.channels.trdbuy-bid-req-ch1.dataDirs =
> /opt/flume/trdbuy-req-ch1/data
>
> agent-1.channels.trdbuy-bid-req-ch1.capacity = 100000000
>
> We use HDFS sinks for everything, an example configuration:
>
> agent-1.sinks.hdfs-trdbuy-bid-req-sink1.channel = trdbuy-bid-req-ch1
>
> agent-1.sinks.hdfs-trdbuy-bid-req-sink1.type = hdfs
>
> agent-1.sinks.hdfs-trdbuy-bid-req-sink1.hdfs.path =
> hdfs://nn-01:8020/flume/trading/buying/bidrequests/yr=%Y/mo=%m/d=%d/
>
> agent-1.sinks.hdfs-trdbuy-bid-req-sink1.hdfs.filePrefix = %{host}s1
>
> agent-1.sinks.hdfs-trdbuy-bid-req-sink1.hdfs.batchSize = 10000
>
> agent-1.sinks.hdfs-trdbuy-bid-req-sink1.hdfs.rollInterval = 3600
>
> agent-1.sinks.hdfs-trdbuy-bid-req-sink1.hdfs.rollCount = 0
>
> agent-1.sinks.hdfs-trdbuy-bid-req-sink1.hdfs.rollSize = 0
>
> agent-1.sinks.hdfs-trdbuy-bid-req-sink1.hdfs.idleTimeout = 1800
>
>
>
> We are running into a few distinct problems:
>
> 1) Losing a single datanode causes our flume instances which are under
> higher volumes (6k messages/second approximately) to be unable to write
> (others seem to continue to hum along). They do not seem to recover until
> we restart the instance. I think this exception is related:
>
> 20 Aug 2014 01:16:23,942 ERROR
> [SinkRunner-PollingRunner-DefaultSinkProcessor]
> (org.apache.flume.sink.hdfs.AbstractHDFSWriter.isUnderReplicated:82)
> - Unexpected error while checking replication factor
>
> java.lang.reflect.InvocationTargetException
>
> at sun.reflect.GeneratedMethodAccessor51.invoke(Unknown Source)
>
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>
> at java.lang.reflect.Method.invoke(Method.java:601)
>
> at
> org.apache.flume.sink.hdfs.AbstractHDFSWriter.getNumCurrentReplicas(AbstractHDFSWriter.java:147)
>
> at
> org.apache.flume.sink.hdfs.AbstractHDFSWriter.isUnderReplicated(AbstractHDFSWriter.java:68)
>
> at
> org.apache.flume.sink.hdfs.BucketWriter.shouldRotate(BucketWriter.java:505)
>
> at
> org.apache.flume.sink.hdfs.BucketWriter.append(BucketWriter.java:440)
>
> at
> org.apache.flume.sink.hdfs.HDFSEventSink.process(HDFSEventSink.java:401)
>
> at
> org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:68)
>
> at
> org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:147)
>
> at java.lang.Thread.run(Thread.java:722)
>
> Caused by: java.io.IOException: Failed to replace a bad datanode on the
> existing pipeline due to no more good datanodes being available to try.
> (Nodes: current=[10.9.178.151:50010, 10.144.197.136:50010], original=[
> 10.9.178.151:50010, 10.144.197.136:50010]). The current failed datanode
> replacement policy is DEFAULT, and a client may configure this via
> 'dfs.client.block.write.replace-datanode-on-failure.policy' in its
> configuration.
>
> at
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.findNewDatanode(DFSOutputStream.java:960)
>
> at
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.addDatanode2ExistingPipeline(DFSOutputStream.java:1026)
>
> at
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1174)
>
> at
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:924)
>
> at
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:486)
>
> 2) Stopping a namenode (as expected) causes the Flume instances to not
> write as well. When the namenode comes back up, Flume does not recover
> either. It again needs a restart.
>
> 3) After restarting Flume in the case of either (1) or (2), it seems to
> take an eternity (anywhere from 5-30 minutes based on the amount of 
> data in
> the channel) to open the source ports for listening. During this time, a
> single cpu on the machine is cranking but no data gets copied to HDFS.
> Once the ports are allowed to open, the data built up in the channel
> starts to copy and the catch-up process begins.
>
>
> This last point has caused us to consider other logging solutions, as our
> system is stuck backed up waiting for Flume to figure itself out. The
> embedded agents (on the client/writer side) have protected us to date, but
> they too can be subject to the same backup issues with file channels.
>
> If there are any tips/obvious things we should be looking for and
> modifying, we would really appreciate it.