You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flume.apache.org by Jagadish Bihani <ja...@pubmatic.com> on 2013/05/29 16:12:17 UTC
HDFS sink data loss possible ?
Hi
Based on our observations on our production setup in flume:
We have seen file roll sink delivering almost 1% events greater than those
delivered by HDFS sink per day.
(We have replicating setup and two different
file channels for the sinks).
Configuration :
========
Flume version:1.3.1
Flume topology: 30 first tier machines and 3 second tier machines (which
deliver to HDFS and local file system)
HDFS compression codec :lzop
Channels : File channel for every source-sink pair.
Hadoop version :1.0.3 (Apache Hadoop)
Things are working fine but we see some data loss in the HDFS (though
not very huge
1 million in 1 billion events).
Is it possible in some scenario? (Just to add datanodes of the hadoop
cluster are highly loaded. Can that lead to any disaster?)
Regards,
Jagadish