You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flume.apache.org by Jagadish Bihani <ja...@pubmatic.com> on 2013/05/29 16:12:17 UTC

HDFS sink data loss possible ?

Hi

Based on our observations on our production setup in flume:

We have seen file roll sink delivering almost 1% events greater than those
delivered by HDFS sink per day.
(We have replicating setup and two different
file channels for the sinks).

Configuration :
========
Flume version:1.3.1
Flume topology: 30 first tier machines and 3 second tier machines (which 
deliver to HDFS and local file system)
HDFS compression codec :lzop
Channels : File channel for every source-sink pair.
Hadoop version :1.0.3 (Apache Hadoop)

Things are working fine but we see some data loss in the HDFS (though 
not very huge
1 million in 1 billion events).

Is it possible in some scenario?  (Just to add datanodes of the hadoop 
cluster are highly loaded. Can that lead to any disaster?)

Regards,
Jagadish