You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flume.apache.org by Balasubramanian Jayaraman <ba...@autodesk.com> on 2015/05/20 05:51:15 UTC

Reg S3 Flume HDFS SINK Compression

Hi,

I am trying to write the flume events in Amaozn S3.The events written in S3 is in compressed format. My Flume configuration is given below. I am facing a data loss. Based on the configuration given below, if I publish 20000 events, I receive only 1000 events and all other data is lost. But When I disable the rollcount, rollSize and rollInterval configurations, all the events are received but there are 2000 small files created. Is there any wrong in my configuration settings? Should I add any other configurations?

    injector.sinks.s3_3store.type = hdfs
    injector.sinks.s3_3store.channel = disk_backed4
    injector.sinks.s3_3store.hdfs.fileType = CompressedStream
    injector.sinks.s3_3store.hdfs.codeC = gzip
    injector.sinks.s3_3store.hdfs.serializer = TEXT
    injector.sinks.s3_3store.hdfs.path = s3n://CID:SecretKey@bucketName/dth=%Y-%m-%d-%H
    injector.sinks.s3_1store.hdfs.filePrefix = events-%{receiver}
    # Roll when files reach 256M or after 10m, whichever comes first
    injector.sinks.s3_3store.hdfs.rollCount = 0
    injector.sinks.s3_3store.hdfs.idleTimeout = 600
    injector.sinks.s3_3store.hdfs.rollSize = 268435456
    #injector.sinks.s3_3store.hdfs.rollInterval = 3600
    # Flush data to buckets every 1k events
    injector.sinks.s3_3store.hdfs.batchSize = 10000

Thanks
Bala