You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@flume.apache.org by "Hari Shreedharan (JIRA)" <ji...@apache.org> on 2014/09/13 02:21:34 UTC

[jira] [Commented] (FLUME-2352) HDFSCompressedDataStream should support appendBatch

    [ https://issues.apache.org/jira/browse/FLUME-2352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14132358#comment-14132358 ] 

Hari Shreedharan commented on FLUME-2352:
-----------------------------------------

This seems like a good idea. I did a quick review and it looks good. Since the serializer is the same for the life of the sink, we don't need to do an instanceOf check every time we write an event. We only need to do it once and reuse this info. We should fix that.

> HDFSCompressedDataStream should support appendBatch
> ---------------------------------------------------
>
>                 Key: FLUME-2352
>                 URL: https://issues.apache.org/jira/browse/FLUME-2352
>             Project: Flume
>          Issue Type: Improvement
>          Components: Sinks+Sources
>    Affects Versions: v1.5.0
>            Reporter: chenshangan
>            Assignee: chenshangan
>             Fix For: v1.6.0
>
>         Attachments: FLUME-2352.patch
>
>
> compressing events in batch is much more efficient than compressing one by one.
> I set hdfs.batchSize to 200000, when I use appendBatch() in BucketWriter, the append operation cost less than 1 seconds, while one by one might cost 10 seconds.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)