You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@flume.apache.org by "Hari Shreedharan (JIRA)" <ji...@apache.org> on 2014/04/01 20:13:16 UTC

[jira] [Commented] (FLUME-2352) HDFSCompressedDataStream should support appendBatch

    [ https://issues.apache.org/jira/browse/FLUME-2352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13956835#comment-13956835 ] 

Hari Shreedharan commented on FLUME-2352:
-----------------------------------------

[~chenshangan521@163.com] - I like the idea but it is not possible to add a new method to the EventSerializer interface as it affects not just the built-in serializers, since it is not binary compatible with older code. There are custom serializers out there that would break if the interface is changed. If possible, can you try to do this without the change in the serializer interface.

> HDFSCompressedDataStream should support appendBatch
> ---------------------------------------------------
>
>                 Key: FLUME-2352
>                 URL: https://issues.apache.org/jira/browse/FLUME-2352
>             Project: Flume
>          Issue Type: Improvement
>          Components: Sinks+Sources
>    Affects Versions: v1.5.0
>            Reporter: chenshangan
>            Assignee: chenshangan
>             Fix For: v1.5.0
>
>         Attachments: FLUME-2352.patch
>
>
> compressing events in batch is much more efficient than compressing one by one.
> I set hdfs.batchSize to 200000, when I use appendBatch() in BucketWriter, the append operation cost less than 1 seconds, while one by one might cost 10 seconds.



--
This message was sent by Atlassian JIRA
(v6.2#6252)