You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@flume.apache.org by "chenshangan (JIRA)" <ji...@apache.org> on 2014/04/02 10:38:18 UTC

[jira] [Comment Edited] (FLUME-2352) HDFSCompressedDataStream should support appendBatch

    [ https://issues.apache.org/jira/browse/FLUME-2352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13957466#comment-13957466 ] 

chenshangan edited comment on FLUME-2352 at 4/2/14 8:38 AM:
------------------------------------------------------------

batchSize:200000

append in batch
|all| take| append| sync|
|50254.6| 47443.7| 815.389| 885.778|

append one by one 
|all| take| append| sync|
|52779.9| 29259|18647.8|2243.67|

indicator explain:
all: overall time processing a batch
take: time cost in taking events from channel
append: time cost in append op
sync:time cost in flush

append op time significantly decreases, but the take time increases, I don't know why.
Another experiment is that send a large file and calculate the whole time, batch-append only cost half the time of one-by-one.


was (Author: chenshangan521@163.com):
append in batch
|all| take| append| sync|
|50254.6| 47443.7| 815.389| 885.778|

append one by one 
|all| take| append| sync|
|52779.9| 29259|18647.8|2243.67|

batchSize:200000
all: overall time processing a batch
take: time cost in taking events from channel
append: time cost in append op
sync:time cost in flush

append op time significantly decreases, but the take time increases, I don't know why.
Another experiment is that send a large file and calculate the whole time, batch-append only cost half the time of one-by-one.

> HDFSCompressedDataStream should support appendBatch
> ---------------------------------------------------
>
>                 Key: FLUME-2352
>                 URL: https://issues.apache.org/jira/browse/FLUME-2352
>             Project: Flume
>          Issue Type: Improvement
>          Components: Sinks+Sources
>    Affects Versions: v1.5.0
>            Reporter: chenshangan
>            Assignee: chenshangan
>             Fix For: v1.5.0
>
>         Attachments: FLUME-2352.patch
>
>
> compressing events in batch is much more efficient than compressing one by one.
> I set hdfs.batchSize to 200000, when I use appendBatch() in BucketWriter, the append operation cost less than 1 seconds, while one by one might cost 10 seconds.



--
This message was sent by Atlassian JIRA
(v6.2#6252)