You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flume.apache.org by Adam Higginson <Ad...@bjss.com> on 2014/07/07 17:55:53 UTC

Embedded Agent File Channel Performance

Hi,

I'm currently investigating the suitability of using an Embedded Agent with a file based channel which will be used to write to another agent with a file based channel, and then ultimately into a hdfs sequence file. I've done some early testing in a local environment (with a VM acting as a small Hadoop set up with a flume agent running on it), and found that using a file based channel is very slow when compared to the memory channel. We're currently writing around 150,000 - 200,000 messages/sec (each message ranging from a few hundred bytes up to 6KB), and this is achieved by writing directly to a Sequence File using Hadoop's File System API. However, I've read that the best we could hope for (on a single channel) is around 8000 events/sec, with each event being around 2KB. I believe this was achieved by having  the checkpoint file on one disk and use other disks for the data directories. Is this performance the best we can get for on a single machine with a file based channel in an Embedded Agent?

Thanks and kind regards,

Adam
The information included in this email and any files transmitted with it may contain information that is confidential and it must not be used by, or its contents or attachments copied or disclosed, to persons other than the intended addressee. If you have received this email in error, please notify BJSS. In the absence of written agreement to the contrary BJSS' relevant standard terms of contract for any work to be undertaken will apply. Please carry out virus or such other checks as you consider appropriate in respect of this email. BJSS do not accept responsibility for any adverse effect upon your system or data in relation to this email or any files transmitted with it. BJSS Limited, a company registered in England and Wales (Company Number 2777575), VAT Registration Number 613295452, Registered Office Address, First Floor, Coronet House, Queen Street, Leeds, LS1 2TW

Re: Embedded Agent File Channel Performance

Posted by Roshan Naik <ro...@hortonworks.com>.
AFAIKT, checkpointing function is synchronized so that no other updates can
happen. So putting checkpoint & data on separate disk may not buy much.
Splitting datadirs into multiple disks would (use comma separated list of
dirs i think) should help more.
 Can you work with mem channel or spillable channel ?


On Mon, Jul 7, 2014 at 8:55 AM, Adam Higginson <Ad...@bjss.com>
wrote:

>  Hi,
>
>
>
> I’m currently investigating the suitability of using an Embedded Agent
> with a file based channel which will be used to write to another agent with
> a file based channel, and then ultimately into a hdfs sequence file. I’ve
> done some early testing in a local environment (with a VM acting as a small
> Hadoop set up with a flume agent running on it), and found that using a
> file based channel is very slow when compared to the memory channel. We’re
> currently writing around 150,000 – 200,000 messages/sec (each message
> ranging from a few hundred bytes up to 6KB), and this is achieved by
> writing directly to a Sequence File using Hadoop’s File System API.
> However, I’ve read that the best we could hope for (on a single channel) is
> around 8000 events/sec, with each event being around 2KB. I believe this
> was achieved by having  the checkpoint file on one disk and use other disks
> for the data directories. Is this performance the best we can get for on a
> single machine with a file based channel in an Embedded Agent?
>
>
>
> Thanks and kind regards,
>
>
>
> Adam
>  The information included in this email and any files transmitted with it
> may contain information that is confidential and it must not be used by, or
> its contents or attachments copied or disclosed, to persons other than the
> intended addressee. If you have received this email in error, please notify
> BJSS. In the absence of written agreement to the contrary BJSS' relevant
> standard terms of contract for any work to be undertaken will apply. Please
> carry out virus or such other checks as you consider appropriate in respect
> of this email. BJSS do not accept responsibility for any adverse effect
> upon your system or data in relation to this email or any files transmitted
> with it. BJSS Limited, a company registered in England and Wales (Company
> Number 2777575), VAT Registration Number 613295452, Registered Office
> Address, First Floor, Coronet House, Queen Street, Leeds, LS1 2TW
>

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.