You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flume.apache.org by Christopher Shannon <cs...@gmail.com> on 2013/11/11 16:16:30 UTC

Avro 2 Avro transfer on Flume 1.3 screamingly slow

Hello,
I am trying to get a decent throughput rate of transfer for an Avro sink to
Avro Source transfer, but the best I've been able to get is about 300
records a minute. This
thread<http://mail-archives.apache.org/mod_mbox/flume-user/201309.mbox/%3cCAHsgzwvDBDvpzR4-19DsJjTokSv0BTQrQMRn5gRJ87LcmjK7iQ@mail.gmail.com%3e>on
the mailing list suggests that hardware and disk partitioning is the
main consideration to achieving good throughput.
My setup is simple:
*Agent 1:*
Source: SpoolingDirectory
Channel: Memory
Sink: Avro
*Agent 2:*
Source: Avro
Channel: Memory
Sink: HDFS
So, what else needs to be done to optimize the data transfer?
All the best,
Chris
p.s. Anticipating suggestion that I upgrade to Flume 1.4, I work for a big
company that has acquired Flume bundled through another product from an
even bigger company. They will not consider an upgrade at this time. (And
given the very slow throughput I'm seeing, I would think that 1.3 can do a
lot better.)

Re: Avro 2 Avro transfer on Flume 1.3 screamingly slow

Posted by Hari Shreedharan <hs...@cloudera.com>.
Are you sure it is not the spooling directory source that is not able to write more data. What makes you think that it is avro sink to source that tis the issue? It just might be that your source is not writing data fast enough or your channel is getting full too fast and the channel is backing off too often? 


Thanks,
Hari


On Monday, November 11, 2013 at 7:16 AM, Christopher Shannon wrote:

> Hello,
> 
> I am trying to get a decent throughput rate of transfer for an Avro sink to Avro Source transfer, but the best I've been able to get is about 300 records a minute. This thread (http://mail-archives.apache.org/mod_mbox/flume-user/201309.mbox/%3cCAHsgzwvDBDvpzR4-19DsJjTokSv0BTQrQMRn5gRJ87LcmjK7iQ@mail.gmail.com%3e) on the mailing list suggests that hardware and disk partitioning is the main consideration to achieving good throughput.
> 
> My setup is simple:
> 
> Agent 1:
> Source: SpoolingDirectory
> Channel: Memory
> Sink: Avro
> 
> Agent 2:
> Source: Avro
> Channel: Memory
> Sink: HDFS
> 
> So, what else needs to be done to optimize the data transfer?
> 
> All the best,
> 
> Chris
> 
> p.s. Anticipating suggestion that I upgrade to Flume 1.4, I work for a big company that has acquired Flume bundled through another product from an even bigger company. They will not consider an upgrade at this time. (And given the very slow throughput I'm seeing, I would think that 1.3 can do a lot better.)
> 
> 
>