You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flume.apache.org by "Meyer, Dennis" <de...@adtech.com> on 2011/11/25 18:46:25 UTC

Flume Performance - experiencing bad throughput

Hi,

We're trying to use Flume in a special way:
We are using asystem, that logs data using TCP connection to a logging box. This box we want to replace by a self-written custom source, that gets in a packet via TCP that itself contains binary payload encoded in AVRO (and no, there's no better way we can change this to a more Flume way ;-)

Problem:

We have a DEV system on 3 nodes x  2Core VM with 8GB Memory,  one Master, the TCP customSource node and an HDFS Sink node. We're testing with a 3MB and a 120MB logfile that has rather small logs in it (resulting in many messages each approx 10kB of data).

With the 3MB we get a throughput rate of a little less than 1MB/sec, with the 120MB file it's much less! The reason seems to be the CPU on the first node (the one getting the TCP messages in). It climbs to 100% and sends messages for a long time with lower throughput. We are using the byteArray input and passing in the AVRO payload to a flume event.

Questions:

1) What could be the case?
2) What throughput rate should Flume be able to handle – we want to pipe far more data through the system?
3) Might there be any compression going on that doesn't make sense as the binary AVRO payload is already compressed itself?
4) Is there any performance optimized config around for such kind of messaging (unfortunately we cannot go for fire and forget and need a reliable transport)?

Thanks,
Dennis

Re: Flume Performance - experiencing bad throughput

Posted by "Meyer, Dennis" <de...@adtech.com>.
Just to not keep this thing unanswered forever in the big brothers caches:

1) We got around the issues by installing source and sink on same node, so looks like an networking issue we had in our environment.
2) We see message rates of 40k messages per second in our QA system using one node (one message packed by one binary AVRO payload – all AVRO data < 100kB)

Hope this might help somebody in looking for some numbers.

Von: Dennis Meyer <de...@adtech.com>>
Antworten an: <fl...@incubator.apache.org>>
Datum: Fri, 25 Nov 2011 17:46:25 +0000
An: "flume-user@incubator.apache.org<ma...@incubator.apache.org>" <fl...@incubator.apache.org>>
Betreff: Flume Performance - experiencing bad throughput

Hi,

We're trying to use Flume in a special way:
We are using asystem, that logs data using TCP connection to a logging box. This box we want to replace by a self-written custom source, that gets in a packet via TCP that itself contains binary payload encoded in AVRO (and no, there's no better way we can change this to a more Flume way ;-)

Problem:

We have a DEV system on 3 nodes x  2Core VM with 8GB Memory,  one Master, the TCP customSource node and an HDFS Sink node. We're testing with a 3MB and a 120MB logfile that has rather small logs in it (resulting in many messages each approx 10kB of data).

With the 3MB we get a throughput rate of a little less than 1MB/sec, with the 120MB file it's much less! The reason seems to be the CPU on the first node (the one getting the TCP messages in). It climbs to 100% and sends messages for a long time with lower throughput. We are using the byteArray input and passing in the AVRO payload to a flume event.

Questions:

1) What could be the case?
2) What throughput rate should Flume be able to handle – we want to pipe far more data through the system?
3) Might there be any compression going on that doesn't make sense as the binary AVRO payload is already compressed itself?
4) Is there any performance optimized config around for such kind of messaging (unfortunately we cannot go for fire and forget and need a reliable transport)?

Thanks,
Dennis