You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flume.apache.org by Guillermo Ortiz <ko...@gmail.com> on 2016/02/02 09:38:11 UTC

Problems performance with FileChannel and HDFS Sink.

Hello,

I have some problems with the performance of HDFS Sink. I only have one
sink and one file channel.

I thought to increase the number of sinks for my channel, but I saw as well
the parameter threadsPoolSize. What's the different between this parameter
and create more sinks?

I guess that it should be a group of sinks, but I read this in another
thread:
"You can add more sinks to your config.
Don't put them in a sink group just have multiple sinks pulling from the
same channel. This should increase your throughput." as answer to other
question similar to mine.

Could someone explain me a little bit this??

Re: Problems performance with FileChannel and HDFS Sink.

Posted by Roshan Naik <ro...@hortonworks.com>.
Take a look at this. It might help.
https://cwiki.apache.org/confluence/display/FLUME/Performance+Measurements+-+round+2


From: Gonzalo Herreros <gh...@gmail.com>>
Reply-To: "user@flume.apache.org<ma...@flume.apache.org>" <us...@flume.apache.org>>
Date: Tuesday, February 2, 2016 at 8:42 AM
To: user <us...@flume.apache.org>>
Subject: Re: Problems performance with FileChannel and HDFS Sink.

I don't know the internal details but I guess all those threads write to a single file, so it will reach a point where there is no improvement.
On the other side having multiple sinks will create multiple files, which should scale better but you need to make sure the files are written in different folders or pattern, which could be an inconvenience having events for the same period in multiple files.

Regards,
Gonzalo

On 2 February 2016 at 08:38, Guillermo Ortiz <ko...@gmail.com>> wrote:
Hello,

I have some problems with the performance of HDFS Sink. I only have one sink and one file channel.

I thought to increase the number of sinks for my channel, but I saw as well the parameter threadsPoolSize. What's the different between this parameter and create more sinks?

I guess that it should be a group of sinks, but I read this in another thread:
"You can add more sinks to your config.
Don't put them in a sink group just have multiple sinks pulling from the same channel. This should increase your throughput." as answer to other question similar to mine.

Could someone explain me a little bit this??



Re: Problems performance with FileChannel and HDFS Sink.

Posted by Gonzalo Herreros <gh...@gmail.com>.
I don't know the internal details but I guess all those threads write to a
single file, so it will reach a point where there is no improvement.
On the other side having multiple sinks will create multiple files, which
should scale better but you need to make sure the files are written in
different folders or pattern, which could be an inconvenience having events
for the same period in multiple files.

Regards,
Gonzalo

On 2 February 2016 at 08:38, Guillermo Ortiz <ko...@gmail.com> wrote:

> Hello,
>
> I have some problems with the performance of HDFS Sink. I only have one
> sink and one file channel.
>
> I thought to increase the number of sinks for my channel, but I saw as
> well the parameter threadsPoolSize. What's the different between this
> parameter and create more sinks?
>
> I guess that it should be a group of sinks, but I read this in another
> thread:
> "You can add more sinks to your config.
> Don't put them in a sink group just have multiple sinks pulling from the
> same channel. This should increase your throughput." as answer to other
> question similar to mine.
>
> Could someone explain me a little bit this??
>
>