You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flume.apache.org by Jagadish Bihani <ja...@pubmatic.com> on 2012/09/04 12:50:27 UTC

Flume netcat source related problems

Hi

I encountered an problem in my scenario with netcat source. Setup is
Host A: Netcat source -file channel -avro sink
Host B: Avro source - file channel - HDFS sink
But to simplify it I have created a single agent with "Netcat Source" 
and "file roll sink"*
*It is *:
*Host A: Netcat source - file channel - File_roll sink

*Problem*:
1. To simulate the our production scenario. I have created a script 
which runs for 15 sec and in the
while loop writes requests netcat source on a given port. For a large 
value of the sleep events are
delivered correctly to the destination. But as I reduce the delay events 
are given to the source but they
are not delivered to the destination. e.g. I write 9108 records within 
15 sec using script and only 1708
got delivered. And I don't get any exception. If it is flow control 
related problem then I should have seen
some exception in agent logs. But with file channel and huge disk space, 
is it a problem?

***Machine Configuration:*
RAM : 8 GB
JVM : 200 MB
CPU: 2.0 GHz Quad core processor

*Flume Agent Confi**guration*
adServerAgent.sources = netcatSource
adServerAgent.channels = fileChannel memoryChannel
adServerAgent.sinks = fileSink

# For each one of the sources, the type is defined
adServerAgent.sources.netcatSource.type = netcat
adServerAgent.sources.netcatSource.bind = 10.0.17.231
adServerAgent.sources.netcatSource.port = 55355

# The channel can be defined as follows.
adServerAgent.sources.netcatSource.channels = fileChannel
#adServerAgent.sources.netcatSource.channels = memoryChannel

# Each sink's type must be defined
adServerAgent.sinks.fileSink.type = file_roll
adServerAgent.sinks.fileSink.sink.directory = /root/flume/flume_sink

#Specify the channel the sink should use
#adServerAgent.sinks.fileSink.channel = memoryChannel
adServerAgent.sinks.fileSink.channel = fileChannel

adServerAgent.channels.memoryChannel.type =memory
adServerAgent.channels.memoryChannel.capacity = 100000
adServerAgent.channels.memoryChannel.transactionCapacity = 10000

adServerAgent.channels.fileChannel.type=file
adServerAgent.channels.fileChannel.dataDirs=/root/jagadish/flume_channel1/dataDir3
adServerAgent.channels.fileChannel.checkpointDir=/root/jagadish/flume_channel1/checkpointDir3**

*Script  snippet being used:*
...
eval
{
         local $SIG{ALRM} = sub { die "alarm\n"; };
         alarm $TIMEOUT;
         my $i=0;
         my $str = "";
         my $counter=1;
         while(1)
         {
                         $str = "";
                         for($i=0; $i < $NO_ELE_PER_ROW; $i++)
                         {
                                 $str .= $counter."\t";
                                 $counter++;
                         }
                         chop($str);
                         #print $socket "$str\n";
                         $socket->send($str."\n") or die "Didn't send";

                         if($? != 0)
                         {
                                 print "Failed for $str \n";
                         }
                         print "$str\n";
                         Time::HiRes::usleep($SLEEP_TIME);
         }
         alarm 0;
};
if ($@) {
......

- Script is working fine as for the very large delay all events are 
getting transmitted correctly.*
*- Same problem occurs with memory channel too but with lower values of 
sleep.*

**Problem 2:*
-- With this setup I am getting very low throughput i.e. I am able to 
transfer only ~ 1 KB/sec data
to the destination file sink. Similar performance was achieved using 
HDFS sink.
-- I had tried increasing batch sizes in my original scenario without 
much gain in throughput.
-- I had seen using 'tail -F' as source almost 10 times better throughput.
-- Is there any tunable parameter for netcat source?

Please help me in above 2 cases - i)netcat source use  cases
ii) Typical flume's expected throughput with file channel and file/HDFS 
sink on the single machine.

Regards,
Jagadish

Re: Flume netcat source related problems

Posted by Juhani Connolly <ju...@cyberagent.co.jp>.
Would you be able to attach using jmx with jconsole(or similar) and 
check out the numbers you are getting for events delivered/number of 
batches(there are beans exposing these values for sink/channel/source)?

If I can, I'll try to recreate your scenario when I have some time, but 
that's not happening right now, sorry

On 09/05/2012 03:05 PM, Jagadish Bihani wrote:
> Hi Juhani
>
> Thanks for the inputs.
> I did following changes:
> -- I sent my string event to socket with batches of  1000 & 10000 of 
> such events.
> -- I have also started using DEBUG log level for flume agent.
> -- I have also increased max-line-length property of netcat source 
> from default 512.
> But both problems remained. Events got lost without any exception.
> And performance also didn't get improve much (from 1 KB/sec now it's 
> 1.3 KB/sec apprx).
> Is there anything else to be considered?
>
> Regards,
> Jagadish
>
>
> On 09/04/2012 04:40 PM, Juhani Connolly wrote:
>> Hi Jagadish,
>>
>> NetcatSource doesn't use any batching when receiving events. It 
>> writes one event at a time, and that translates in the FileChannel to 
>> a flush to disk, so when you're writing many, your disk just won't 
>> keep up. One way to improve this is to use separate physical disks 
>> for your checkpoint/data directories.
>>
>> TailSource used to have the same problem until we added batching to 
>> it. By a cursory examination of NetcatSource, it looks to me like you 
>> can also force some batching by sending multiple lines in each 
>> socket->send.
>>
>> As to the first problem with lines going missing, I'm not entirely 
>> sure as I can't dive deeply into the source right now. I wouldn't be 
>> surprised if it's some kind of congestion problem and lack of 
>> logging(or your log levels are just too high, try switching them to 
>> INFO or DEBUG?) that will be resolved once you get the throughput up.
>>
>> On 09/04/2012 07:50 PM, Jagadish Bihani wrote:
>>> Hi
>>>
>>> I encountered an problem in my scenario with netcat source. Setup is
>>> Host A: Netcat source -file channel -avro sink
>>> Host B: Avro source - file channel - HDFS sink
>>> But to simplify it I have created a single agent with "Netcat 
>>> Source" and "file roll sink"*
>>> *It is *:
>>> *Host A: Netcat source - file channel - File_roll sink
>>>
>>> *Problem*:
>>> 1. To simulate the our production scenario. I have created a script 
>>> which runs for 15 sec and in the
>>> while loop writes requests netcat source on a given port. For a 
>>> large value of the sleep events are
>>> delivered correctly to the destination. But as I reduce the delay 
>>> events are given to the source but they
>>> are not delivered to the destination. e.g. I write 9108 records 
>>> within 15 sec using script and only 1708
>>> got delivered. And I don't get any exception. If it is flow control 
>>> related problem then I should have seen
>>> some exception in agent logs. But with file channel and huge disk 
>>> space, is it a problem?
>>>
>>> *Machine Configuration:*
>>> RAM : 8 GB
>>> JVM : 200 MB
>>> CPU: 2.0 GHz Quad core processor
>>>
>>> *Flume Agent Confi**guration*
>>> adServerAgent.sources = netcatSource
>>> adServerAgent.channels = fileChannel memoryChannel
>>> adServerAgent.sinks = fileSink
>>>
>>> # For each one of the sources, the type is defined
>>> adServerAgent.sources.netcatSource.type = netcat
>>> adServerAgent.sources.netcatSource.bind = 10.0.17.231
>>> adServerAgent.sources.netcatSource.port = 55355
>>>
>>> # The channel can be defined as follows.
>>> adServerAgent.sources.netcatSource.channels = fileChannel
>>> #adServerAgent.sources.netcatSource.channels = memoryChannel
>>>
>>> # Each sink's type must be defined
>>> adServerAgent.sinks.fileSink.type = file_roll
>>> adServerAgent.sinks.fileSink.sink.directory = /root/flume/flume_sink
>>>
>>> #Specify the channel the sink should use
>>> #adServerAgent.sinks.fileSink.channel = memoryChannel
>>> adServerAgent.sinks.fileSink.channel = fileChannel
>>>
>>> adServerAgent.channels.memoryChannel.type =memory
>>> adServerAgent.channels.memoryChannel.capacity = 100000
>>> adServerAgent.channels.memoryChannel.transactionCapacity = 10000
>>>
>>> adServerAgent.channels.fileChannel.type=file
>>> adServerAgent.channels.fileChannel.dataDirs=/root/jagadish/flume_channel1/dataDir3
>>> adServerAgent.channels.fileChannel.checkpointDir=/root/jagadish/flume_channel1/checkpointDir3
>>>
>>> *Script  snippet being used:*
>>> ...
>>> eval
>>> {
>>>         local $SIG{ALRM} = sub { die "alarm\n"; };
>>>         alarm $TIMEOUT;
>>>         my $i=0;
>>>         my $str = "";
>>>         my $counter=1;
>>>         while(1)
>>>         {
>>>                         $str = "";
>>>                         for($i=0; $i < $NO_ELE_PER_ROW; $i++)
>>>                         {
>>>                                 $str .= $counter."\t";
>>>                                 $counter++;
>>>                         }
>>>                         chop($str);
>>>                         #print $socket "$str\n";
>>>                         $socket->send($str."\n") or die "Didn't send";
>>>
>>>                         if($? != 0)
>>>                         {
>>>                                 print "Failed for $str \n";
>>>                         }
>>>                         print "$str\n";
>>>                         Time::HiRes::usleep($SLEEP_TIME);
>>>         }
>>>         alarm 0;
>>> };
>>> if ($@) {
>>> ......
>>>
>>> - Script is working fine as for the very large delay all events are 
>>> getting transmitted correctly.*
>>> *- Same problem occurs with memory channel too but with lower values 
>>> of sleep.*
>>>
>>> **Problem 2:*
>>> -- With this setup I am getting very low throughput i.e. I am able 
>>> to transfer only ~ 1 KB/sec data
>>> to the destination file sink. Similar performance was achieved using 
>>> HDFS sink.
>>> -- I had tried increasing batch sizes in my original scenario 
>>> without much gain in throughput.
>>> -- I had seen using 'tail -F' as source almost 10 times better 
>>> throughput.
>>> -- Is there any tunable parameter for netcat source?
>>>
>>> Please help me in above 2 cases - i)netcat source use  cases
>>> ii) Typical flume's expected throughput with file channel and 
>>> file/HDFS sink on the single machine.
>>>
>>> Regards,
>>> Jagadish
>>
>


Re: Flume netcat source related problems

Posted by Steve Johnson <st...@webninja.com>.
I had the same issues.  I was able to get it to run much smoother, and only
dropped events about 40% of the time.
Not 40% of the events, 40% of the the tests would drop a small number of
events..  For instance, I'd send through about 2mil. log events over about
a minute.  4 out of 10 runs would drop a very small number of events, I'm
talking maybe 10 or less events, 6 out of 10 runs, I'd receive every log
event in the File roll sink and checksums would be exact.  I'm still
working on looking into this more, but the throughput I was able to tune
was impressive.  However, this was with only a single flume agent, but
using a similar setup.  (NetCat source, MemChannel, FileRoll sink).

My original testing I was trying to use the FileChannel, since I need this
to be recoverable, but to no avail, so I decided to go with the memory
channel since my initial testing was to test the framework first.  I'll
revisit the FileChannel later.

Anyway, in doing so I found an interesting comment in the source, this
ended up resolving some of my issues:
I set the java OPTS like so: JAVA_OPTS="-Xms1g -Xmx5g
-Dcom.sun.management.jmxremote -XX:MaxDirectMemorySize=5g" in my
flume-env.sh in the flume conf dir.

I know that the file channel uses direct memory, I don't know if Memory
channel needs this, but I left it set and things seemed to work better for
me, not sure if it was a coincidence or not. I just made sure that these
settings were higher than what I had set for the channel capacity, and
things seemed to run smoother.  I'm planning on following up with an email
of my findings further.

Also, it seems your capacity is set too low.  You may have an issue with it
trying to write too often to disk, instead of buffering more.  I have mine
set to 4g.  (Assuming the capacity setting is in bytes).  This worked well
for me, just need to make sure your OPTS tuning have at least this much ram
allocated. (I went 1gb higher on the OPTS to be safe, my server has 48gb of
ram, but even with 8gb, if your not using it all elsewhere, you should be
fine.).

testagent.channels.memChannel.capacity = 4294967296

On Wed, Sep 5, 2012 at 1:05 AM, Jagadish Bihani <
jagadish.bihani@pubmatic.com> wrote:

>  Hi Juhani
>
> Thanks for the inputs.
> I did following changes:
> -- I sent my string event to socket with batches of  1000 & 10000 of such
> events.
> -- I have also started using DEBUG log level for flume agent.
> -- I have also increased max-line-length property of netcat source from
> default 512.
> But both problems remained. Events got lost without any exception.
> And performance also didn't get improve much (from 1 KB/sec now it's 1.3
> KB/sec apprx).
> Is there anything else to be considered?
>
> Regards,
> Jagadish
>
>
>
> On 09/04/2012 04:40 PM, Juhani Connolly wrote:
>
> Hi Jagadish,
>
> NetcatSource doesn't use any batching when receiving events. It writes one
> event at a time, and that translates in the FileChannel to a flush to disk,
> so when you're writing many, your disk just won't keep up. One way to
> improve this is to use separate physical disks for your checkpoint/data
> directories.
>
> TailSource used to have the same problem until we added batching to it. By
> a cursory examination of NetcatSource, it looks to me like you can also
> force some batching by sending multiple lines in each socket->send.
>
> As to the first problem with lines going missing, I'm not entirely sure as
> I can't dive deeply into the source right now. I wouldn't be surprised if
> it's some kind of congestion problem and lack of logging(or your log levels
> are just too high, try switching them to INFO or DEBUG?) that will be
> resolved once you get the throughput up.
>
> On 09/04/2012 07:50 PM, Jagadish Bihani wrote:
>
> Hi
>
> I encountered an problem in my scenario with netcat source. Setup is
> Host A: Netcat source -file channel -avro sink
> Host B: Avro source - file channel - HDFS sink
> But to simplify it I have created a single agent with "Netcat Source" and
> "file roll sink"*
> *It is *:
> *Host A: Netcat source - file channel - File_roll sink
>
> *Problem*:
> 1. To simulate the our production scenario. I have created a script which
> runs for 15 sec and in the
> while loop writes requests netcat source on a given port. For a large
> value of the sleep events are
> delivered correctly to the destination. But as I reduce the delay events
> are given to the source but they
> are not delivered to the destination. e.g. I write 9108 records within 15
> sec using script and only 1708
> got delivered. And I don't get any exception. If it is flow control
> related problem then I should have seen
> some exception in agent logs. But with file channel and huge disk space,
> is it a problem?
>
> *Machine Configuration:*
> RAM : 8 GB
> JVM : 200 MB
> CPU: 2.0 GHz Quad core processor
>
> *Flume Agent Confi**guration*
> adServerAgent.sources = netcatSource
> adServerAgent.channels = fileChannel memoryChannel
> adServerAgent.sinks = fileSink
>
> # For each one of the sources, the type is defined
> adServerAgent.sources.netcatSource.type = netcat
> adServerAgent.sources.netcatSource.bind = 10.0.17.231
> adServerAgent.sources.netcatSource.port = 55355
>
> # The channel can be defined as follows.
> adServerAgent.sources.netcatSource.channels = fileChannel
> #adServerAgent.sources.netcatSource.channels = memoryChannel
>
> # Each sink's type must be defined
> adServerAgent.sinks.fileSink.type = file_roll
> adServerAgent.sinks.fileSink.sink.directory = /root/flume/flume_sink
>
> #Specify the channel the sink should use
> #adServerAgent.sinks.fileSink.channel = memoryChannel
> adServerAgent.sinks.fileSink.channel = fileChannel
>
> adServerAgent.channels.memoryChannel.type =memory
> adServerAgent.channels.memoryChannel.capacity = 100000
> adServerAgent.channels.memoryChannel.transactionCapacity = 10000
>
> adServerAgent.channels.fileChannel.type=file
>
> adServerAgent.channels.fileChannel.dataDirs=/root/jagadish/flume_channel1/dataDir3
>
> adServerAgent.channels.fileChannel.checkpointDir=/root/jagadish/flume_channel1/checkpointDir3
>
> *Script  snippet being used:*
> ...
> eval
> {
>         local $SIG{ALRM} = sub { die "alarm\n"; };
>         alarm $TIMEOUT;
>         my $i=0;
>         my $str = "";
>         my $counter=1;
>         while(1)
>         {
>                         $str = "";
>                         for($i=0; $i < $NO_ELE_PER_ROW; $i++)
>                         {
>                                 $str .= $counter."\t";
>                                 $counter++;
>                         }
>                         chop($str);
>                         #print $socket "$str\n";
>                         $socket->send($str."\n") or die "Didn't send";
>
>                         if($? != 0)
>                         {
>                                 print "Failed for $str \n";
>                         }
>                         print "$str\n";
>                         Time::HiRes::usleep($SLEEP_TIME);
>         }
>         alarm 0;
> };
> if ($@) {
> ......
>
> - Script is working fine as for the very large delay all events are
> getting transmitted correctly.*
> *- Same problem occurs with memory channel too but with lower values of
> sleep.*
>
> **Problem 2:*
> -- With this setup I am getting very low throughput i.e. I am able to
> transfer only ~ 1 KB/sec data
> to the destination file sink. Similar performance was achieved using HDFS
> sink.
> -- I had tried increasing batch sizes in my original scenario without much
> gain in throughput.
> -- I had seen using 'tail -F' as source almost 10 times better throughput.
> -- Is there any tunable parameter for netcat source?
>
> Please help me in above 2 cases - i)netcat source use  cases
> ii) Typical flume's expected throughput with file channel and file/HDFS
> sink on the single machine.
>
> Regards,
> Jagadish
>
>
>
>


-- 
Steve Johnson
steve@webninja.com

Re: Flume netcat source related problems

Posted by Jagadish Bihani <ja...@pubmatic.com>.
Hi Juhani

Thanks for the inputs.
I did following changes:
-- I sent my string event to socket with batches of  1000 & 10000 of 
such events.
-- I have also started using DEBUG log level for flume agent.
-- I have also increased max-line-length property of netcat source from 
default 512.
But both problems remained. Events got lost without any exception.
And performance also didn't get improve much (from 1 KB/sec now it's 1.3 
KB/sec apprx).
Is there anything else to be considered?

Regards,
Jagadish


On 09/04/2012 04:40 PM, Juhani Connolly wrote:
> Hi Jagadish,
>
> NetcatSource doesn't use any batching when receiving events. It writes 
> one event at a time, and that translates in the FileChannel to a flush 
> to disk, so when you're writing many, your disk just won't keep up. 
> One way to improve this is to use separate physical disks for your 
> checkpoint/data directories.
>
> TailSource used to have the same problem until we added batching to 
> it. By a cursory examination of NetcatSource, it looks to me like you 
> can also force some batching by sending multiple lines in each 
> socket->send.
>
> As to the first problem with lines going missing, I'm not entirely 
> sure as I can't dive deeply into the source right now. I wouldn't be 
> surprised if it's some kind of congestion problem and lack of 
> logging(or your log levels are just too high, try switching them to 
> INFO or DEBUG?) that will be resolved once you get the throughput up.
>
> On 09/04/2012 07:50 PM, Jagadish Bihani wrote:
>> Hi
>>
>> I encountered an problem in my scenario with netcat source. Setup is
>> Host A: Netcat source -file channel -avro sink
>> Host B: Avro source - file channel - HDFS sink
>> But to simplify it I have created a single agent with "Netcat Source" 
>> and "file roll sink"*
>> *It is *:
>> *Host A: Netcat source - file channel - File_roll sink
>>
>> *Problem*:
>> 1. To simulate the our production scenario. I have created a script 
>> which runs for 15 sec and in the
>> while loop writes requests netcat source on a given port. For a large 
>> value of the sleep events are
>> delivered correctly to the destination. But as I reduce the delay 
>> events are given to the source but they
>> are not delivered to the destination. e.g. I write 9108 records 
>> within 15 sec using script and only 1708
>> got delivered. And I don't get any exception. If it is flow control 
>> related problem then I should have seen
>> some exception in agent logs. But with file channel and huge disk 
>> space, is it a problem?
>>
>> *Machine Configuration:*
>> RAM : 8 GB
>> JVM : 200 MB
>> CPU: 2.0 GHz Quad core processor
>>
>> *Flume Agent Confi**guration*
>> adServerAgent.sources = netcatSource
>> adServerAgent.channels = fileChannel memoryChannel
>> adServerAgent.sinks = fileSink
>>
>> # For each one of the sources, the type is defined
>> adServerAgent.sources.netcatSource.type = netcat
>> adServerAgent.sources.netcatSource.bind = 10.0.17.231
>> adServerAgent.sources.netcatSource.port = 55355
>>
>> # The channel can be defined as follows.
>> adServerAgent.sources.netcatSource.channels = fileChannel
>> #adServerAgent.sources.netcatSource.channels = memoryChannel
>>
>> # Each sink's type must be defined
>> adServerAgent.sinks.fileSink.type = file_roll
>> adServerAgent.sinks.fileSink.sink.directory = /root/flume/flume_sink
>>
>> #Specify the channel the sink should use
>> #adServerAgent.sinks.fileSink.channel = memoryChannel
>> adServerAgent.sinks.fileSink.channel = fileChannel
>>
>> adServerAgent.channels.memoryChannel.type =memory
>> adServerAgent.channels.memoryChannel.capacity = 100000
>> adServerAgent.channels.memoryChannel.transactionCapacity = 10000
>>
>> adServerAgent.channels.fileChannel.type=file
>> adServerAgent.channels.fileChannel.dataDirs=/root/jagadish/flume_channel1/dataDir3
>> adServerAgent.channels.fileChannel.checkpointDir=/root/jagadish/flume_channel1/checkpointDir3
>>
>> *Script  snippet being used:*
>> ...
>> eval
>> {
>>         local $SIG{ALRM} = sub { die "alarm\n"; };
>>         alarm $TIMEOUT;
>>         my $i=0;
>>         my $str = "";
>>         my $counter=1;
>>         while(1)
>>         {
>>                         $str = "";
>>                         for($i=0; $i < $NO_ELE_PER_ROW; $i++)
>>                         {
>>                                 $str .= $counter."\t";
>>                                 $counter++;
>>                         }
>>                         chop($str);
>>                         #print $socket "$str\n";
>>                         $socket->send($str."\n") or die "Didn't send";
>>
>>                         if($? != 0)
>>                         {
>>                                 print "Failed for $str \n";
>>                         }
>>                         print "$str\n";
>>                         Time::HiRes::usleep($SLEEP_TIME);
>>         }
>>         alarm 0;
>> };
>> if ($@) {
>> ......
>>
>> - Script is working fine as for the very large delay all events are 
>> getting transmitted correctly.*
>> *- Same problem occurs with memory channel too but with lower values 
>> of sleep.*
>>
>> **Problem 2:*
>> -- With this setup I am getting very low throughput i.e. I am able to 
>> transfer only ~ 1 KB/sec data
>> to the destination file sink. Similar performance was achieved using 
>> HDFS sink.
>> -- I had tried increasing batch sizes in my original scenario without 
>> much gain in throughput.
>> -- I had seen using 'tail -F' as source almost 10 times better 
>> throughput.
>> -- Is there any tunable parameter for netcat source?
>>
>> Please help me in above 2 cases - i)netcat source use  cases
>> ii) Typical flume's expected throughput with file channel and 
>> file/HDFS sink on the single machine.
>>
>> Regards,
>> Jagadish
>


Re: Flume netcat source related problems

Posted by Juhani Connolly <ju...@cyberagent.co.jp>.
Hi Jagadish,

NetcatSource doesn't use any batching when receiving events. It writes 
one event at a time, and that translates in the FileChannel to a flush 
to disk, so when you're writing many, your disk just won't keep up. One 
way to improve this is to use separate physical disks for your 
checkpoint/data directories.

TailSource used to have the same problem until we added batching to it. 
By a cursory examination of NetcatSource, it looks to me like you can 
also force some batching by sending multiple lines in each socket->send.

As to the first problem with lines going missing, I'm not entirely sure 
as I can't dive deeply into the source right now. I wouldn't be 
surprised if it's some kind of congestion problem and lack of logging(or 
your log levels are just too high, try switching them to INFO or DEBUG?) 
that will be resolved once you get the throughput up.

On 09/04/2012 07:50 PM, Jagadish Bihani wrote:
> Hi
>
> I encountered an problem in my scenario with netcat source. Setup is
> Host A: Netcat source -file channel -avro sink
> Host B: Avro source - file channel - HDFS sink
> But to simplify it I have created a single agent with "Netcat Source" 
> and "file roll sink"*
> *It is *:
> *Host A: Netcat source - file channel - File_roll sink
>
> *Problem*:
> 1. To simulate the our production scenario. I have created a script 
> which runs for 15 sec and in the
> while loop writes requests netcat source on a given port. For a large 
> value of the sleep events are
> delivered correctly to the destination. But as I reduce the delay 
> events are given to the source but they
> are not delivered to the destination. e.g. I write 9108 records within 
> 15 sec using script and only 1708
> got delivered. And I don't get any exception. If it is flow control 
> related problem then I should have seen
> some exception in agent logs. But with file channel and huge disk 
> space, is it a problem?
>
> *Machine Configuration:*
> RAM : 8 GB
> JVM : 200 MB
> CPU: 2.0 GHz Quad core processor
>
> *Flume Agent Confi**guration*
> adServerAgent.sources = netcatSource
> adServerAgent.channels = fileChannel memoryChannel
> adServerAgent.sinks = fileSink
>
> # For each one of the sources, the type is defined
> adServerAgent.sources.netcatSource.type = netcat
> adServerAgent.sources.netcatSource.bind = 10.0.17.231
> adServerAgent.sources.netcatSource.port = 55355
>
> # The channel can be defined as follows.
> adServerAgent.sources.netcatSource.channels = fileChannel
> #adServerAgent.sources.netcatSource.channels = memoryChannel
>
> # Each sink's type must be defined
> adServerAgent.sinks.fileSink.type = file_roll
> adServerAgent.sinks.fileSink.sink.directory = /root/flume/flume_sink
>
> #Specify the channel the sink should use
> #adServerAgent.sinks.fileSink.channel = memoryChannel
> adServerAgent.sinks.fileSink.channel = fileChannel
>
> adServerAgent.channels.memoryChannel.type =memory
> adServerAgent.channels.memoryChannel.capacity = 100000
> adServerAgent.channels.memoryChannel.transactionCapacity = 10000
>
> adServerAgent.channels.fileChannel.type=file
> adServerAgent.channels.fileChannel.dataDirs=/root/jagadish/flume_channel1/dataDir3
> adServerAgent.channels.fileChannel.checkpointDir=/root/jagadish/flume_channel1/checkpointDir3
>
> *Script  snippet being used:*
> ...
> eval
> {
>         local $SIG{ALRM} = sub { die "alarm\n"; };
>         alarm $TIMEOUT;
>         my $i=0;
>         my $str = "";
>         my $counter=1;
>         while(1)
>         {
>                         $str = "";
>                         for($i=0; $i < $NO_ELE_PER_ROW; $i++)
>                         {
>                                 $str .= $counter."\t";
>                                 $counter++;
>                         }
>                         chop($str);
>                         #print $socket "$str\n";
>                         $socket->send($str."\n") or die "Didn't send";
>
>                         if($? != 0)
>                         {
>                                 print "Failed for $str \n";
>                         }
>                         print "$str\n";
>                         Time::HiRes::usleep($SLEEP_TIME);
>         }
>         alarm 0;
> };
> if ($@) {
> ......
>
> - Script is working fine as for the very large delay all events are 
> getting transmitted correctly.*
> *- Same problem occurs with memory channel too but with lower values 
> of sleep.*
>
> **Problem 2:*
> -- With this setup I am getting very low throughput i.e. I am able to 
> transfer only ~ 1 KB/sec data
> to the destination file sink. Similar performance was achieved using 
> HDFS sink.
> -- I had tried increasing batch sizes in my original scenario without 
> much gain in throughput.
> -- I had seen using 'tail -F' as source almost 10 times better throughput.
> -- Is there any tunable parameter for netcat source?
>
> Please help me in above 2 cases - i)netcat source use  cases
> ii) Typical flume's expected throughput with file channel and 
> file/HDFS sink on the single machine.
>
> Regards,
> Jagadish