You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@flume.apache.org by "Rakos, Rudolf" <Ru...@morganstanley.com> on 2012/12/18 17:07:33 UTC

Flume 1.3.0 - NFS + File Channel Performance

Hi,

We’ve run into a strange problem regarding NFS and File Channel performance while evaluating the new version of Flume.
We had no issues with the previous version (1.2.0).

Our configuration looks like this:

·         Node1:
(Avro RPC Clients ->) Avro Source and Custom Sources -> File Channel -> Avro Sink (-> Node 2)

·         Node2:
(Node1s ->) Avro Source -> File Channel -> Custom Sink

Both the checkpoint and the data directories of the File Channels are on NFS shares. We use the same share for checkpoint and data directories, but different shares for each Node. Unfortunately it is not an option for us to use local directories.
The events are about 1KB large, and the batch sizes are the following:

·         Avro RPC Clients: 1000

·         Custom Sources: 2000

·         Avro Sink: 5000

·         Custom Sink: 10000

We are experiencing very slow File Channel performance compared to the previous version, and high amount of timeouts (almost always) in the Avro RPC Clients and the Avro Sink.
Something like this:

·         2012-12-18 15:43:31,828 [SinkRunner-PollingRunner-ExceptionCatchingSinkProcessor] WARN  org.apache.flume.sink.AvroSink - Failed to send event batch
org.apache.flume.EventDeliveryException: NettyAvroRpcClient { host: ***, port: *** }: Failed to send batch
        at org.apache.flume.api.NettyAvroRpcClient.appendBatch(NettyAvroRpcClient.java:236) ~[flume-ng-sdk-1.3.0.jar:1.3.0]
        ***
        at org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:147) [flume-ng-core-1.3.0.jar:1.3.0]
        at java.lang.Thread.run(Thread.java:662) [na:1.6.0_31]
Caused by: org.apache.flume.EventDeliveryException: NettyAvroRpcClient { host: ***, port: *** }: Handshake timed out after 20000ms
        at org.apache.flume.api.NettyAvroRpcClient.appendBatch(NettyAvroRpcClient.java:280) ~[flume-ng-sdk-1.3.0.jar:1.3.0]
        at org.apache.flume.api.NettyAvroRpcClient.appendBatch(NettyAvroRpcClient.java:224) ~[flume-ng-sdk-1.3.0.jar:1.3.0]
        ... 5 common frames omitted
Caused by: java.util.concurrent.TimeoutException: null
        at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:228) ~[na:1.6.0_31]
        at java.util.concurrent.FutureTask.get(FutureTask.java:91) ~[na:1.6.0_31]
        at org.apache.flume.api.NettyAvroRpcClient.appendBatch(NettyAvroRpcClient.java:278) ~[flume-ng-sdk-1.3.0.jar:1.3.0]
        ... 6 common frames omitted
(I had to remove some details, sorry for that.)

We managed to narrow down the root cause of the issue to the File Channel, because:

·         Everything works fine if we switch to the Memory Channel or to the Old File Channel (1.2.0).

·         Everything works fine if we use local directories.
We’ve tested this on multiple different PCs (both Windows and Linux).

I spent the day debugging and profiling, but I could not find anything worth mentioning (nothing with excessive CPU usage, no threads are waiting too much, etc…). The only problem is that File Channel takes and puts take way more time than with the previous version.


Could someone please try the File Channel on an NFS share?
Does anyone have similar issues?

Thank you for your help.

Regards,
Rudolf

Rudolf Rakos
Morgan Stanley | ISG Technology
Lechner Odon fasor 8 | Floor 06
Budapest, 1095
Phone: +36 1 881-4011
Rudolf.Rakos@morganstanley.com<ma...@morganstanley.com>


Be carbon conscious. Please consider our environment before printing this email.



________________________________

NOTICE: Morgan Stanley is not acting as a municipal advisor and the opinions or views contained herein are not intended to be, and do not constitute, advice within the meaning of Section 975 of the Dodd-Frank Wall Street Reform and Consumer Protection Act. If you have received this communication in error, please destroy all electronic and paper copies and notify the sender immediately. Mistransmission is not intended to waive confidentiality or privilege. Morgan Stanley reserves the right, to the extent permitted under applicable law, to monitor electronic communications. This message is subject to terms available at the following link: http://www.morganstanley.com/disclaimers If you cannot access these links, please notify us by reply message and we will send the contents to you. By messaging with Morgan Stanley you consent to the foregoing.

Re: Flume 1.3.0 - NFS + File Channel Performance

Posted by Juhani Connolly <ju...@cyberagent.co.jp>.

I can confirm that this fixed issues with a non-NFS FileChannel(the 
difference is still very noticeable, and I would recommend anyone with 
high throughput to patch this in)

On 12/19/2012 06:08 AM, Brock Noland wrote:
> Hi,
>
> If you do have a chance, it would great to hear if the patch attached
> to this JIRA (https://issues.apache.org/jira/browse/FLUME-1794) fixes
> the performance problem.
>
> Brock
>
> On Tue, Dec 18, 2012 at 11:25 AM, Brock Noland <br...@cloudera.com> wrote:
>> Yeah I think we should do that check in the background and then update
>> a flag. This how hdfs and mapred do it.
>>
>> On Tue, Dec 18, 2012 at 11:04 AM, Hari Shreedharan
>> <hs...@cloudera.com> wrote:
>>> Yep. The disk space calls require an NFS call for each write, and that slows
>>> things down a lot.
>>>
>>> --
>>> Hari Shreedharan
>>>
>>> On Tuesday, December 18, 2012 at 8:43 AM, Brock Noland wrote:
>>>
>>> We'd need those thread dumps to help confirm but I bet that FLUME-1609
>>> results in a NFS call on each operation on the channel.
>>>
>>> If that is true, that would explain why it works well on local disk.
>>>
>>> Brock
>>>
>>> On Tue, Dec 18, 2012 at 10:17 AM, Brock Noland <br...@cloudera.com> wrote:
>>>
>>> Hi,
>>>
>>> Hmm, yes in general performance is not going to be great over NFS, but
>>> there haven't been any FC changes that stick out here.
>>>
>>> Could you take 10 thread dumps of the agent running the file channel
>>> and 10 thread dumps of the agent sending data to the agent with the
>>> file channel? (You can address them to myself directly since the list
>>> won't take attachements.)
>>>
>>> Are there any patterns, like it works for 40 seconds then times out
>>> and then works for 39 seconds, etc?
>>>
>>> Brock
>>>
>>> On Tue, Dec 18, 2012 at 10:07 AM, Rakos, Rudolf
>>> <Ru...@morganstanley.com> wrote:
>>>
>>> Hi,
>>>
>>>
>>>
>>> We’ve run into a strange problem regarding NFS and File Channel performance
>>> while evaluating the new version of Flume.
>>>
>>> We had no issues with the previous version (1.2.0).
>>>
>>>
>>>
>>> Our configuration looks like this:
>>>
>>> · Node1:
>>> (Avro RPC Clients ->) Avro Source and Custom Sources -> File Channel -> Avro
>>> Sink (-> Node 2)
>>>
>>> · Node2:
>>> (Node1s ->) Avro Source -> File Channel -> Custom Sink
>>>
>>>
>>>
>>> Both the checkpoint and the data directories of the File Channels are on NFS
>>> shares. We use the same share for checkpoint and data directories, but
>>> different shares for each Node. Unfortunately it is not an option for us to
>>> use local directories.
>>>
>>> The events are about 1KB large, and the batch sizes are the following:
>>>
>>> · Avro RPC Clients: 1000
>>>
>>> · Custom Sources: 2000
>>>
>>> · Avro Sink: 5000
>>>
>>> · Custom Sink: 10000
>>>
>>>
>>>
>>> We are experiencing very slow File Channel performance compared to the
>>> previous version, and high amount of timeouts (almost always) in the Avro
>>> RPC Clients and the Avro Sink.
>>>
>>> Something like this:
>>>
>>> · 2012-12-18 15:43:31,828
>>> [SinkRunner-PollingRunner-ExceptionCatchingSinkProcessor] WARN
>>> org.apache.flume.sink.AvroSink - Failed to send event batch
>>> org.apache.flume.EventDeliveryException: NettyAvroRpcClient { host: ***,
>>> port: *** }: Failed to send batch
>>> at
>>> org.apache.flume.api.NettyAvroRpcClient.appendBatch(NettyAvroRpcClient.java:236)
>>> ~[flume-ng-sdk-1.3.0.jar:1.3.0]
>>> ***
>>> at
>>> org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:147)
>>> [flume-ng-core-1.3.0.jar:1.3.0]
>>> at java.lang.Thread.run(Thread.java:662) [na:1.6.0_31]
>>> Caused by: org.apache.flume.EventDeliveryException: NettyAvroRpcClient {
>>> host: ***, port: *** }: Handshake timed out after 20000ms
>>> at
>>> org.apache.flume.api.NettyAvroRpcClient.appendBatch(NettyAvroRpcClient.java:280)
>>> ~[flume-ng-sdk-1.3.0.jar:1.3.0]
>>> at
>>> org.apache.flume.api.NettyAvroRpcClient.appendBatch(NettyAvroRpcClient.java:224)
>>> ~[flume-ng-sdk-1.3.0.jar:1.3.0]
>>> ... 5 common frames omitted
>>> Caused by: java.util.concurrent.TimeoutException: null
>>> at
>>> java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:228)
>>> ~[na:1.6.0_31]
>>> at java.util.concurrent.FutureTask.get(FutureTask.java:91)
>>> ~[na:1.6.0_31]
>>> at
>>> org.apache.flume.api.NettyAvroRpcClient.appendBatch(NettyAvroRpcClient.java:278)
>>> ~[flume-ng-sdk-1.3.0.jar:1.3.0]
>>> ... 6 common frames omitted
>>>
>>> (I had to remove some details, sorry for that.)
>>>
>>>
>>>
>>> We managed to narrow down the root cause of the issue to the File Channel,
>>> because:
>>>
>>> · Everything works fine if we switch to the Memory Channel or to the
>>> Old File Channel (1.2.0).
>>>
>>> · Everything works fine if we use local directories.
>>>
>>> We’ve tested this on multiple different PCs (both Windows and Linux).
>>>
>>>
>>>
>>> I spent the day debugging and profiling, but I could not find anything worth
>>> mentioning (nothing with excessive CPU usage, no threads are waiting too
>>> much, etc…). The only problem is that File Channel takes and puts take way
>>> more time than with the previous version.
>>>
>>>
>>>
>>>
>>>
>>> Could someone please try the File Channel on an NFS share?
>>>
>>> Does anyone have similar issues?
>>>
>>>
>>>
>>> Thank you for your help.
>>>
>>>
>>>
>>> Regards,
>>>
>>> Rudolf
>>>
>>>
>>>
>>> Rudolf Rakos
>>> Morgan Stanley | ISG Technology
>>> Lechner Odon fasor 8 | Floor 06
>>> Budapest, 1095
>>> Phone: +36 1 881-4011
>>> Rudolf.Rakos@morganstanley.com
>>>
>>>
>>> Be carbon conscious. Please consider our environment before printing this
>>> email.
>>>
>>>
>>>
>>>
>>> ________________________________
>>>
>>> NOTICE: Morgan Stanley is not acting as a municipal advisor and the opinions
>>> or views contained herein are not intended to be, and do not constitute,
>>> advice within the meaning of Section 975 of the Dodd-Frank Wall Street
>>> Reform and Consumer Protection Act. If you have received this communication
>>> in error, please destroy all electronic and paper copies and notify the
>>> sender immediately. Mistransmission is not intended to waive confidentiality
>>> or privilege. Morgan Stanley reserves the right, to the extent permitted
>>> under applicable law, to monitor electronic communications. This message is
>>> subject to terms available at the following link:
>>> http://www.morganstanley.com/disclaimers If you cannot access these links,
>>> please notify us by reply message and we will send the contents to you. By
>>> messaging with Morgan Stanley you consent to the foregoing.
>>>
>>>
>>>
>>>
>>> --
>>> Apache MRUnit - Unit testing MapReduce - http://incubator.apache.org/mrunit/
>>>
>>>
>>>
>>>
>>> --
>>> Apache MRUnit - Unit testing MapReduce - http://incubator.apache.org/mrunit/
>>>
>>>
>>
>>
>> --
>> Apache MRUnit - Unit testing MapReduce - http://incubator.apache.org/mrunit/
>
>

Re: Flume 1.3.0 - NFS + File Channel Performance

Posted by Brock Noland <br...@cloudera.com>.

Hi,

Yes it'd be great to get 1763 and 1794 into 1.3.1. I don't have time
right at this point in time. If another committer does, I'd love to
vote on an RC! :)

Brock

On Wed, Dec 19, 2012 at 4:34 AM, Rakos, Rudolf
<Ru...@morganstanley.com> wrote:
> Brock, Hari,
>
> I can confirm that the patch in FLUME-1794 fixes the performance issue.
>
> I was wondering whether it is possible to ask for a new release (1.3.1) including the recent File Channel bug fixes?
>
>   Trunk: https://git-wip-us.apache.org/repos/asf?p=flume.git;a=history;f=flume-ng-channels/flume-file-channel;h=cc779e886b4d6290723a43b4f874239150d93475;hb=trunk
>   1.3.0: https://git-wip-us.apache.org/repos/asf?p=flume.git;a=history;f=flume-ng-channels/flume-file-channel;h=cc93d99eac6d631e9200d122928d5e307621b4fe;hb=refs/heads/flume-1.3.0
>
> Unfortunately we cannot use trunk, and waiting for Flume 1.4.0 could take a few months.
> It's not a big problem if we need to stick with Flume 1.2.0, but according to Juhani Connolly this was causing high CPU usage with non-NFS File Channels too, so I think maybe it would be better for the community.
>
> Regards,
> Rudolf
>
> -----Original Message-----
> From: Rakos, Rudolf (ISGT)
> Sent: Wednesday, December 19, 2012 9:10 AM
> To: user@flume.apache.org
> Subject: RE: Flume 1.3.0 - NFS + File Channel Performance
>
> Brock, Hari,
>
> Thank you very much for looking so quickly into this.
>
> We're aware that the general performance will not be that great using NFS, but having some "last minute" data on failover scenarios could be worth the performance cost.
>
> You were right.
> I've taken some thread dumps and I can confirm that FLUME-1609 (File.getUsableSpace calls) are causing the issue. (I just don't understand how could I miss this hot spot during profiling.)
>
> I'll check whether the patch in FLUME-1794 fixes this.
>
> Thanks,
> Rudolf
>
> -----Original Message-----
> From: Brock Noland [mailto:brock@cloudera.com]
> Sent: Tuesday, December 18, 2012 10:09 PM
> To: user@flume.apache.org
> Subject: Re: Flume 1.3.0 - NFS + File Channel Performance
>
> Hi,
>
> If you do have a chance, it would great to hear if the patch attached to this JIRA (https://issues.apache.org/jira/browse/FLUME-1794) fixes the performance problem.
>
> Brock
>
> On Tue, Dec 18, 2012 at 11:25 AM, Brock Noland <br...@cloudera.com> wrote:
>> Yeah I think we should do that check in the background and then update
>> a flag. This how hdfs and mapred do it.
>>
>> On Tue, Dec 18, 2012 at 11:04 AM, Hari Shreedharan
>> <hs...@cloudera.com> wrote:
>>> Yep. The disk space calls require an NFS call for each write, and
>>> that slows things down a lot.
>>>
>>> --
>>> Hari Shreedharan
>>>
>>> On Tuesday, December 18, 2012 at 8:43 AM, Brock Noland wrote:
>>>
>>> We'd need those thread dumps to help confirm but I bet that
>>> FLUME-1609 results in a NFS call on each operation on the channel.
>>>
>>> If that is true, that would explain why it works well on local disk.
>>>
>>> Brock
>>>
>>> On Tue, Dec 18, 2012 at 10:17 AM, Brock Noland <br...@cloudera.com> wrote:
>>>
>>> Hi,
>>>
>>> Hmm, yes in general performance is not going to be great over NFS,
>>> but there haven't been any FC changes that stick out here.
>>>
>>> Could you take 10 thread dumps of the agent running the file channel
>>> and 10 thread dumps of the agent sending data to the agent with the
>>> file channel? (You can address them to myself directly since the list
>>> won't take attachements.)
>>>
>>> Are there any patterns, like it works for 40 seconds then times out
>>> and then works for 39 seconds, etc?
>>>
>>> Brock
>>>
>>> On Tue, Dec 18, 2012 at 10:07 AM, Rakos, Rudolf
>>> <Ru...@morganstanley.com> wrote:
>>>
>>> Hi,
>>>
>>>
>>>
>>> We’ve run into a strange problem regarding NFS and File Channel
>>> performance while evaluating the new version of Flume.
>>>
>>> We had no issues with the previous version (1.2.0).
>>>
>>>
>>>
>>> Our configuration looks like this:
>>>
>>> · Node1:
>>> (Avro RPC Clients ->) Avro Source and Custom Sources -> File Channel
>>> -> Avro Sink (-> Node 2)
>>>
>>> · Node2:
>>> (Node1s ->) Avro Source -> File Channel -> Custom Sink
>>>
>>>
>>>
>>> Both the checkpoint and the data directories of the File Channels are
>>> on NFS shares. We use the same share for checkpoint and data
>>> directories, but different shares for each Node. Unfortunately it is
>>> not an option for us to use local directories.
>>>
>>> The events are about 1KB large, and the batch sizes are the following:
>>>
>>> · Avro RPC Clients: 1000
>>>
>>> · Custom Sources: 2000
>>>
>>> · Avro Sink: 5000
>>>
>>> · Custom Sink: 10000
>>>
>>>
>>>
>>> We are experiencing very slow File Channel performance compared to
>>> the previous version, and high amount of timeouts (almost always) in
>>> the Avro RPC Clients and the Avro Sink.
>>>
>>> Something like this:
>>>
>>> · 2012-12-18 15:43:31,828
>>> [SinkRunner-PollingRunner-ExceptionCatchingSinkProcessor] WARN
>>> org.apache.flume.sink.AvroSink - Failed to send event batch
>>> org.apache.flume.EventDeliveryException: NettyAvroRpcClient { host:
>>> ***,
>>> port: *** }: Failed to send batch
>>> at
>>> org.apache.flume.api.NettyAvroRpcClient.appendBatch(NettyAvroRpcClien
>>> t.java:236)
>>> ~[flume-ng-sdk-1.3.0.jar:1.3.0]
>>> ***
>>> at
>>> org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:147)
>>> [flume-ng-core-1.3.0.jar:1.3.0]
>>> at java.lang.Thread.run(Thread.java:662) [na:1.6.0_31] Caused by:
>>> org.apache.flume.EventDeliveryException: NettyAvroRpcClient {
>>> host: ***, port: *** }: Handshake timed out after 20000ms at
>>> org.apache.flume.api.NettyAvroRpcClient.appendBatch(NettyAvroRpcClien
>>> t.java:280)
>>> ~[flume-ng-sdk-1.3.0.jar:1.3.0]
>>> at
>>> org.apache.flume.api.NettyAvroRpcClient.appendBatch(NettyAvroRpcClien
>>> t.java:224)
>>> ~[flume-ng-sdk-1.3.0.jar:1.3.0]
>>> ... 5 common frames omitted
>>> Caused by: java.util.concurrent.TimeoutException: null at
>>> java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:228)
>>> ~[na:1.6.0_31]
>>> at java.util.concurrent.FutureTask.get(FutureTask.java:91)
>>> ~[na:1.6.0_31]
>>> at
>>> org.apache.flume.api.NettyAvroRpcClient.appendBatch(NettyAvroRpcClien
>>> t.java:278)
>>> ~[flume-ng-sdk-1.3.0.jar:1.3.0]
>>> ... 6 common frames omitted
>>>
>>> (I had to remove some details, sorry for that.)
>>>
>>>
>>>
>>> We managed to narrow down the root cause of the issue to the File
>>> Channel,
>>> because:
>>>
>>> · Everything works fine if we switch to the Memory Channel or to the
>>> Old File Channel (1.2.0).
>>>
>>> · Everything works fine if we use local directories.
>>>
>>> We’ve tested this on multiple different PCs (both Windows and Linux).
>>>
>>>
>>>
>>> I spent the day debugging and profiling, but I could not find
>>> anything worth mentioning (nothing with excessive CPU usage, no
>>> threads are waiting too much, etc…). The only problem is that File
>>> Channel takes and puts take way more time than with the previous version.
>>>
>>>
>>>
>>>
>>>
>>> Could someone please try the File Channel on an NFS share?
>>>
>>> Does anyone have similar issues?
>>>
>>>
>>>
>>> Thank you for your help.
>>>
>>>
>>>
>>> Regards,
>>>
>>> Rudolf
>>>
>>>
>>>
>>> Rudolf Rakos
>>> Morgan Stanley | ISG Technology
>>> Lechner Odon fasor 8 | Floor 06
>>> Budapest, 1095
>>> Phone: +36 1 881-4011
>>> Rudolf.Rakos@morganstanley.com
>>>
>>>
>>> Be carbon conscious. Please consider our environment before printing
>>> this email.
>>>
>>>
>>>
>>>
>>> ________________________________
>>>
>>> NOTICE: Morgan Stanley is not acting as a municipal advisor and the
>>> opinions or views contained herein are not intended to be, and do not
>>> constitute, advice within the meaning of Section 975 of the
>>> Dodd-Frank Wall Street Reform and Consumer Protection Act. If you
>>> have received this communication in error, please destroy all
>>> electronic and paper copies and notify the sender immediately.
>>> Mistransmission is not intended to waive confidentiality or
>>> privilege. Morgan Stanley reserves the right, to the extent permitted
>>> under applicable law, to monitor electronic communications. This message is subject to terms available at the following link:
>>> http://www.morganstanley.com/disclaimers If you cannot access these
>>> links, please notify us by reply message and we will send the
>>> contents to you. By messaging with Morgan Stanley you consent to the foregoing.
>>>
>>>
>>>
>>>
>>> --
>>> Apache MRUnit - Unit testing MapReduce -
>>> http://incubator.apache.org/mrunit/
>>>
>>>
>>>
>>>
>>> --
>>> Apache MRUnit - Unit testing MapReduce -
>>> http://incubator.apache.org/mrunit/
>>>
>>>
>>
>>
>>
>> --
>> Apache MRUnit - Unit testing MapReduce -
>> http://incubator.apache.org/mrunit/
>
>
>
> --
> Apache MRUnit - Unit testing MapReduce - http://incubator.apache.org/mrunit/
>
>
> --------------------------------------------------------------------------------
>
> NOTICE: Morgan Stanley is not acting as a municipal advisor and the opinions or views contained herein are not intended to be, and do not constitute, advice within the meaning of Section 975 of the Dodd-Frank Wall Street Reform and Consumer Protection Act. If you have received this communication in error, please destroy all electronic and paper copies and notify the sender immediately. Mistransmission is not intended to waive confidentiality or privilege. Morgan Stanley reserves the right, to the extent permitted under applicable law, to monitor electronic communications. This message is subject to terms available at the following link: http://www.morganstanley.com/disclaimers. If you cannot access these links, please notify us by reply message and we will send the contents to you. By messaging with Morgan Stanley you consent to the foregoing.
>
>
> --------------------------------------------------------------------------------
>
> NOTICE: Morgan Stanley is not acting as a municipal advisor and the opinions or views contained herein are not intended to be, and do not constitute, advice within the meaning of Section 975 of the Dodd-Frank Wall Street Reform and Consumer Protection Act. If you have received this communication in error, please destroy all electronic and paper copies and notify the sender immediately. Mistransmission is not intended to waive confidentiality or privilege. Morgan Stanley reserves the right, to the extent permitted under applicable law, to monitor electronic communications. This message is subject to terms available at the following link: http://www.morganstanley.com/disclaimers. If you cannot access these links, please notify us by reply message and we will send the contents to you. By messaging with Morgan Stanley you consent to the foregoing.



-- 
Apache MRUnit - Unit testing MapReduce - http://incubator.apache.org/mrunit/

RE: Flume 1.3.0 - NFS + File Channel Performance

Posted by "Rakos, Rudolf" <Ru...@morganstanley.com>.

Brock, Hari,

I can confirm that the patch in FLUME-1794 fixes the performance issue.

I was wondering whether it is possible to ask for a new release (1.3.1) including the recent File Channel bug fixes?

  Trunk: https://git-wip-us.apache.org/repos/asf?p=flume.git;a=history;f=flume-ng-channels/flume-file-channel;h=cc779e886b4d6290723a43b4f874239150d93475;hb=trunk
  1.3.0: https://git-wip-us.apache.org/repos/asf?p=flume.git;a=history;f=flume-ng-channels/flume-file-channel;h=cc93d99eac6d631e9200d122928d5e307621b4fe;hb=refs/heads/flume-1.3.0

Unfortunately we cannot use trunk, and waiting for Flume 1.4.0 could take a few months.
It's not a big problem if we need to stick with Flume 1.2.0, but according to Juhani Connolly this was causing high CPU usage with non-NFS File Channels too, so I think maybe it would be better for the community.

Regards,
Rudolf

-----Original Message-----
From: Rakos, Rudolf (ISGT) 
Sent: Wednesday, December 19, 2012 9:10 AM
To: user@flume.apache.org
Subject: RE: Flume 1.3.0 - NFS + File Channel Performance

Brock, Hari,

Thank you very much for looking so quickly into this.

We're aware that the general performance will not be that great using NFS, but having some "last minute" data on failover scenarios could be worth the performance cost.

You were right.
I've taken some thread dumps and I can confirm that FLUME-1609 (File.getUsableSpace calls) are causing the issue. (I just don't understand how could I miss this hot spot during profiling.)

I'll check whether the patch in FLUME-1794 fixes this. 

Thanks,
Rudolf

-----Original Message-----
From: Brock Noland [mailto:brock@cloudera.com]
Sent: Tuesday, December 18, 2012 10:09 PM
To: user@flume.apache.org
Subject: Re: Flume 1.3.0 - NFS + File Channel Performance

Hi,

If you do have a chance, it would great to hear if the patch attached to this JIRA (https://issues.apache.org/jira/browse/FLUME-1794) fixes the performance problem.

Brock

On Tue, Dec 18, 2012 at 11:25 AM, Brock Noland <br...@cloudera.com> wrote:
> Yeah I think we should do that check in the background and then update 
> a flag. This how hdfs and mapred do it.
>
> On Tue, Dec 18, 2012 at 11:04 AM, Hari Shreedharan 
> <hs...@cloudera.com> wrote:
>> Yep. The disk space calls require an NFS call for each write, and 
>> that slows things down a lot.
>>
>> --
>> Hari Shreedharan
>>
>> On Tuesday, December 18, 2012 at 8:43 AM, Brock Noland wrote:
>>
>> We'd need those thread dumps to help confirm but I bet that
>> FLUME-1609 results in a NFS call on each operation on the channel.
>>
>> If that is true, that would explain why it works well on local disk.
>>
>> Brock
>>
>> On Tue, Dec 18, 2012 at 10:17 AM, Brock Noland <br...@cloudera.com> wrote:
>>
>> Hi,
>>
>> Hmm, yes in general performance is not going to be great over NFS, 
>> but there haven't been any FC changes that stick out here.
>>
>> Could you take 10 thread dumps of the agent running the file channel 
>> and 10 thread dumps of the agent sending data to the agent with the 
>> file channel? (You can address them to myself directly since the list 
>> won't take attachements.)
>>
>> Are there any patterns, like it works for 40 seconds then times out 
>> and then works for 39 seconds, etc?
>>
>> Brock
>>
>> On Tue, Dec 18, 2012 at 10:07 AM, Rakos, Rudolf 
>> <Ru...@morganstanley.com> wrote:
>>
>> Hi,
>>
>>
>>
>> We’ve run into a strange problem regarding NFS and File Channel 
>> performance while evaluating the new version of Flume.
>>
>> We had no issues with the previous version (1.2.0).
>>
>>
>>
>> Our configuration looks like this:
>>
>> · Node1:
>> (Avro RPC Clients ->) Avro Source and Custom Sources -> File Channel
>> -> Avro Sink (-> Node 2)
>>
>> · Node2:
>> (Node1s ->) Avro Source -> File Channel -> Custom Sink
>>
>>
>>
>> Both the checkpoint and the data directories of the File Channels are 
>> on NFS shares. We use the same share for checkpoint and data 
>> directories, but different shares for each Node. Unfortunately it is 
>> not an option for us to use local directories.
>>
>> The events are about 1KB large, and the batch sizes are the following:
>>
>> · Avro RPC Clients: 1000
>>
>> · Custom Sources: 2000
>>
>> · Avro Sink: 5000
>>
>> · Custom Sink: 10000
>>
>>
>>
>> We are experiencing very slow File Channel performance compared to 
>> the previous version, and high amount of timeouts (almost always) in 
>> the Avro RPC Clients and the Avro Sink.
>>
>> Something like this:
>>
>> · 2012-12-18 15:43:31,828
>> [SinkRunner-PollingRunner-ExceptionCatchingSinkProcessor] WARN 
>> org.apache.flume.sink.AvroSink - Failed to send event batch
>> org.apache.flume.EventDeliveryException: NettyAvroRpcClient { host: 
>> ***,
>> port: *** }: Failed to send batch
>> at
>> org.apache.flume.api.NettyAvroRpcClient.appendBatch(NettyAvroRpcClien
>> t.java:236)
>> ~[flume-ng-sdk-1.3.0.jar:1.3.0]
>> ***
>> at
>> org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:147)
>> [flume-ng-core-1.3.0.jar:1.3.0]
>> at java.lang.Thread.run(Thread.java:662) [na:1.6.0_31] Caused by: 
>> org.apache.flume.EventDeliveryException: NettyAvroRpcClient {
>> host: ***, port: *** }: Handshake timed out after 20000ms at 
>> org.apache.flume.api.NettyAvroRpcClient.appendBatch(NettyAvroRpcClien
>> t.java:280)
>> ~[flume-ng-sdk-1.3.0.jar:1.3.0]
>> at
>> org.apache.flume.api.NettyAvroRpcClient.appendBatch(NettyAvroRpcClien
>> t.java:224)
>> ~[flume-ng-sdk-1.3.0.jar:1.3.0]
>> ... 5 common frames omitted
>> Caused by: java.util.concurrent.TimeoutException: null at
>> java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:228)
>> ~[na:1.6.0_31]
>> at java.util.concurrent.FutureTask.get(FutureTask.java:91)
>> ~[na:1.6.0_31]
>> at
>> org.apache.flume.api.NettyAvroRpcClient.appendBatch(NettyAvroRpcClien
>> t.java:278)
>> ~[flume-ng-sdk-1.3.0.jar:1.3.0]
>> ... 6 common frames omitted
>>
>> (I had to remove some details, sorry for that.)
>>
>>
>>
>> We managed to narrow down the root cause of the issue to the File 
>> Channel,
>> because:
>>
>> · Everything works fine if we switch to the Memory Channel or to the 
>> Old File Channel (1.2.0).
>>
>> · Everything works fine if we use local directories.
>>
>> We’ve tested this on multiple different PCs (both Windows and Linux).
>>
>>
>>
>> I spent the day debugging and profiling, but I could not find 
>> anything worth mentioning (nothing with excessive CPU usage, no 
>> threads are waiting too much, etc…). The only problem is that File 
>> Channel takes and puts take way more time than with the previous version.
>>
>>
>>
>>
>>
>> Could someone please try the File Channel on an NFS share?
>>
>> Does anyone have similar issues?
>>
>>
>>
>> Thank you for your help.
>>
>>
>>
>> Regards,
>>
>> Rudolf
>>
>>
>>
>> Rudolf Rakos
>> Morgan Stanley | ISG Technology
>> Lechner Odon fasor 8 | Floor 06
>> Budapest, 1095
>> Phone: +36 1 881-4011
>> Rudolf.Rakos@morganstanley.com
>>
>>
>> Be carbon conscious. Please consider our environment before printing 
>> this email.
>>
>>
>>
>>
>> ________________________________
>>
>> NOTICE: Morgan Stanley is not acting as a municipal advisor and the 
>> opinions or views contained herein are not intended to be, and do not 
>> constitute, advice within the meaning of Section 975 of the 
>> Dodd-Frank Wall Street Reform and Consumer Protection Act. If you 
>> have received this communication in error, please destroy all 
>> electronic and paper copies and notify the sender immediately.
>> Mistransmission is not intended to waive confidentiality or 
>> privilege. Morgan Stanley reserves the right, to the extent permitted 
>> under applicable law, to monitor electronic communications. This message is subject to terms available at the following link:
>> http://www.morganstanley.com/disclaimers If you cannot access these 
>> links, please notify us by reply message and we will send the 
>> contents to you. By messaging with Morgan Stanley you consent to the foregoing.
>>
>>
>>
>>
>> --
>> Apache MRUnit - Unit testing MapReduce - 
>> http://incubator.apache.org/mrunit/
>>
>>
>>
>>
>> --
>> Apache MRUnit - Unit testing MapReduce - 
>> http://incubator.apache.org/mrunit/
>>
>>
>
>
>
> --
> Apache MRUnit - Unit testing MapReduce - 
> http://incubator.apache.org/mrunit/



--
Apache MRUnit - Unit testing MapReduce - http://incubator.apache.org/mrunit/


--------------------------------------------------------------------------------

NOTICE: Morgan Stanley is not acting as a municipal advisor and the opinions or views contained herein are not intended to be, and do not constitute, advice within the meaning of Section 975 of the Dodd-Frank Wall Street Reform and Consumer Protection Act. If you have received this communication in error, please destroy all electronic and paper copies and notify the sender immediately. Mistransmission is not intended to waive confidentiality or privilege. Morgan Stanley reserves the right, to the extent permitted under applicable law, to monitor electronic communications. This message is subject to terms available at the following link: http://www.morganstanley.com/disclaimers. If you cannot access these links, please notify us by reply message and we will send the contents to you. By messaging with Morgan Stanley you consent to the foregoing.


--------------------------------------------------------------------------------

NOTICE: Morgan Stanley is not acting as a municipal advisor and the opinions or views contained herein are not intended to be, and do not constitute, advice within the meaning of Section 975 of the Dodd-Frank Wall Street Reform and Consumer Protection Act. If you have received this communication in error, please destroy all electronic and paper copies and notify the sender immediately. Mistransmission is not intended to waive confidentiality or privilege. Morgan Stanley reserves the right, to the extent permitted under applicable law, to monitor electronic communications. This message is subject to terms available at the following link: http://www.morganstanley.com/disclaimers. If you cannot access these links, please notify us by reply message and we will send the contents to you. By messaging with Morgan Stanley you consent to the foregoing.

RE: Flume 1.3.0 - NFS + File Channel Performance

Posted by "Rakos, Rudolf" <Ru...@morganstanley.com>.

Brock, Hari,

Thank you very much for looking so quickly into this.

We're aware that the general performance will not be that great using NFS, but having some "last minute" data on failover scenarios could be worth the performance cost.

You were right.
I've taken some thread dumps and I can confirm that FLUME-1609 (File.getUsableSpace calls) are causing the issue. (I just don't understand how could I miss this hot spot during profiling.)

I'll check whether the patch in FLUME-1794 fixes this. 

Thanks,
Rudolf

-----Original Message-----
From: Brock Noland [mailto:brock@cloudera.com] 
Sent: Tuesday, December 18, 2012 10:09 PM
To: user@flume.apache.org
Subject: Re: Flume 1.3.0 - NFS + File Channel Performance

Hi,

If you do have a chance, it would great to hear if the patch attached to this JIRA (https://issues.apache.org/jira/browse/FLUME-1794) fixes the performance problem.

Brock

On Tue, Dec 18, 2012 at 11:25 AM, Brock Noland <br...@cloudera.com> wrote:
> Yeah I think we should do that check in the background and then update 
> a flag. This how hdfs and mapred do it.
>
> On Tue, Dec 18, 2012 at 11:04 AM, Hari Shreedharan 
> <hs...@cloudera.com> wrote:
>> Yep. The disk space calls require an NFS call for each write, and 
>> that slows things down a lot.
>>
>> --
>> Hari Shreedharan
>>
>> On Tuesday, December 18, 2012 at 8:43 AM, Brock Noland wrote:
>>
>> We'd need those thread dumps to help confirm but I bet that 
>> FLUME-1609 results in a NFS call on each operation on the channel.
>>
>> If that is true, that would explain why it works well on local disk.
>>
>> Brock
>>
>> On Tue, Dec 18, 2012 at 10:17 AM, Brock Noland <br...@cloudera.com> wrote:
>>
>> Hi,
>>
>> Hmm, yes in general performance is not going to be great over NFS, 
>> but there haven't been any FC changes that stick out here.
>>
>> Could you take 10 thread dumps of the agent running the file channel 
>> and 10 thread dumps of the agent sending data to the agent with the 
>> file channel? (You can address them to myself directly since the list 
>> won't take attachements.)
>>
>> Are there any patterns, like it works for 40 seconds then times out 
>> and then works for 39 seconds, etc?
>>
>> Brock
>>
>> On Tue, Dec 18, 2012 at 10:07 AM, Rakos, Rudolf 
>> <Ru...@morganstanley.com> wrote:
>>
>> Hi,
>>
>>
>>
>> We’ve run into a strange problem regarding NFS and File Channel 
>> performance while evaluating the new version of Flume.
>>
>> We had no issues with the previous version (1.2.0).
>>
>>
>>
>> Our configuration looks like this:
>>
>> · Node1:
>> (Avro RPC Clients ->) Avro Source and Custom Sources -> File Channel 
>> -> Avro Sink (-> Node 2)
>>
>> · Node2:
>> (Node1s ->) Avro Source -> File Channel -> Custom Sink
>>
>>
>>
>> Both the checkpoint and the data directories of the File Channels are 
>> on NFS shares. We use the same share for checkpoint and data 
>> directories, but different shares for each Node. Unfortunately it is 
>> not an option for us to use local directories.
>>
>> The events are about 1KB large, and the batch sizes are the following:
>>
>> · Avro RPC Clients: 1000
>>
>> · Custom Sources: 2000
>>
>> · Avro Sink: 5000
>>
>> · Custom Sink: 10000
>>
>>
>>
>> We are experiencing very slow File Channel performance compared to 
>> the previous version, and high amount of timeouts (almost always) in 
>> the Avro RPC Clients and the Avro Sink.
>>
>> Something like this:
>>
>> · 2012-12-18 15:43:31,828
>> [SinkRunner-PollingRunner-ExceptionCatchingSinkProcessor] WARN 
>> org.apache.flume.sink.AvroSink - Failed to send event batch
>> org.apache.flume.EventDeliveryException: NettyAvroRpcClient { host: 
>> ***,
>> port: *** }: Failed to send batch
>> at
>> org.apache.flume.api.NettyAvroRpcClient.appendBatch(NettyAvroRpcClien
>> t.java:236)
>> ~[flume-ng-sdk-1.3.0.jar:1.3.0]
>> ***
>> at
>> org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:147)
>> [flume-ng-core-1.3.0.jar:1.3.0]
>> at java.lang.Thread.run(Thread.java:662) [na:1.6.0_31] Caused by: 
>> org.apache.flume.EventDeliveryException: NettyAvroRpcClient {
>> host: ***, port: *** }: Handshake timed out after 20000ms at
>> org.apache.flume.api.NettyAvroRpcClient.appendBatch(NettyAvroRpcClien
>> t.java:280)
>> ~[flume-ng-sdk-1.3.0.jar:1.3.0]
>> at
>> org.apache.flume.api.NettyAvroRpcClient.appendBatch(NettyAvroRpcClien
>> t.java:224)
>> ~[flume-ng-sdk-1.3.0.jar:1.3.0]
>> ... 5 common frames omitted
>> Caused by: java.util.concurrent.TimeoutException: null at
>> java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:228)
>> ~[na:1.6.0_31]
>> at java.util.concurrent.FutureTask.get(FutureTask.java:91)
>> ~[na:1.6.0_31]
>> at
>> org.apache.flume.api.NettyAvroRpcClient.appendBatch(NettyAvroRpcClien
>> t.java:278)
>> ~[flume-ng-sdk-1.3.0.jar:1.3.0]
>> ... 6 common frames omitted
>>
>> (I had to remove some details, sorry for that.)
>>
>>
>>
>> We managed to narrow down the root cause of the issue to the File 
>> Channel,
>> because:
>>
>> · Everything works fine if we switch to the Memory Channel or to the 
>> Old File Channel (1.2.0).
>>
>> · Everything works fine if we use local directories.
>>
>> We’ve tested this on multiple different PCs (both Windows and Linux).
>>
>>
>>
>> I spent the day debugging and profiling, but I could not find 
>> anything worth mentioning (nothing with excessive CPU usage, no 
>> threads are waiting too much, etc…). The only problem is that File 
>> Channel takes and puts take way more time than with the previous version.
>>
>>
>>
>>
>>
>> Could someone please try the File Channel on an NFS share?
>>
>> Does anyone have similar issues?
>>
>>
>>
>> Thank you for your help.
>>
>>
>>
>> Regards,
>>
>> Rudolf
>>
>>
>>
>> Rudolf Rakos
>> Morgan Stanley | ISG Technology
>> Lechner Odon fasor 8 | Floor 06
>> Budapest, 1095
>> Phone: +36 1 881-4011
>> Rudolf.Rakos@morganstanley.com
>>
>>
>> Be carbon conscious. Please consider our environment before printing 
>> this email.
>>
>>
>>
>>
>> ________________________________
>>
>> NOTICE: Morgan Stanley is not acting as a municipal advisor and the 
>> opinions or views contained herein are not intended to be, and do not 
>> constitute, advice within the meaning of Section 975 of the 
>> Dodd-Frank Wall Street Reform and Consumer Protection Act. If you 
>> have received this communication in error, please destroy all 
>> electronic and paper copies and notify the sender immediately. 
>> Mistransmission is not intended to waive confidentiality or 
>> privilege. Morgan Stanley reserves the right, to the extent permitted 
>> under applicable law, to monitor electronic communications. This message is subject to terms available at the following link:
>> http://www.morganstanley.com/disclaimers If you cannot access these 
>> links, please notify us by reply message and we will send the 
>> contents to you. By messaging with Morgan Stanley you consent to the foregoing.
>>
>>
>>
>>
>> --
>> Apache MRUnit - Unit testing MapReduce - 
>> http://incubator.apache.org/mrunit/
>>
>>
>>
>>
>> --
>> Apache MRUnit - Unit testing MapReduce - 
>> http://incubator.apache.org/mrunit/
>>
>>
>
>
>
> --
> Apache MRUnit - Unit testing MapReduce - 
> http://incubator.apache.org/mrunit/



--
Apache MRUnit - Unit testing MapReduce - http://incubator.apache.org/mrunit/


--------------------------------------------------------------------------------

NOTICE: Morgan Stanley is not acting as a municipal advisor and the opinions or views contained herein are not intended to be, and do not constitute, advice within the meaning of Section 975 of the Dodd-Frank Wall Street Reform and Consumer Protection Act. If you have received this communication in error, please destroy all electronic and paper copies and notify the sender immediately. Mistransmission is not intended to waive confidentiality or privilege. Morgan Stanley reserves the right, to the extent permitted under applicable law, to monitor electronic communications. This message is subject to terms available at the following link: http://www.morganstanley.com/disclaimers. If you cannot access these links, please notify us by reply message and we will send the contents to you. By messaging with Morgan Stanley you consent to the foregoing.

Re: Flume 1.3.0 - NFS + File Channel Performance

Posted by Brock Noland <br...@cloudera.com>.

Hi,

If you do have a chance, it would great to hear if the patch attached
to this JIRA (https://issues.apache.org/jira/browse/FLUME-1794) fixes
the performance problem.

Brock

On Tue, Dec 18, 2012 at 11:25 AM, Brock Noland <br...@cloudera.com> wrote:
> Yeah I think we should do that check in the background and then update
> a flag. This how hdfs and mapred do it.
>
> On Tue, Dec 18, 2012 at 11:04 AM, Hari Shreedharan
> <hs...@cloudera.com> wrote:
>> Yep. The disk space calls require an NFS call for each write, and that slows
>> things down a lot.
>>
>> --
>> Hari Shreedharan
>>
>> On Tuesday, December 18, 2012 at 8:43 AM, Brock Noland wrote:
>>
>> We'd need those thread dumps to help confirm but I bet that FLUME-1609
>> results in a NFS call on each operation on the channel.
>>
>> If that is true, that would explain why it works well on local disk.
>>
>> Brock
>>
>> On Tue, Dec 18, 2012 at 10:17 AM, Brock Noland <br...@cloudera.com> wrote:
>>
>> Hi,
>>
>> Hmm, yes in general performance is not going to be great over NFS, but
>> there haven't been any FC changes that stick out here.
>>
>> Could you take 10 thread dumps of the agent running the file channel
>> and 10 thread dumps of the agent sending data to the agent with the
>> file channel? (You can address them to myself directly since the list
>> won't take attachements.)
>>
>> Are there any patterns, like it works for 40 seconds then times out
>> and then works for 39 seconds, etc?
>>
>> Brock
>>
>> On Tue, Dec 18, 2012 at 10:07 AM, Rakos, Rudolf
>> <Ru...@morganstanley.com> wrote:
>>
>> Hi,
>>
>>
>>
>> We’ve run into a strange problem regarding NFS and File Channel performance
>> while evaluating the new version of Flume.
>>
>> We had no issues with the previous version (1.2.0).
>>
>>
>>
>> Our configuration looks like this:
>>
>> · Node1:
>> (Avro RPC Clients ->) Avro Source and Custom Sources -> File Channel -> Avro
>> Sink (-> Node 2)
>>
>> · Node2:
>> (Node1s ->) Avro Source -> File Channel -> Custom Sink
>>
>>
>>
>> Both the checkpoint and the data directories of the File Channels are on NFS
>> shares. We use the same share for checkpoint and data directories, but
>> different shares for each Node. Unfortunately it is not an option for us to
>> use local directories.
>>
>> The events are about 1KB large, and the batch sizes are the following:
>>
>> · Avro RPC Clients: 1000
>>
>> · Custom Sources: 2000
>>
>> · Avro Sink: 5000
>>
>> · Custom Sink: 10000
>>
>>
>>
>> We are experiencing very slow File Channel performance compared to the
>> previous version, and high amount of timeouts (almost always) in the Avro
>> RPC Clients and the Avro Sink.
>>
>> Something like this:
>>
>> · 2012-12-18 15:43:31,828
>> [SinkRunner-PollingRunner-ExceptionCatchingSinkProcessor] WARN
>> org.apache.flume.sink.AvroSink - Failed to send event batch
>> org.apache.flume.EventDeliveryException: NettyAvroRpcClient { host: ***,
>> port: *** }: Failed to send batch
>> at
>> org.apache.flume.api.NettyAvroRpcClient.appendBatch(NettyAvroRpcClient.java:236)
>> ~[flume-ng-sdk-1.3.0.jar:1.3.0]
>> ***
>> at
>> org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:147)
>> [flume-ng-core-1.3.0.jar:1.3.0]
>> at java.lang.Thread.run(Thread.java:662) [na:1.6.0_31]
>> Caused by: org.apache.flume.EventDeliveryException: NettyAvroRpcClient {
>> host: ***, port: *** }: Handshake timed out after 20000ms
>> at
>> org.apache.flume.api.NettyAvroRpcClient.appendBatch(NettyAvroRpcClient.java:280)
>> ~[flume-ng-sdk-1.3.0.jar:1.3.0]
>> at
>> org.apache.flume.api.NettyAvroRpcClient.appendBatch(NettyAvroRpcClient.java:224)
>> ~[flume-ng-sdk-1.3.0.jar:1.3.0]
>> ... 5 common frames omitted
>> Caused by: java.util.concurrent.TimeoutException: null
>> at
>> java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:228)
>> ~[na:1.6.0_31]
>> at java.util.concurrent.FutureTask.get(FutureTask.java:91)
>> ~[na:1.6.0_31]
>> at
>> org.apache.flume.api.NettyAvroRpcClient.appendBatch(NettyAvroRpcClient.java:278)
>> ~[flume-ng-sdk-1.3.0.jar:1.3.0]
>> ... 6 common frames omitted
>>
>> (I had to remove some details, sorry for that.)
>>
>>
>>
>> We managed to narrow down the root cause of the issue to the File Channel,
>> because:
>>
>> · Everything works fine if we switch to the Memory Channel or to the
>> Old File Channel (1.2.0).
>>
>> · Everything works fine if we use local directories.
>>
>> We’ve tested this on multiple different PCs (both Windows and Linux).
>>
>>
>>
>> I spent the day debugging and profiling, but I could not find anything worth
>> mentioning (nothing with excessive CPU usage, no threads are waiting too
>> much, etc…). The only problem is that File Channel takes and puts take way
>> more time than with the previous version.
>>
>>
>>
>>
>>
>> Could someone please try the File Channel on an NFS share?
>>
>> Does anyone have similar issues?
>>
>>
>>
>> Thank you for your help.
>>
>>
>>
>> Regards,
>>
>> Rudolf
>>
>>
>>
>> Rudolf Rakos
>> Morgan Stanley | ISG Technology
>> Lechner Odon fasor 8 | Floor 06
>> Budapest, 1095
>> Phone: +36 1 881-4011
>> Rudolf.Rakos@morganstanley.com
>>
>>
>> Be carbon conscious. Please consider our environment before printing this
>> email.
>>
>>
>>
>>
>> ________________________________
>>
>> NOTICE: Morgan Stanley is not acting as a municipal advisor and the opinions
>> or views contained herein are not intended to be, and do not constitute,
>> advice within the meaning of Section 975 of the Dodd-Frank Wall Street
>> Reform and Consumer Protection Act. If you have received this communication
>> in error, please destroy all electronic and paper copies and notify the
>> sender immediately. Mistransmission is not intended to waive confidentiality
>> or privilege. Morgan Stanley reserves the right, to the extent permitted
>> under applicable law, to monitor electronic communications. This message is
>> subject to terms available at the following link:
>> http://www.morganstanley.com/disclaimers If you cannot access these links,
>> please notify us by reply message and we will send the contents to you. By
>> messaging with Morgan Stanley you consent to the foregoing.
>>
>>
>>
>>
>> --
>> Apache MRUnit - Unit testing MapReduce - http://incubator.apache.org/mrunit/
>>
>>
>>
>>
>> --
>> Apache MRUnit - Unit testing MapReduce - http://incubator.apache.org/mrunit/
>>
>>
>
>
>
> --
> Apache MRUnit - Unit testing MapReduce - http://incubator.apache.org/mrunit/



-- 
Apache MRUnit - Unit testing MapReduce - http://incubator.apache.org/mrunit/

Re: Flume 1.3.0 - NFS + File Channel Performance

Posted by Brock Noland <br...@cloudera.com>.

Yeah I think we should do that check in the background and then update
a flag. This how hdfs and mapred do it.

On Tue, Dec 18, 2012 at 11:04 AM, Hari Shreedharan
<hs...@cloudera.com> wrote:
> Yep. The disk space calls require an NFS call for each write, and that slows
> things down a lot.
>
> --
> Hari Shreedharan
>
> On Tuesday, December 18, 2012 at 8:43 AM, Brock Noland wrote:
>
> We'd need those thread dumps to help confirm but I bet that FLUME-1609
> results in a NFS call on each operation on the channel.
>
> If that is true, that would explain why it works well on local disk.
>
> Brock
>
> On Tue, Dec 18, 2012 at 10:17 AM, Brock Noland <br...@cloudera.com> wrote:
>
> Hi,
>
> Hmm, yes in general performance is not going to be great over NFS, but
> there haven't been any FC changes that stick out here.
>
> Could you take 10 thread dumps of the agent running the file channel
> and 10 thread dumps of the agent sending data to the agent with the
> file channel? (You can address them to myself directly since the list
> won't take attachements.)
>
> Are there any patterns, like it works for 40 seconds then times out
> and then works for 39 seconds, etc?
>
> Brock
>
> On Tue, Dec 18, 2012 at 10:07 AM, Rakos, Rudolf
> <Ru...@morganstanley.com> wrote:
>
> Hi,
>
>
>
> We’ve run into a strange problem regarding NFS and File Channel performance
> while evaluating the new version of Flume.
>
> We had no issues with the previous version (1.2.0).
>
>
>
> Our configuration looks like this:
>
> · Node1:
> (Avro RPC Clients ->) Avro Source and Custom Sources -> File Channel -> Avro
> Sink (-> Node 2)
>
> · Node2:
> (Node1s ->) Avro Source -> File Channel -> Custom Sink
>
>
>
> Both the checkpoint and the data directories of the File Channels are on NFS
> shares. We use the same share for checkpoint and data directories, but
> different shares for each Node. Unfortunately it is not an option for us to
> use local directories.
>
> The events are about 1KB large, and the batch sizes are the following:
>
> · Avro RPC Clients: 1000
>
> · Custom Sources: 2000
>
> · Avro Sink: 5000
>
> · Custom Sink: 10000
>
>
>
> We are experiencing very slow File Channel performance compared to the
> previous version, and high amount of timeouts (almost always) in the Avro
> RPC Clients and the Avro Sink.
>
> Something like this:
>
> · 2012-12-18 15:43:31,828
> [SinkRunner-PollingRunner-ExceptionCatchingSinkProcessor] WARN
> org.apache.flume.sink.AvroSink - Failed to send event batch
> org.apache.flume.EventDeliveryException: NettyAvroRpcClient { host: ***,
> port: *** }: Failed to send batch
> at
> org.apache.flume.api.NettyAvroRpcClient.appendBatch(NettyAvroRpcClient.java:236)
> ~[flume-ng-sdk-1.3.0.jar:1.3.0]
> ***
> at
> org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:147)
> [flume-ng-core-1.3.0.jar:1.3.0]
> at java.lang.Thread.run(Thread.java:662) [na:1.6.0_31]
> Caused by: org.apache.flume.EventDeliveryException: NettyAvroRpcClient {
> host: ***, port: *** }: Handshake timed out after 20000ms
> at
> org.apache.flume.api.NettyAvroRpcClient.appendBatch(NettyAvroRpcClient.java:280)
> ~[flume-ng-sdk-1.3.0.jar:1.3.0]
> at
> org.apache.flume.api.NettyAvroRpcClient.appendBatch(NettyAvroRpcClient.java:224)
> ~[flume-ng-sdk-1.3.0.jar:1.3.0]
> ... 5 common frames omitted
> Caused by: java.util.concurrent.TimeoutException: null
> at
> java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:228)
> ~[na:1.6.0_31]
> at java.util.concurrent.FutureTask.get(FutureTask.java:91)
> ~[na:1.6.0_31]
> at
> org.apache.flume.api.NettyAvroRpcClient.appendBatch(NettyAvroRpcClient.java:278)
> ~[flume-ng-sdk-1.3.0.jar:1.3.0]
> ... 6 common frames omitted
>
> (I had to remove some details, sorry for that.)
>
>
>
> We managed to narrow down the root cause of the issue to the File Channel,
> because:
>
> · Everything works fine if we switch to the Memory Channel or to the
> Old File Channel (1.2.0).
>
> · Everything works fine if we use local directories.
>
> We’ve tested this on multiple different PCs (both Windows and Linux).
>
>
>
> I spent the day debugging and profiling, but I could not find anything worth
> mentioning (nothing with excessive CPU usage, no threads are waiting too
> much, etc…). The only problem is that File Channel takes and puts take way
> more time than with the previous version.
>
>
>
>
>
> Could someone please try the File Channel on an NFS share?
>
> Does anyone have similar issues?
>
>
>
> Thank you for your help.
>
>
>
> Regards,
>
> Rudolf
>
>
>
> Rudolf Rakos
> Morgan Stanley | ISG Technology
> Lechner Odon fasor 8 | Floor 06
> Budapest, 1095
> Phone: +36 1 881-4011
> Rudolf.Rakos@morganstanley.com
>
>
> Be carbon conscious. Please consider our environment before printing this
> email.
>
>
>
>
> ________________________________
>
> NOTICE: Morgan Stanley is not acting as a municipal advisor and the opinions
> or views contained herein are not intended to be, and do not constitute,
> advice within the meaning of Section 975 of the Dodd-Frank Wall Street
> Reform and Consumer Protection Act. If you have received this communication
> in error, please destroy all electronic and paper copies and notify the
> sender immediately. Mistransmission is not intended to waive confidentiality
> or privilege. Morgan Stanley reserves the right, to the extent permitted
> under applicable law, to monitor electronic communications. This message is
> subject to terms available at the following link:
> http://www.morganstanley.com/disclaimers If you cannot access these links,
> please notify us by reply message and we will send the contents to you. By
> messaging with Morgan Stanley you consent to the foregoing.
>
>
>
>
> --
> Apache MRUnit - Unit testing MapReduce - http://incubator.apache.org/mrunit/
>
>
>
>
> --
> Apache MRUnit - Unit testing MapReduce - http://incubator.apache.org/mrunit/
>
>



-- 
Apache MRUnit - Unit testing MapReduce - http://incubator.apache.org/mrunit/

Re: Flume 1.3.0 - NFS + File Channel Performance

Posted by Hari Shreedharan <hs...@cloudera.com>.

Yep. The disk space calls require an NFS call for each write, and that slows things down a lot.  

--  
Hari Shreedharan


On Tuesday, December 18, 2012 at 8:43 AM, Brock Noland wrote:

> We'd need those thread dumps to help confirm but I bet that FLUME-1609
> results in a NFS call on each operation on the channel.
>  
> If that is true, that would explain why it works well on local disk.
>  
> Brock
>  
> On Tue, Dec 18, 2012 at 10:17 AM, Brock Noland <brock@cloudera.com (mailto:brock@cloudera.com)> wrote:
> > Hi,
> >  
> > Hmm, yes in general performance is not going to be great over NFS, but
> > there haven't been any FC changes that stick out here.
> >  
> > Could you take 10 thread dumps of the agent running the file channel
> > and 10 thread dumps of the agent sending data to the agent with the
> > file channel? (You can address them to myself directly since the list
> > won't take attachements.)
> >  
> > Are there any patterns, like it works for 40 seconds then times out
> > and then works for 39 seconds, etc?
> >  
> > Brock
> >  
> > On Tue, Dec 18, 2012 at 10:07 AM, Rakos, Rudolf
> > <Rudolf.Rakos@morganstanley.com (mailto:Rudolf.Rakos@morganstanley.com)> wrote:
> > > Hi,
> > >  
> > >  
> > >  
> > > We’ve run into a strange problem regarding NFS and File Channel performance
> > > while evaluating the new version of Flume.
> > >  
> > > We had no issues with the previous version (1.2.0).
> > >  
> > >  
> > >  
> > > Our configuration looks like this:
> > >  
> > > · Node1:
> > > (Avro RPC Clients ->) Avro Source and Custom Sources -> File Channel -> Avro
> > > Sink (-> Node 2)
> > >  
> > > · Node2:
> > > (Node1s ->) Avro Source -> File Channel -> Custom Sink
> > >  
> > >  
> > >  
> > > Both the checkpoint and the data directories of the File Channels are on NFS
> > > shares. We use the same share for checkpoint and data directories, but
> > > different shares for each Node. Unfortunately it is not an option for us to
> > > use local directories.
> > >  
> > > The events are about 1KB large, and the batch sizes are the following:
> > >  
> > > · Avro RPC Clients: 1000
> > >  
> > > · Custom Sources: 2000
> > >  
> > > · Avro Sink: 5000
> > >  
> > > · Custom Sink: 10000
> > >  
> > >  
> > >  
> > > We are experiencing very slow File Channel performance compared to the
> > > previous version, and high amount of timeouts (almost always) in the Avro
> > > RPC Clients and the Avro Sink.
> > >  
> > > Something like this:
> > >  
> > > · 2012-12-18 15:43:31,828
> > > [SinkRunner-PollingRunner-ExceptionCatchingSinkProcessor] WARN
> > > org.apache.flume.sink.AvroSink - Failed to send event batch
> > > org.apache.flume.EventDeliveryException: NettyAvroRpcClient { host: ***,
> > > port: *** }: Failed to send batch
> > > at
> > > org.apache.flume.api.NettyAvroRpcClient.appendBatch(NettyAvroRpcClient.java:236)
> > > ~[flume-ng-sdk-1.3.0.jar:1.3.0]
> > > ***
> > > at
> > > org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:147)
> > > [flume-ng-core-1.3.0.jar:1.3.0]
> > > at java.lang.Thread.run(Thread.java:662) [na:1.6.0_31]
> > > Caused by: org.apache.flume.EventDeliveryException: NettyAvroRpcClient {
> > > host: ***, port: *** }: Handshake timed out after 20000ms
> > > at
> > > org.apache.flume.api.NettyAvroRpcClient.appendBatch(NettyAvroRpcClient.java:280)
> > > ~[flume-ng-sdk-1.3.0.jar:1.3.0]
> > > at
> > > org.apache.flume.api.NettyAvroRpcClient.appendBatch(NettyAvroRpcClient.java:224)
> > > ~[flume-ng-sdk-1.3.0.jar:1.3.0]
> > > ... 5 common frames omitted
> > > Caused by: java.util.concurrent.TimeoutException: null
> > > at
> > > java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:228)
> > > ~[na:1.6.0_31]
> > > at java.util.concurrent.FutureTask.get(FutureTask.java:91)
> > > ~[na:1.6.0_31]
> > > at
> > > org.apache.flume.api.NettyAvroRpcClient.appendBatch(NettyAvroRpcClient.java:278)
> > > ~[flume-ng-sdk-1.3.0.jar:1.3.0]
> > > ... 6 common frames omitted
> > >  
> > > (I had to remove some details, sorry for that.)
> > >  
> > >  
> > >  
> > > We managed to narrow down the root cause of the issue to the File Channel,
> > > because:
> > >  
> > > · Everything works fine if we switch to the Memory Channel or to the
> > > Old File Channel (1.2.0).
> > >  
> > > · Everything works fine if we use local directories.
> > >  
> > > We’ve tested this on multiple different PCs (both Windows and Linux).
> > >  
> > >  
> > >  
> > > I spent the day debugging and profiling, but I could not find anything worth
> > > mentioning (nothing with excessive CPU usage, no threads are waiting too
> > > much, etc…). The only problem is that File Channel takes and puts take way
> > > more time than with the previous version.
> > >  
> > >  
> > >  
> > >  
> > >  
> > > Could someone please try the File Channel on an NFS share?
> > >  
> > > Does anyone have similar issues?
> > >  
> > >  
> > >  
> > > Thank you for your help.
> > >  
> > >  
> > >  
> > > Regards,
> > >  
> > > Rudolf
> > >  
> > >  
> > >  
> > > Rudolf Rakos
> > > Morgan Stanley | ISG Technology
> > > Lechner Odon fasor 8 | Floor 06
> > > Budapest, 1095
> > > Phone: +36 1 881-4011
> > > Rudolf.Rakos@morganstanley.com (mailto:Rudolf.Rakos@morganstanley.com)
> > >  
> > >  
> > > Be carbon conscious. Please consider our environment before printing this
> > > email.
> > >  
> > >  
> > >  
> > >  
> > > ________________________________
> > >  
> > > NOTICE: Morgan Stanley is not acting as a municipal advisor and the opinions
> > > or views contained herein are not intended to be, and do not constitute,
> > > advice within the meaning of Section 975 of the Dodd-Frank Wall Street
> > > Reform and Consumer Protection Act. If you have received this communication
> > > in error, please destroy all electronic and paper copies and notify the
> > > sender immediately. Mistransmission is not intended to waive confidentiality
> > > or privilege. Morgan Stanley reserves the right, to the extent permitted
> > > under applicable law, to monitor electronic communications. This message is
> > > subject to terms available at the following link:
> > > http://www.morganstanley.com/disclaimers If you cannot access these links,
> > > please notify us by reply message and we will send the contents to you. By
> > > messaging with Morgan Stanley you consent to the foregoing.
> > >  
> >  
> >  
> >  
> >  
> > --
> > Apache MRUnit - Unit testing MapReduce - http://incubator.apache.org/mrunit/
> >  
>  
>  
>  
>  
> --  
> Apache MRUnit - Unit testing MapReduce - http://incubator.apache.org/mrunit/
>  
>

Re: Flume 1.3.0 - NFS + File Channel Performance

Posted by Brock Noland <br...@cloudera.com>.

We'd need those thread dumps to help confirm but I bet that FLUME-1609
results in a NFS call on each operation on the channel.

If that is true, that would explain why it works well on local disk.

Brock

On Tue, Dec 18, 2012 at 10:17 AM, Brock Noland <br...@cloudera.com> wrote:
> Hi,
>
> Hmm, yes in general performance is not going to be great over NFS, but
> there haven't been any FC changes that stick out here.
>
> Could you take 10 thread dumps of the agent running the file channel
> and 10 thread dumps of the agent sending data to the agent with the
> file channel? (You can address them to myself directly since the list
> won't take attachements.)
>
> Are there any patterns, like it works for 40 seconds then times out
> and then works for 39 seconds, etc?
>
> Brock
>
> On Tue, Dec 18, 2012 at 10:07 AM, Rakos, Rudolf
> <Ru...@morganstanley.com> wrote:
>> Hi,
>>
>>
>>
>> We’ve run into a strange problem regarding NFS and File Channel performance
>> while evaluating the new version of Flume.
>>
>> We had no issues with the previous version (1.2.0).
>>
>>
>>
>> Our configuration looks like this:
>>
>> ·         Node1:
>> (Avro RPC Clients ->) Avro Source and Custom Sources -> File Channel -> Avro
>> Sink (-> Node 2)
>>
>> ·         Node2:
>> (Node1s ->) Avro Source -> File Channel -> Custom Sink
>>
>>
>>
>> Both the checkpoint and the data directories of the File Channels are on NFS
>> shares. We use the same share for checkpoint and data directories, but
>> different shares for each Node. Unfortunately it is not an option for us to
>> use local directories.
>>
>> The events are about 1KB large, and the batch sizes are the following:
>>
>> ·         Avro RPC Clients: 1000
>>
>> ·         Custom Sources: 2000
>>
>> ·         Avro Sink: 5000
>>
>> ·         Custom Sink: 10000
>>
>>
>>
>> We are experiencing very slow File Channel performance compared to the
>> previous version, and high amount of timeouts (almost always) in the Avro
>> RPC Clients and the Avro Sink.
>>
>> Something like this:
>>
>> ·         2012-12-18 15:43:31,828
>> [SinkRunner-PollingRunner-ExceptionCatchingSinkProcessor] WARN
>> org.apache.flume.sink.AvroSink - Failed to send event batch
>> org.apache.flume.EventDeliveryException: NettyAvroRpcClient { host: ***,
>> port: *** }: Failed to send batch
>>         at
>> org.apache.flume.api.NettyAvroRpcClient.appendBatch(NettyAvroRpcClient.java:236)
>> ~[flume-ng-sdk-1.3.0.jar:1.3.0]
>>         ***
>>         at
>> org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:147)
>> [flume-ng-core-1.3.0.jar:1.3.0]
>>         at java.lang.Thread.run(Thread.java:662) [na:1.6.0_31]
>> Caused by: org.apache.flume.EventDeliveryException: NettyAvroRpcClient {
>> host: ***, port: *** }: Handshake timed out after 20000ms
>>         at
>> org.apache.flume.api.NettyAvroRpcClient.appendBatch(NettyAvroRpcClient.java:280)
>> ~[flume-ng-sdk-1.3.0.jar:1.3.0]
>>         at
>> org.apache.flume.api.NettyAvroRpcClient.appendBatch(NettyAvroRpcClient.java:224)
>> ~[flume-ng-sdk-1.3.0.jar:1.3.0]
>>         ... 5 common frames omitted
>> Caused by: java.util.concurrent.TimeoutException: null
>>         at
>> java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:228)
>> ~[na:1.6.0_31]
>>         at java.util.concurrent.FutureTask.get(FutureTask.java:91)
>> ~[na:1.6.0_31]
>>         at
>> org.apache.flume.api.NettyAvroRpcClient.appendBatch(NettyAvroRpcClient.java:278)
>> ~[flume-ng-sdk-1.3.0.jar:1.3.0]
>>         ... 6 common frames omitted
>>
>> (I had to remove some details, sorry for that.)
>>
>>
>>
>> We managed to narrow down the root cause of the issue to the File Channel,
>> because:
>>
>> ·         Everything works fine if we switch to the Memory Channel or to the
>> Old File Channel (1.2.0).
>>
>> ·         Everything works fine if we use local directories.
>>
>> We’ve tested this on multiple different PCs (both Windows and Linux).
>>
>>
>>
>> I spent the day debugging and profiling, but I could not find anything worth
>> mentioning (nothing with excessive CPU usage, no threads are waiting too
>> much, etc…). The only problem is that File Channel takes and puts take way
>> more time than with the previous version.
>>
>>
>>
>>
>>
>> Could someone please try the File Channel on an NFS share?
>>
>> Does anyone have similar issues?
>>
>>
>>
>> Thank you for your help.
>>
>>
>>
>> Regards,
>>
>> Rudolf
>>
>>
>>
>> Rudolf Rakos
>> Morgan Stanley | ISG Technology
>> Lechner Odon fasor 8 | Floor 06
>> Budapest, 1095
>> Phone: +36 1 881-4011
>> Rudolf.Rakos@morganstanley.com
>>
>>
>> Be carbon conscious. Please consider our environment before printing this
>> email.
>>
>>
>>
>>
>> ________________________________
>>
>> NOTICE: Morgan Stanley is not acting as a municipal advisor and the opinions
>> or views contained herein are not intended to be, and do not constitute,
>> advice within the meaning of Section 975 of the Dodd-Frank Wall Street
>> Reform and Consumer Protection Act. If you have received this communication
>> in error, please destroy all electronic and paper copies and notify the
>> sender immediately. Mistransmission is not intended to waive confidentiality
>> or privilege. Morgan Stanley reserves the right, to the extent permitted
>> under applicable law, to monitor electronic communications. This message is
>> subject to terms available at the following link:
>> http://www.morganstanley.com/disclaimers If you cannot access these links,
>> please notify us by reply message and we will send the contents to you. By
>> messaging with Morgan Stanley you consent to the foregoing.
>
>
>
> --
> Apache MRUnit - Unit testing MapReduce - http://incubator.apache.org/mrunit/



-- 
Apache MRUnit - Unit testing MapReduce - http://incubator.apache.org/mrunit/

Re: Flume 1.3.0 - NFS + File Channel Performance

Posted by Brock Noland <br...@cloudera.com>.

Hi,

Hmm, yes in general performance is not going to be great over NFS, but
there haven't been any FC changes that stick out here.

Could you take 10 thread dumps of the agent running the file channel
and 10 thread dumps of the agent sending data to the agent with the
file channel? (You can address them to myself directly since the list
won't take attachements.)

Are there any patterns, like it works for 40 seconds then times out
and then works for 39 seconds, etc?

Brock

On Tue, Dec 18, 2012 at 10:07 AM, Rakos, Rudolf
<Ru...@morganstanley.com> wrote:
> Hi,
>
>
>
> We’ve run into a strange problem regarding NFS and File Channel performance
> while evaluating the new version of Flume.
>
> We had no issues with the previous version (1.2.0).
>
>
>
> Our configuration looks like this:
>
> ·         Node1:
> (Avro RPC Clients ->) Avro Source and Custom Sources -> File Channel -> Avro
> Sink (-> Node 2)
>
> ·         Node2:
> (Node1s ->) Avro Source -> File Channel -> Custom Sink
>
>
>
> Both the checkpoint and the data directories of the File Channels are on NFS
> shares. We use the same share for checkpoint and data directories, but
> different shares for each Node. Unfortunately it is not an option for us to
> use local directories.
>
> The events are about 1KB large, and the batch sizes are the following:
>
> ·         Avro RPC Clients: 1000
>
> ·         Custom Sources: 2000
>
> ·         Avro Sink: 5000
>
> ·         Custom Sink: 10000
>
>
>
> We are experiencing very slow File Channel performance compared to the
> previous version, and high amount of timeouts (almost always) in the Avro
> RPC Clients and the Avro Sink.
>
> Something like this:
>
> ·         2012-12-18 15:43:31,828
> [SinkRunner-PollingRunner-ExceptionCatchingSinkProcessor] WARN
> org.apache.flume.sink.AvroSink - Failed to send event batch
> org.apache.flume.EventDeliveryException: NettyAvroRpcClient { host: ***,
> port: *** }: Failed to send batch
>         at
> org.apache.flume.api.NettyAvroRpcClient.appendBatch(NettyAvroRpcClient.java:236)
> ~[flume-ng-sdk-1.3.0.jar:1.3.0]
>         ***
>         at
> org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:147)
> [flume-ng-core-1.3.0.jar:1.3.0]
>         at java.lang.Thread.run(Thread.java:662) [na:1.6.0_31]
> Caused by: org.apache.flume.EventDeliveryException: NettyAvroRpcClient {
> host: ***, port: *** }: Handshake timed out after 20000ms
>         at
> org.apache.flume.api.NettyAvroRpcClient.appendBatch(NettyAvroRpcClient.java:280)
> ~[flume-ng-sdk-1.3.0.jar:1.3.0]
>         at
> org.apache.flume.api.NettyAvroRpcClient.appendBatch(NettyAvroRpcClient.java:224)
> ~[flume-ng-sdk-1.3.0.jar:1.3.0]
>         ... 5 common frames omitted
> Caused by: java.util.concurrent.TimeoutException: null
>         at
> java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:228)
> ~[na:1.6.0_31]
>         at java.util.concurrent.FutureTask.get(FutureTask.java:91)
> ~[na:1.6.0_31]
>         at
> org.apache.flume.api.NettyAvroRpcClient.appendBatch(NettyAvroRpcClient.java:278)
> ~[flume-ng-sdk-1.3.0.jar:1.3.0]
>         ... 6 common frames omitted
>
> (I had to remove some details, sorry for that.)
>
>
>
> We managed to narrow down the root cause of the issue to the File Channel,
> because:
>
> ·         Everything works fine if we switch to the Memory Channel or to the
> Old File Channel (1.2.0).
>
> ·         Everything works fine if we use local directories.
>
> We’ve tested this on multiple different PCs (both Windows and Linux).
>
>
>
> I spent the day debugging and profiling, but I could not find anything worth
> mentioning (nothing with excessive CPU usage, no threads are waiting too
> much, etc…). The only problem is that File Channel takes and puts take way
> more time than with the previous version.
>
>
>
>
>
> Could someone please try the File Channel on an NFS share?
>
> Does anyone have similar issues?
>
>
>
> Thank you for your help.
>
>
>
> Regards,
>
> Rudolf
>
>
>
> Rudolf Rakos
> Morgan Stanley | ISG Technology
> Lechner Odon fasor 8 | Floor 06
> Budapest, 1095
> Phone: +36 1 881-4011
> Rudolf.Rakos@morganstanley.com
>
>
> Be carbon conscious. Please consider our environment before printing this
> email.
>
>
>
>
> ________________________________
>
> NOTICE: Morgan Stanley is not acting as a municipal advisor and the opinions
> or views contained herein are not intended to be, and do not constitute,
> advice within the meaning of Section 975 of the Dodd-Frank Wall Street
> Reform and Consumer Protection Act. If you have received this communication
> in error, please destroy all electronic and paper copies and notify the
> sender immediately. Mistransmission is not intended to waive confidentiality
> or privilege. Morgan Stanley reserves the right, to the extent permitted
> under applicable law, to monitor electronic communications. This message is
> subject to terms available at the following link:
> http://www.morganstanley.com/disclaimers If you cannot access these links,
> please notify us by reply message and we will send the contents to you. By
> messaging with Morgan Stanley you consent to the foregoing.



-- 
Apache MRUnit - Unit testing MapReduce - http://incubator.apache.org/mrunit/