You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@camel.apache.org by David Hoffer <dh...@gmail.com> on 2015/11/11 19:31:03 UTC

sftp endpoint is not as performant as expected

I have a spring configured (XML) Camel project where where are many routes
but the starting point of all the data is an sftp URI.  E.g.

<from uri="sftp://{{gp.camel.sg.username}}@
{{gp.camel.sg.host}}:{{gp.camel.sg.port}}/{{gp.camel.sg.path}}?password={{gp.camel.sg.password}}&amp;delete=true&amp;exclusiveReadLockStrategy=#nonBlockingSftpReadLockStrategy&amp;readLockCheckInterval=60000&amp;readLockTimeout=360000000&amp;filter=#sgHiFilter&amp;reconnectDelay=30000&amp;delay=60000"/>
<to
uri="file://{{gp.home}}/work/sg_decrypt/?tempFileName=${file:name}.partial"/>

There are several more subsequent routes that are all file based except the
very last one which calls a bean URI (web service call)

This mostly works fine but what is happening is that Camel is not pulling
the data from the SFTP server fast enough.  At times the SFTP server can
get a lot of files...say about 1000.  We want those to be consumed by this
process as soon as they are available on the SFTP server.  The sftp route
does check via the nonBlockingSftpReadLockStrategy bean that the file has
not changed since the last time it was identified (e.g. second poll of file
will return true for acquireExclusiveReadLock() if last modified date and
file size has not changed.  And the poll frequency is 60 seconds so we
expect the file to be on the SFTP server for 2-3 minutes then be sent to
the next route.

What is happening is that when the SFTP server receives a lot of files the
Camel route slows down and it only processes the files very slowly.  I'm
trying to understand why it would slow down.

Btw, we can use mget to quickly get all the files from the SFTP server and
put them in the same folder that the Camel route uses as its destination
and that works fine.  So why can't the sftp endpoint do the same thing?

Also, we don't have any thread pools configured for these routes except for
one where we have a muticast route configured for parallel processing.  So
all the rest should be using the default Camel thread pool.  I did read
online that the sftp input endpoint is single threaded so I'm wondering if
that has something to do with the problem?  But not sure how.  We do want
maximum concurrency with all the routes.

-Dave

Re: sftp endpoint is not as performant as expected

Posted by Claus Ibsen <cl...@gmail.com>.
I logged a ticket
https://issues.apache.org/jira/browse/CAMEL-9324

On Fri, Nov 13, 2015 at 10:20 PM, Preben.Asmussen <pr...@dr.dk> wrote:
> +1
>
>
>
> --
> View this message in context: http://camel.465427.n5.nabble.com/sftp-endpoint-is-not-as-performant-as-expected-tp5773654p5773879.html
> Sent from the Camel - Users mailing list archive at Nabble.com.



-- 
Claus Ibsen
-----------------
http://davsclaus.com @davsclaus
Camel in Action 2: https://www.manning.com/ibsen2

Re: sftp endpoint is not as performant as expected

Posted by "Preben.Asmussen" <pr...@dr.dk>.
+1



--
View this message in context: http://camel.465427.n5.nabble.com/sftp-endpoint-is-not-as-performant-as-expected-tp5773654p5773879.html
Sent from the Camel - Users mailing list archive at Nabble.com.

Re: sftp endpoint is not as performant as expected

Posted by Claus Ibsen <cl...@gmail.com>.
Hi

Yeah the current changed read-lock afair don't keep state between
polls. So we could look into one that does that, and do a full scan of
all the files, and then do a change detect on all files all together
and make up which ones hasn't changed. And that way can react faster
than currently.

Now that may require doing more file directory listings to gather all
those files and their timestamps / size to see which one has changed,
instead of monitoring a single file one by one. Also it may mean that
files can be processed out of order, if a file sort isn't must be
strictly followed.

On Thu, Nov 12, 2015 at 10:05 PM, David Hoffer <dh...@gmail.com> wrote:
> I'm rather new to Camel, I'm assuming that if the SFTP source gets 900
> files then the configured
> GenericFileExclusiveReadLockStrategy#acquireExclusiveReadLock will get
> called 900 times (once per file) every polling cycle.
>
> We too have a custom GenericFileExclusiveReadLockStrategy class that does
> not block.  What ours does is maintain a map/cache of of the
> lastModifiedTime and fileSize for each of the 900 files.  The call to
> acquireExclusiveReadLock() will return true as soon as neither of those
> values changed since the last poll cycle.
>
> We have our delay/poll cycle set to 60 seconds.  I'm not sure why such a
> large value but we are fine waiting 2 minutes to get each of the 900
> files...and by the time 2 minutes go by we will likely have more files in
> the SFTP source.  E.g. if we have a constant delay in receiving each file
> of 2-3 minutes that is more than fine.  The problem is that on each cycle
> it only processes 3-5 files instead of the 900 that should have returned
> true from acquireExclusiveReadLock()
>
> What seems to be happening is that Camel is not calling the
> acquireExclusiveReadLock() method for each of the 900 files every 60
> seconds, rather it slows down and either calls just a few of those 900
> files or none.
>
> Any ideas?
>
> -Dave
>
> Btw, I'm  using Camel 2.8.2
>
>
>
> On Thu, Nov 12, 2015 at 12:43 PM, pmmerritt <pm...@gmail.com> wrote:
>
>> We had a similar issue and traced it down to our usage of
>> SftpChangedExclusiveReadLockStrategy which has a default check interval of
>> 5
>> seconds which causes the polling to slow down waiting on files to finish
>> being written, so max you can do is about 1 per 5 seconds. You can change
>> the checkInterval so that it is faster, or we ended up writing our own read
>> lock strategy that did not block the polling thread
>>
>>
>>
>> --
>> View this message in context:
>> http://camel.465427.n5.nabble.com/sftp-endpoint-is-not-as-performant-as-expected-tp5773654p5773780.html
>> Sent from the Camel - Users mailing list archive at Nabble.com.
>>



-- 
Claus Ibsen
-----------------
http://davsclaus.com @davsclaus
Camel in Action 2: https://www.manning.com/ibsen2

Re: sftp endpoint is not as performant as expected

Posted by pmmerritt <pm...@gmail.com>.
The 5 second check interval was something that I believe was introduced after
2.8.2. I don't see anything from the uri that would cause it either. We do
explicitly set maxMessagesPerPoll however according to the documentation if
not set it should default to unlimited, so that likely isn't the issue
either but maybe worth a shot.



--
View this message in context: http://camel.465427.n5.nabble.com/sftp-endpoint-is-not-as-performant-as-expected-tp5773654p5773862.html
Sent from the Camel - Users mailing list archive at Nabble.com.

Re: sftp endpoint is not as performant as expected

Posted by David Hoffer <dh...@gmail.com>.
I'm rather new to Camel, I'm assuming that if the SFTP source gets 900
files then the configured
GenericFileExclusiveReadLockStrategy#acquireExclusiveReadLock will get
called 900 times (once per file) every polling cycle.

We too have a custom GenericFileExclusiveReadLockStrategy class that does
not block.  What ours does is maintain a map/cache of of the
lastModifiedTime and fileSize for each of the 900 files.  The call to
acquireExclusiveReadLock() will return true as soon as neither of those
values changed since the last poll cycle.

We have our delay/poll cycle set to 60 seconds.  I'm not sure why such a
large value but we are fine waiting 2 minutes to get each of the 900
files...and by the time 2 minutes go by we will likely have more files in
the SFTP source.  E.g. if we have a constant delay in receiving each file
of 2-3 minutes that is more than fine.  The problem is that on each cycle
it only processes 3-5 files instead of the 900 that should have returned
true from acquireExclusiveReadLock()

What seems to be happening is that Camel is not calling the
acquireExclusiveReadLock() method for each of the 900 files every 60
seconds, rather it slows down and either calls just a few of those 900
files or none.

Any ideas?

-Dave

Btw, I'm  using Camel 2.8.2



On Thu, Nov 12, 2015 at 12:43 PM, pmmerritt <pm...@gmail.com> wrote:

> We had a similar issue and traced it down to our usage of
> SftpChangedExclusiveReadLockStrategy which has a default check interval of
> 5
> seconds which causes the polling to slow down waiting on files to finish
> being written, so max you can do is about 1 per 5 seconds. You can change
> the checkInterval so that it is faster, or we ended up writing our own read
> lock strategy that did not block the polling thread
>
>
>
> --
> View this message in context:
> http://camel.465427.n5.nabble.com/sftp-endpoint-is-not-as-performant-as-expected-tp5773654p5773780.html
> Sent from the Camel - Users mailing list archive at Nabble.com.
>

Re: sftp endpoint is not as performant as expected

Posted by pmmerritt <pm...@gmail.com>.
We had a similar issue and traced it down to our usage of
SftpChangedExclusiveReadLockStrategy which has a default check interval of 5
seconds which causes the polling to slow down waiting on files to finish
being written, so max you can do is about 1 per 5 seconds. You can change
the checkInterval so that it is faster, or we ended up writing our own read
lock strategy that did not block the polling thread



--
View this message in context: http://camel.465427.n5.nabble.com/sftp-endpoint-is-not-as-performant-as-expected-tp5773654p5773780.html
Sent from the Camel - Users mailing list archive at Nabble.com.