You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@camel.apache.org by Tom Duncalf <to...@tomduncalf.com> on 2015/06/01 18:10:15 UTC

Parallel processing/multiple file consumer threads with readLock=changed and long timeout?

Hi,

I am building a Camel route to consume files uploaded by FTP and then upload
them elsewhere. It would seem that "changed" is the most suitable readLock
strategy for this, and I would like to set a fairly large
readLockCheckInterval and readLockTimeout (e.g. 20s and 40s) to help prevent
incomplete files from users with slow/intermittent connections being picked
up by the route prematurely.

What I would like to achieve is to make it so that the File consumer is not
blocked from picking up more files for the duration of the readLockTimeout -
that is, currently, if file1 is uploaded at t=0 and file2 uploaded at t=5,
then the consumer picks up file1 at t=0 and is then blocked until at least
t=20 (assuming a 20s check interval), so file2 will not be picked up until
at least t=20 (and processed until at least t=40). Instead, I would like
there to be (for example) more than one consumer thread, so that when the
first consumer thread picks up file1 and is waiting until t=20, another
consumer thread can still pick up file2 at t=5 and wait until t=25.

None of the concurrency options available seem to cover this scenario - I
understand that I can decouple the file being consumed and the subsequent
processing using, for example, a SEDA queue, or I can split the processing
into multiple threads after a file has been processed by the File consumer
using the Threads DSL, but I can't see how to make the consumer itself
consume files and therefore create messages in a multi-threaded manner.

One workaround may be to spawn a new process for each FTP user and have it
consume from their home directory, but I would prefer to avoid this
additional complexity and it would not gracefully handle a situation where
one user uploads a large volume of files. One other option I can think of is
to start multiple instances of my route somehow (to create multiple
consumers), and then synchronise them with a shared repository for the
inProgressRepository, but I'm not sure if that would actually work.

Any input would be great - even if it is basic as I am new to Camel!

Thanks,
Tom




--
View this message in context: http://camel.465427.n5.nabble.com/Parallel-processing-multiple-file-consumer-threads-with-readLock-changed-and-long-timeout-tp5767753.html
Sent from the Camel - Users mailing list archive at Nabble.com.

Re: Parallel processing/multiple file consumer threads with readLock=changed and long timeout?

Posted by Tom Duncalf <to...@tomduncalf.com>.
Thanks Claus - those are both interesting ideas, I'll give it some more
thought!

Thanks,
Tom

Tom Duncalf / Software Developer
tomduncalf.com <http://www.tomduncalf.com> / @tomduncalf
<http://twitter.com/tomduncalf>


On 2 June 2015 at 10:04, Claus Ibsen-2 [via Camel] <
ml-node+s465427n5767782h93@n5.nabble.com> wrote:

> Hi
>
> You could either try to implement your own read lock, or alternative
> try with a statefull file filter. Then in the filter you can return
> false for files that hasn't "waited long enough" and this allow to
> advance to the next file instead of blocking.
>
> Well in fact the read lock could also be stateful. And by stateful I
> mean you remember from previous time what was the last timestamp of
> the file, so you can figure out whether to pickup the file or not.
>
> We could consider adding support for stateful in the out of the box
> read lock but it would maybe require a few options to control the
> state cache so its size can be configured and so on.
>
>
> On Tue, Jun 2, 2015 at 10:55 AM, Tom Duncalf <[hidden email]
> <http:///user/SendEmail.jtp?type=node&node=5767782&i=0>> wrote:
>
> > Thanks Claus - unfortunately, there is a requirement to support existing
> > client workflows, which are usually not able to upload a "done" file
> > without additional development work on their end, and it seems that
> ProFTPD
> > is not able to do this automatically.
> >
> > Are you aware of any way to prevent the file consumer blocking while
> > waiting for the "changed" read lock or would another approach such as
> one
> > Camel process per user directory be the recommended approach in this
> case?
> >
> > Thanks,
> > Tom
> >
> > Tom Duncalf / Software Developer
> > tomduncalf.com <http://www.tomduncalf.com> / @tomduncalf
> > <http://twitter.com/tomduncalf>
> >
> >
> > On 2 June 2015 at 08:12, Claus Ibsen-2 [via Camel] <
> > [hidden email] <http:///user/SendEmail.jtp?type=node&node=5767782&i=1>>
> wrote:
> >
> >> Hi
> >>
> >> If possible then using done file names is IMHO a better strategy.
> >> Though that would require the other party to do this when it uploads
> >> to the FTP server.
> >>
> >>
> >>
> >> On Mon, Jun 1, 2015 at 6:10 PM, Tom Duncalf <[hidden email]
> >> <http:///user/SendEmail.jtp?type=node&node=5767770&i=0>> wrote:
> >>
> >> > Hi,
> >> >
> >> > I am building a Camel route to consume files uploaded by FTP and then
> >> upload
> >> > them elsewhere. It would seem that "changed" is the most suitable
> >> readLock
> >> > strategy for this, and I would like to set a fairly large
> >> > readLockCheckInterval and readLockTimeout (e.g. 20s and 40s) to help
> >> prevent
> >> > incomplete files from users with slow/intermittent connections being
> >> picked
> >> > up by the route prematurely.
> >> >
> >> > What I would like to achieve is to make it so that the File consumer
> is
> >> not
> >> > blocked from picking up more files for the duration of the
> >> readLockTimeout -
> >> > that is, currently, if file1 is uploaded at t=0 and file2 uploaded at
> >> t=5,
> >> > then the consumer picks up file1 at t=0 and is then blocked until at
> >> least
> >> > t=20 (assuming a 20s check interval), so file2 will not be picked up
> >> until
> >> > at least t=20 (and processed until at least t=40). Instead, I would
> like
> >> > there to be (for example) more than one consumer thread, so that when
> >> the
> >> > first consumer thread picks up file1 and is waiting until t=20,
> another
> >> > consumer thread can still pick up file2 at t=5 and wait until t=25.
> >> >
> >> > None of the concurrency options available seem to cover this scenario
> -
> >> I
> >> > understand that I can decouple the file being consumed and the
> >> subsequent
> >> > processing using, for example, a SEDA queue, or I can split the
> >> processing
> >> > into multiple threads after a file has been processed by the File
> >> consumer
> >> > using the Threads DSL, but I can't see how to make the consumer
> itself
> >> > consume files and therefore create messages in a multi-threaded
> manner.
> >> >
> >> > One workaround may be to spawn a new process for each FTP user and
> have
> >> it
> >> > consume from their home directory, but I would prefer to avoid this
> >> > additional complexity and it would not gracefully handle a situation
> >> where
> >> > one user uploads a large volume of files. One other option I can
> think
> >> of is
> >> > to start multiple instances of my route somehow (to create multiple
> >> > consumers), and then synchronise them with a shared repository for
> the
> >> > inProgressRepository, but I'm not sure if that would actually work.
> >> >
> >> > Any input would be great - even if it is basic as I am new to Camel!
> >> >
> >> > Thanks,
> >> > Tom
> >> >
> >> >
> >> >
> >> >
> >> > --
> >> > View this message in context:
> >>
> http://camel.465427.n5.nabble.com/Parallel-processing-multiple-file-consumer-threads-with-readLock-changed-and-long-timeout-tp5767753.html
> >> > Sent from the Camel - Users mailing list archive at Nabble.com.
> >>
> >>
> >>
> >> --
> >> Claus Ibsen
> >> -----------------
> >> Red Hat, Inc.
> >> Email: [hidden email]
> >> <http:///user/SendEmail.jtp?type=node&node=5767770&i=1>
> >> Twitter: davsclaus
> >> Blog: http://davsclaus.com
> >> Author of Camel in Action: http://www.manning.com/ibsen
> >> hawtio: http://hawt.io/
> >> fabric8: http://fabric8.io/
> >>
> >>
> >> ------------------------------
> >>  If you reply to this email, your message will be added to the
> discussion
> >> below:
> >>
> >>
> http://camel.465427.n5.nabble.com/Parallel-processing-multiple-file-consumer-threads-with-readLock-changed-and-long-timeout-tp5767753p5767770.html
> >>  To unsubscribe from Parallel processing/multiple file consumer threads
> >> with readLock=changed and long timeout?, click here
> >> <
> >> .
> >> NAML
> >> <
> http://camel.465427.n5.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>
>
> >>
> >
> >
> >
> >
> > --
> > View this message in context:
> http://camel.465427.n5.nabble.com/Parallel-processing-multiple-file-consumer-threads-with-readLock-changed-and-long-timeout-tp5767753p5767781.html
> > Sent from the Camel - Users mailing list archive at Nabble.com.
>
>
>
> --
> Claus Ibsen
> -----------------
> Red Hat, Inc.
> Email: [hidden email]
> <http:///user/SendEmail.jtp?type=node&node=5767782&i=2>
> Twitter: davsclaus
> Blog: http://davsclaus.com
> Author of Camel in Action: http://www.manning.com/ibsen
> hawtio: http://hawt.io/
> fabric8: http://fabric8.io/
>
>
> ------------------------------
>  If you reply to this email, your message will be added to the discussion
> below:
>
> http://camel.465427.n5.nabble.com/Parallel-processing-multiple-file-consumer-threads-with-readLock-changed-and-long-timeout-tp5767753p5767782.html
>  To unsubscribe from Parallel processing/multiple file consumer threads
> with readLock=changed and long timeout?, click here
> <http://camel.465427.n5.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=5767753&code=dG9tQHRvbWR1bmNhbGYuY29tfDU3Njc3NTN8MTAzMzMyNTM2>
> .
> NAML
> <http://camel.465427.n5.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>
>




--
View this message in context: http://camel.465427.n5.nabble.com/Parallel-processing-multiple-file-consumer-threads-with-readLock-changed-and-long-timeout-tp5767753p5767785.html
Sent from the Camel - Users mailing list archive at Nabble.com.

Re: Parallel processing/multiple file consumer threads with readLock=changed and long timeout?

Posted by Tom Duncalf <to...@tomduncalf.com>.
Hi Claus,

I have prototyped a solution to the problem of avoiding the File consumer
blocking while the "changed" read lock checks if the size has changed,
which works well in initial testing, but it would be great to get feedback
on whether this approach is sound or if there are potential problems with
it (e.g. due to Camel implementation details).

The solution is as follows:

1. I have decoupled the file consumer from the subsequent processing of the
message by splitting my route into two routes - one from the File consumer
directly to a SEDA queue, and the other from the SEDA queue to the
subsequent processing/upload of the file (with the possibility to have
multiple concurrent consumers for the queue).

2. I am then starting multiple instances of my File-to-Queue route with
(Scala) code like:

val threads = 10
(0 until threads).foreach { x => camelMain.addRouteBuilder(new
FileSystemToQueueRoute(x)) }

All of the File consumer "threads" share the same instance of
MemoryIdempotentRepository as their inProgress repository, to prevent more
than one consumer picking up a given file at a time.

The "hack" here is that Camel doesn't allow multiple consumers for the same
endpoint, so I am making each File consumers endpoint URI unique by giving
it a different initialDelay parameter, based on the thread number. My main
concern is that there is a good reason why Camel doesn't allow multiple
consumers for the same endpoint, and by subverting it like this, there
could be issues.

Any feedback on this approach would be greatly appreciated!

Thanks,
Tom

Tom Duncalf / Software Developer
tomduncalf.com <http://www.tomduncalf.com> / @tomduncalf
<http://twitter.com/tomduncalf>


On 2 June 2015 at 10:16, Tom Duncalf <to...@tomduncalf.com> wrote:

> Thanks Claus - those are both interesting ideas, I'll give it some more
> thought!
>
> Thanks,
> Tom
>
> Tom Duncalf / Software Developer
> tomduncalf.com <http://www.tomduncalf.com> / @tomduncalf
> <http://twitter.com/tomduncalf>
>
>
> On 2 June 2015 at 10:04, Claus Ibsen-2 [via Camel] <
> ml-node+s465427n5767782h93@n5.nabble.com> wrote:
>
>> Hi
>>
>> You could either try to implement your own read lock, or alternative
>> try with a statefull file filter. Then in the filter you can return
>> false for files that hasn't "waited long enough" and this allow to
>> advance to the next file instead of blocking.
>>
>> Well in fact the read lock could also be stateful. And by stateful I
>> mean you remember from previous time what was the last timestamp of
>> the file, so you can figure out whether to pickup the file or not.
>>
>> We could consider adding support for stateful in the out of the box
>> read lock but it would maybe require a few options to control the
>> state cache so its size can be configured and so on.
>>
>>
>> On Tue, Jun 2, 2015 at 10:55 AM, Tom Duncalf <[hidden email]
>> <http:///user/SendEmail.jtp?type=node&node=5767782&i=0>> wrote:
>>
>> > Thanks Claus - unfortunately, there is a requirement to support
>> existing
>> > client workflows, which are usually not able to upload a "done" file
>> > without additional development work on their end, and it seems that
>> ProFTPD
>> > is not able to do this automatically.
>> >
>> > Are you aware of any way to prevent the file consumer blocking while
>> > waiting for the "changed" read lock or would another approach such as
>> one
>> > Camel process per user directory be the recommended approach in this
>> case?
>> >
>> > Thanks,
>> > Tom
>> >
>> > Tom Duncalf / Software Developer
>> > tomduncalf.com <http://www.tomduncalf.com> / @tomduncalf
>> > <http://twitter.com/tomduncalf>
>> >
>> >
>> > On 2 June 2015 at 08:12, Claus Ibsen-2 [via Camel] <
>> > [hidden email] <http:///user/SendEmail.jtp?type=node&node=5767782&i=1>>
>> wrote:
>> >
>> >> Hi
>> >>
>> >> If possible then using done file names is IMHO a better strategy.
>> >> Though that would require the other party to do this when it uploads
>> >> to the FTP server.
>> >>
>> >>
>> >>
>> >> On Mon, Jun 1, 2015 at 6:10 PM, Tom Duncalf <[hidden email]
>> >> <http:///user/SendEmail.jtp?type=node&node=5767770&i=0>> wrote:
>> >>
>> >> > Hi,
>> >> >
>> >> > I am building a Camel route to consume files uploaded by FTP and
>> then
>> >> upload
>> >> > them elsewhere. It would seem that "changed" is the most suitable
>> >> readLock
>> >> > strategy for this, and I would like to set a fairly large
>> >> > readLockCheckInterval and readLockTimeout (e.g. 20s and 40s) to help
>> >> prevent
>> >> > incomplete files from users with slow/intermittent connections being
>> >> picked
>> >> > up by the route prematurely.
>> >> >
>> >> > What I would like to achieve is to make it so that the File consumer
>> is
>> >> not
>> >> > blocked from picking up more files for the duration of the
>> >> readLockTimeout -
>> >> > that is, currently, if file1 is uploaded at t=0 and file2 uploaded
>> at
>> >> t=5,
>> >> > then the consumer picks up file1 at t=0 and is then blocked until at
>> >> least
>> >> > t=20 (assuming a 20s check interval), so file2 will not be picked up
>> >> until
>> >> > at least t=20 (and processed until at least t=40). Instead, I would
>> like
>> >> > there to be (for example) more than one consumer thread, so that
>> when
>> >> the
>> >> > first consumer thread picks up file1 and is waiting until t=20,
>> another
>> >> > consumer thread can still pick up file2 at t=5 and wait until t=25.
>> >> >
>> >> > None of the concurrency options available seem to cover this
>> scenario -
>> >> I
>> >> > understand that I can decouple the file being consumed and the
>> >> subsequent
>> >> > processing using, for example, a SEDA queue, or I can split the
>> >> processing
>> >> > into multiple threads after a file has been processed by the File
>> >> consumer
>> >> > using the Threads DSL, but I can't see how to make the consumer
>> itself
>> >> > consume files and therefore create messages in a multi-threaded
>> manner.
>> >> >
>> >> > One workaround may be to spawn a new process for each FTP user and
>> have
>> >> it
>> >> > consume from their home directory, but I would prefer to avoid this
>> >> > additional complexity and it would not gracefully handle a situation
>> >> where
>> >> > one user uploads a large volume of files. One other option I can
>> think
>> >> of is
>> >> > to start multiple instances of my route somehow (to create multiple
>> >> > consumers), and then synchronise them with a shared repository for
>> the
>> >> > inProgressRepository, but I'm not sure if that would actually work.
>> >> >
>> >> > Any input would be great - even if it is basic as I am new to Camel!
>> >> >
>> >> > Thanks,
>> >> > Tom
>> >> >
>> >> >
>> >> >
>> >> >
>> >> > --
>> >> > View this message in context:
>> >>
>> http://camel.465427.n5.nabble.com/Parallel-processing-multiple-file-consumer-threads-with-readLock-changed-and-long-timeout-tp5767753.html
>> >> > Sent from the Camel - Users mailing list archive at Nabble.com.
>> >>
>> >>
>> >>
>> >> --
>> >> Claus Ibsen
>> >> -----------------
>> >> Red Hat, Inc.
>> >> Email: [hidden email]
>> >> <http:///user/SendEmail.jtp?type=node&node=5767770&i=1>
>> >> Twitter: davsclaus
>> >> Blog: http://davsclaus.com
>> >> Author of Camel in Action: http://www.manning.com/ibsen
>> >> hawtio: http://hawt.io/
>> >> fabric8: http://fabric8.io/
>> >>
>> >>
>> >> ------------------------------
>> >>  If you reply to this email, your message will be added to the
>> discussion
>> >> below:
>> >>
>> >>
>> http://camel.465427.n5.nabble.com/Parallel-processing-multiple-file-consumer-threads-with-readLock-changed-and-long-timeout-tp5767753p5767770.html
>> >>  To unsubscribe from Parallel processing/multiple file consumer
>> threads
>> >> with readLock=changed and long timeout?, click here
>> >> <
>> >> .
>> >> NAML
>> >> <
>> http://camel.465427.n5.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>
>>
>> >>
>> >
>> >
>> >
>> >
>> > --
>> > View this message in context:
>> http://camel.465427.n5.nabble.com/Parallel-processing-multiple-file-consumer-threads-with-readLock-changed-and-long-timeout-tp5767753p5767781.html
>> > Sent from the Camel - Users mailing list archive at Nabble.com.
>>
>>
>>
>> --
>> Claus Ibsen
>> -----------------
>> Red Hat, Inc.
>> Email: [hidden email]
>> <http:///user/SendEmail.jtp?type=node&node=5767782&i=2>
>> Twitter: davsclaus
>> Blog: http://davsclaus.com
>> Author of Camel in Action: http://www.manning.com/ibsen
>> hawtio: http://hawt.io/
>> fabric8: http://fabric8.io/
>>
>>
>> ------------------------------
>>  If you reply to this email, your message will be added to the
>> discussion below:
>>
>> http://camel.465427.n5.nabble.com/Parallel-processing-multiple-file-consumer-threads-with-readLock-changed-and-long-timeout-tp5767753p5767782.html
>>  To unsubscribe from Parallel processing/multiple file consumer threads
>> with readLock=changed and long timeout?, click here
>> <http://camel.465427.n5.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=5767753&code=dG9tQHRvbWR1bmNhbGYuY29tfDU3Njc3NTN8MTAzMzMyNTM2>
>> .
>> NAML
>> <http://camel.465427.n5.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>
>>
>
>




--
View this message in context: http://camel.465427.n5.nabble.com/Parallel-processing-multiple-file-consumer-threads-with-readLock-changed-and-long-timeout-tp5767753p5767831.html
Sent from the Camel - Users mailing list archive at Nabble.com.

Re: Parallel processing/multiple file consumer threads with readLock=changed and long timeout?

Posted by Claus Ibsen <cl...@gmail.com>.
Hi

You could either try to implement your own read lock, or alternative
try with a statefull file filter. Then in the filter you can return
false for files that hasn't "waited long enough" and this allow to
advance to the next file instead of blocking.

Well in fact the read lock could also be stateful. And by stateful I
mean you remember from previous time what was the last timestamp of
the file, so you can figure out whether to pickup the file or not.

We could consider adding support for stateful in the out of the box
read lock but it would maybe require a few options to control the
state cache so its size can be configured and so on.


On Tue, Jun 2, 2015 at 10:55 AM, Tom Duncalf <to...@tomduncalf.com> wrote:
> Thanks Claus - unfortunately, there is a requirement to support existing
> client workflows, which are usually not able to upload a "done" file
> without additional development work on their end, and it seems that ProFTPD
> is not able to do this automatically.
>
> Are you aware of any way to prevent the file consumer blocking while
> waiting for the "changed" read lock or would another approach such as one
> Camel process per user directory be the recommended approach in this case?
>
> Thanks,
> Tom
>
> Tom Duncalf / Software Developer
> tomduncalf.com <http://www.tomduncalf.com> / @tomduncalf
> <http://twitter.com/tomduncalf>
>
>
> On 2 June 2015 at 08:12, Claus Ibsen-2 [via Camel] <
> ml-node+s465427n5767770h21@n5.nabble.com> wrote:
>
>> Hi
>>
>> If possible then using done file names is IMHO a better strategy.
>> Though that would require the other party to do this when it uploads
>> to the FTP server.
>>
>>
>>
>> On Mon, Jun 1, 2015 at 6:10 PM, Tom Duncalf <[hidden email]
>> <http:///user/SendEmail.jtp?type=node&node=5767770&i=0>> wrote:
>>
>> > Hi,
>> >
>> > I am building a Camel route to consume files uploaded by FTP and then
>> upload
>> > them elsewhere. It would seem that "changed" is the most suitable
>> readLock
>> > strategy for this, and I would like to set a fairly large
>> > readLockCheckInterval and readLockTimeout (e.g. 20s and 40s) to help
>> prevent
>> > incomplete files from users with slow/intermittent connections being
>> picked
>> > up by the route prematurely.
>> >
>> > What I would like to achieve is to make it so that the File consumer is
>> not
>> > blocked from picking up more files for the duration of the
>> readLockTimeout -
>> > that is, currently, if file1 is uploaded at t=0 and file2 uploaded at
>> t=5,
>> > then the consumer picks up file1 at t=0 and is then blocked until at
>> least
>> > t=20 (assuming a 20s check interval), so file2 will not be picked up
>> until
>> > at least t=20 (and processed until at least t=40). Instead, I would like
>> > there to be (for example) more than one consumer thread, so that when
>> the
>> > first consumer thread picks up file1 and is waiting until t=20, another
>> > consumer thread can still pick up file2 at t=5 and wait until t=25.
>> >
>> > None of the concurrency options available seem to cover this scenario -
>> I
>> > understand that I can decouple the file being consumed and the
>> subsequent
>> > processing using, for example, a SEDA queue, or I can split the
>> processing
>> > into multiple threads after a file has been processed by the File
>> consumer
>> > using the Threads DSL, but I can't see how to make the consumer itself
>> > consume files and therefore create messages in a multi-threaded manner.
>> >
>> > One workaround may be to spawn a new process for each FTP user and have
>> it
>> > consume from their home directory, but I would prefer to avoid this
>> > additional complexity and it would not gracefully handle a situation
>> where
>> > one user uploads a large volume of files. One other option I can think
>> of is
>> > to start multiple instances of my route somehow (to create multiple
>> > consumers), and then synchronise them with a shared repository for the
>> > inProgressRepository, but I'm not sure if that would actually work.
>> >
>> > Any input would be great - even if it is basic as I am new to Camel!
>> >
>> > Thanks,
>> > Tom
>> >
>> >
>> >
>> >
>> > --
>> > View this message in context:
>> http://camel.465427.n5.nabble.com/Parallel-processing-multiple-file-consumer-threads-with-readLock-changed-and-long-timeout-tp5767753.html
>> > Sent from the Camel - Users mailing list archive at Nabble.com.
>>
>>
>>
>> --
>> Claus Ibsen
>> -----------------
>> Red Hat, Inc.
>> Email: [hidden email]
>> <http:///user/SendEmail.jtp?type=node&node=5767770&i=1>
>> Twitter: davsclaus
>> Blog: http://davsclaus.com
>> Author of Camel in Action: http://www.manning.com/ibsen
>> hawtio: http://hawt.io/
>> fabric8: http://fabric8.io/
>>
>>
>> ------------------------------
>>  If you reply to this email, your message will be added to the discussion
>> below:
>>
>> http://camel.465427.n5.nabble.com/Parallel-processing-multiple-file-consumer-threads-with-readLock-changed-and-long-timeout-tp5767753p5767770.html
>>  To unsubscribe from Parallel processing/multiple file consumer threads
>> with readLock=changed and long timeout?, click here
>> <http://camel.465427.n5.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=5767753&code=dG9tQHRvbWR1bmNhbGYuY29tfDU3Njc3NTN8MTAzMzMyNTM2>
>> .
>> NAML
>> <http://camel.465427.n5.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>
>>
>
>
>
>
> --
> View this message in context: http://camel.465427.n5.nabble.com/Parallel-processing-multiple-file-consumer-threads-with-readLock-changed-and-long-timeout-tp5767753p5767781.html
> Sent from the Camel - Users mailing list archive at Nabble.com.



-- 
Claus Ibsen
-----------------
Red Hat, Inc.
Email: cibsen@redhat.com
Twitter: davsclaus
Blog: http://davsclaus.com
Author of Camel in Action: http://www.manning.com/ibsen
hawtio: http://hawt.io/
fabric8: http://fabric8.io/

Re: Parallel processing/multiple file consumer threads with readLock=changed and long timeout?

Posted by Tom Duncalf <to...@tomduncalf.com>.
Thanks Claus - unfortunately, there is a requirement to support existing
client workflows, which are usually not able to upload a "done" file
without additional development work on their end, and it seems that ProFTPD
is not able to do this automatically.

Are you aware of any way to prevent the file consumer blocking while
waiting for the "changed" read lock or would another approach such as one
Camel process per user directory be the recommended approach in this case?

Thanks,
Tom

Tom Duncalf / Software Developer
tomduncalf.com <http://www.tomduncalf.com> / @tomduncalf
<http://twitter.com/tomduncalf>


On 2 June 2015 at 08:12, Claus Ibsen-2 [via Camel] <
ml-node+s465427n5767770h21@n5.nabble.com> wrote:

> Hi
>
> If possible then using done file names is IMHO a better strategy.
> Though that would require the other party to do this when it uploads
> to the FTP server.
>
>
>
> On Mon, Jun 1, 2015 at 6:10 PM, Tom Duncalf <[hidden email]
> <http:///user/SendEmail.jtp?type=node&node=5767770&i=0>> wrote:
>
> > Hi,
> >
> > I am building a Camel route to consume files uploaded by FTP and then
> upload
> > them elsewhere. It would seem that "changed" is the most suitable
> readLock
> > strategy for this, and I would like to set a fairly large
> > readLockCheckInterval and readLockTimeout (e.g. 20s and 40s) to help
> prevent
> > incomplete files from users with slow/intermittent connections being
> picked
> > up by the route prematurely.
> >
> > What I would like to achieve is to make it so that the File consumer is
> not
> > blocked from picking up more files for the duration of the
> readLockTimeout -
> > that is, currently, if file1 is uploaded at t=0 and file2 uploaded at
> t=5,
> > then the consumer picks up file1 at t=0 and is then blocked until at
> least
> > t=20 (assuming a 20s check interval), so file2 will not be picked up
> until
> > at least t=20 (and processed until at least t=40). Instead, I would like
> > there to be (for example) more than one consumer thread, so that when
> the
> > first consumer thread picks up file1 and is waiting until t=20, another
> > consumer thread can still pick up file2 at t=5 and wait until t=25.
> >
> > None of the concurrency options available seem to cover this scenario -
> I
> > understand that I can decouple the file being consumed and the
> subsequent
> > processing using, for example, a SEDA queue, or I can split the
> processing
> > into multiple threads after a file has been processed by the File
> consumer
> > using the Threads DSL, but I can't see how to make the consumer itself
> > consume files and therefore create messages in a multi-threaded manner.
> >
> > One workaround may be to spawn a new process for each FTP user and have
> it
> > consume from their home directory, but I would prefer to avoid this
> > additional complexity and it would not gracefully handle a situation
> where
> > one user uploads a large volume of files. One other option I can think
> of is
> > to start multiple instances of my route somehow (to create multiple
> > consumers), and then synchronise them with a shared repository for the
> > inProgressRepository, but I'm not sure if that would actually work.
> >
> > Any input would be great - even if it is basic as I am new to Camel!
> >
> > Thanks,
> > Tom
> >
> >
> >
> >
> > --
> > View this message in context:
> http://camel.465427.n5.nabble.com/Parallel-processing-multiple-file-consumer-threads-with-readLock-changed-and-long-timeout-tp5767753.html
> > Sent from the Camel - Users mailing list archive at Nabble.com.
>
>
>
> --
> Claus Ibsen
> -----------------
> Red Hat, Inc.
> Email: [hidden email]
> <http:///user/SendEmail.jtp?type=node&node=5767770&i=1>
> Twitter: davsclaus
> Blog: http://davsclaus.com
> Author of Camel in Action: http://www.manning.com/ibsen
> hawtio: http://hawt.io/
> fabric8: http://fabric8.io/
>
>
> ------------------------------
>  If you reply to this email, your message will be added to the discussion
> below:
>
> http://camel.465427.n5.nabble.com/Parallel-processing-multiple-file-consumer-threads-with-readLock-changed-and-long-timeout-tp5767753p5767770.html
>  To unsubscribe from Parallel processing/multiple file consumer threads
> with readLock=changed and long timeout?, click here
> <http://camel.465427.n5.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=5767753&code=dG9tQHRvbWR1bmNhbGYuY29tfDU3Njc3NTN8MTAzMzMyNTM2>
> .
> NAML
> <http://camel.465427.n5.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>
>




--
View this message in context: http://camel.465427.n5.nabble.com/Parallel-processing-multiple-file-consumer-threads-with-readLock-changed-and-long-timeout-tp5767753p5767781.html
Sent from the Camel - Users mailing list archive at Nabble.com.

Re: Parallel processing/multiple file consumer threads with readLock=changed and long timeout?

Posted by Claus Ibsen <cl...@gmail.com>.
Hi

If possible then using done file names is IMHO a better strategy.
Though that would require the other party to do this when it uploads
to the FTP server.



On Mon, Jun 1, 2015 at 6:10 PM, Tom Duncalf <to...@tomduncalf.com> wrote:
> Hi,
>
> I am building a Camel route to consume files uploaded by FTP and then upload
> them elsewhere. It would seem that "changed" is the most suitable readLock
> strategy for this, and I would like to set a fairly large
> readLockCheckInterval and readLockTimeout (e.g. 20s and 40s) to help prevent
> incomplete files from users with slow/intermittent connections being picked
> up by the route prematurely.
>
> What I would like to achieve is to make it so that the File consumer is not
> blocked from picking up more files for the duration of the readLockTimeout -
> that is, currently, if file1 is uploaded at t=0 and file2 uploaded at t=5,
> then the consumer picks up file1 at t=0 and is then blocked until at least
> t=20 (assuming a 20s check interval), so file2 will not be picked up until
> at least t=20 (and processed until at least t=40). Instead, I would like
> there to be (for example) more than one consumer thread, so that when the
> first consumer thread picks up file1 and is waiting until t=20, another
> consumer thread can still pick up file2 at t=5 and wait until t=25.
>
> None of the concurrency options available seem to cover this scenario - I
> understand that I can decouple the file being consumed and the subsequent
> processing using, for example, a SEDA queue, or I can split the processing
> into multiple threads after a file has been processed by the File consumer
> using the Threads DSL, but I can't see how to make the consumer itself
> consume files and therefore create messages in a multi-threaded manner.
>
> One workaround may be to spawn a new process for each FTP user and have it
> consume from their home directory, but I would prefer to avoid this
> additional complexity and it would not gracefully handle a situation where
> one user uploads a large volume of files. One other option I can think of is
> to start multiple instances of my route somehow (to create multiple
> consumers), and then synchronise them with a shared repository for the
> inProgressRepository, but I'm not sure if that would actually work.
>
> Any input would be great - even if it is basic as I am new to Camel!
>
> Thanks,
> Tom
>
>
>
>
> --
> View this message in context: http://camel.465427.n5.nabble.com/Parallel-processing-multiple-file-consumer-threads-with-readLock-changed-and-long-timeout-tp5767753.html
> Sent from the Camel - Users mailing list archive at Nabble.com.



-- 
Claus Ibsen
-----------------
Red Hat, Inc.
Email: cibsen@redhat.com
Twitter: davsclaus
Blog: http://davsclaus.com
Author of Camel in Action: http://www.manning.com/ibsen
hawtio: http://hawt.io/
fabric8: http://fabric8.io/