You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@camel.apache.org by wmoussel <wm...@gmail.com> on 2010/02/22 09:04:52 UTC

File Consumer with dynamic list of files to poll

Hi,

Here's my initial issue. I have to poll for new files in a directory
containing a lot of files (5000 to 20000) (and I can't move them after
processing).

When using idempotent option it takes way too much cpu (like 9% even when
they aren't new files).

So i tried adding a filter so that the consumer would only take files less
than 2  days old. It doesn't change much in the end. When I looked into how
filter works in Java I got why it's still cpu consuming...

So then I thought i would get recent file names with a unix command like
find . -mtime -1 and use pollenrich to poll specific files. The way I
understand how poll enrich works, it will poll the whole directory and then
try to match filename afterwards. That is also cpu consuming since it gets
all the files as exchanges prior to the match. Am I wrong ?

So I'm basically trying to invoke a FileName to GenericFileExchange
processor in the middle of my route. Is there a way to do this?

I'm open to any suggestion.

Thanks in advance :)

Wandrille
-- 
View this message in context: http://old.nabble.com/File-Consumer-with-dynamic-list-of-files-to-poll-tp27683938p27683938.html
Sent from the Camel - Users mailing list archive at Nabble.com.


Re: File Consumer with dynamic list of files to poll

Posted by wmoussel <wm...@gmail.com>.
Thanks,


Claus Ibsen-2 wrote:
> 
> On Mon, Feb 22, 2010 at 1:01 PM, Claus Ibsen <cl...@gmail.com>
> wrote:
>> On Mon, Feb 22, 2010 at 12:57 PM, wmoussel <wm...@gmail.com> wrote:
>>>
>>> I considered decreasing the poll interval as well, but couldn't get cpu
>>> usage
>>> down enough for acceptable interval...
>>>
>>> I'm trying PollingConsumerPollStrategy but begin method is a void not a
>>> boolean, isn't it?
>>>
>>
>> Ah yeah it is. I can see we haven't made that a boolean to allow you
>> to deny polling.
>> You can create a ticket in JIRA so we can enhance this so you can do
>> that in Camel 2.3.
>>
> 
> I have created the ticket
> https://issues.apache.org/activemq/browse/CAMEL-2492
> 
>>>
>>>
>>> Claus Ibsen-2 wrote:
>>>>
>>>> On Mon, Feb 22, 2010 at 9:04 AM, wmoussel <wm...@gmail.com> wrote:
>>>>>
>>>>> Hi,
>>>>>
>>>>> Here's my initial issue. I have to poll for new files in a directory
>>>>> containing a lot of files (5000 to 20000) (and I can't move them after
>>>>> processing).
>>>>>
>>>>> When using idempotent option it takes way too much cpu (like 9% even
>>>>> when
>>>>> they aren't new files).
>>>>>
>>>>
>>>> How often are you gonna poll in those files? You can probably change
>>>> the delay to be less frequent.
>>>>
>>>> In any case somehow you gotta match all those 20000 files whether or
>>>> not you have processed them before.
>>>>
>>>>
>>>>> So i tried adding a filter so that the consumer would only take files
>>>>> less
>>>>> than 2  days old. It doesn't change much in the end. When I looked
>>>>> into
>>>>> how
>>>>> filter works in Java I got why it's still cpu consuming...
>>>>>
>>>>> So then I thought i would get recent file names with a unix command
>>>>> like
>>>>> find . -mtime -1 and use pollenrich to poll specific files. The way I
>>>>> understand how poll enrich works, it will poll the whole directory and
>>>>> then
>>>>> try to match filename afterwards. That is also cpu consuming since it
>>>>> gets
>>>>> all the files as exchanges prior to the match. Am I wrong ?
>>>>>
>>>>
>>>> pollEnrich is not suitable for this as its meant for polling and
>>>> aggregating 1 resource at a time.
>>>>
>>>>> So I'm basically trying to invoke a FileName to GenericFileExchange
>>>>> processor in the middle of my route. Is there a way to do this?
>>>>>
>>>>> I'm open to any suggestion.
>>>>>
>>>>
>>>> The file component supports using a custom PollingConsumerPollStrategy
>>>> http://camel.apache.org/polling-consumer.html
>>>>
>>>> You can then implement your own logic and return false in the being
>>>> method where there are no new files since last time.
>>>>
>>>>
>>>>> Thanks in advance :)
>>>>>
>>>>> Wandrille
>>>>> --
>>>>> View this message in context:
>>>>> http://old.nabble.com/File-Consumer-with-dynamic-list-of-files-to-poll-tp27683938p27683938.html
>>>>> Sent from the Camel - Users mailing list archive at Nabble.com.
>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Claus Ibsen
>>>> Apache Camel Committer
>>>>
>>>> Author of Camel in Action: http://www.manning.com/ibsen/
>>>> Open Source Integration: http://fusesource.com
>>>> Blog: http://davsclaus.blogspot.com/
>>>> Twitter: http://twitter.com/davsclaus
>>>>
>>>>
>>>
>>> --
>>> View this message in context:
>>> http://old.nabble.com/File-Consumer-with-dynamic-list-of-files-to-poll-tp27683938p27686162.html
>>> Sent from the Camel - Users mailing list archive at Nabble.com.
>>>
>>>
>>
>>
>>
>> --
>> Claus Ibsen
>> Apache Camel Committer
>>
>> Author of Camel in Action: http://www.manning.com/ibsen/
>> Open Source Integration: http://fusesource.com
>> Blog: http://davsclaus.blogspot.com/
>> Twitter: http://twitter.com/davsclaus
>>
> 
> 
> 
> -- 
> Claus Ibsen
> Apache Camel Committer
> 
> Author of Camel in Action: http://www.manning.com/ibsen/
> Open Source Integration: http://fusesource.com
> Blog: http://davsclaus.blogspot.com/
> Twitter: http://twitter.com/davsclaus
> 
> 

-- 
View this message in context: http://old.nabble.com/File-Consumer-with-dynamic-list-of-files-to-poll-tp27683938p27686496.html
Sent from the Camel - Users mailing list archive at Nabble.com.


Re: File Consumer with dynamic list of files to poll

Posted by Claus Ibsen <cl...@gmail.com>.
On Mon, Feb 22, 2010 at 1:01 PM, Claus Ibsen <cl...@gmail.com> wrote:
> On Mon, Feb 22, 2010 at 12:57 PM, wmoussel <wm...@gmail.com> wrote:
>>
>> I considered decreasing the poll interval as well, but couldn't get cpu usage
>> down enough for acceptable interval...
>>
>> I'm trying PollingConsumerPollStrategy but begin method is a void not a
>> boolean, isn't it?
>>
>
> Ah yeah it is. I can see we haven't made that a boolean to allow you
> to deny polling.
> You can create a ticket in JIRA so we can enhance this so you can do
> that in Camel 2.3.
>

I have created the ticket
https://issues.apache.org/activemq/browse/CAMEL-2492

>>
>>
>> Claus Ibsen-2 wrote:
>>>
>>> On Mon, Feb 22, 2010 at 9:04 AM, wmoussel <wm...@gmail.com> wrote:
>>>>
>>>> Hi,
>>>>
>>>> Here's my initial issue. I have to poll for new files in a directory
>>>> containing a lot of files (5000 to 20000) (and I can't move them after
>>>> processing).
>>>>
>>>> When using idempotent option it takes way too much cpu (like 9% even when
>>>> they aren't new files).
>>>>
>>>
>>> How often are you gonna poll in those files? You can probably change
>>> the delay to be less frequent.
>>>
>>> In any case somehow you gotta match all those 20000 files whether or
>>> not you have processed them before.
>>>
>>>
>>>> So i tried adding a filter so that the consumer would only take files
>>>> less
>>>> than 2  days old. It doesn't change much in the end. When I looked into
>>>> how
>>>> filter works in Java I got why it's still cpu consuming...
>>>>
>>>> So then I thought i would get recent file names with a unix command like
>>>> find . -mtime -1 and use pollenrich to poll specific files. The way I
>>>> understand how poll enrich works, it will poll the whole directory and
>>>> then
>>>> try to match filename afterwards. That is also cpu consuming since it
>>>> gets
>>>> all the files as exchanges prior to the match. Am I wrong ?
>>>>
>>>
>>> pollEnrich is not suitable for this as its meant for polling and
>>> aggregating 1 resource at a time.
>>>
>>>> So I'm basically trying to invoke a FileName to GenericFileExchange
>>>> processor in the middle of my route. Is there a way to do this?
>>>>
>>>> I'm open to any suggestion.
>>>>
>>>
>>> The file component supports using a custom PollingConsumerPollStrategy
>>> http://camel.apache.org/polling-consumer.html
>>>
>>> You can then implement your own logic and return false in the being
>>> method where there are no new files since last time.
>>>
>>>
>>>> Thanks in advance :)
>>>>
>>>> Wandrille
>>>> --
>>>> View this message in context:
>>>> http://old.nabble.com/File-Consumer-with-dynamic-list-of-files-to-poll-tp27683938p27683938.html
>>>> Sent from the Camel - Users mailing list archive at Nabble.com.
>>>>
>>>>
>>>
>>>
>>>
>>> --
>>> Claus Ibsen
>>> Apache Camel Committer
>>>
>>> Author of Camel in Action: http://www.manning.com/ibsen/
>>> Open Source Integration: http://fusesource.com
>>> Blog: http://davsclaus.blogspot.com/
>>> Twitter: http://twitter.com/davsclaus
>>>
>>>
>>
>> --
>> View this message in context: http://old.nabble.com/File-Consumer-with-dynamic-list-of-files-to-poll-tp27683938p27686162.html
>> Sent from the Camel - Users mailing list archive at Nabble.com.
>>
>>
>
>
>
> --
> Claus Ibsen
> Apache Camel Committer
>
> Author of Camel in Action: http://www.manning.com/ibsen/
> Open Source Integration: http://fusesource.com
> Blog: http://davsclaus.blogspot.com/
> Twitter: http://twitter.com/davsclaus
>



-- 
Claus Ibsen
Apache Camel Committer

Author of Camel in Action: http://www.manning.com/ibsen/
Open Source Integration: http://fusesource.com
Blog: http://davsclaus.blogspot.com/
Twitter: http://twitter.com/davsclaus

Re: File Consumer with dynamic list of files to poll

Posted by Claus Ibsen <cl...@gmail.com>.
On Mon, Feb 22, 2010 at 12:57 PM, wmoussel <wm...@gmail.com> wrote:
>
> I considered decreasing the poll interval as well, but couldn't get cpu usage
> down enough for acceptable interval...
>
> I'm trying PollingConsumerPollStrategy but begin method is a void not a
> boolean, isn't it?
>

Ah yeah it is. I can see we haven't made that a boolean to allow you
to deny polling.
You can create a ticket in JIRA so we can enhance this so you can do
that in Camel 2.3.

>
>
> Claus Ibsen-2 wrote:
>>
>> On Mon, Feb 22, 2010 at 9:04 AM, wmoussel <wm...@gmail.com> wrote:
>>>
>>> Hi,
>>>
>>> Here's my initial issue. I have to poll for new files in a directory
>>> containing a lot of files (5000 to 20000) (and I can't move them after
>>> processing).
>>>
>>> When using idempotent option it takes way too much cpu (like 9% even when
>>> they aren't new files).
>>>
>>
>> How often are you gonna poll in those files? You can probably change
>> the delay to be less frequent.
>>
>> In any case somehow you gotta match all those 20000 files whether or
>> not you have processed them before.
>>
>>
>>> So i tried adding a filter so that the consumer would only take files
>>> less
>>> than 2  days old. It doesn't change much in the end. When I looked into
>>> how
>>> filter works in Java I got why it's still cpu consuming...
>>>
>>> So then I thought i would get recent file names with a unix command like
>>> find . -mtime -1 and use pollenrich to poll specific files. The way I
>>> understand how poll enrich works, it will poll the whole directory and
>>> then
>>> try to match filename afterwards. That is also cpu consuming since it
>>> gets
>>> all the files as exchanges prior to the match. Am I wrong ?
>>>
>>
>> pollEnrich is not suitable for this as its meant for polling and
>> aggregating 1 resource at a time.
>>
>>> So I'm basically trying to invoke a FileName to GenericFileExchange
>>> processor in the middle of my route. Is there a way to do this?
>>>
>>> I'm open to any suggestion.
>>>
>>
>> The file component supports using a custom PollingConsumerPollStrategy
>> http://camel.apache.org/polling-consumer.html
>>
>> You can then implement your own logic and return false in the being
>> method where there are no new files since last time.
>>
>>
>>> Thanks in advance :)
>>>
>>> Wandrille
>>> --
>>> View this message in context:
>>> http://old.nabble.com/File-Consumer-with-dynamic-list-of-files-to-poll-tp27683938p27683938.html
>>> Sent from the Camel - Users mailing list archive at Nabble.com.
>>>
>>>
>>
>>
>>
>> --
>> Claus Ibsen
>> Apache Camel Committer
>>
>> Author of Camel in Action: http://www.manning.com/ibsen/
>> Open Source Integration: http://fusesource.com
>> Blog: http://davsclaus.blogspot.com/
>> Twitter: http://twitter.com/davsclaus
>>
>>
>
> --
> View this message in context: http://old.nabble.com/File-Consumer-with-dynamic-list-of-files-to-poll-tp27683938p27686162.html
> Sent from the Camel - Users mailing list archive at Nabble.com.
>
>



-- 
Claus Ibsen
Apache Camel Committer

Author of Camel in Action: http://www.manning.com/ibsen/
Open Source Integration: http://fusesource.com
Blog: http://davsclaus.blogspot.com/
Twitter: http://twitter.com/davsclaus

Re: File Consumer with dynamic list of files to poll

Posted by wmoussel <wm...@gmail.com>.
I considered decreasing the poll interval as well, but couldn't get cpu usage
down enough for acceptable interval...

I'm trying PollingConsumerPollStrategy but begin method is a void not a
boolean, isn't it?



Claus Ibsen-2 wrote:
> 
> On Mon, Feb 22, 2010 at 9:04 AM, wmoussel <wm...@gmail.com> wrote:
>>
>> Hi,
>>
>> Here's my initial issue. I have to poll for new files in a directory
>> containing a lot of files (5000 to 20000) (and I can't move them after
>> processing).
>>
>> When using idempotent option it takes way too much cpu (like 9% even when
>> they aren't new files).
>>
> 
> How often are you gonna poll in those files? You can probably change
> the delay to be less frequent.
> 
> In any case somehow you gotta match all those 20000 files whether or
> not you have processed them before.
> 
> 
>> So i tried adding a filter so that the consumer would only take files
>> less
>> than 2  days old. It doesn't change much in the end. When I looked into
>> how
>> filter works in Java I got why it's still cpu consuming...
>>
>> So then I thought i would get recent file names with a unix command like
>> find . -mtime -1 and use pollenrich to poll specific files. The way I
>> understand how poll enrich works, it will poll the whole directory and
>> then
>> try to match filename afterwards. That is also cpu consuming since it
>> gets
>> all the files as exchanges prior to the match. Am I wrong ?
>>
> 
> pollEnrich is not suitable for this as its meant for polling and
> aggregating 1 resource at a time.
> 
>> So I'm basically trying to invoke a FileName to GenericFileExchange
>> processor in the middle of my route. Is there a way to do this?
>>
>> I'm open to any suggestion.
>>
> 
> The file component supports using a custom PollingConsumerPollStrategy
> http://camel.apache.org/polling-consumer.html
> 
> You can then implement your own logic and return false in the being
> method where there are no new files since last time.
> 
> 
>> Thanks in advance :)
>>
>> Wandrille
>> --
>> View this message in context:
>> http://old.nabble.com/File-Consumer-with-dynamic-list-of-files-to-poll-tp27683938p27683938.html
>> Sent from the Camel - Users mailing list archive at Nabble.com.
>>
>>
> 
> 
> 
> -- 
> Claus Ibsen
> Apache Camel Committer
> 
> Author of Camel in Action: http://www.manning.com/ibsen/
> Open Source Integration: http://fusesource.com
> Blog: http://davsclaus.blogspot.com/
> Twitter: http://twitter.com/davsclaus
> 
> 

-- 
View this message in context: http://old.nabble.com/File-Consumer-with-dynamic-list-of-files-to-poll-tp27683938p27686162.html
Sent from the Camel - Users mailing list archive at Nabble.com.


Re: File Consumer with dynamic list of files to poll

Posted by Claus Ibsen <cl...@gmail.com>.
On Mon, Feb 22, 2010 at 9:04 AM, wmoussel <wm...@gmail.com> wrote:
>
> Hi,
>
> Here's my initial issue. I have to poll for new files in a directory
> containing a lot of files (5000 to 20000) (and I can't move them after
> processing).
>
> When using idempotent option it takes way too much cpu (like 9% even when
> they aren't new files).
>

How often are you gonna poll in those files? You can probably change
the delay to be less frequent.

In any case somehow you gotta match all those 20000 files whether or
not you have processed them before.


> So i tried adding a filter so that the consumer would only take files less
> than 2  days old. It doesn't change much in the end. When I looked into how
> filter works in Java I got why it's still cpu consuming...
>
> So then I thought i would get recent file names with a unix command like
> find . -mtime -1 and use pollenrich to poll specific files. The way I
> understand how poll enrich works, it will poll the whole directory and then
> try to match filename afterwards. That is also cpu consuming since it gets
> all the files as exchanges prior to the match. Am I wrong ?
>

pollEnrich is not suitable for this as its meant for polling and
aggregating 1 resource at a time.

> So I'm basically trying to invoke a FileName to GenericFileExchange
> processor in the middle of my route. Is there a way to do this?
>
> I'm open to any suggestion.
>

The file component supports using a custom PollingConsumerPollStrategy
http://camel.apache.org/polling-consumer.html

You can then implement your own logic and return false in the being
method where there are no new files since last time.


> Thanks in advance :)
>
> Wandrille
> --
> View this message in context: http://old.nabble.com/File-Consumer-with-dynamic-list-of-files-to-poll-tp27683938p27683938.html
> Sent from the Camel - Users mailing list archive at Nabble.com.
>
>



-- 
Claus Ibsen
Apache Camel Committer

Author of Camel in Action: http://www.manning.com/ibsen/
Open Source Integration: http://fusesource.com
Blog: http://davsclaus.blogspot.com/
Twitter: http://twitter.com/davsclaus