You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@camel.apache.org by Jon Anstey <ja...@gmail.com> on 2008/11/18 19:57:43 UTC

Deprecation of file consumer timestamp

The algorithm that checks whether a file should be consumed based on
timestamp has been deprecated for a while now (see
http://activemq.apache.org/camel/file.html). I've removed this on my local
branch only to realize that it introduces a bit of an ugly problem...
essentially since files will be processed always (modified or not) in the
case of noop=true or if a fault has been set, the same file will be
processed over and over again... not good!

The original intent of removing the timestamp checking was to simplify the
consumer. I think that in trying to get around this new issue we may make it
even more complicated!

I'm wondering if there is a simple solution to this that I'm just not seeing
yet or if maybe this issue was discussed before...

-- 
Cheers,
Jon

http://janstey.blogspot.com/

Re: Deprecation of file consumer timestamp

Posted by Hadrian Zbarcea <hz...@gmail.com>.
What if you restart camel, will files be processed again?  I assume  
so, which is not good either.


On Nov 18, 2008, at 1:57 PM, Jon Anstey wrote:

> The algorithm that checks whether a file should be consumed based on
> timestamp has been deprecated for a while now (see
> http://activemq.apache.org/camel/file.html). I've removed this on my  
> local
> branch only to realize that it introduces a bit of an ugly problem...
> essentially since files will be processed always (modified or not)  
> in the
> case of noop=true or if a fault has been set, the same file will be
> processed over and over again... not good!
>
> The original intent of removing the timestamp checking was to  
> simplify the
> consumer. I think that in trying to get around this new issue we may  
> make it
> even more complicated!
>
> I'm wondering if there is a simple solution to this that I'm just  
> not seeing
> yet or if maybe this issue was discussed before...
>
> -- 
> Cheers,
> Jon
>
> http://janstey.blogspot.com/


Re: Deprecation of file consumer timestamp

Posted by Claus Ibsen <cl...@gmail.com>.
On Mon, Dec 1, 2008 at 11:58 AM, James Strachan
<ja...@gmail.com> wrote:
> 2008/12/1 Claus Ibsen <cl...@gmail.com>:
>> /Claus Ibsen
>> Apache Camel Committer
>> Blog: http://davsclaus.blogspot.com/
>>
>>
>>
>> On Mon, Dec 1, 2008 at 11:44 AM, James Strachan
>> <ja...@gmail.com> wrote:
>>> 2008/11/29 Claus Ibsen <cl...@gmail.com>:
>>>> Hi
>>>>
>>>> I am reworking the file component as the code needs to be polished to
>>>> be ready for new feature requests by end users.
>>>>
>>>> Having my fingers on the keyboard and reworking the code I do think we
>>>> should consider letting the idempotent consumer EIP pattern having a
>>>> first class interface for consumers to implement to support idempotent
>>>> right out-of-the-box. This is convenient for both the file and ftp
>>>> consumers to avoid re-consuming already processed files.
>>>>
>>>> Then we could allow very easy URI configuration for the file consumer
>>>> to enable the idempotent
>>>> from("file://inbox?idempotent=true").to("bean:processOrder");
>>>
>>> Great idea!
>>>
>>>> So I am proposing to either
>>>> a) add a new interface in org.apache.camel to cater for this
>>>> b) move the existing interface MessageIdRepository to org.apache.camel
>>>> c) option b but renaming the interface to a better name, IdempotentRepository
>>>>
>>>> Using the existing MessageIdRepository allows us to leverage existing
>>>> implementations such as the JpaMessageIdRepository so we can support a
>>>> persistent solution right out-of-the-box.
>>>
>>> Sounds good - how about IdempotentRepository being in the camel.spi package?
>> Yeah that is a great home for it ;)
>>
>> Maybe we should add a peek method to see if an id has been processed before
>>
>> contains will add it if missing, so we might need a "read-only peek method"
>>  boolean contains(String key);
>>
>>  boolean peek(String key);
>>
>> Anyone got a better name for peek?
>
> How about following the Set naming convention?
>
> boolean add(element) // Returns true if this set did not already
> contain the specified element.
>
> boolean contains(element) // for peek
+1, however existing implementations change from contains to add to
keep current behavior. We should just state this in the release notes.
End users with custom implementations must change their code anyway as
we rename and move the interface.



>
> --
> James
> -------
> http://macstrac.blogspot.com/
>
> Open Source Integration
> http://fusesource.com/
>

Re: Deprecation of file consumer timestamp

Posted by James Strachan <ja...@gmail.com>.
2008/12/1 Claus Ibsen <cl...@gmail.com>:
> /Claus Ibsen
> Apache Camel Committer
> Blog: http://davsclaus.blogspot.com/
>
>
>
> On Mon, Dec 1, 2008 at 11:44 AM, James Strachan
> <ja...@gmail.com> wrote:
>> 2008/11/29 Claus Ibsen <cl...@gmail.com>:
>>> Hi
>>>
>>> I am reworking the file component as the code needs to be polished to
>>> be ready for new feature requests by end users.
>>>
>>> Having my fingers on the keyboard and reworking the code I do think we
>>> should consider letting the idempotent consumer EIP pattern having a
>>> first class interface for consumers to implement to support idempotent
>>> right out-of-the-box. This is convenient for both the file and ftp
>>> consumers to avoid re-consuming already processed files.
>>>
>>> Then we could allow very easy URI configuration for the file consumer
>>> to enable the idempotent
>>> from("file://inbox?idempotent=true").to("bean:processOrder");
>>
>> Great idea!
>>
>>> So I am proposing to either
>>> a) add a new interface in org.apache.camel to cater for this
>>> b) move the existing interface MessageIdRepository to org.apache.camel
>>> c) option b but renaming the interface to a better name, IdempotentRepository
>>>
>>> Using the existing MessageIdRepository allows us to leverage existing
>>> implementations such as the JpaMessageIdRepository so we can support a
>>> persistent solution right out-of-the-box.
>>
>> Sounds good - how about IdempotentRepository being in the camel.spi package?
> Yeah that is a great home for it ;)
>
> Maybe we should add a peek method to see if an id has been processed before
>
> contains will add it if missing, so we might need a "read-only peek method"
>  boolean contains(String key);
>
>  boolean peek(String key);
>
> Anyone got a better name for peek?

How about following the Set naming convention?

boolean add(element) // Returns true if this set did not already
contain the specified element.

boolean contains(element) // for peek

-- 
James
-------
http://macstrac.blogspot.com/

Open Source Integration
http://fusesource.com/

Re: Deprecation of file consumer timestamp

Posted by Claus Ibsen <cl...@gmail.com>.
/Claus Ibsen
Apache Camel Committer
Blog: http://davsclaus.blogspot.com/



On Mon, Dec 1, 2008 at 11:44 AM, James Strachan
<ja...@gmail.com> wrote:
> 2008/11/29 Claus Ibsen <cl...@gmail.com>:
>> Hi
>>
>> I am reworking the file component as the code needs to be polished to
>> be ready for new feature requests by end users.
>>
>> Having my fingers on the keyboard and reworking the code I do think we
>> should consider letting the idempotent consumer EIP pattern having a
>> first class interface for consumers to implement to support idempotent
>> right out-of-the-box. This is convenient for both the file and ftp
>> consumers to avoid re-consuming already processed files.
>>
>> Then we could allow very easy URI configuration for the file consumer
>> to enable the idempotent
>> from("file://inbox?idempotent=true").to("bean:processOrder");
>
> Great idea!
>
>> So I am proposing to either
>> a) add a new interface in org.apache.camel to cater for this
>> b) move the existing interface MessageIdRepository to org.apache.camel
>> c) option b but renaming the interface to a better name, IdempotentRepository
>>
>> Using the existing MessageIdRepository allows us to leverage existing
>> implementations such as the JpaMessageIdRepository so we can support a
>> persistent solution right out-of-the-box.
>
> Sounds good - how about IdempotentRepository being in the camel.spi package?
Yeah that is a great home for it ;)

Maybe we should add a peek method to see if an id has been processed before

contains will add it if missing, so we might need a "read-only peek method"
 boolean contains(String key);

 boolean peek(String key);

Anyone got a better name for peek?

> --
> James
> -------
> http://macstrac.blogspot.com/
>
> Open Source Integration
> http://fusesource.com/
>

Re: Deprecation of file consumer timestamp

Posted by James Strachan <ja...@gmail.com>.
2008/11/29 Claus Ibsen <cl...@gmail.com>:
> Hi
>
> I am reworking the file component as the code needs to be polished to
> be ready for new feature requests by end users.
>
> Having my fingers on the keyboard and reworking the code I do think we
> should consider letting the idempotent consumer EIP pattern having a
> first class interface for consumers to implement to support idempotent
> right out-of-the-box. This is convenient for both the file and ftp
> consumers to avoid re-consuming already processed files.
>
> Then we could allow very easy URI configuration for the file consumer
> to enable the idempotent
> from("file://inbox?idempotent=true").to("bean:processOrder");

Great idea!

> So I am proposing to either
> a) add a new interface in org.apache.camel to cater for this
> b) move the existing interface MessageIdRepository to org.apache.camel
> c) option b but renaming the interface to a better name, IdempotentRepository
>
> Using the existing MessageIdRepository allows us to leverage existing
> implementations such as the JpaMessageIdRepository so we can support a
> persistent solution right out-of-the-box.

Sounds good - how about IdempotentRepository being in the camel.spi package?
-- 
James
-------
http://macstrac.blogspot.com/

Open Source Integration
http://fusesource.com/

Re: Deprecation of file consumer timestamp

Posted by Claus Ibsen <cl...@gmail.com>.
Hi

I am reworking the file component as the code needs to be polished to
be ready for new feature requests by end users.

Having my fingers on the keyboard and reworking the code I do think we
should consider letting the idempotent consumer EIP pattern having a
first class interface for consumers to implement to support idempotent
right out-of-the-box. This is convenient for both the file and ftp
consumers to avoid re-consuming already processed files.

Then we could allow very easy URI configuration for the file consumer
to enable the idempotent
from("file://inbox?idempotent=true").to("bean:processOrder");

So I am proposing to either
a) add a new interface in org.apache.camel to cater for this
b) move the existing interface MessageIdRepository to org.apache.camel
c) option b but renaming the interface to a better name, IdempotentRepository

Using the existing MessageIdRepository allows us to leverage existing
implementations such as the JpaMessageIdRepository so we can support a
persistent solution right out-of-the-box.


/Claus Ibsen
Apache Camel Committer
Blog: http://davsclaus.blogspot.com/



On Tue, Nov 25, 2008 at 7:24 AM,  <ja...@gmail.com> wrote:
> Btw unit testing - where you want to process all filed on startup -
> and never want to edit/delete them was the main motivation & use case
> for noop.
>
> We definitely need to support different strategies as there are many
> different use cases. Eg sometimes keeping a cache of all files
> processed won't scale due to huge number of files. Sometimes you want
> to process a file again if it is touched.
>
> I understand that sometimes timestamps are dodgy; but I would rather
> us support all use cases cleanly using different pluggable strategies
> than disable useful functionality (like testing! :-)
>
>
> On 19/11/2008, Gert Vanthienen <ge...@skynet.be> wrote:
>> L.S.,
>>
>> It almost sounds as if we need two separate different strategies that
>> can be configured on the file endpoint:
>> - one to determine which files need to be processed (the basic one just
>> takes all the files in a directory but we can build additional ones that
>> use a storage mechanisms)
>> - another one (like we already have now) that determines what to do with
>> the file after a successful or failed exchange
>>
>> FWIW, I actually like the simple noop one for creating unit tests
>> because it allows you to just refer to the /src/test/resources folder in
>> your project instead of having to copy them to a work folder first.
>>
>> Regards,
>>
>> Gert
>>
>> Claus Ibsen wrote:
>>> Hi
>>>
>>> Oh I have thought that some end-users want FileConsumer to keep retry
>>> consuming the same filer over and over again if it could not be
>>> processed, so the postAction could have a 3rd option or we could have
>>> an option to set this feature (kinda like noop but only for when the
>>> file could not be processed)
>>>
>>>
>>>
>>> /Claus Ibsen
>>> Apache Camel Committer
>>> Blog: http://davsclaus.blogspot.com/
>>>
>>>
>>>
>>> On Wed, Nov 19, 2008 at 10:35 AM, Claus Ibsen <cl...@gmail.com>
>>> wrote:
>>>
>>>> Hi
>>>>
>>>> The store idea is good as it can be used for the idempotent consumer
>>>> as well so we can use it to persist as well, so it can survive
>>>> restarts. We need to allow it to be pluggable so users can use a
>>>> shared DB if they use grid, or maybe some of that fancy terracote
>>>> thing that distributes memory caches.
>>>>
>>>> But turning back to the file consumer. I really think the noop=true
>>>> options should be deprecated as well. The file is like an inbox where
>>>> if a file is dropped it is consumed once. After processing the file is
>>>> deleted or moved to another destination. Now with this "remember list"
>>>> we have a serious issue if the inbox receives file with the same name
>>>> but the content of the file is different. What if someone uploads a
>>>> file to a FTP server and the filename is always fixed (= the same).
>>>> Now we have a complex situation as we need to hash the file content to
>>>> be able to determine if the file is different, or not support it at
>>>> all.
>>>>
>>>> I am mostly keen to keep it simpler and as Hadrian said "keep it lean".
>>>>
>>>> So I am voting for:
>>>> a) to remove noop as wel
>>>> b) to always delete or move file after processing (we should support
>>>> moving files to a different folder if exchange failed)
>>>>
>>>> Ad b)
>>>> We should support moving files using different pattern depending on
>>>> - exchange OK
>>>> - exchange Failed
>>>> I have though about introducing some better URI options to express this
>>>>
>>>> Something along the lines of (think of better uri option names)
>>>> postAction=delete
>>>>
>>>> postAction=move
>>>> moveCompleteExpression=./done/${file:name}.bak
>>>> moveErrorExpression=./error/${date:now:yyyyMMdd}/${file:name}.error
>>>>
>>>> And we should have defaults as well, so if moveErrorExpression is
>>>> omitted it defaults to the completed move.
>>>>
>>>>
>>>> And then we could consider @deprecating all the other pre and postfix
>>>> URI option we have in favor of the power of the expression instead.
>>>>
>>>>
>>>>
>>>> But the list store is not wasted as we can use it for the idempotent
>>>> as well and for other areas.
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> /Claus Ibsen
>>>> Apache Camel Committer
>>>> Blog: http://davsclaus.blogspot.com/
>>>>
>>>>
>>>>
>>>> On Wed, Nov 19, 2008 at 4:04 AM, Jon Anstey <ja...@gmail.com> wrote:
>>>>
>>>>> Hmmm... yeah, I like this suggestion. It may be just what we need here!
>>>>> Thanks!
>>>>>
>>>>> On Tue, Nov 18, 2008 at 4:11 PM, Gert Vanthienen
>>>>> <ge...@skynet.be>wrote:
>>>>>
>>>>>
>>>>>> Jon,
>>>>>>
>>>>>> How about if we enhance the file consumer to keep track of files that
>>>>>> have
>>>>>> already been processed instead of using a timestamp?  The timestamp
>>>>>> approach
>>>>>> is a bit error-prone (just touching the file by accident can set it off
>>>>>> again).
>>>>>> If we provide multiple implementations for the storage mechanism to
>>>>>> keep
>>>>>> this information, we can cover a lot of use cases (similar to the
>>>>>> message id
>>>>>> store for an idempotent consumer):
>>>>>> - an in-memory store for testing purposes
>>>>>> - a file-based implementation for basic production environments
>>>>>> - a database- or ldap-backed implementation for clustered environments,
>>>>>> where a file can arrive through multiple directories
>>>>>>
>>>>>> Regards,
>>>>>>
>>>>>> Gert
>>>>>>
>>>>>> Jon Anstey schreef:
>>>>>>
>>>>>>  The algorithm that checks whether a file should be consumed based on
>>>>>>
>>>>>>> timestamp has been deprecated for a while now (see
>>>>>>> http://activemq.apache.org/camel/file.html). I've removed this on my
>>>>>>> local
>>>>>>> branch only to realize that it introduces a bit of an ugly problem...
>>>>>>> essentially since files will be processed always (modified or not) in
>>>>>>> the
>>>>>>> case of noop=true or if a fault has been set, the same file will be
>>>>>>> processed over and over again... not good!
>>>>>>>
>>>>>>> The original intent of removing the timestamp checking was to simplify
>>>>>>> the
>>>>>>> consumer. I think that in trying to get around this new issue we may
>>>>>>> make
>>>>>>> it
>>>>>>> even more complicated!
>>>>>>>
>>>>>>> I'm wondering if there is a simple solution to this that I'm just not
>>>>>>> seeing
>>>>>>> yet or if maybe this issue was discussed before...
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>> --
>>>>> Cheers,
>>>>> Jon
>>>>>
>>>>> http://janstey.blogspot.com/
>>>>>
>>>>>
>>>
>>>
>>
>>
>
>
> --
> James
> -------
> http://macstrac.blogspot.com/
>
> Open Source Integration
> http://fusesource.com/
>

Re: Deprecation of file consumer timestamp

Posted by ja...@gmail.com.
Btw unit testing - where you want to process all filed on startup -
and never want to edit/delete them was the main motivation & use case
for noop.

We definitely need to support different strategies as there are many
different use cases. Eg sometimes keeping a cache of all files
processed won't scale due to huge number of files. Sometimes you want
to process a file again if it is touched.

I understand that sometimes timestamps are dodgy; but I would rather
us support all use cases cleanly using different pluggable strategies
than disable useful functionality (like testing! :-)


On 19/11/2008, Gert Vanthienen <ge...@skynet.be> wrote:
> L.S.,
>
> It almost sounds as if we need two separate different strategies that
> can be configured on the file endpoint:
> - one to determine which files need to be processed (the basic one just
> takes all the files in a directory but we can build additional ones that
> use a storage mechanisms)
> - another one (like we already have now) that determines what to do with
> the file after a successful or failed exchange
>
> FWIW, I actually like the simple noop one for creating unit tests
> because it allows you to just refer to the /src/test/resources folder in
> your project instead of having to copy them to a work folder first.
>
> Regards,
>
> Gert
>
> Claus Ibsen wrote:
>> Hi
>>
>> Oh I have thought that some end-users want FileConsumer to keep retry
>> consuming the same filer over and over again if it could not be
>> processed, so the postAction could have a 3rd option or we could have
>> an option to set this feature (kinda like noop but only for when the
>> file could not be processed)
>>
>>
>>
>> /Claus Ibsen
>> Apache Camel Committer
>> Blog: http://davsclaus.blogspot.com/
>>
>>
>>
>> On Wed, Nov 19, 2008 at 10:35 AM, Claus Ibsen <cl...@gmail.com>
>> wrote:
>>
>>> Hi
>>>
>>> The store idea is good as it can be used for the idempotent consumer
>>> as well so we can use it to persist as well, so it can survive
>>> restarts. We need to allow it to be pluggable so users can use a
>>> shared DB if they use grid, or maybe some of that fancy terracote
>>> thing that distributes memory caches.
>>>
>>> But turning back to the file consumer. I really think the noop=true
>>> options should be deprecated as well. The file is like an inbox where
>>> if a file is dropped it is consumed once. After processing the file is
>>> deleted or moved to another destination. Now with this "remember list"
>>> we have a serious issue if the inbox receives file with the same name
>>> but the content of the file is different. What if someone uploads a
>>> file to a FTP server and the filename is always fixed (= the same).
>>> Now we have a complex situation as we need to hash the file content to
>>> be able to determine if the file is different, or not support it at
>>> all.
>>>
>>> I am mostly keen to keep it simpler and as Hadrian said "keep it lean".
>>>
>>> So I am voting for:
>>> a) to remove noop as wel
>>> b) to always delete or move file after processing (we should support
>>> moving files to a different folder if exchange failed)
>>>
>>> Ad b)
>>> We should support moving files using different pattern depending on
>>> - exchange OK
>>> - exchange Failed
>>> I have though about introducing some better URI options to express this
>>>
>>> Something along the lines of (think of better uri option names)
>>> postAction=delete
>>>
>>> postAction=move
>>> moveCompleteExpression=./done/${file:name}.bak
>>> moveErrorExpression=./error/${date:now:yyyyMMdd}/${file:name}.error
>>>
>>> And we should have defaults as well, so if moveErrorExpression is
>>> omitted it defaults to the completed move.
>>>
>>>
>>> And then we could consider @deprecating all the other pre and postfix
>>> URI option we have in favor of the power of the expression instead.
>>>
>>>
>>>
>>> But the list store is not wasted as we can use it for the idempotent
>>> as well and for other areas.
>>>
>>>
>>>
>>>
>>>
>>> /Claus Ibsen
>>> Apache Camel Committer
>>> Blog: http://davsclaus.blogspot.com/
>>>
>>>
>>>
>>> On Wed, Nov 19, 2008 at 4:04 AM, Jon Anstey <ja...@gmail.com> wrote:
>>>
>>>> Hmmm... yeah, I like this suggestion. It may be just what we need here!
>>>> Thanks!
>>>>
>>>> On Tue, Nov 18, 2008 at 4:11 PM, Gert Vanthienen
>>>> <ge...@skynet.be>wrote:
>>>>
>>>>
>>>>> Jon,
>>>>>
>>>>> How about if we enhance the file consumer to keep track of files that
>>>>> have
>>>>> already been processed instead of using a timestamp?  The timestamp
>>>>> approach
>>>>> is a bit error-prone (just touching the file by accident can set it off
>>>>> again).
>>>>> If we provide multiple implementations for the storage mechanism to
>>>>> keep
>>>>> this information, we can cover a lot of use cases (similar to the
>>>>> message id
>>>>> store for an idempotent consumer):
>>>>> - an in-memory store for testing purposes
>>>>> - a file-based implementation for basic production environments
>>>>> - a database- or ldap-backed implementation for clustered environments,
>>>>> where a file can arrive through multiple directories
>>>>>
>>>>> Regards,
>>>>>
>>>>> Gert
>>>>>
>>>>> Jon Anstey schreef:
>>>>>
>>>>>  The algorithm that checks whether a file should be consumed based on
>>>>>
>>>>>> timestamp has been deprecated for a while now (see
>>>>>> http://activemq.apache.org/camel/file.html). I've removed this on my
>>>>>> local
>>>>>> branch only to realize that it introduces a bit of an ugly problem...
>>>>>> essentially since files will be processed always (modified or not) in
>>>>>> the
>>>>>> case of noop=true or if a fault has been set, the same file will be
>>>>>> processed over and over again... not good!
>>>>>>
>>>>>> The original intent of removing the timestamp checking was to simplify
>>>>>> the
>>>>>> consumer. I think that in trying to get around this new issue we may
>>>>>> make
>>>>>> it
>>>>>> even more complicated!
>>>>>>
>>>>>> I'm wondering if there is a simple solution to this that I'm just not
>>>>>> seeing
>>>>>> yet or if maybe this issue was discussed before...
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>> --
>>>> Cheers,
>>>> Jon
>>>>
>>>> http://janstey.blogspot.com/
>>>>
>>>>
>>
>>
>
>


-- 
James
-------
http://macstrac.blogspot.com/

Open Source Integration
http://fusesource.com/

Re: Deprecation of file consumer timestamp

Posted by Jon Anstey <ja...@gmail.com>.
On Wed, Nov 19, 2008 at 2:57 PM, Claus Ibsen <cl...@gmail.com> wrote:

> Hi
>
> We have a ticket for a new feature to allow a sort of callback for
> rollback / commit when an exchange is finished. James suggested to use
> the UnitOfWork. So I guess if we could get that work started it would
> be possible using that to get a callback when the file producer has
> written the file and this the delay could be removed.


Cool stuff.


>
>
> But when I started adding all these unit tests using sleep was the
> easiest to get going. I didn't envision there would be like 30 unit
> test for the file component with all their 1-2 sec sleeps. Or even
> longer to allow it to pass on all boxes from XP to unix on fast and
> slow ones.


Don't worry about it. There are sleeps everywhere else too :) It kinda hurts
productivity to try and make tests perfect anyways... that said, I did
create a ticket for fixing this *eventually* (CAMEL-966). Maybe as a start
we could put in the support to make better tests without sleeps.


>
>
>
> /Claus Ibsen
> Apache Camel Committer
> Blog: http://davsclaus.blogspot.com/
>
>
>
> On Wed, Nov 19, 2008 at 4:31 PM, Jon Anstey <ja...@gmail.com> wrote:
> > Great ideas guys. I think I'll focus on the memory feature first though,
> so
> > it doesn't get too confusing ;)
> >
> > On the testability with noop bit: a big +1 from me for keeping anything
> that
> > would make testing easier! In particular, I'd like to improve the file
> > component tests by removing as many sleeps as possible. There are sleeps
> > everywhere in the Camel tests but I've found that the file tests have
> > particularily long sleeps which make for slow builds. Perhaps I'll get
> > around to putting in a file watcher assertion or something eventually.
> >
> > On Wed, Nov 19, 2008 at 7:22 AM, Claus Ibsen <cl...@gmail.com>
> wrote:
> >
> >> > FWIW, I actually like the simple noop one for creating unit tests
> because
> >> it
> >> > allows you to just refer to the /src/test/resources folder in your
> >> project
> >> > instead of having to copy them to a work folder first.
> >> Yeah here it has a good purpose. I guess James Strachan and others
> >> have used it in unit testing other components, to get some payload
> >> from a file.
> >>
> >> Maybe the postAction=noop should be supported ;)
> >>
> >> /Claus
> >>
> >
> >
> >
> > --
> > Cheers,
> > Jon
> >
> > http://janstey.blogspot.com/
> >
>



-- 
Cheers,
Jon

http://janstey.blogspot.com/

Re: Deprecation of file consumer timestamp

Posted by Claus Ibsen <cl...@gmail.com>.
Hi

We have a ticket for a new feature to allow a sort of callback for
rollback / commit when an exchange is finished. James suggested to use
the UnitOfWork. So I guess if we could get that work started it would
be possible using that to get a callback when the file producer has
written the file and this the delay could be removed.

But when I started adding all these unit tests using sleep was the
easiest to get going. I didn't envision there would be like 30 unit
test for the file component with all their 1-2 sec sleeps. Or even
longer to allow it to pass on all boxes from XP to unix on fast and
slow ones.


/Claus Ibsen
Apache Camel Committer
Blog: http://davsclaus.blogspot.com/



On Wed, Nov 19, 2008 at 4:31 PM, Jon Anstey <ja...@gmail.com> wrote:
> Great ideas guys. I think I'll focus on the memory feature first though, so
> it doesn't get too confusing ;)
>
> On the testability with noop bit: a big +1 from me for keeping anything that
> would make testing easier! In particular, I'd like to improve the file
> component tests by removing as many sleeps as possible. There are sleeps
> everywhere in the Camel tests but I've found that the file tests have
> particularily long sleeps which make for slow builds. Perhaps I'll get
> around to putting in a file watcher assertion or something eventually.
>
> On Wed, Nov 19, 2008 at 7:22 AM, Claus Ibsen <cl...@gmail.com> wrote:
>
>> > FWIW, I actually like the simple noop one for creating unit tests because
>> it
>> > allows you to just refer to the /src/test/resources folder in your
>> project
>> > instead of having to copy them to a work folder first.
>> Yeah here it has a good purpose. I guess James Strachan and others
>> have used it in unit testing other components, to get some payload
>> from a file.
>>
>> Maybe the postAction=noop should be supported ;)
>>
>> /Claus
>>
>
>
>
> --
> Cheers,
> Jon
>
> http://janstey.blogspot.com/
>

Re: Deprecation of file consumer timestamp

Posted by Jon Anstey <ja...@gmail.com>.
Great ideas guys. I think I'll focus on the memory feature first though, so
it doesn't get too confusing ;)

On the testability with noop bit: a big +1 from me for keeping anything that
would make testing easier! In particular, I'd like to improve the file
component tests by removing as many sleeps as possible. There are sleeps
everywhere in the Camel tests but I've found that the file tests have
particularily long sleeps which make for slow builds. Perhaps I'll get
around to putting in a file watcher assertion or something eventually.

On Wed, Nov 19, 2008 at 7:22 AM, Claus Ibsen <cl...@gmail.com> wrote:

> > FWIW, I actually like the simple noop one for creating unit tests because
> it
> > allows you to just refer to the /src/test/resources folder in your
> project
> > instead of having to copy them to a work folder first.
> Yeah here it has a good purpose. I guess James Strachan and others
> have used it in unit testing other components, to get some payload
> from a file.
>
> Maybe the postAction=noop should be supported ;)
>
> /Claus
>



-- 
Cheers,
Jon

http://janstey.blogspot.com/

Re: Deprecation of file consumer timestamp

Posted by Claus Ibsen <cl...@gmail.com>.
> FWIW, I actually like the simple noop one for creating unit tests because it
> allows you to just refer to the /src/test/resources folder in your project
> instead of having to copy them to a work folder first.
Yeah here it has a good purpose. I guess James Strachan and others
have used it in unit testing other components, to get some payload
from a file.

Maybe the postAction=noop should be supported ;)

/Claus

Re: Deprecation of file consumer timestamp

Posted by Gert Vanthienen <ge...@skynet.be>.
L.S.,

It almost sounds as if we need two separate different strategies that 
can be configured on the file endpoint:
- one to determine which files need to be processed (the basic one just 
takes all the files in a directory but we can build additional ones that 
use a storage mechanisms)
- another one (like we already have now) that determines what to do with 
the file after a successful or failed exchange

FWIW, I actually like the simple noop one for creating unit tests 
because it allows you to just refer to the /src/test/resources folder in 
your project instead of having to copy them to a work folder first.

Regards,

Gert

Claus Ibsen wrote:
> Hi
>
> Oh I have thought that some end-users want FileConsumer to keep retry
> consuming the same filer over and over again if it could not be
> processed, so the postAction could have a 3rd option or we could have
> an option to set this feature (kinda like noop but only for when the
> file could not be processed)
>
>
>
> /Claus Ibsen
> Apache Camel Committer
> Blog: http://davsclaus.blogspot.com/
>
>
>
> On Wed, Nov 19, 2008 at 10:35 AM, Claus Ibsen <cl...@gmail.com> wrote:
>   
>> Hi
>>
>> The store idea is good as it can be used for the idempotent consumer
>> as well so we can use it to persist as well, so it can survive
>> restarts. We need to allow it to be pluggable so users can use a
>> shared DB if they use grid, or maybe some of that fancy terracote
>> thing that distributes memory caches.
>>
>> But turning back to the file consumer. I really think the noop=true
>> options should be deprecated as well. The file is like an inbox where
>> if a file is dropped it is consumed once. After processing the file is
>> deleted or moved to another destination. Now with this "remember list"
>> we have a serious issue if the inbox receives file with the same name
>> but the content of the file is different. What if someone uploads a
>> file to a FTP server and the filename is always fixed (= the same).
>> Now we have a complex situation as we need to hash the file content to
>> be able to determine if the file is different, or not support it at
>> all.
>>
>> I am mostly keen to keep it simpler and as Hadrian said "keep it lean".
>>
>> So I am voting for:
>> a) to remove noop as wel
>> b) to always delete or move file after processing (we should support
>> moving files to a different folder if exchange failed)
>>
>> Ad b)
>> We should support moving files using different pattern depending on
>> - exchange OK
>> - exchange Failed
>> I have though about introducing some better URI options to express this
>>
>> Something along the lines of (think of better uri option names)
>> postAction=delete
>>
>> postAction=move
>> moveCompleteExpression=./done/${file:name}.bak
>> moveErrorExpression=./error/${date:now:yyyyMMdd}/${file:name}.error
>>
>> And we should have defaults as well, so if moveErrorExpression is
>> omitted it defaults to the completed move.
>>
>>
>> And then we could consider @deprecating all the other pre and postfix
>> URI option we have in favor of the power of the expression instead.
>>
>>
>>
>> But the list store is not wasted as we can use it for the idempotent
>> as well and for other areas.
>>
>>
>>
>>
>>
>> /Claus Ibsen
>> Apache Camel Committer
>> Blog: http://davsclaus.blogspot.com/
>>
>>
>>
>> On Wed, Nov 19, 2008 at 4:04 AM, Jon Anstey <ja...@gmail.com> wrote:
>>     
>>> Hmmm... yeah, I like this suggestion. It may be just what we need here!
>>> Thanks!
>>>
>>> On Tue, Nov 18, 2008 at 4:11 PM, Gert Vanthienen
>>> <ge...@skynet.be>wrote:
>>>
>>>       
>>>> Jon,
>>>>
>>>> How about if we enhance the file consumer to keep track of files that have
>>>> already been processed instead of using a timestamp?  The timestamp approach
>>>> is a bit error-prone (just touching the file by accident can set it off
>>>> again).
>>>> If we provide multiple implementations for the storage mechanism to keep
>>>> this information, we can cover a lot of use cases (similar to the message id
>>>> store for an idempotent consumer):
>>>> - an in-memory store for testing purposes
>>>> - a file-based implementation for basic production environments
>>>> - a database- or ldap-backed implementation for clustered environments,
>>>> where a file can arrive through multiple directories
>>>>
>>>> Regards,
>>>>
>>>> Gert
>>>>
>>>> Jon Anstey schreef:
>>>>
>>>>  The algorithm that checks whether a file should be consumed based on
>>>>         
>>>>> timestamp has been deprecated for a while now (see
>>>>> http://activemq.apache.org/camel/file.html). I've removed this on my
>>>>> local
>>>>> branch only to realize that it introduces a bit of an ugly problem...
>>>>> essentially since files will be processed always (modified or not) in the
>>>>> case of noop=true or if a fault has been set, the same file will be
>>>>> processed over and over again... not good!
>>>>>
>>>>> The original intent of removing the timestamp checking was to simplify the
>>>>> consumer. I think that in trying to get around this new issue we may make
>>>>> it
>>>>> even more complicated!
>>>>>
>>>>> I'm wondering if there is a simple solution to this that I'm just not
>>>>> seeing
>>>>> yet or if maybe this issue was discussed before...
>>>>>
>>>>>
>>>>>
>>>>>           
>>>>         
>>> --
>>> Cheers,
>>> Jon
>>>
>>> http://janstey.blogspot.com/
>>>
>>>       
>
>   


Re: Deprecation of file consumer timestamp

Posted by Claus Ibsen <cl...@gmail.com>.
Hi

Oh I have thought that some end-users want FileConsumer to keep retry
consuming the same filer over and over again if it could not be
processed, so the postAction could have a 3rd option or we could have
an option to set this feature (kinda like noop but only for when the
file could not be processed)



/Claus Ibsen
Apache Camel Committer
Blog: http://davsclaus.blogspot.com/



On Wed, Nov 19, 2008 at 10:35 AM, Claus Ibsen <cl...@gmail.com> wrote:
> Hi
>
> The store idea is good as it can be used for the idempotent consumer
> as well so we can use it to persist as well, so it can survive
> restarts. We need to allow it to be pluggable so users can use a
> shared DB if they use grid, or maybe some of that fancy terracote
> thing that distributes memory caches.
>
> But turning back to the file consumer. I really think the noop=true
> options should be deprecated as well. The file is like an inbox where
> if a file is dropped it is consumed once. After processing the file is
> deleted or moved to another destination. Now with this "remember list"
> we have a serious issue if the inbox receives file with the same name
> but the content of the file is different. What if someone uploads a
> file to a FTP server and the filename is always fixed (= the same).
> Now we have a complex situation as we need to hash the file content to
> be able to determine if the file is different, or not support it at
> all.
>
> I am mostly keen to keep it simpler and as Hadrian said "keep it lean".
>
> So I am voting for:
> a) to remove noop as wel
> b) to always delete or move file after processing (we should support
> moving files to a different folder if exchange failed)
>
> Ad b)
> We should support moving files using different pattern depending on
> - exchange OK
> - exchange Failed
> I have though about introducing some better URI options to express this
>
> Something along the lines of (think of better uri option names)
> postAction=delete
>
> postAction=move
> moveCompleteExpression=./done/${file:name}.bak
> moveErrorExpression=./error/${date:now:yyyyMMdd}/${file:name}.error
>
> And we should have defaults as well, so if moveErrorExpression is
> omitted it defaults to the completed move.
>
>
> And then we could consider @deprecating all the other pre and postfix
> URI option we have in favor of the power of the expression instead.
>
>
>
> But the list store is not wasted as we can use it for the idempotent
> as well and for other areas.
>
>
>
>
>
> /Claus Ibsen
> Apache Camel Committer
> Blog: http://davsclaus.blogspot.com/
>
>
>
> On Wed, Nov 19, 2008 at 4:04 AM, Jon Anstey <ja...@gmail.com> wrote:
>> Hmmm... yeah, I like this suggestion. It may be just what we need here!
>> Thanks!
>>
>> On Tue, Nov 18, 2008 at 4:11 PM, Gert Vanthienen
>> <ge...@skynet.be>wrote:
>>
>>> Jon,
>>>
>>> How about if we enhance the file consumer to keep track of files that have
>>> already been processed instead of using a timestamp?  The timestamp approach
>>> is a bit error-prone (just touching the file by accident can set it off
>>> again).
>>> If we provide multiple implementations for the storage mechanism to keep
>>> this information, we can cover a lot of use cases (similar to the message id
>>> store for an idempotent consumer):
>>> - an in-memory store for testing purposes
>>> - a file-based implementation for basic production environments
>>> - a database- or ldap-backed implementation for clustered environments,
>>> where a file can arrive through multiple directories
>>>
>>> Regards,
>>>
>>> Gert
>>>
>>> Jon Anstey schreef:
>>>
>>>  The algorithm that checks whether a file should be consumed based on
>>>> timestamp has been deprecated for a while now (see
>>>> http://activemq.apache.org/camel/file.html). I've removed this on my
>>>> local
>>>> branch only to realize that it introduces a bit of an ugly problem...
>>>> essentially since files will be processed always (modified or not) in the
>>>> case of noop=true or if a fault has been set, the same file will be
>>>> processed over and over again... not good!
>>>>
>>>> The original intent of removing the timestamp checking was to simplify the
>>>> consumer. I think that in trying to get around this new issue we may make
>>>> it
>>>> even more complicated!
>>>>
>>>> I'm wondering if there is a simple solution to this that I'm just not
>>>> seeing
>>>> yet or if maybe this issue was discussed before...
>>>>
>>>>
>>>>
>>>
>>>
>>
>>
>> --
>> Cheers,
>> Jon
>>
>> http://janstey.blogspot.com/
>>
>

Re: Deprecation of file consumer timestamp

Posted by Claus Ibsen <cl...@gmail.com>.
Hi

The store idea is good as it can be used for the idempotent consumer
as well so we can use it to persist as well, so it can survive
restarts. We need to allow it to be pluggable so users can use a
shared DB if they use grid, or maybe some of that fancy terracote
thing that distributes memory caches.

But turning back to the file consumer. I really think the noop=true
options should be deprecated as well. The file is like an inbox where
if a file is dropped it is consumed once. After processing the file is
deleted or moved to another destination. Now with this "remember list"
we have a serious issue if the inbox receives file with the same name
but the content of the file is different. What if someone uploads a
file to a FTP server and the filename is always fixed (= the same).
Now we have a complex situation as we need to hash the file content to
be able to determine if the file is different, or not support it at
all.

I am mostly keen to keep it simpler and as Hadrian said "keep it lean".

So I am voting for:
a) to remove noop as wel
b) to always delete or move file after processing (we should support
moving files to a different folder if exchange failed)

Ad b)
We should support moving files using different pattern depending on
- exchange OK
- exchange Failed
I have though about introducing some better URI options to express this

Something along the lines of (think of better uri option names)
postAction=delete

postAction=move
moveCompleteExpression=./done/${file:name}.bak
moveErrorExpression=./error/${date:now:yyyyMMdd}/${file:name}.error

And we should have defaults as well, so if moveErrorExpression is
omitted it defaults to the completed move.


And then we could consider @deprecating all the other pre and postfix
URI option we have in favor of the power of the expression instead.



But the list store is not wasted as we can use it for the idempotent
as well and for other areas.





/Claus Ibsen
Apache Camel Committer
Blog: http://davsclaus.blogspot.com/



On Wed, Nov 19, 2008 at 4:04 AM, Jon Anstey <ja...@gmail.com> wrote:
> Hmmm... yeah, I like this suggestion. It may be just what we need here!
> Thanks!
>
> On Tue, Nov 18, 2008 at 4:11 PM, Gert Vanthienen
> <ge...@skynet.be>wrote:
>
>> Jon,
>>
>> How about if we enhance the file consumer to keep track of files that have
>> already been processed instead of using a timestamp?  The timestamp approach
>> is a bit error-prone (just touching the file by accident can set it off
>> again).
>> If we provide multiple implementations for the storage mechanism to keep
>> this information, we can cover a lot of use cases (similar to the message id
>> store for an idempotent consumer):
>> - an in-memory store for testing purposes
>> - a file-based implementation for basic production environments
>> - a database- or ldap-backed implementation for clustered environments,
>> where a file can arrive through multiple directories
>>
>> Regards,
>>
>> Gert
>>
>> Jon Anstey schreef:
>>
>>  The algorithm that checks whether a file should be consumed based on
>>> timestamp has been deprecated for a while now (see
>>> http://activemq.apache.org/camel/file.html). I've removed this on my
>>> local
>>> branch only to realize that it introduces a bit of an ugly problem...
>>> essentially since files will be processed always (modified or not) in the
>>> case of noop=true or if a fault has been set, the same file will be
>>> processed over and over again... not good!
>>>
>>> The original intent of removing the timestamp checking was to simplify the
>>> consumer. I think that in trying to get around this new issue we may make
>>> it
>>> even more complicated!
>>>
>>> I'm wondering if there is a simple solution to this that I'm just not
>>> seeing
>>> yet or if maybe this issue was discussed before...
>>>
>>>
>>>
>>
>>
>
>
> --
> Cheers,
> Jon
>
> http://janstey.blogspot.com/
>

Re: Deprecation of file consumer timestamp

Posted by Jon Anstey <ja...@gmail.com>.
Hmmm... yeah, I like this suggestion. It may be just what we need here!
Thanks!

On Tue, Nov 18, 2008 at 4:11 PM, Gert Vanthienen
<ge...@skynet.be>wrote:

> Jon,
>
> How about if we enhance the file consumer to keep track of files that have
> already been processed instead of using a timestamp?  The timestamp approach
> is a bit error-prone (just touching the file by accident can set it off
> again).
> If we provide multiple implementations for the storage mechanism to keep
> this information, we can cover a lot of use cases (similar to the message id
> store for an idempotent consumer):
> - an in-memory store for testing purposes
> - a file-based implementation for basic production environments
> - a database- or ldap-backed implementation for clustered environments,
> where a file can arrive through multiple directories
>
> Regards,
>
> Gert
>
> Jon Anstey schreef:
>
>  The algorithm that checks whether a file should be consumed based on
>> timestamp has been deprecated for a while now (see
>> http://activemq.apache.org/camel/file.html). I've removed this on my
>> local
>> branch only to realize that it introduces a bit of an ugly problem...
>> essentially since files will be processed always (modified or not) in the
>> case of noop=true or if a fault has been set, the same file will be
>> processed over and over again... not good!
>>
>> The original intent of removing the timestamp checking was to simplify the
>> consumer. I think that in trying to get around this new issue we may make
>> it
>> even more complicated!
>>
>> I'm wondering if there is a simple solution to this that I'm just not
>> seeing
>> yet or if maybe this issue was discussed before...
>>
>>
>>
>
>


-- 
Cheers,
Jon

http://janstey.blogspot.com/

Re: Deprecation of file consumer timestamp

Posted by Gert Vanthienen <ge...@skynet.be>.
Jon,

How about if we enhance the file consumer to keep track of files that 
have already been processed instead of using a timestamp?  The timestamp 
approach is a bit error-prone (just touching the file by accident can 
set it off again).
If we provide multiple implementations for the storage mechanism to keep 
this information, we can cover a lot of use cases (similar to the 
message id store for an idempotent consumer):
- an in-memory store for testing purposes
- a file-based implementation for basic production environments
- a database- or ldap-backed implementation for clustered environments, 
where a file can arrive through multiple directories

Regards,

Gert

Jon Anstey schreef:
> The algorithm that checks whether a file should be consumed based on
> timestamp has been deprecated for a while now (see
> http://activemq.apache.org/camel/file.html). I've removed this on my local
> branch only to realize that it introduces a bit of an ugly problem...
> essentially since files will be processed always (modified or not) in the
> case of noop=true or if a fault has been set, the same file will be
> processed over and over again... not good!
>
> The original intent of removing the timestamp checking was to simplify the
> consumer. I think that in trying to get around this new issue we may make it
> even more complicated!
>
> I'm wondering if there is a simple solution to this that I'm just not seeing
> yet or if maybe this issue was discussed before...
>
>