You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@camel.apache.org by Christopher Hammack <ch...@fadedsky.com> on 2008/12/02 23:51:07 UTC

Concerns about File endpoint

In addition to the memory leak issue in Camel 1.5.0, I have a few other
concerns about the file consumer endpoint--some of which could be a
misunderstanding on my part:

1. When using the default move capability (moving a file to .camel) after
it has been picked up, the java.io.File object refers to the path BEFORE the
move, not after. So in order to actually read the file, my processing code
must have knowledge of which path the file was moved to. Is this
intentional?

2. Occasionally, especially when the system is under considerable load, the
java.io.File object that I get is not available in the moved location, which
generates a FileNotFoundException. When I check that location later on, the
file is in the correct location. Looking at the code, it seems like the
message should not be being propogated back to me prior to the rename
occurring, but it is apparently happening. Any thoughts?

The use case for this is a very large number of small files is being dropped
into a directory. This directory is then being scanned by camel's file
endpoint. The files as they are discovered are then moved to the .camel
directory, and the filename is put onto a jms endpoint. A clustered set of
camel processors then pull the filename off the endpoint and process the
file, and then delete it.

Any suggestions would be appreciated as right now I'm "stranding" about one
out of every 1000 files because the processing checks for the file prior to
the file actually being in the moved location.

Thanks.
--
View this message in context: http://www.nabble.com/Concerns-about-File-endpoint-tp20802855s22882p20802855.html
Sent from the Camel - Users mailing list archive at Nabble.com.

Re: Concerns about File endpoint

Posted by Claus Ibsen <cl...@gmail.com>.

Hi Chris

I am doing the final testing on the pre move on the 1.5.1 branch. It
has been commited to the trunk.

I also updated the wiki page for the file component to explain a bit
more on the file repository for the idempotent thing we have in Camel
2.0.
The file repo is basic as it's just a persistence store for a 1st
level cache - but it survives server restarts.

We can always improve later if needed.

But if you need a really large idempotent repo the we have the JPA you can use.

Yeah I am hoping we get some time sooner to get the JMS component
improved in 2.0 to have this exchange transfer as it has been
requested a few times and good for internal routing using the JMS
queues to survive restarts.



/Claus Ibsen
Apache Camel Committer
Blog: http://davsclaus.blogspot.com/



On Thu, Dec 4, 2008 at 4:46 PM, Christopher Hammack
<ch...@fadedsky.com> wrote:
>
> I opened CAMEL-1148 on the preMove concept.
>
> I would be willing to test the transferExchange when you get that going.
>
> It's a good question on the file-based persistence.  I suppose you could do
> something simple like putting a hidden file that is a java serialized object
> of the list but I think, at least for our situation, scalability would be an
> issue.  To perform well you'd need a datastructure that you could either
> natively search without it being in memory, or accept the limitation that
> all of the metadata must be in memory (e.g. a hashmap)--but that probably
> won't scale well.  I suppose you could use derby (but that's basically just
> jpa anyhow) or something from hadoop or one of the persistence technologies
> used in activemq.
>
> For us, we bring in on the order of several hundred thousand to million
> files per day, and so keeping track of that much metadata and trying to find
> the new subset will be probably be untenable on anything but a full blown
> database store.   That's why we came up with the preMove concept as it
> allows us to use the actual filesystem for marking which files have been
> scanned, and by using persistence in JMS we survive restarts.  I can see
> situations where there are much less files and that could work though.
>
>
> Claus Ibsen-2 wrote:
>>
>> Hi
>>
>> Glad I could help. The preMove command is a very good idea. Please
>> feel free to create a ticket about it in JIRA.
>>
>> Yeah the jms-component auto deciding the javax.jms.XXX message type is
>> an issue I also would like to remedy for Camel 2.0.
>> There is a ticket about it.
>>
>> However you want to only send the java.io.File over the JMS and not
>> the actual payload of the file (kinda like sending the pointer of the
>> file).
>> I would envision that java.io.File by default would load the file
>> content and send that over JMS. So you want to keep it as a
>> java.io.File object and that's it.
>>
>> We have a ticket for sending the exchange itself, like we have for
>> camel-mina: transferExchange=true. That one would resolve the issue
>> you have, since you are using Camel as both sender and reciever of the
>> JMS queues. There are some ticket in JIRA for this. Feel free to
>> comment and vote for them.
>>
>> We might get started on them pretty soon, then you could help test it
>> on your system.
>>
>>
>> The idempotent repository for the file consumer can be changed to a
>> jpa version, or you can implement your own. The jpa will persist in a
>> DB and thus survive restarts. We are also planning on a file base repo
>> as well. In fact it's on my next todo list. Do you have any
>> suggestions / requirements for such a file based repo?
>>
>>
>>
>> /Claus Ibsen
>> Apache Camel Committer
>> Blog: http://davsclaus.blogspot.com/
>>
>>
>>
>> On Wed, Dec 3, 2008 at 4:27 PM, Christopher Hammack
>> <ch...@fadedsky.com> wrote:
>>>
>>> Thanks for the clarification.  This does completely explain the
>>> situation,
>>> and applying a variant of your solution "a" seems to have things working
>>> much more reliably.
>>>
>>> However, I'd like to suggest that you add a "preMove" option as it seems
>>> to
>>> be pretty much a requirement for doing clustered seda-style processing
>>> from
>>> a file endpoint.  I suppose you could use the new noop/idempotent
>>> capabilities that you have added in 2.0, but it's also nice to have
>>> separation between files that have yet to be discovered and files that
>>> are
>>> in process and also I'm a little leery of keeping state information like
>>> that around as the file endpoint idempotent data presumably (maybe I'm
>>> wrong?) doesn't persist across restarts, etc.
>>>
>>> Also, is there a way to disable the implicit conversion of java.io.File
>>> to a
>>> jms bytes message on JMS?  This is also not desirable in this situation.
>>> Putting gigabytes of byte data on jms does not work out very well.  To
>>> get
>>> around this we have to convert the java.io.File to a String prior to
>>> putting
>>> it on JMS, and then convert it back to a java.io.File so we can give our
>>> users the ability to use the camel built in transformers to transform to
>>> the
>>> input method of their choice (byte[], InputStream, FileReader,
>>> java.io.File,
>>> etc.).  This works, but it kind of takes away from the "cleanliness" of
>>> the
>>> routes.
>>>
>>> Thanks for your help--we've found camel to be much superior in many ways
>>> to
>>> another animal-named esb product that we're attempting to migrate away
>>> from.
>>>
>>>
>>>
>>> Claus Ibsen-2 wrote:
>>>>
>>>> Hi
>>>>
>>>> #1
>>>> The move is executed *AFTER* the routing.
>>>> The idea is that you process the file while it's in the target folder
>>>> (where its dropped) and after processing you can move it to a backup
>>>> folder.
>>>>
>>>> #2
>>>> Ah I speculate that the JMS consumer is faster than the file consumer
>>>> so when you drop a JMS message with the filename pointing at the move
>>>> folder then the JMS consumer in some circumstances be ahead of the
>>>> file consumer and trying to get the file before it's actually moved
>>>> there. Hence #1
>>>>
>>>>
>>>> You could to fix
>>>> =============
>>>> b) use camel to route and move the file using pipes and filters
>>>> from("file://inbox?delete=true").to("file://moved").to("jms:queue:pleasegetthefilenow");
>>>> Note: However this will read the file content and save it as a new
>>>> file, it's not a native File IO move operation
>>>>
>>>> a) move the file yourself and then afterwards send the JMS message.
>>>> from("file://inbox").to("bean:myFileMover").to("jms:queue:pleasegetthefilenow");
>>>>
>>>> Using a POJO bean you can move the file yourself using File rename.
>>>>
>>>>
>>>>
>>>> /Claus Ibsen
>>>> Apache Camel Committer
>>>> Blog: http://davsclaus.blogspot.com/
>>>>
>>>>
>>>>
>>>> On Tue, Dec 2, 2008 at 11:51 PM, Christopher Hammack
>>>> <ch...@fadedsky.com> wrote:
>>>>>
>>>>> In addition to the memory leak issue in Camel 1.5.0, I have a few other
>>>>> concerns about the file consumer endpoint--some of which could be a
>>>>> misunderstanding on my part:
>>>>>
>>>>> 1.  When using the default move capability (moving a file to .camel)
>>>>> after
>>>>> it has been picked up, the java.io.File object refers to the path
>>>>> BEFORE
>>>>> the
>>>>> move, not after.  So in order to actually read the file, my processing
>>>>> code
>>>>> must have knowledge of which path the file was moved to.  Is this
>>>>> intentional?
>>>>>
>>>>> 2.  Occasionally, especially when the system is under considerable
>>>>> load,
>>>>> the
>>>>> java.io.File object that I get is not available in the moved location,
>>>>> which
>>>>> generates a FileNotFoundException.  When I check that location later
>>>>> on,
>>>>> the
>>>>> file is in the correct location.  Looking at the code, it seems like
>>>>> the
>>>>> message should not be being propogated back to me prior to the rename
>>>>> occurring, but it is apparently happening.  Any thoughts?
>>>>>
>>>>> The use case for this is a very large number of small files is being
>>>>> dropped
>>>>> into a directory.  This directory is then being scanned by camel's file
>>>>> endpoint.  The files as they are discovered are then moved to the
>>>>> .camel
>>>>> directory, and the filename is put onto a jms endpoint.  A clustered
>>>>> set
>>>>> of
>>>>> camel processors then pull the filename off the endpoint and process
>>>>> the
>>>>> file, and then delete it.
>>>>>
>>>>> Any suggestions would be appreciated as right now I'm "stranding" about
>>>>> one
>>>>> out of every 1000 files because the processing checks for the file
>>>>> prior
>>>>> to
>>>>> the file actually being in the moved location.
>>>>>
>>>>> Thanks.
>>>>> --
>>>>> View this message in context:
>>>>> http://www.nabble.com/Concerns-about-File-endpoint-tp20802855s22882p20802855.html
>>>>> Sent from the Camel - Users mailing list archive at Nabble.com.
>>>>>
>>>>>
>>>>
>>>>
>>>
>>> --
>>> View this message in context:
>>> http://www.nabble.com/Concerns-about-File-endpoint-tp20802855s22882p20815087.html
>>> Sent from the Camel - Users mailing list archive at Nabble.com.
>>>
>>>
>>
>>
>
> --
> View this message in context: http://www.nabble.com/Concerns-about-File-endpoint-tp20802855s22882p20836110.html
> Sent from the Camel - Users mailing list archive at Nabble.com.
>
>

Re: Concerns about File endpoint

Posted by Christopher Hammack <ch...@fadedsky.com>.

I opened CAMEL-1148 on the preMove concept.

I would be willing to test the transferExchange when you get that going.

It's a good question on the file-based persistence.  I suppose you could do
something simple like putting a hidden file that is a java serialized object
of the list but I think, at least for our situation, scalability would be an
issue.  To perform well you'd need a datastructure that you could either
natively search without it being in memory, or accept the limitation that
all of the metadata must be in memory (e.g. a hashmap)--but that probably
won't scale well.  I suppose you could use derby (but that's basically just
jpa anyhow) or something from hadoop or one of the persistence technologies
used in activemq.

For us, we bring in on the order of several hundred thousand to million
files per day, and so keeping track of that much metadata and trying to find
the new subset will be probably be untenable on anything but a full blown
database store.   That's why we came up with the preMove concept as it
allows us to use the actual filesystem for marking which files have been
scanned, and by using persistence in JMS we survive restarts.  I can see
situations where there are much less files and that could work though.


Claus Ibsen-2 wrote:
> 
> Hi
> 
> Glad I could help. The preMove command is a very good idea. Please
> feel free to create a ticket about it in JIRA.
> 
> Yeah the jms-component auto deciding the javax.jms.XXX message type is
> an issue I also would like to remedy for Camel 2.0.
> There is a ticket about it.
> 
> However you want to only send the java.io.File over the JMS and not
> the actual payload of the file (kinda like sending the pointer of the
> file).
> I would envision that java.io.File by default would load the file
> content and send that over JMS. So you want to keep it as a
> java.io.File object and that's it.
> 
> We have a ticket for sending the exchange itself, like we have for
> camel-mina: transferExchange=true. That one would resolve the issue
> you have, since you are using Camel as both sender and reciever of the
> JMS queues. There are some ticket in JIRA for this. Feel free to
> comment and vote for them.
> 
> We might get started on them pretty soon, then you could help test it
> on your system.
> 
> 
> The idempotent repository for the file consumer can be changed to a
> jpa version, or you can implement your own. The jpa will persist in a
> DB and thus survive restarts. We are also planning on a file base repo
> as well. In fact it's on my next todo list. Do you have any
> suggestions / requirements for such a file based repo?
> 
> 
> 
> /Claus Ibsen
> Apache Camel Committer
> Blog: http://davsclaus.blogspot.com/
> 
> 
> 
> On Wed, Dec 3, 2008 at 4:27 PM, Christopher Hammack
> <ch...@fadedsky.com> wrote:
>>
>> Thanks for the clarification.  This does completely explain the
>> situation,
>> and applying a variant of your solution "a" seems to have things working
>> much more reliably.
>>
>> However, I'd like to suggest that you add a "preMove" option as it seems
>> to
>> be pretty much a requirement for doing clustered seda-style processing
>> from
>> a file endpoint.  I suppose you could use the new noop/idempotent
>> capabilities that you have added in 2.0, but it's also nice to have
>> separation between files that have yet to be discovered and files that
>> are
>> in process and also I'm a little leery of keeping state information like
>> that around as the file endpoint idempotent data presumably (maybe I'm
>> wrong?) doesn't persist across restarts, etc.
>>
>> Also, is there a way to disable the implicit conversion of java.io.File
>> to a
>> jms bytes message on JMS?  This is also not desirable in this situation.
>> Putting gigabytes of byte data on jms does not work out very well.  To
>> get
>> around this we have to convert the java.io.File to a String prior to
>> putting
>> it on JMS, and then convert it back to a java.io.File so we can give our
>> users the ability to use the camel built in transformers to transform to
>> the
>> input method of their choice (byte[], InputStream, FileReader,
>> java.io.File,
>> etc.).  This works, but it kind of takes away from the "cleanliness" of
>> the
>> routes.
>>
>> Thanks for your help--we've found camel to be much superior in many ways
>> to
>> another animal-named esb product that we're attempting to migrate away
>> from.
>>
>>
>>
>> Claus Ibsen-2 wrote:
>>>
>>> Hi
>>>
>>> #1
>>> The move is executed *AFTER* the routing.
>>> The idea is that you process the file while it's in the target folder
>>> (where its dropped) and after processing you can move it to a backup
>>> folder.
>>>
>>> #2
>>> Ah I speculate that the JMS consumer is faster than the file consumer
>>> so when you drop a JMS message with the filename pointing at the move
>>> folder then the JMS consumer in some circumstances be ahead of the
>>> file consumer and trying to get the file before it's actually moved
>>> there. Hence #1
>>>
>>>
>>> You could to fix
>>> =============
>>> b) use camel to route and move the file using pipes and filters
>>> from("file://inbox?delete=true").to("file://moved").to("jms:queue:pleasegetthefilenow");
>>> Note: However this will read the file content and save it as a new
>>> file, it's not a native File IO move operation
>>>
>>> a) move the file yourself and then afterwards send the JMS message.
>>> from("file://inbox").to("bean:myFileMover").to("jms:queue:pleasegetthefilenow");
>>>
>>> Using a POJO bean you can move the file yourself using File rename.
>>>
>>>
>>>
>>> /Claus Ibsen
>>> Apache Camel Committer
>>> Blog: http://davsclaus.blogspot.com/
>>>
>>>
>>>
>>> On Tue, Dec 2, 2008 at 11:51 PM, Christopher Hammack
>>> <ch...@fadedsky.com> wrote:
>>>>
>>>> In addition to the memory leak issue in Camel 1.5.0, I have a few other
>>>> concerns about the file consumer endpoint--some of which could be a
>>>> misunderstanding on my part:
>>>>
>>>> 1.  When using the default move capability (moving a file to .camel)
>>>> after
>>>> it has been picked up, the java.io.File object refers to the path
>>>> BEFORE
>>>> the
>>>> move, not after.  So in order to actually read the file, my processing
>>>> code
>>>> must have knowledge of which path the file was moved to.  Is this
>>>> intentional?
>>>>
>>>> 2.  Occasionally, especially when the system is under considerable
>>>> load,
>>>> the
>>>> java.io.File object that I get is not available in the moved location,
>>>> which
>>>> generates a FileNotFoundException.  When I check that location later
>>>> on,
>>>> the
>>>> file is in the correct location.  Looking at the code, it seems like
>>>> the
>>>> message should not be being propogated back to me prior to the rename
>>>> occurring, but it is apparently happening.  Any thoughts?
>>>>
>>>> The use case for this is a very large number of small files is being
>>>> dropped
>>>> into a directory.  This directory is then being scanned by camel's file
>>>> endpoint.  The files as they are discovered are then moved to the
>>>> .camel
>>>> directory, and the filename is put onto a jms endpoint.  A clustered
>>>> set
>>>> of
>>>> camel processors then pull the filename off the endpoint and process
>>>> the
>>>> file, and then delete it.
>>>>
>>>> Any suggestions would be appreciated as right now I'm "stranding" about
>>>> one
>>>> out of every 1000 files because the processing checks for the file
>>>> prior
>>>> to
>>>> the file actually being in the moved location.
>>>>
>>>> Thanks.
>>>> --
>>>> View this message in context:
>>>> http://www.nabble.com/Concerns-about-File-endpoint-tp20802855s22882p20802855.html
>>>> Sent from the Camel - Users mailing list archive at Nabble.com.
>>>>
>>>>
>>>
>>>
>>
>> --
>> View this message in context:
>> http://www.nabble.com/Concerns-about-File-endpoint-tp20802855s22882p20815087.html
>> Sent from the Camel - Users mailing list archive at Nabble.com.
>>
>>
> 
> 

-- 
View this message in context: http://www.nabble.com/Concerns-about-File-endpoint-tp20802855s22882p20836110.html
Sent from the Camel - Users mailing list archive at Nabble.com.

Re: Concerns about File endpoint

Posted by Claus Ibsen <cl...@gmail.com>.

Hi

Glad I could help. The preMove command is a very good idea. Please
feel free to create a ticket about it in JIRA.

Yeah the jms-component auto deciding the javax.jms.XXX message type is
an issue I also would like to remedy for Camel 2.0.
There is a ticket about it.

However you want to only send the java.io.File over the JMS and not
the actual payload of the file (kinda like sending the pointer of the
file).
I would envision that java.io.File by default would load the file
content and send that over JMS. So you want to keep it as a
java.io.File object and that's it.

We have a ticket for sending the exchange itself, like we have for
camel-mina: transferExchange=true. That one would resolve the issue
you have, since you are using Camel as both sender and reciever of the
JMS queues. There are some ticket in JIRA for this. Feel free to
comment and vote for them.

We might get started on them pretty soon, then you could help test it
on your system.


The idempotent repository for the file consumer can be changed to a
jpa version, or you can implement your own. The jpa will persist in a
DB and thus survive restarts. We are also planning on a file base repo
as well. In fact it's on my next todo list. Do you have any
suggestions / requirements for such a file based repo?



/Claus Ibsen
Apache Camel Committer
Blog: http://davsclaus.blogspot.com/



On Wed, Dec 3, 2008 at 4:27 PM, Christopher Hammack
<ch...@fadedsky.com> wrote:
>
> Thanks for the clarification.  This does completely explain the situation,
> and applying a variant of your solution "a" seems to have things working
> much more reliably.
>
> However, I'd like to suggest that you add a "preMove" option as it seems to
> be pretty much a requirement for doing clustered seda-style processing from
> a file endpoint.  I suppose you could use the new noop/idempotent
> capabilities that you have added in 2.0, but it's also nice to have
> separation between files that have yet to be discovered and files that are
> in process and also I'm a little leery of keeping state information like
> that around as the file endpoint idempotent data presumably (maybe I'm
> wrong?) doesn't persist across restarts, etc.
>
> Also, is there a way to disable the implicit conversion of java.io.File to a
> jms bytes message on JMS?  This is also not desirable in this situation.
> Putting gigabytes of byte data on jms does not work out very well.  To get
> around this we have to convert the java.io.File to a String prior to putting
> it on JMS, and then convert it back to a java.io.File so we can give our
> users the ability to use the camel built in transformers to transform to the
> input method of their choice (byte[], InputStream, FileReader, java.io.File,
> etc.).  This works, but it kind of takes away from the "cleanliness" of the
> routes.
>
> Thanks for your help--we've found camel to be much superior in many ways to
> another animal-named esb product that we're attempting to migrate away from.
>
>
>
> Claus Ibsen-2 wrote:
>>
>> Hi
>>
>> #1
>> The move is executed *AFTER* the routing.
>> The idea is that you process the file while it's in the target folder
>> (where its dropped) and after processing you can move it to a backup
>> folder.
>>
>> #2
>> Ah I speculate that the JMS consumer is faster than the file consumer
>> so when you drop a JMS message with the filename pointing at the move
>> folder then the JMS consumer in some circumstances be ahead of the
>> file consumer and trying to get the file before it's actually moved
>> there. Hence #1
>>
>>
>> You could to fix
>> =============
>> b) use camel to route and move the file using pipes and filters
>> from("file://inbox?delete=true").to("file://moved").to("jms:queue:pleasegetthefilenow");
>> Note: However this will read the file content and save it as a new
>> file, it's not a native File IO move operation
>>
>> a) move the file yourself and then afterwards send the JMS message.
>> from("file://inbox").to("bean:myFileMover").to("jms:queue:pleasegetthefilenow");
>>
>> Using a POJO bean you can move the file yourself using File rename.
>>
>>
>>
>> /Claus Ibsen
>> Apache Camel Committer
>> Blog: http://davsclaus.blogspot.com/
>>
>>
>>
>> On Tue, Dec 2, 2008 at 11:51 PM, Christopher Hammack
>> <ch...@fadedsky.com> wrote:
>>>
>>> In addition to the memory leak issue in Camel 1.5.0, I have a few other
>>> concerns about the file consumer endpoint--some of which could be a
>>> misunderstanding on my part:
>>>
>>> 1.  When using the default move capability (moving a file to .camel)
>>> after
>>> it has been picked up, the java.io.File object refers to the path BEFORE
>>> the
>>> move, not after.  So in order to actually read the file, my processing
>>> code
>>> must have knowledge of which path the file was moved to.  Is this
>>> intentional?
>>>
>>> 2.  Occasionally, especially when the system is under considerable load,
>>> the
>>> java.io.File object that I get is not available in the moved location,
>>> which
>>> generates a FileNotFoundException.  When I check that location later on,
>>> the
>>> file is in the correct location.  Looking at the code, it seems like the
>>> message should not be being propogated back to me prior to the rename
>>> occurring, but it is apparently happening.  Any thoughts?
>>>
>>> The use case for this is a very large number of small files is being
>>> dropped
>>> into a directory.  This directory is then being scanned by camel's file
>>> endpoint.  The files as they are discovered are then moved to the .camel
>>> directory, and the filename is put onto a jms endpoint.  A clustered set
>>> of
>>> camel processors then pull the filename off the endpoint and process the
>>> file, and then delete it.
>>>
>>> Any suggestions would be appreciated as right now I'm "stranding" about
>>> one
>>> out of every 1000 files because the processing checks for the file prior
>>> to
>>> the file actually being in the moved location.
>>>
>>> Thanks.
>>> --
>>> View this message in context:
>>> http://www.nabble.com/Concerns-about-File-endpoint-tp20802855s22882p20802855.html
>>> Sent from the Camel - Users mailing list archive at Nabble.com.
>>>
>>>
>>
>>
>
> --
> View this message in context: http://www.nabble.com/Concerns-about-File-endpoint-tp20802855s22882p20815087.html
> Sent from the Camel - Users mailing list archive at Nabble.com.
>
>

Re: Concerns about File endpoint

Posted by Christopher Hammack <ch...@fadedsky.com>.

Thanks for the clarification.  This does completely explain the situation,
and applying a variant of your solution "a" seems to have things working
much more reliably.

However, I'd like to suggest that you add a "preMove" option as it seems to
be pretty much a requirement for doing clustered seda-style processing from
a file endpoint.  I suppose you could use the new noop/idempotent
capabilities that you have added in 2.0, but it's also nice to have
separation between files that have yet to be discovered and files that are
in process and also I'm a little leery of keeping state information like
that around as the file endpoint idempotent data presumably (maybe I'm
wrong?) doesn't persist across restarts, etc.

Also, is there a way to disable the implicit conversion of java.io.File to a
jms bytes message on JMS?  This is also not desirable in this situation. 
Putting gigabytes of byte data on jms does not work out very well.  To get
around this we have to convert the java.io.File to a String prior to putting
it on JMS, and then convert it back to a java.io.File so we can give our
users the ability to use the camel built in transformers to transform to the
input method of their choice (byte[], InputStream, FileReader, java.io.File,
etc.).  This works, but it kind of takes away from the "cleanliness" of the
routes.

Thanks for your help--we've found camel to be much superior in many ways to
another animal-named esb product that we're attempting to migrate away from.

Claus Ibsen-2 wrote:
> 
> Hi
> 
> #1
> The move is executed *AFTER* the routing.
> The idea is that you process the file while it's in the target folder
> (where its dropped) and after processing you can move it to a backup
> folder.
> 
> #2
> Ah I speculate that the JMS consumer is faster than the file consumer
> so when you drop a JMS message with the filename pointing at the move
> folder then the JMS consumer in some circumstances be ahead of the
> file consumer and trying to get the file before it's actually moved
> there. Hence #1
> 
> 
> You could to fix
> =============
> b) use camel to route and move the file using pipes and filters
> from("file://inbox?delete=true").to("file://moved").to("jms:queue:pleasegetthefilenow");
> Note: However this will read the file content and save it as a new
> file, it's not a native File IO move operation
> 
> a) move the file yourself and then afterwards send the JMS message.
> from("file://inbox").to("bean:myFileMover").to("jms:queue:pleasegetthefilenow");
> 
> Using a POJO bean you can move the file yourself using File rename.
> 
> 
> 
> /Claus Ibsen
> Apache Camel Committer
> Blog: http://davsclaus.blogspot.com/
> 
> 
> 
> On Tue, Dec 2, 2008 at 11:51 PM, Christopher Hammack
> <ch...@fadedsky.com> wrote:
>>
>> In addition to the memory leak issue in Camel 1.5.0, I have a few other
>> concerns about the file consumer endpoint--some of which could be a
>> misunderstanding on my part:
>>
>> 1.  When using the default move capability (moving a file to .camel)
>> after
>> it has been picked up, the java.io.File object refers to the path BEFORE
>> the
>> move, not after.  So in order to actually read the file, my processing
>> code
>> must have knowledge of which path the file was moved to.  Is this
>> intentional?
>>
>> 2.  Occasionally, especially when the system is under considerable load,
>> the
>> java.io.File object that I get is not available in the moved location,
>> which
>> generates a FileNotFoundException.  When I check that location later on,
>> the
>> file is in the correct location.  Looking at the code, it seems like the
>> message should not be being propogated back to me prior to the rename
>> occurring, but it is apparently happening.  Any thoughts?
>>
>> The use case for this is a very large number of small files is being
>> dropped
>> into a directory.  This directory is then being scanned by camel's file
>> endpoint.  The files as they are discovered are then moved to the .camel
>> directory, and the filename is put onto a jms endpoint.  A clustered set
>> of
>> camel processors then pull the filename off the endpoint and process the
>> file, and then delete it.
>>
>> Any suggestions would be appreciated as right now I'm "stranding" about
>> one
>> out of every 1000 files because the processing checks for the file prior
>> to
>> the file actually being in the moved location.
>>
>> Thanks.
>> --
>> View this message in context:
>> http://www.nabble.com/Concerns-about-File-endpoint-tp20802855s22882p20802855.html
>> Sent from the Camel - Users mailing list archive at Nabble.com.
>>
>>
> 
> 

-- 
View this message in context: http://www.nabble.com/Concerns-about-File-endpoint-tp20802855s22882p20815087.html
Sent from the Camel - Users mailing list archive at Nabble.com.

Re: Concerns about File endpoint

Posted by Claus Ibsen <cl...@gmail.com>.

Hi

#1
The move is executed *AFTER* the routing.
The idea is that you process the file while it's in the target folder
(where its dropped) and after processing you can move it to a backup
folder.

#2
Ah I speculate that the JMS consumer is faster than the file consumer
so when you drop a JMS message with the filename pointing at the move
folder then the JMS consumer in some circumstances be ahead of the
file consumer and trying to get the file before it's actually moved
there. Hence #1


You could to fix
=============
b) use camel to route and move the file using pipes and filters
from("file://inbox?delete=true").to("file://moved").to("jms:queue:pleasegetthefilenow");
Note: However this will read the file content and save it as a new
file, it's not a native File IO move operation

a) move the file yourself and then afterwards send the JMS message.
from("file://inbox").to("bean:myFileMover").to("jms:queue:pleasegetthefilenow");

Using a POJO bean you can move the file yourself using File rename.



/Claus Ibsen
Apache Camel Committer
Blog: http://davsclaus.blogspot.com/



On Tue, Dec 2, 2008 at 11:51 PM, Christopher Hammack
<ch...@fadedsky.com> wrote:
>
> In addition to the memory leak issue in Camel 1.5.0, I have a few other
> concerns about the file consumer endpoint--some of which could be a
> misunderstanding on my part:
>
> 1.  When using the default move capability (moving a file to .camel) after
> it has been picked up, the java.io.File object refers to the path BEFORE the
> move, not after.  So in order to actually read the file, my processing code
> must have knowledge of which path the file was moved to.  Is this
> intentional?
>
> 2.  Occasionally, especially when the system is under considerable load, the
> java.io.File object that I get is not available in the moved location, which
> generates a FileNotFoundException.  When I check that location later on, the
> file is in the correct location.  Looking at the code, it seems like the
> message should not be being propogated back to me prior to the rename
> occurring, but it is apparently happening.  Any thoughts?
>
> The use case for this is a very large number of small files is being dropped
> into a directory.  This directory is then being scanned by camel's file
> endpoint.  The files as they are discovered are then moved to the .camel
> directory, and the filename is put onto a jms endpoint.  A clustered set of
> camel processors then pull the filename off the endpoint and process the
> file, and then delete it.
>
> Any suggestions would be appreciated as right now I'm "stranding" about one
> out of every 1000 files because the processing checks for the file prior to
> the file actually being in the moved location.
>
> Thanks.
> --
> View this message in context: http://www.nabble.com/Concerns-about-File-endpoint-tp20802855s22882p20802855.html
> Sent from the Camel - Users mailing list archive at Nabble.com.
>
>