You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@camel.apache.org by Peter Hilton <pe...@lunatech.com> on 2013/05/16 12:23:56 UTC

De-duplicate route file name?

I’m using Camel to process files, and need to save a copy of each incoming file to an 'archive' directory. In addition, if another file arrives with the same name as an earlier file, I must add this to the archive with a different file name and not overwrite an existing archive file.

Two questions:

1. Does Camel have something built-in to deal with duplicate messages this way? This differs from ‘Idempotent Consumer’ in that I don't want to skip duplicate files.

2. If there is nothing built in, what do I need to do to implement my own 'De-duplicating Archive' end point?

Note that I have to implement a specific (and weird) file naming strategy: repeatedly receiving 'foo.msg' must result in foo.msg, foo.msg-1, foo.msg-2, foo.msg-3, …

I'm using: Camel 2.11.0, Scala DSL, Scala 2.10.0, sbt 0.12.2, JDK 1.6.0_43, OSX 10.8.3

Peter


Re: De-duplicate route file name?

Posted by Peter Hilton <pe...@lunatech.com>.
Solved! I didn't find anything built in, so I did the following:

1. Add a custom Producer based on the File producer, that overrides createFileName to add a suffix when the file is a duplicate:

	class ArchiveProducer(endpoint: GenericFileEndpoint[File], operations: GenericFileOperations[File])
		extends GenericFileProducer[File](endpoint, operations) {

2. Add a custom endpoint that overrides createProducer to return my custom producer:

	class ArchiveEndpoint(uri: String, component: Component) extends FileEndpoint(uri, component) {

3. Add a custom component that creates my custom endpoint, by overriding buildFileEndpoint:

	class ArchiveComponent extends FileComponent {

4. Register my custom component when setting-up Camel:

	camelContext = CamelContextBuilder()
	camelContext.addComponent("archive", new ArchiveComponent())

5. Change my file:// URIs to archive:// URIs to use my new component.

Peter


On 16 May 2013, at 12:23, Peter Hilton <pe...@lunatech.com> wrote:
> I’m using Camel to process files, and need to save a copy of each incoming file to an 'archive' directory. In addition, if another file arrives with the same name as an earlier file, I must add this to the archive with a different file name and not overwrite an existing archive file.
> 
> Two questions:
> 
> 1. Does Camel have something built-in to deal with duplicate messages this way? This differs from ‘Idempotent Consumer’ in that I don't want to skip duplicate files.
> 
> 2. If there is nothing built in, what do I need to do to implement my own 'De-duplicating Archive' end point?


Re: De-duplicate route file name?

Posted by Claus Ibsen <cl...@gmail.com>.
On Fri, May 17, 2013 at 1:03 PM, Peter Hilton <pe...@lunatech.com> wrote:
> On 17 May 2013, at 09:33, Claus Ibsen <cl...@gmail.com> wrote:
>> You can use the move option on the Camel file consumer, and then use a
>> bean to calculate the file name. http://camel.apache.org/file2
>
> Claus, by the way, your comment let to an 'aha' moment - the kind that Manning talk about :)
>
> Being new to Camel, I hadn't realised the implications of the documentation for File Component’s 'move' property (the implication being that I can write my own code to calculate the file name):
>
>> move: Expression (such as File Language) used to dynamically set the filename when moving it after processing.
>
> Note that a later sentence on the same page seems to contract that I can use this to calculate the file name:
>
>> The move and preMove options should be a directory name, which can be either relative or absolute.
>

Ah yeah let me update the docs to make this more clear. Thanks for
spotting this.


>
> Anyway, your suggestion to use a bean prompted me to drill down (→ http://camel.apache.org/expression → http://camel.apache.org/bean-language) and realise that I can always implement an expression or predicate as a method in my own bean class.
>

Yes any expression / predicate can just be a method on a bean. For the
predicate the returned value is evaluated as a boolean or a non empty
value will be considered true.

> For what it’s worth, I don’t yet see how I would discover that my bean must have a 'calculateFileName' method - whether Camel will pick this method because it's the only one, or whether it has to have that name, for example. I’ll save that for another day.
>

Yeah if there is only one method then that's picked up. You can also
tell Camel the name of the method to use

move=bean:myBean.myMethod

The bean selection logic is covered in CiA book chapter 4, or you can
find some details on the camel bean wiki pages, somewhere.

> Thanks again,
> Peter
>



--
Claus Ibsen
-----------------
www.camelone.org: The open source integration conference.

Red Hat, Inc.
FuseSource is now part of Red Hat
Email: cibsen@redhat.com
Web: http://fusesource.com
Twitter: davsclaus
Blog: http://davsclaus.com
Author of Camel in Action: http://www.manning.com/ibsen

Re: De-duplicate route file name?

Posted by Peter Hilton <pe...@lunatech.com>.
On 17 May 2013, at 09:33, Claus Ibsen <cl...@gmail.com> wrote:
> You can use the move option on the Camel file consumer, and then use a
> bean to calculate the file name. http://camel.apache.org/file2

Claus, by the way, your comment let to an 'aha' moment - the kind that Manning talk about :)

Being new to Camel, I hadn't realised the implications of the documentation for File Component’s 'move' property (the implication being that I can write my own code to calculate the file name):

> move: Expression (such as File Language) used to dynamically set the filename when moving it after processing.

Note that a later sentence on the same page seems to contract that I can use this to calculate the file name:

> The move and preMove options should be a directory name, which can be either relative or absolute.


Anyway, your suggestion to use a bean prompted me to drill down (→ http://camel.apache.org/expression → http://camel.apache.org/bean-language) and realise that I can always implement an expression or predicate as a method in my own bean class.

For what it’s worth, I don’t yet see how I would discover that my bean must have a 'calculateFileName' method - whether Camel will pick this method because it's the only one, or whether it has to have that name, for example. I’ll save that for another day.

Thanks again,
Peter


Re: De-duplicate route file name?

Posted by Peter Hilton <pe...@lunatech.com>.
Ah, cool - thanks. That's much less code than a custom component, endpoint and producer.

I discounted the 'move' option, because the default behaviour seemed to be to move the existing files to make room for the duplicate, instead of picking a different name for the duplicate.

Peter


On 17 May 2013, at 09:33, Claus Ibsen <cl...@gmail.com> wrote:
> You can use the move option on the Camel file consumer, and then use a
> bean to calculate the file name. http://camel.apache.org/file2
> 
> move=bean:myFileCalculatorBean
> 
> <bean id="myFileCalculatorBean" class=...


Re: De-duplicate route file name?

Posted by Claus Ibsen <cl...@gmail.com>.
Hi

You can use the move option on the Camel file consumer, and then use a
bean to calculate the file name.
http://camel.apache.org/file2

move=bean:myFileCalculatorBean

<bean id="myFileCalculatorBean" class=...


And in the bean has a single method

public String calculateFileName(@Header(Exchange.FILE_NAME) String
existingName) {
  // check if the file exists, and if so, then do your - number trick
to find a "free" name
   ...
}


On Thu, May 16, 2013 at 5:04 PM, Peter Hilton <pe...@lunatech.com> wrote:
> On 16 May 2013, at 12:23, Peter Hilton <pe...@lunatech.com> wrote:
>> I’m using Camel to process files, and need to save a copy of each incoming file to an 'archive' directory. In addition, if another file arrives with the same name as an earlier file, I must add this to the archive with a different file name and not overwrite an existing archive file.
>
> I'm currently hoping to do it like this:
>
>         wireTap("file:///archive?fileName=${date:now:yyyy/DDD}/${file:name}", archiveFileNameProcessor)
>
> … where archiveFileNameProcessor is a Processor that modifies the file name (to avoid overwriting existing files).
>
> How can I parse the URL in my processor to get today’s path, e.g. archive/2013/136? The processor to check this directory for existing files, in order to calculate the next file name in the sequence, for duplicate input file names.
>
> Peter
>



-- 
Claus Ibsen
-----------------
www.camelone.org: The open source integration conference.

Red Hat, Inc.
FuseSource is now part of Red Hat
Email: cibsen@redhat.com
Web: http://fusesource.com
Twitter: davsclaus
Blog: http://davsclaus.com
Author of Camel in Action: http://www.manning.com/ibsen

Re: De-duplicate route file name?

Posted by Peter Hilton <pe...@lunatech.com>.
On 16 May 2013, at 12:23, Peter Hilton <pe...@lunatech.com> wrote:
> I’m using Camel to process files, and need to save a copy of each incoming file to an 'archive' directory. In addition, if another file arrives with the same name as an earlier file, I must add this to the archive with a different file name and not overwrite an existing archive file.

I'm currently hoping to do it like this:

	wireTap("file:///archive?fileName=${date:now:yyyy/DDD}/${file:name}", archiveFileNameProcessor)

… where archiveFileNameProcessor is a Processor that modifies the file name (to avoid overwriting existing files).

How can I parse the URL in my processor to get today’s path, e.g. archive/2013/136? The processor to check this directory for existing files, in order to calculate the next file name in the sequence, for duplicate input file names.

Peter