You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@camel.apache.org by wing-tung Leung <wi...@gmail.com> on 2011/08/12 16:51:57 UTC

From http4 via md5checksum to FTP: file cache or streaming ?

Hello,

one of our routes pulls binary data from from a HTTP service using
http4, and then uploads the binary to a remote FTP directory. It also
uses temporary files on the local filesystem behind the scenes , which
makes sense because the binary image data can grow up till 10MB.

Now I want to add one extra step: MD5 checksum verification just after
the download. Because of the size, I prefer not to load all the data
into a byte array to calculate a simple checksum, and use a
InputStream instead. At first sight this seems to work. In the
debugger, I can see this input stream is a wrapper around the locally
cached file, and at the end of the function I return the original
input stream.

This is how the route currently looks like:
    <to uri="http4:/" />
    <to uri="bean:md5sum" />
    <to uri="ftp://{{attachment.ftp.location}}/?username={{attachment.ftp.user}}&amp;password={{attachment.ftp.password}}&amp;binary=true"/>

The processing bean method's signature:
    InputStream process(InputStream buffer,
@Header("mgws_file_md5sum") String expectedChecksum)


But now the route seems to hang. I assume returning the "used" input
stream is wrong, since the FTP component can't do anything useful with
this anymore. I basically see options to fix this:
1 - use fancy stream interception with a custom HttpBinder for http4,
integrating MD5 checksum calculation on the fly
2 - redirect to temporary file explicitly, start a new route for the
checksum, reuse same file for FTP upload and cleanup manually



Any other recommendations? I think this is a quite common use case, so
I guess more experienced Camel user may want to give some useful
advice to a novice like me ..

Some pages I have been looking at:
http://camel.apache.org/http4.html
http://camel.apache.org/file2.html
http://camel.apache.org/stream-caching.html

(using Camel 2.6)

Thanks!

Tung

Re: From http4 via md5checksum to FTP: file cache or streaming ?

Posted by wing-tung Leung <wi...@gmail.com>.
2011/8/12 Magnus Palmér <ma...@gmail.com>:
> I would make your md5bean return a file.
> Not sure if the current Jira issue of not cleaning (deleting) the file until JVM stops will apply for you then or not.

Well, the MD5 processor would probably need to dive into the
InputStream implementation (Camel specific) which holds the reference
to the original file, which I don't like very much. Avoiding knowledge
of the specific implementation is possibly by creating a new file
during the calculation of the MD5 checksum, but then I have the file
content twice on the local filesystem: once from the http4 cache, and
a second copy to pass to the rest of the route.

But I will try to dump the result from http4 into a file myself, I
assume I can pass along that reference around easily.

Thanks for the tip.

Tung

Re: From http4 via md5checksum to FTP: file cache or streaming ?

Posted by Magnus Palmér <ma...@gmail.com>.
I would make your md5bean return a file. 
Not sure if the current Jira issue of not cleaning (deleting) the file until JVM stops will apply for you then or not.

Don't have the link to it right now.

-- 
Magnus Palmér
+46 736 845680

12 aug 2011 kl. 16:51 skrev wing-tung Leung <wi...@gmail.com>:

> Hello,
> 
> one of our routes pulls binary data from from a HTTP service using
> http4, and then uploads the binary to a remote FTP directory. It also
> uses temporary files on the local filesystem behind the scenes , which
> makes sense because the binary image data can grow up till 10MB.
> 
> Now I want to add one extra step: MD5 checksum verification just after
> the download. Because of the size, I prefer not to load all the data
> into a byte array to calculate a simple checksum, and use a
> InputStream instead. At first sight this seems to work. In the
> debugger, I can see this input stream is a wrapper around the locally
> cached file, and at the end of the function I return the original
> input stream.
> 
> This is how the route currently looks like:
>    <to uri="http4:/" />
>    <to uri="bean:md5sum" />
>    <to uri="ftp://{{attachment.ftp.location}}/?username={{attachment.ftp.user}}&amp;password={{attachment.ftp.password}}&amp;binary=true"/>
> 
> The processing bean method's signature:
>    InputStream process(InputStream buffer,
> @Header("mgws_file_md5sum") String expectedChecksum)
> 
> 
> But now the route seems to hang. I assume returning the "used" input
> stream is wrong, since the FTP component can't do anything useful with
> this anymore. I basically see options to fix this:
> 1 - use fancy stream interception with a custom HttpBinder for http4,
> integrating MD5 checksum calculation on the fly
> 2 - redirect to temporary file explicitly, start a new route for the
> checksum, reuse same file for FTP upload and cleanup manually
> 
> 
> 
> Any other recommendations? I think this is a quite common use case, so
> I guess more experienced Camel user may want to give some useful
> advice to a novice like me ..
> 
> Some pages I have been looking at:
> http://camel.apache.org/http4.html
> http://camel.apache.org/file2.html
> http://camel.apache.org/stream-caching.html
> 
> (using Camel 2.6)
> 
> Thanks!
> 
> Tung