You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@camel.apache.org by Burkard Stephan <St...@visana.ch> on 2017/12/15 15:48:29 UTC

Peek into big file before splitter

Hi Camel users

I have a route that consumes big XML files (100 to 300 MB), splits them into chunks and processes the chunks.
 
from(pollingFileConsumer) 
    .routeId("bigFileConsumer")
    // here I would like to make a very basic check if the file looks as expected
    .split().tokenizeXML("XmlElementNameToSplitWith").streaming()
    // Continue to work with file chunk from splitter 
    .bean(whatever)


Before the splitter, I would like to check if the file is in expected format. To avoid reading the whole big file (memory), I would like to read the first 2000 chars (no matter if XML is invalid) and look up some required specifics (for example with regex) like the namespace.

What is best practice to do this? 

I could of course implement a bean and call it. But the message body is "java.io.File". 
=> When I convert it into a String (implicitly by annotate the method with "@Body String body" this converts the whole big file.
=> When I read the File through a stream, I would probably have to reset the Stream so that Camel can read the File from beginning.

Thanks for suggestions
Stephan



AW: Peek into big file before splitter

Posted by Burkard Stephan <St...@visana.ch>.
Hi Claus

Works perfect, thanks a lot
Stephan


-----Ursprüngliche Nachricht-----
Von: Claus Ibsen [mailto:claus.ibsen@gmail.com] 
Gesendet: Freitag, 15. Dezember 2017 18:03
An: users@camel.apache.org
Betreff: Re: Peek into big file before splitter

Hi

The file consumer will store the file as a GenericFile which basically is a pointer to java.io.File.
So you ought to be fine to use FileInputStream or InputStream as your bean parameter. And even close it after use.

As its a bean parameter then Camel dont replace the existing message body with that, and the splitter should be able to work with the original message body as-is.

It would be a different story if the message body was an InputStream that was only readable once.
For that you can use stream caching from Camel.





On Fri, Dec 15, 2017 at 4:48 PM, Burkard Stephan <St...@visana.ch> wrote:
> Hi Camel users
>
> I have a route that consumes big XML files (100 to 300 MB), splits them into chunks and processes the chunks.
>
> from(pollingFileConsumer)
>     .routeId("bigFileConsumer")
>     // here I would like to make a very basic check if the file looks as expected
>     .split().tokenizeXML("XmlElementNameToSplitWith").streaming()
>     // Continue to work with file chunk from splitter
>     .bean(whatever)
>
>
> Before the splitter, I would like to check if the file is in expected format. To avoid reading the whole big file (memory), I would like to read the first 2000 chars (no matter if XML is invalid) and look up some required specifics (for example with regex) like the namespace.
>
> What is best practice to do this?
>
> I could of course implement a bean and call it. But the message body is "java.io.File".
> => When I convert it into a String (implicitly by annotate the method with "@Body String body" this converts the whole big file.
> => When I read the File through a stream, I would probably have to reset the Stream so that Camel can read the File from beginning.
>
> Thanks for suggestions
> Stephan
>
>



--
Claus Ibsen
-----------------
http://davsclaus.com @davsclaus
Camel in Action 2: https://www.manning.com/ibsen2

Re: Peek into big file before splitter

Posted by Claus Ibsen <cl...@gmail.com>.
Hi

The file consumer will store the file as a GenericFile which basically
is a pointer to java.io.File.
So you ought to be fine to use FileInputStream or InputStream as your
bean parameter. And even close it after use.

As its a bean parameter then Camel dont replace the existing message
body with that, and the splitter should be able to work with the
original message body as-is.

It would be a different story if the message body was an InputStream
that was only readable once.
For that you can use stream caching from Camel.





On Fri, Dec 15, 2017 at 4:48 PM, Burkard Stephan
<St...@visana.ch> wrote:
> Hi Camel users
>
> I have a route that consumes big XML files (100 to 300 MB), splits them into chunks and processes the chunks.
>
> from(pollingFileConsumer)
>     .routeId("bigFileConsumer")
>     // here I would like to make a very basic check if the file looks as expected
>     .split().tokenizeXML("XmlElementNameToSplitWith").streaming()
>     // Continue to work with file chunk from splitter
>     .bean(whatever)
>
>
> Before the splitter, I would like to check if the file is in expected format. To avoid reading the whole big file (memory), I would like to read the first 2000 chars (no matter if XML is invalid) and look up some required specifics (for example with regex) like the namespace.
>
> What is best practice to do this?
>
> I could of course implement a bean and call it. But the message body is "java.io.File".
> => When I convert it into a String (implicitly by annotate the method with "@Body String body" this converts the whole big file.
> => When I read the File through a stream, I would probably have to reset the Stream so that Camel can read the File from beginning.
>
> Thanks for suggestions
> Stephan
>
>



-- 
Claus Ibsen
-----------------
http://davsclaus.com @davsclaus
Camel in Action 2: https://www.manning.com/ibsen2