You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@camel.apache.org by Ephemeris Lappis <ep...@gmail.com> on 2014/05/01 17:17:27 UTC

Writing big files : stream or file ?

Hello.

We have to produce some rather big volumes of data and generate output
files, in several steps, some of these steps using splitting to process
inputs. I've been looking for two ways...

The first way is writing a file, setting an "append" mode, that could be a
nice solution to write batches of lines, but I suppose that the file if
closed and reopened for each exchange, and thus may be a quite bad solution
when writing millions of line one by one from a splitting loop.

The second way is to use a stream, using "stream:file", but I've not found
any way to control actually how the file is closed. FYI, we use Camel with
Service Mix with an emdedded 2.10, and the "closeOnDone" option is not
available. The "autoCloseCount" seemed to be the beginning of a solution,
but as the value can't be set dynamically, using a property for example,
this doesn't give a full control on the file closing, and outputs might stay
opened, and make fail the following tasks.

Any idea to write big files and control how to close them ?

Thanks in advance.

Regards.



--
View this message in context: http://camel.465427.n5.nabble.com/Writing-big-files-stream-or-file-tp5750742.html
Sent from the Camel - Users mailing list archive at Nabble.com.

Re: Writing big files : stream or file ?

Posted by Ephemeris Lappis <ep...@gmail.com>.
Hello.

In cases that produce about one million lines, grouping lines by 1000 
lines lets the file opened and closed 1000 times. Perhaps this is an 
acceptable performance price...

As an idealist I am, the best solution should be to open the file once 
and close it at the end of the main exchange when all the output is 
done. This seems to be what is intended with the new stream options 
(closeOnDone) in Camel 2.11, but is not available in our customer version.

Ephemeris Lappis

Le 02/05/2014 09:15, Claus Ibsen-2 [via Camel] a écrit :
> Hi
>
> I wonder if using groupLines 1000 etc to work on a bulk of lines at a
> time, wont be fast enough with the fileExists=Append mode.
>
>
>
> On Thu, May 1, 2014 at 5:17 PM, Ephemeris Lappis
> <[hidden email] </user/SendEmail.jtp?type=node&node=5750772&i=0>> wrote:
>
> > Hello.
> >
> > We have to produce some rather big volumes of data and generate output
> > files, in several steps, some of these steps using splitting to process
> > inputs. I've been looking for two ways...
> >
> > The first way is writing a file, setting an "append" mode, that 
> could be a
> > nice solution to write batches of lines, but I suppose that the file if
> > closed and reopened for each exchange, and thus may be a quite bad 
> solution
> > when writing millions of line one by one from a splitting loop.
> >
> > The second way is to use a stream, using "stream:file", but I've not 
> found
> > any way to control actually how the file is closed. FYI, we use 
> Camel with
> > Service Mix with an emdedded 2.10, and the "closeOnDone" option is not
> > available. The "autoCloseCount" seemed to be the beginning of a 
> solution,
> > but as the value can't be set dynamically, using a property for 
> example,
> > this doesn't give a full control on the file closing, and outputs 
> might stay
> > opened, and make fail the following tasks.
> >
> > Any idea to write big files and control how to close them ?
> >
> > Thanks in advance.
> >
> > Regards.
> >
> >
> >
> > --
> > View this message in context: 
> http://camel.465427.n5.nabble.com/Writing-big-files-stream-or-file-tp5750742.html
> > Sent from the Camel - Users mailing list archive at Nabble.com.
>
>
>
> -- 
> Claus Ibsen
> -----------------
> Red Hat, Inc.
> Email: [hidden email] </user/SendEmail.jtp?type=node&node=5750772&i=1>
> Twitter: davsclaus
> Blog: http://davsclaus.com
> Author of Camel in Action: http://www.manning.com/ibsen
> hawtio: http://hawt.io/
> fabric8: http://fabric8.io/
>
>
> ------------------------------------------------------------------------
> If you reply to this email, your message will be added to the 
> discussion below:
> http://camel.465427.n5.nabble.com/Writing-big-files-stream-or-file-tp5750742p5750772.html 
>
> To unsubscribe from Writing big files : stream or file ?, click here 
> <http://camel.465427.n5.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=5750742&code=ZXBoZW1lcmlzLmxhcHBpc0BnbWFpbC5jb218NTc1MDc0Mnw0OTQyMjM2NDI=>.
> NAML 
> <http://camel.465427.n5.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml> 
>





--
View this message in context: http://camel.465427.n5.nabble.com/Writing-big-files-stream-or-file-tp5750742p5750773.html
Sent from the Camel - Users mailing list archive at Nabble.com.

Re: Writing big files : stream or file ?

Posted by Claus Ibsen <cl...@gmail.com>.
Hi

I wonder if using groupLines 1000 etc to work on a bulk of lines at a
time, wont be fast enough with the fileExists=Append mode.



On Thu, May 1, 2014 at 5:17 PM, Ephemeris Lappis
<ep...@gmail.com> wrote:
> Hello.
>
> We have to produce some rather big volumes of data and generate output
> files, in several steps, some of these steps using splitting to process
> inputs. I've been looking for two ways...
>
> The first way is writing a file, setting an "append" mode, that could be a
> nice solution to write batches of lines, but I suppose that the file if
> closed and reopened for each exchange, and thus may be a quite bad solution
> when writing millions of line one by one from a splitting loop.
>
> The second way is to use a stream, using "stream:file", but I've not found
> any way to control actually how the file is closed. FYI, we use Camel with
> Service Mix with an emdedded 2.10, and the "closeOnDone" option is not
> available. The "autoCloseCount" seemed to be the beginning of a solution,
> but as the value can't be set dynamically, using a property for example,
> this doesn't give a full control on the file closing, and outputs might stay
> opened, and make fail the following tasks.
>
> Any idea to write big files and control how to close them ?
>
> Thanks in advance.
>
> Regards.
>
>
>
> --
> View this message in context: http://camel.465427.n5.nabble.com/Writing-big-files-stream-or-file-tp5750742.html
> Sent from the Camel - Users mailing list archive at Nabble.com.



-- 
Claus Ibsen
-----------------
Red Hat, Inc.
Email: cibsen@redhat.com
Twitter: davsclaus
Blog: http://davsclaus.com
Author of Camel in Action: http://www.manning.com/ibsen
hawtio: http://hawt.io/
fabric8: http://fabric8.io/

Re: Writing big files : stream or file ?

Posted by Ephemeris Lappis <ep...@gmail.com>.
Hello Claus.

I've seen this post before. Using an aggregation indeed helps to reduce 
the open and close operations overhead, buffering lines before sends 
them to the file endpoint.

As we use only blueprints in our solution until now, I'd prefer not to 
use java code.

Do you know about predefined aggregation strategies that just group 
body's objects. In camel 2.11 the abstract class 
"org.apache.camel.processor.aggregate.AbstractListAggregationStrategy" 
seems to help to make 'List<V>', but in our ServiceMix we only have a 2.10.

Is there any library that provides some simple aggregator "out off the 
shelves" ? For example List<Object> should be good for our case !

Secondly, we're talking about the file endpoint solution, but what about 
streams ?
According to my tests, streaming the splitted lines to the file doesn't 
open and close it for each resulting exchange and works rather well, 
except that the output file is not closed at the end of the main exchange.

Some idea on this way ?

Thanks for your help.

Ephemeris Lappis

Le 02/05/2014 06:55, claus.straube [via Camel] a écrit :
> Hi,
>
> have a look at
> http://www.catify.com/2012/07/09/parsing-large-files-with-apache-camel/
> - perhaps this helps (it's also about writing big files).
>
> Best regards - Claus
>
> On 01.05.2014 17:17, Ephemeris Lappis wrote:
>
> > Hello.
> >
> > We have to produce some rather big volumes of data and generate output
> > files, in several steps, some of these steps using splitting to process
> > inputs. I've been looking for two ways...
> >
> > The first way is writing a file, setting an "append" mode, that 
> could be a
> > nice solution to write batches of lines, but I suppose that the file if
> > closed and reopened for each exchange, and thus may be a quite bad 
> solution
> > when writing millions of line one by one from a splitting loop.
> >
> > The second way is to use a stream, using "stream:file", but I've not 
> found
> > any way to control actually how the file is closed. FYI, we use 
> Camel with
> > Service Mix with an emdedded 2.10, and the "closeOnDone" option is not
> > available. The "autoCloseCount" seemed to be the beginning of a 
> solution,
> > but as the value can't be set dynamically, using a property for 
> example,
> > this doesn't give a full control on the file closing, and outputs 
> might stay
> > opened, and make fail the following tasks.
> >
> > Any idea to write big files and control how to close them ?
> >
> > Thanks in advance.
> >
> > Regards.
> >
> >
> >
> > --
> > View this message in context: 
> http://camel.465427.n5.nabble.com/Writing-big-files-stream-or-file-tp5750742.html
> > Sent from the Camel - Users mailing list archive at Nabble.com.
> >
>
>
> ------------------------------------------------------------------------
> If you reply to this email, your message will be added to the 
> discussion below:
> http://camel.465427.n5.nabble.com/Writing-big-files-stream-or-file-tp5750742p5750764.html 
>
> To unsubscribe from Writing big files : stream or file ?, click here 
> <http://camel.465427.n5.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=5750742&code=ZXBoZW1lcmlzLmxhcHBpc0BnbWFpbC5jb218NTc1MDc0Mnw0OTQyMjM2NDI=>.
> NAML 
> <http://camel.465427.n5.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml> 
>





--
View this message in context: http://camel.465427.n5.nabble.com/Writing-big-files-stream-or-file-tp5750742p5750771.html
Sent from the Camel - Users mailing list archive at Nabble.com.

Re: Writing big files : stream or file ?

Posted by Claus Straube <cl...@catify.com>.
Hi,

have a look at 
http://www.catify.com/2012/07/09/parsing-large-files-with-apache-camel/ 
- perhaps this helps (it's also about writing big files).

Best regards - Claus

On 01.05.2014 17:17, Ephemeris Lappis wrote:
> Hello.
>
> We have to produce some rather big volumes of data and generate output
> files, in several steps, some of these steps using splitting to process
> inputs. I've been looking for two ways...
>
> The first way is writing a file, setting an "append" mode, that could be a
> nice solution to write batches of lines, but I suppose that the file if
> closed and reopened for each exchange, and thus may be a quite bad solution
> when writing millions of line one by one from a splitting loop.
>
> The second way is to use a stream, using "stream:file", but I've not found
> any way to control actually how the file is closed. FYI, we use Camel with
> Service Mix with an emdedded 2.10, and the "closeOnDone" option is not
> available. The "autoCloseCount" seemed to be the beginning of a solution,
> but as the value can't be set dynamically, using a property for example,
> this doesn't give a full control on the file closing, and outputs might stay
> opened, and make fail the following tasks.
>
> Any idea to write big files and control how to close them ?
>
> Thanks in advance.
>
> Regards.
>
>
>
> --
> View this message in context: http://camel.465427.n5.nabble.com/Writing-big-files-stream-or-file-tp5750742.html
> Sent from the Camel - Users mailing list archive at Nabble.com.
>