You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@camel.apache.org by Gonzalo Vasquez <gv...@altiuz.cl> on 2012/11/09 16:01:45 UTC

Camel performance tuning

I'm running a route that basically adds a character per line to a plain text file, but it's taking to long, and it seems that it's due to some kind of buffering issue when reading/writing from disk.

I'm processing a 5MB file (attached as DC_FACCL132_0000 MORA_1075_16-10-2012_19-09-47_15.txt.zip), with the corresponding XSL template (also attached).

It's taking for ever to process such a file, I understand I'm tokenizing on line breaks, which could be the source of the problem as there are many lines in the file (48198 exactly), but when running jvisualvm (see attached images/snapshot)I can see the writing op is invoked 20386 times, which seem not related to the line count. Is there an output buffer size that I can configure? Or something like that?

This is the route:
		<camel:route id="pager" autoStartup="true">
			<camel:from
				uri="file:///tmp/in?charset=Windows-1252&amp;move=${file:parent}/../paged/${file:name.noext}.paged.ack&amp;preMove=${file:name.noext}-${date:now:yyyyMMddHHmmssSSS}.${file:ext}" />
			<camel:split streaming="true" parallelProcessing="false">
				<camel:tokenize token="\n" />
				<camel:to uri="bean:pager" />
				<camel:to
					uri="file:///tmp/paged?charset=utf8&amp;fileName=${file:name.noext}.paged&amp;fileExist=Append" />
			</camel:split>
		</camel:route>

This is the referenced bean:

	<bean id="pager" class="cl.altiuz.reports.etl.TextProcessor">
		<property name="xsltPath"
			value="/Users/gonzalovasquez/Documents/workspace/altiuz-reports/reports-etl/xsl/pager.xsl" />
		<property name="param" value="C.*PAG.* 1" />
	</bean>

Camel versión is 2,10.1, and happens both on OSX & MS Windows, so I think isn't a platform dependent problem, but a configuration one.

Any ideas? Any thing else that I should send?

Thanks!

Gonzalo Vásquez Sáez
Gerente Investigación y Desarrollo (R&D)
Altiuz Soluciones Tecnológicas de Negocios Ltda.
Av. Nueva Tajamar 555 Of. 802, Las Condes
(56-2) 335 2461
gvasquez@altiuz.cl
http://www.altiuz.cl
 



Re: Camel performance tuning

Posted by Gonzalo Vasquez <gv...@altiuz.cl>.
I think that should be included in a wishlist, as there might be cases where such methodology is a must, instead of my case where the aggregator seems to do the job ;)

Gonzalo Vásquez Sáez
Gerente Investigación y Desarrollo (R&D)
Altiuz Soluciones Tecnológicas de Negocios Ltda.
Av. Nueva Tajamar 555 Of. 802, Las Condes
(56-2) 335 2461
gvasquez@altiuz.cl
http://www.altiuz.cl
 


El 09-11-2012, a las 13:52, Claus Ibsen <cl...@gmail.com> escribió:

> Hi
> 
> The file component forces the write, to ensure its persistent on disk.
> See IOHelper.force method
> 
> So when you do so many writes, then that ought to impact the performance.
> 
> We could though allow to configure the behavior of force, so people
> can turn it off.
> 
> 
> 
> On Fri, Nov 9, 2012 at 4:01 PM, Gonzalo Vasquez <gv...@altiuz.cl> wrote:
>> I'm running a route that basically adds a character per line to a plain text
>> file, but it's taking to long, and it seems that it's due to some kind of
>> buffering issue when reading/writing from disk.
>> 
>> I'm processing a 5MB file (attached as DC_FACCL132_0000
>> MORA_1075_16-10-2012_19-09-47_15.txt.zip), with the corresponding XSL
>> template (also attached).
>> 
>> It's taking for ever to process such a file, I understand I'm tokenizing on
>> line breaks, which could be the source of the problem as there are many
>> lines in the file (48198 exactly), but when running jvisualvm (see attached
>> images/snapshot)I can see the writing op is invoked 20386 times, which seem
>> not related to the line count. Is there an output buffer size that I can
>> configure? Or something like that?
>> 
>> This is the route:
>> <camel:route id="pager" autoStartup="true">
>> <camel:from
>> uri="file:///tmp/in?charset=Windows-1252&amp;move=${file:parent}/../paged/${file:name.noext}.paged.ack&amp;preMove=${file:name.noext}-${date:now:yyyyMMddHHmmssSSS}.${file:ext}"
>> />
>> <camel:split streaming="true" parallelProcessing="false">
>> <camel:tokenize token="\n" />
>> <camel:to uri="bean:pager" />
>> <camel:to
>> uri="file:///tmp/paged?charset=utf8&amp;fileName=${file:name.noext}.paged&amp;fileExist=Append"
>> />
>> </camel:split>
>> </camel:route>
>> 
>> This is the referenced bean:
>> 
>> <bean id="pager" class="cl.altiuz.reports.etl.TextProcessor">
>> <property name="xsltPath"
>> value="/Users/gonzalovasquez/Documents/workspace/altiuz-reports/reports-etl/xsl/pager.xsl"
>> />
>> <property name="param" value="C.*PAG.* 1" />
>> </bean>
>> 
>> Camel versión is 2,10.1, and happens both on OSX & MS Windows, so I think
>> isn't a platform dependent problem, but a configuration one.
>> 
>> Any ideas? Any thing else that I should send?
>> 
>> Thanks!
>> 
>> Gonzalo Vásquez Sáez
>> Gerente Investigación y Desarrollo (R&D)
>> Altiuz Soluciones Tecnológicas de Negocios Ltda.
>> Av. Nueva Tajamar 555 Of. 802, Las Condes
>> (56-2) 335 2461
>> gvasquez@altiuz.cl
>> http://www.altiuz.cl
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
> 
> 
> 
> -- 
> Claus Ibsen
> -----------------
> Red Hat, Inc.
> FuseSource is now part of Red Hat
> Email: cibsen@redhat.com
> Web: http://fusesource.com
> Twitter: davsclaus
> Blog: http://davsclaus.com
> Author of Camel in Action: http://www.manning.com/ibsen


Re: Camel performance tuning

Posted by Claus Ibsen <cl...@gmail.com>.
Hi

The file component forces the write, to ensure its persistent on disk.
See IOHelper.force method

So when you do so many writes, then that ought to impact the performance.

We could though allow to configure the behavior of force, so people
can turn it off.



On Fri, Nov 9, 2012 at 4:01 PM, Gonzalo Vasquez <gv...@altiuz.cl> wrote:
> I'm running a route that basically adds a character per line to a plain text
> file, but it's taking to long, and it seems that it's due to some kind of
> buffering issue when reading/writing from disk.
>
> I'm processing a 5MB file (attached as DC_FACCL132_0000
> MORA_1075_16-10-2012_19-09-47_15.txt.zip), with the corresponding XSL
> template (also attached).
>
> It's taking for ever to process such a file, I understand I'm tokenizing on
> line breaks, which could be the source of the problem as there are many
> lines in the file (48198 exactly), but when running jvisualvm (see attached
> images/snapshot)I can see the writing op is invoked 20386 times, which seem
> not related to the line count. Is there an output buffer size that I can
> configure? Or something like that?
>
> This is the route:
> <camel:route id="pager" autoStartup="true">
> <camel:from
> uri="file:///tmp/in?charset=Windows-1252&amp;move=${file:parent}/../paged/${file:name.noext}.paged.ack&amp;preMove=${file:name.noext}-${date:now:yyyyMMddHHmmssSSS}.${file:ext}"
> />
> <camel:split streaming="true" parallelProcessing="false">
> <camel:tokenize token="\n" />
> <camel:to uri="bean:pager" />
> <camel:to
> uri="file:///tmp/paged?charset=utf8&amp;fileName=${file:name.noext}.paged&amp;fileExist=Append"
> />
> </camel:split>
> </camel:route>
>
> This is the referenced bean:
>
> <bean id="pager" class="cl.altiuz.reports.etl.TextProcessor">
> <property name="xsltPath"
> value="/Users/gonzalovasquez/Documents/workspace/altiuz-reports/reports-etl/xsl/pager.xsl"
> />
> <property name="param" value="C.*PAG.* 1" />
> </bean>
>
> Camel versión is 2,10.1, and happens both on OSX & MS Windows, so I think
> isn't a platform dependent problem, but a configuration one.
>
> Any ideas? Any thing else that I should send?
>
> Thanks!
>
> Gonzalo Vásquez Sáez
> Gerente Investigación y Desarrollo (R&D)
> Altiuz Soluciones Tecnológicas de Negocios Ltda.
> Av. Nueva Tajamar 555 Of. 802, Las Condes
> (56-2) 335 2461
> gvasquez@altiuz.cl
> http://www.altiuz.cl
>
>
>
>
>
>
>
>



-- 
Claus Ibsen
-----------------
Red Hat, Inc.
FuseSource is now part of Red Hat
Email: cibsen@redhat.com
Web: http://fusesource.com
Twitter: davsclaus
Blog: http://davsclaus.com
Author of Camel in Action: http://www.manning.com/ibsen

Re: Camel performance tuning

Posted by Gonzalo Vasquez <gv...@altiuz.cl>.
Ok, understood, buy remember I'm using a splitter. 

So, should I just add de strategyRef parameter to the splitter, or perhaps wrap the "to" part within the splitter with an aggregator? Both work? If so, any differences/pros/cons between them?

Thanks again to thee all!

Gonzalo Vásquez Sáez
Gerente Investigación y Desarrollo (R&D)
Altiuz Soluciones Tecnológicas de Negocios Ltda.
Av. Nueva Tajamar 555 Of. 802, Las Condes
(56-2) 335 2461
gvasquez@altiuz.cl
http://www.altiuz.cl
 



El 09-11-2012, a las 13:55, Claus Straube <cl...@catify.com> escribió:

> Sound simple but take a look at: http://camel.apache.org/aggregator2.html ;)
> 
> On 09.11.2012 17:49, Gonzalo Vasquez wrote:
>> That sounds much better, but how do I write that in spring DSL?
>> 
>> Gonzalo Vásquez Sáez
>> Gerente Investigación y Desarrollo (R&D)
>> Altiuz Soluciones Tecnológicas de Negocios Ltda.
>> Av. Nueva Tajamar 555 Of. 802, Las Condes
>> (56-2) 335 2461
>> gvasquez@altiuz.cl
>> http://www.altiuz.cl
>>  
>> 
>> <Archivo adjunto al mensaje.jpeg>
>> 
>> El 09-11-2012, a las 13:43, Claus Straube <cl...@catify.com> escribió:
>> 
>>> Hi Gonzales. Take a look at this: http://www.catify.com/2012/07/09/parsing-large-files-with-apache-camel/ perhaps it solves your issue.
>>> 
>>> Best regards - Claus
>>>> 
>>>> I'm running a route that basically adds a character per line to a plain text file, but it's taking to long, and it seems that it's due to some kind of buffering issue when reading/writing from disk.
>>>> 
>>>> I'm processing a 5MB file (attached as DC_FACCL132_0000 MORA_1075_16-10-2012_19-09-47_15.txt.zip), with the corresponding XSL template (also attached).
>>>> 
>>>> It's taking for ever to process such a file, I understand I'm tokenizing on line breaks, which could be the source of the problem as there are many lines in the file (48198 exactly), but when running jvisualvm (see attached images/snapshot)I can see the writing op is invoked 20386 times, which seem not related to the line count. Is there an output buffer size that I can configure? Or something like that?
>>>> 
>>>> This is the route:
>>>> 
>>>> <camel:routeid="pager"autoStartup="true">
>>>> 
>>>> <camel:from
>>>> 
>>>> uri="file:///tmp/in?charset=Windows-1252&amp;move=${file:parent}/../paged/${file:name.noext}.paged.ack&amp;preMove=${file:name.noext}-${date:now:yyyyMMddHHmmssSSS}.${file:ext} <file:///%5C%5Ctmp%5Cin?charset=Windows-1252&amp;move=$%7bfile:parent%7d/../paged/$%7bfile:name.noext%7d.paged.ack&amp;preMove=$%7bfile:name.noext%7d-$%7bdate:now:yyyyMMddHHmmssSSS%7d.$%7bfile:ext%7d>"/>
>>>> 
>>>> <camel:splitstreaming="true"parallelProcessing="false">
>>>> 
>>>> <camel:tokenizetoken="\n"/>
>>>> 
>>>> <camel:touri="bean:pager"/>
>>>> 
>>>> <camel:to
>>>> 
>>>> uri="file:///tmp/paged?charset=utf8&amp;fileName=${file:name.noext}.paged&amp;fileExist=Append <file:///%5C%5Ctmp%5Cpaged?charset=utf8&amp;fileName=$%7bfile:name.noext%7d.paged&amp;fileExist=Append>"/>
>>>> 
>>>> </camel:split>
>>>> 
>>>> </camel:route>
>>>> 
>>>> This is the referenced bean:
>>>> 
>>>> <beanid="pager"class="cl.altiuz.reports.etl.TextProcessor">
>>>> 
>>>> <propertyname="xsltPath"
>>>> 
>>>> value="/Users/gonzalovasquez/Documents/workspace/altiuz-reports/reports-etl/xsl/pager.xsl"/>
>>>> 
>>>> <propertyname="param"value="C.*PAG.* 1"/>
>>>> 
>>>> </bean>
>>>> 
>>>> Camel versión is 2,10.1, and happens both on OSX & MS Windows, so I think isn't a platform dependent problem, but a configuration one.
>>>> 
>>>> Any ideas? Any thing else that I should send?
>>>> 
>>>> Thanks!
>>>> 
>>>> *Gonzalo Vásquez Sáez*
>>>> 
>>>> *Gerente Investigación y Desarrollo (R&D)*
>>>> *Altiuz* Soluciones Tecnológicas de Negocios Ltda.
>>>> Av. Nueva Tajamar 555 Of. 802, Las Condes
>>>> (56-2) 335 2461
>>>> _gvasquez@altiuz.c <ma...@altiuz.com>l_
>>>> 
>>>> _http://www.altiuz.cl <http://www.altiuz.cl/>_
>>>> 
>>>> This e-mail and any files transmitted with it are for the sole use of the intended recipient(s) and may contain confidential and privileged information. If you are not the intended recipient(s), please reply to the sender and destroy all copies of the original message. Any unauthorized review, use, disclosure, dissemination, forwarding, printing or copying of this email, and/or any action taken in reliance on the contents of this e-mail is strictly prohibited and may be unlawful. 
>> 


Re: Camel performance tuning

Posted by Claus Straube <cl...@catify.com>.
Sound simple but take a look at: http://camel.apache.org/aggregator2.html ;)

On 09.11.2012 17:49, Gonzalo Vasquez wrote:
> That sounds much better, but how do I write that in spring DSL?
>
> Gonzalo Vásquez Sáez
> Gerente Investigación y Desarrollo (R&D)
> Altiuz Soluciones Tecnológicas de Negocios Ltda.
> Av. Nueva Tajamar 555 Of. 802, Las Condes
> (56-2) 335 2461
> gvasquez@altiuz.c <ma...@altiuz.c>l
> http://www.altiuz.cl
>
>
>
> El 09-11-2012, a las 13:43, Claus Straube <claus.straube@catify.com 
> <ma...@catify.com>> escribió:
>
>> Hi Gonzales. Take a look at this: 
>> http://www.catify.com/2012/07/09/parsing-large-files-with-apache-camel/ 
>> perhaps it solves your issue.
>>
>> Best regards - Claus
>>>
>>> I'm running a route that basically adds a character per line to a 
>>> plain text file, but it's taking to long, and it seems that it's due 
>>> to some kind of buffering issue when reading/writing from disk.
>>>
>>> I'm processing a 5MB file (attached as DC_FACCL132_0000 
>>> MORA_1075_16-10-2012_19-09-47_15.txt.zip), with the corresponding 
>>> XSL template (also attached).
>>>
>>> It's taking for ever to process such a file, I understand I'm 
>>> tokenizing on line breaks, which could be the source of the problem 
>>> as there are many lines in the file (48198 exactly), but when 
>>> running jvisualvm (see attached images/snapshot)I can see the 
>>> writing op is invoked 20386 times, which seem not related to the 
>>> line count. Is there an output buffer size that I can configure? Or 
>>> something like that?
>>>
>>> This is the route:
>>>
>>> <camel:routeid="pager"autoStartup="true">
>>>
>>> <camel:from
>>>
>>> uri="file:///tmp/in?charset=Windows-1252&amp;move=${file:parent}/../paged/${file:name.noext}.paged.ack&amp;preMove=${file:name.noext}-${date:now:yyyyMMddHHmmssSSS}.${file:ext} 
>>> <file:///tmp/in?charset=Windows-1252&amp;move=$%7Bfile:parent%7D/../paged/$%7Bfile:name.noext%7D.paged.ack&amp;preMove=$%7Bfile:name.noext%7D-$%7Bdate:now:yyyyMMddHHmmssSSS%7D.$%7Bfile:ext%7D> 
>>> <file:///%5C%5Ctmp%5Cin?charset=Windows-1252&amp;move=$%7bfile:parent%7d/../paged/$%7bfile:name.noext%7d.paged.ack&amp;preMove=$%7bfile:name.noext%7d-$%7bdate:now:yyyyMMddHHmmssSSS%7d.$%7bfile:ext%7d>"/>
>>>
>>> <camel:splitstreaming="true"parallelProcessing="false">
>>>
>>> <camel:tokenizetoken="\n"/>
>>>
>>> <camel:touri="bean:pager"/>
>>>
>>> <camel:to
>>>
>>> uri="file:///tmp/paged?charset=utf8&amp;fileName=${file:name.noext}.paged&amp;fileExist=Append 
>>> <file:///tmp/paged?charset=utf8&amp;fileName=$%7Bfile:name.noext%7D.paged&amp;fileExist=Append> 
>>> <file:///%5C%5Ctmp%5Cpaged?charset=utf8&amp;fileName=$%7bfile:name.noext%7d.paged&amp;fileExist=Append>"/>
>>>
>>> </camel:split>
>>>
>>> </camel:route>
>>>
>>> This is the referenced bean:
>>>
>>> <beanid="pager"class="cl.altiuz.reports.etl.TextProcessor">
>>>
>>> <propertyname="xsltPath"
>>>
>>> value="/Users/gonzalovasquez/Documents/workspace/altiuz-reports/reports-etl/xsl/pager.xsl"/>
>>>
>>> <propertyname="param"value="C.*PAG.* 1"/>
>>>
>>> </bean>
>>>
>>> Camel versión is 2,10.1, and happens both on OSX & MS Windows, so I 
>>> think isn't a platform dependent problem, but a configuration one.
>>>
>>> Any ideas? Any thing else that I should send?
>>>
>>> Thanks!
>>>
>>> *Gonzalo Vásquez Sáez*
>>>
>>> *Gerente Investigación y Desarrollo (R&D)*
>>> *Altiuz* Soluciones Tecnológicas de Negocios Ltda.
>>> Av. Nueva Tajamar 555 Of. 802, Las Condes
>>> (56-2) 335 2461
>>> _gvasquez@altiuz.c <ma...@altiuz.c> 
>>> <ma...@altiuz.com>l_
>>>
>>> _http://www.altiuz.cl <http://www.altiuz.cl/>_
>>>
>>> This e-mail and any files transmitted with it are for the sole use 
>>> of the intended recipient(s) and may contain confidential and 
>>> privileged information. If you are not the intended recipient(s), 
>>> please reply to the sender and destroy all copies of the original 
>>> message. Any unauthorized review, use, disclosure, dissemination, 
>>> forwarding, printing or copying of this email, and/or any action 
>>> taken in reliance on the contents of this e-mail is strictly 
>>> prohibited and may be unlawful.
>

Re: Camel performance tuning

Posted by Gonzalo Vasquez <gv...@altiuz.cl>.
That sounds much better, but how do I write that in spring DSL?

Gonzalo Vásquez Sáez
Gerente Investigación y Desarrollo (R&D)
Altiuz Soluciones Tecnológicas de Negocios Ltda.
Av. Nueva Tajamar 555 Of. 802, Las Condes
(56-2) 335 2461
gvasquez@altiuz.cl
http://www.altiuz.cl
 


El 09-11-2012, a las 13:43, Claus Straube <cl...@catify.com> escribió:

> Hi Gonzales. Take a look at this: http://www.catify.com/2012/07/09/parsing-large-files-with-apache-camel/ perhaps it solves your issue.
> 
> Best regards - Claus
>> 
>> I'm running a route that basically adds a character per line to a plain text file, but it's taking to long, and it seems that it's due to some kind of buffering issue when reading/writing from disk.
>> 
>> I'm processing a 5MB file (attached as DC_FACCL132_0000 MORA_1075_16-10-2012_19-09-47_15.txt.zip), with the corresponding XSL template (also attached).
>> 
>> It's taking for ever to process such a file, I understand I'm tokenizing on line breaks, which could be the source of the problem as there are many lines in the file (48198 exactly), but when running jvisualvm (see attached images/snapshot)I can see the writing op is invoked 20386 times, which seem not related to the line count. Is there an output buffer size that I can configure? Or something like that?
>> 
>> This is the route:
>> 
>> <camel:routeid="pager"autoStartup="true">
>> 
>> <camel:from
>> 
>> uri="file:///tmp/in?charset=Windows-1252&amp;move=${file:parent}/../paged/${file:name.noext}.paged.ack&amp;preMove=${file:name.noext}-${date:now:yyyyMMddHHmmssSSS}.${file:ext} <file:///%5C%5Ctmp%5Cin?charset=Windows-1252&amp;move=$%7bfile:parent%7d/../paged/$%7bfile:name.noext%7d.paged.ack&amp;preMove=$%7bfile:name.noext%7d-$%7bdate:now:yyyyMMddHHmmssSSS%7d.$%7bfile:ext%7d>"/>
>> 
>> <camel:splitstreaming="true"parallelProcessing="false">
>> 
>> <camel:tokenizetoken="\n"/>
>> 
>> <camel:touri="bean:pager"/>
>> 
>> <camel:to
>> 
>> uri="file:///tmp/paged?charset=utf8&amp;fileName=${file:name.noext}.paged&amp;fileExist=Append <file:///%5C%5Ctmp%5Cpaged?charset=utf8&amp;fileName=$%7bfile:name.noext%7d.paged&amp;fileExist=Append>"/>
>> 
>> </camel:split>
>> 
>> </camel:route>
>> 
>> This is the referenced bean:
>> 
>> <beanid="pager"class="cl.altiuz.reports.etl.TextProcessor">
>> 
>> <propertyname="xsltPath"
>> 
>> value="/Users/gonzalovasquez/Documents/workspace/altiuz-reports/reports-etl/xsl/pager.xsl"/>
>> 
>> <propertyname="param"value="C.*PAG.* 1"/>
>> 
>> </bean>
>> 
>> Camel versión is 2,10.1, and happens both on OSX & MS Windows, so I think isn't a platform dependent problem, but a configuration one.
>> 
>> Any ideas? Any thing else that I should send?
>> 
>> Thanks!
>> 
>> *Gonzalo Vásquez Sáez*
>> 
>> *Gerente Investigación y Desarrollo (R&D)*
>> *Altiuz* Soluciones Tecnológicas de Negocios Ltda.
>> Av. Nueva Tajamar 555 Of. 802, Las Condes
>> (56-2) 335 2461
>> _gvasquez@altiuz.c <ma...@altiuz.com>l_
>> 
>> _http://www.altiuz.cl <http://www.altiuz.cl/>_
>> 
>> This e-mail and any files transmitted with it are for the sole use of the intended recipient(s) and may contain confidential and privileged information. If you are not the intended recipient(s), please reply to the sender and destroy all copies of the original message. Any unauthorized review, use, disclosure, dissemination, forwarding, printing or copying of this email, and/or any action taken in reliance on the contents of this e-mail is strictly prohibited and may be unlawful. 


Re: Camel performance tuning

Posted by Claus Straube <cl...@catify.com>.
Hi Gonzales. Take a look at this: 
http://www.catify.com/2012/07/09/parsing-large-files-with-apache-camel/ 
perhaps it solves your issue.

Best regards - Claus
>
> I'm running a route that basically adds a character per line to a 
> plain text file, but it's taking to long, and it seems that it's due 
> to some kind of buffering issue when reading/writing from disk.
>
> I'm processing a 5MB file (attached as DC_FACCL132_0000 
> MORA_1075_16-10-2012_19-09-47_15.txt.zip), with the corresponding XSL 
> template (also attached).
>
> It's taking for ever to process such a file, I understand I'm 
> tokenizing on line breaks, which could be the source of the problem as 
> there are many lines in the file (48198 exactly), but when running 
> jvisualvm (see attached images/snapshot)I can see the writing op is 
> invoked 20386 times, which seem not related to the line count. Is 
> there an output buffer size that I can configure? Or something like that?
>
> This is the route:
>
> <camel:routeid="pager"autoStartup="true">
>
> <camel:from
>
> uri="file:///tmp/in?charset=Windows-1252&amp;move=${file:parent}/../paged/${file:name.noext}.paged.ack&amp;preMove=${file:name.noext}-${date:now:yyyyMMddHHmmssSSS}.${file:ext} 
> <file:///%5C%5Ctmp%5Cin?charset=Windows-1252&amp;move=$%7bfile:parent%7d/../paged/$%7bfile:name.noext%7d.paged.ack&amp;preMove=$%7bfile:name.noext%7d-$%7bdate:now:yyyyMMddHHmmssSSS%7d.$%7bfile:ext%7d>"/>
>
> <camel:splitstreaming="true"parallelProcessing="false">
>
> <camel:tokenizetoken="\n"/>
>
> <camel:touri="bean:pager"/>
>
> <camel:to
>
> uri="file:///tmp/paged?charset=utf8&amp;fileName=${file:name.noext}.paged&amp;fileExist=Append 
> <file:///%5C%5Ctmp%5Cpaged?charset=utf8&amp;fileName=$%7bfile:name.noext%7d.paged&amp;fileExist=Append>"/>
>
> </camel:split>
>
> </camel:route>
>
> This is the referenced bean:
>
> <beanid="pager"class="cl.altiuz.reports.etl.TextProcessor">
>
> <propertyname="xsltPath"
>
> value="/Users/gonzalovasquez/Documents/workspace/altiuz-reports/reports-etl/xsl/pager.xsl"/>
>
> <propertyname="param"value="C.*PAG.* 1"/>
>
> </bean>
>
> Camel versión is 2,10.1, and happens both on OSX & MS Windows, so I 
> think isn't a platform dependent problem, but a configuration one.
>
> Any ideas? Any thing else that I should send?
>
> Thanks!
>
> *Gonzalo Vásquez Sáez*
>
> *Gerente Investigación y Desarrollo (R&D)*
> *Altiuz* Soluciones Tecnológicas de Negocios Ltda.
> Av. Nueva Tajamar 555 Of. 802, Las Condes
> (56-2) 335 2461
> _gvasquez@altiuz.c <ma...@altiuz.com>l_
>
> _http://www.altiuz.cl <http://www.altiuz.cl/>_
>
> This e-mail and any files transmitted with it are for the sole use of 
> the intended recipient(s) and may contain confidential and privileged 
> information. If you are not the intended recipient(s), please reply to 
> the sender and destroy all copies of the original message. Any 
> unauthorized review, use, disclosure, dissemination, forwarding, 
> printing or copying of this email, and/or any action taken in reliance 
> on the contents of this e-mail is strictly prohibited and may be 
> unlawful. 

Re: Camel performance tuning

Posted by Gonzalo Vasquez <gv...@altiuz.cl>.
We tried between 1 to 250 on the dev environment, ending in 16 as best number. That value must be set on a server dependent basis.


Gonzalo Vásquez Sáez
Gerente Investigación y Desarrollo (R&D)
Altiuz Soluciones Tecnológicas de Negocios Ltda.
Av. Nueva Tajamar 555 Of. 802, Las Condes
(56-2) 335 2461
gvasquez@altiuz.cl
http://www.altiuz.cl
 


El 12-11-2012, a las 10:16, Claus Straube <cl...@catify.com> escribió:

> Have you tried a higher completion size? For us 750 was the best.
> 
> On 09.11.2012 19:59, Gonzalo Vasquez wrote:
>> Ok, I've included an aggregator in the splitter, as follows:
>> 
>> 		<camel:route id="pager" autoStartup="true">
>> 			<camel:from
>> 				uri="file:///tmp/in?charset=Windows-1252&amp;move=${file:parent}/../paged/${file:name.noext}.paged.ack&amp;preMove=${file:name.noext}-${date:now:yyyyMMddHHmmssSSS}.${file:ext}" />
>> 			<camel:log message="Iniciando paging" />
>> 			<camel:setHeader headerName="start">
>> 				<camel:simple>${date:now:mm}:${date:now:ss}.${date:now:SSS}</camel:simple>
>> 			</camel:setHeader>
>> 			<camel:split streaming="true" parallelProcessing="false">
>> 				<camel:tokenize token="\n" />
>> 				<!-- <camel:log message="${property.CamelSplitIndex}" /> -->
>> 				<camel:to uri="bean:pager" />
>> 				<camel:aggregate strategyRef="aggregatorStrategy">
>> 					<camel:correlationExpression>
>> 						<camel:simple>${file:name}</camel:simple>
>> 					</camel:correlationExpression>
>> 					<camel:completionSize>
>> 						<camel:constant>250</camel:constant>
>> 					</camel:completionSize>
>> 					<camel:to
>> 						uri="file:///tmp/paged?charset=utf8&amp;fileName=${file:name.noext}.paged&amp;fileExist=Append" />
>> 				</camel:aggregate>
>> 			</camel:split>
>> 			<camel:log
>> 				message="Elapsed: ${header.start} - ${date:now:mm}:${date:now:ss}.${date:now:SSS}" />
>> 		</camel:route>
>> 
>> 
>> And the AggregationStrategy:
>> 
>> 	<bean id="aggregatorStrategy" class="cl.altiuz.reports.etl.ConcatAggregationStrategy" />
>> 
>> 
>> I've also added some headers & logging to calculate elapsed time.
>> 
>> Pre-aggregator the elapsed time was about 30 seconds (for the 5MB test file), and now is about half (15 secs), I can see clearly the improvement, but not as much as expected.
>> 
>> Any extra tips? I''ve included the custom AggregationStrategy I had to create, as all I needed was appending/concatenating body contents.
>> 
>> 
>> 
>> Gonzalo Vásquez Sáez
>> Gerente Investigación y Desarrollo (R&D)
>> Altiuz Soluciones Tecnológicas de Negocios Ltda.
>> Av. Nueva Tajamar 555 Of. 802, Las Condes
>> (56-2) 335 2461
>> gvasquez@altiuz.cl
>> http://www.altiuz.cl
>>  
>> 
>> 
>> El 09-11-2012, a las 15:09, Christian Müller <ch...@gmail.com> escribió:
>> 
>>> Using Hypersonic, Hadoop or Mongo for such a use case is "over engineering"
>>> the requirement and will end up in much more complicated solution - IMO.
>>> 
>>> Best,
>>> Christian
>>> 
>>> On Fri, Nov 9, 2012 at 6:57 PM, <Ra...@cognizant.com> wrote:
>>> 
>>>> You may also want to check out Hadoop and map reduce
>>>> 
>>>> 
>>>> 
>>>> http://camel.apache.org/hdfs.html
>>>> 
>>>> 
>>>> 
>>>> with respect to point a and b.
>>>> 
>>>> 
>>>> 
>>>> You can have an index on the record and the “reduce” job can serialize on
>>>> the index.
>>>> 
>>>> 
>>>> 
>>>> *From:* Gonzalo Vasquez [mailto:gvasquez@altiuz.cl]
>>>> *Sent:* Friday, November 09, 2012 10:16 PM
>>>> *To:* users@camel.apache.org
>>>> *Subject:* Re: Camel performance tuning
>>>> 
>>>> 
>>>> 
>>>> Thanks for your answer, my comments:
>>>> 
>>>> 
>>>> 
>>>> a) a 5M file could be loaded into memory, but I have streaming enabled as
>>>> file size could be in the range of GB. Notwithstanding, I'll check what
>>>> Hypersonic & Mongo are, as I'm not aware of them.
>>>> 
>>>> b) Parallel processing is set to false, because records must preserve
>>>> order on the output file
>>>> 
>>>> c) Don't see the point here
>>>> 
>>>> d) See a)
>>>> 
>>>> e) what about async processing? There's no "long running process" here
>>>> 
>>>> 
>>>> 
>>>> Thanks again.-
>>>> 
>>>> 
>>>> 
>>>> *Gonzalo Vásquez Sáez*
>>>> 
>>>> *Gerente Investigación y Desarrollo (R&D)*
>>>> *Altiuz* Soluciones Tecnológicas de Negocios Ltda.
>>>> Av. Nueva Tajamar 555 Of. 802, Las Condes
>>>> (56-2) 335 2461
>>>> *gvasquez@altiuz.c <gc...@altiuz.com>l*
>>>> 
>>>> *http://www.altiuz.cl*
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> El 09-11-2012, a las 13:12, <Ra...@cognizant.com> escribió:
>>>> 
>>>> 
>>>> 
>>>>  I am really new to Camel but here are some options you can try
>>>> 
>>>> 
>>>> 
>>>> a)      Can you load the 5 MB file to memory before splitting it ? That
>>>> way IO will not be a problem. Probably put it in something like Hypersonic
>>>> or Mongo
>>>> 
>>>> b)      Why is parallel  processing false ? Are the records related to
>>>> each other ? If true you can take advantage of multicore
>>>> 
>>>> c)       Is it possible to first split the files into chunks and then use
>>>> process the chunks independently ?
>>>> 
>>>> d)      Can you write into memory and flush at once ?
>>>> 
>>>> e)      Sync/Asynch : http://camel.apache.org/async.html
>>>> 
>>>> 
>>>> 
>>>> *From:* Gonzalo Vasquez [mailto:gvasquez@altiuz.cl]
>>>> *Sent:* Friday, November 09, 2012 8:32 PM
>>>> *To:* users@camel.apache.org
>>>> *Subject:* Camel performance tuning
>>>> 
>>>> 
>>>> 
>>>> I'm running a route that basically adds a character per line to a plain
>>>> text file, but it's taking to long, and it seems that it's due to some kind
>>>> of buffering issue when reading/writing from disk.
>>>> 
>>>> 
>>>> 
>>>> I'm processing a 5MB file (attached as DC_FACCL132_0000
>>>> MORA_1075_16-10-2012_19-09-47_15.txt.zip), with the corresponding XSL
>>>> template (also attached).
>>>> 
>>>> 
>>>> 
>>>> It's taking for ever to process such a file, I understand I'm tokenizing
>>>> on line breaks, which could be the source of the problem as there are many
>>>> lines in the file (48198 exactly), but when running jvisualvm (see attached
>>>> images/snapshot)I can see the writing op is invoked 20386 times, which seem
>>>> not related to the line count. Is there an output buffer size that I can
>>>> configure? Or something like that?
>>>> 
>>>> 
>>>> 
>>>> This is the route:
>>>> 
>>>> <camel:route id="pager" autoStartup="true">
>>>> 
>>>> <camel:from
>>>> 
>>>> uri="
>>>> file:///tmp/in?charset=Windows-1252&amp;move=${file:parent}/../paged/${file:name.noext}.paged.ack&amp;preMove=${file:name.noext}-${date:now:yyyyMMddHHmmssSSS}.${file:ext}
>>>> " />
>>>> 
>>>> <camel:split streaming="true" parallelProcessing="false">
>>>> 
>>>> <camel:tokenize token="\n" />
>>>> 
>>>> <camel:to uri="bean:pager" />
>>>> 
>>>> <camel:to
>>>> 
>>>> uri="
>>>> file:///tmp/paged?charset=utf8&amp;fileName=${file:name.noext}.paged&amp;fileExist=Append
>>>> " />
>>>> 
>>>> </camel:split>
>>>> 
>>>> </camel:route>
>>>> 
>>>> 
>>>> 
>>>> This is the referenced bean:
>>>> 
>>>> 
>>>> 
>>>> <bean id="pager" class="cl.altiuz.reports.etl.TextProcessor">
>>>> 
>>>> <property name="xsltPath"
>>>> 
>>>> value=
>>>> "/Users/gonzalovasquez/Documents/workspace/altiuz-reports/reports-etl/xsl/pager.xsl"
>>>> />
>>>> 
>>>> <property name="param" value="C.*PAG.* 1" />
>>>> 
>>>> </bean>
>>>> 
>>>> 
>>>> 
>>>> Camel versión is 2,10.1, and happens both on OSX & MS Windows, so I think
>>>> isn't a platform dependent problem, but a configuration one.
>>>> 
>>>> 
>>>> 
>>>> Any ideas? Any thing else that I should send?
>>>> 
>>>> 
>>>> 
>>>> Thanks!
>>>> 
>>>> 
>>>> 
>>>> *Gonzalo Vásquez Sáez*
>>>> 
>>>> *Gerente Investigación y Desarrollo (R&D)*
>>>> *Altiuz* Soluciones Tecnológicas de Negocios Ltda.
>>>> Av. Nueva Tajamar 555 Of. 802, Las Condes
>>>> (56-2) 335 2461
>>>> *gvasquez@altiuz.c <gc...@altiuz.com>l*
>>>> 
>>>> *http://www.altiuz.cl*
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>>       This e-mail and any files transmitted with it are for the sole use
>>>> of the intended recipient(s) and may contain confidential and privileged
>>>> information. If you are not the intended recipient(s), please reply to the
>>>> sender and destroy all copies of the original message. Any unauthorized
>>>> review, use, disclosure, dissemination, forwarding, printing or copying of
>>>> this email, and/or any action taken in reliance on the contents of this
>>>> e-mail is strictly prohibited and may be unlawful.
>>>> 
>>>> 
>>>> This e-mail and any files transmitted with it are for the sole use of
>>>> the intended recipient(s) and may contain confidential and privileged
>>>> information. If you are not the intended recipient(s), please reply to the
>>>> sender and destroy all copies of the original message. Any unauthorized
>>>> review, use, disclosure, dissemination, forwarding, printing or copying of
>>>> this email, and/or any action taken in reliance on the contents of this
>>>> e-mail is strictly prohibited and may be unlawful.
>>>> 
>>> 
>>> 
>>> --
> 


Re: Camel performance tuning

Posted by Claus Straube <cl...@catify.com>.
Have you tried a higher completion size? For us 750 was the best.

On 09.11.2012 19:59, Gonzalo Vasquez wrote:
> Ok, I've included an aggregator in the splitter, as follows:
>
> 		<camel:route id="pager" autoStartup="true">
> 			<camel:from
> 				uri="file:///tmp/in?charset=Windows-1252&amp;move=${file:parent}/../paged/${file:name.noext}.paged.ack&amp;preMove=${file:name.noext}-${date:now:yyyyMMddHHmmssSSS}.${file:ext}" />
> 			<camel:log message="Iniciando paging" />
> 			<camel:setHeader headerName="start">
> 				<camel:simple>${date:now:mm}:${date:now:ss}.${date:now:SSS}</camel:simple>
> 			</camel:setHeader>
> 			<camel:split streaming="true" parallelProcessing="false">
> 				<camel:tokenize token="\n" />
> 				<!-- <camel:log message="${property.CamelSplitIndex}" /> -->
> 				<camel:to uri="bean:pager" />
> 				<camel:aggregate strategyRef="aggregatorStrategy">
> 					<camel:correlationExpression>
> 						<camel:simple>${file:name}</camel:simple>
> 					</camel:correlationExpression>
> 					<camel:completionSize>
> 						<camel:constant>250</camel:constant>
> 					</camel:completionSize>
> 					<camel:to
> 						uri="file:///tmp/paged?charset=utf8&amp;fileName=${file:name.noext}.paged&amp;fileExist=Append" />
> 				</camel:aggregate>
> 			</camel:split>
> 			<camel:log
> 				message="Elapsed: ${header.start} - ${date:now:mm}:${date:now:ss}.${date:now:SSS}" />
> 		</camel:route>
>
>
> And the AggregationStrategy:
>
> 	<bean id="aggregatorStrategy" class="cl.altiuz.reports.etl.ConcatAggregationStrategy" />
>
>
> I've also added some headers & logging to calculate elapsed time.
>
> Pre-aggregator the elapsed time was about 30 seconds (for the 5MB test file), and now is about half (15 secs), I can see clearly the improvement, but not as much as expected.
>
> Any extra tips? I''ve included the custom AggregationStrategy I had to create, as all I needed was appending/concatenating body contents.
>
>
>
> Gonzalo Vásquez Sáez
> Gerente Investigación y Desarrollo (R&D)
> Altiuz Soluciones Tecnológicas de Negocios Ltda.
> Av. Nueva Tajamar 555 Of. 802, Las Condes
> (56-2) 335 2461
> gvasquez@altiuz.cl
> http://www.altiuz.cl
>   
>
>
>
> El 09-11-2012, a las 15:09, Christian Müller <ch...@gmail.com> escribió:
>
>> Using Hypersonic, Hadoop or Mongo for such a use case is "over engineering"
>> the requirement and will end up in much more complicated solution - IMO.
>>
>> Best,
>> Christian
>>
>> On Fri, Nov 9, 2012 at 6:57 PM, <Ra...@cognizant.com> wrote:
>>
>>> You may also want to check out Hadoop and map reduce
>>>
>>>
>>>
>>> http://camel.apache.org/hdfs.html
>>>
>>>
>>>
>>> with respect to point a and b.
>>>
>>>
>>>
>>> You can have an index on the record and the “reduce” job can serialize on
>>> the index.
>>>
>>>
>>>
>>> *From:* Gonzalo Vasquez [mailto:gvasquez@altiuz.cl]
>>> *Sent:* Friday, November 09, 2012 10:16 PM
>>> *To:* users@camel.apache.org
>>> *Subject:* Re: Camel performance tuning
>>>
>>>
>>>
>>> Thanks for your answer, my comments:
>>>
>>>
>>>
>>> a) a 5M file could be loaded into memory, but I have streaming enabled as
>>> file size could be in the range of GB. Notwithstanding, I'll check what
>>> Hypersonic & Mongo are, as I'm not aware of them.
>>>
>>> b) Parallel processing is set to false, because records must preserve
>>> order on the output file
>>>
>>> c) Don't see the point here
>>>
>>> d) See a)
>>>
>>> e) what about async processing? There's no "long running process" here
>>>
>>>
>>>
>>> Thanks again.-
>>>
>>>
>>>
>>> *Gonzalo Vásquez Sáez*
>>>
>>> *Gerente Investigación y Desarrollo (R&D)*
>>> *Altiuz* Soluciones Tecnológicas de Negocios Ltda.
>>> Av. Nueva Tajamar 555 Of. 802, Las Condes
>>> (56-2) 335 2461
>>> *gvasquez@altiuz.c <gc...@altiuz.com>l*
>>>
>>> *http://www.altiuz.cl*
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> El 09-11-2012, a las 13:12, <Ra...@cognizant.com> escribió:
>>>
>>>
>>>
>>>   I am really new to Camel but here are some options you can try
>>>
>>>
>>>
>>> a)      Can you load the 5 MB file to memory before splitting it ? That
>>> way IO will not be a problem. Probably put it in something like Hypersonic
>>> or Mongo
>>>
>>> b)      Why is parallel  processing false ? Are the records related to
>>> each other ? If true you can take advantage of multicore
>>>
>>> c)       Is it possible to first split the files into chunks and then use
>>> process the chunks independently ?
>>>
>>> d)      Can you write into memory and flush at once ?
>>>
>>> e)      Sync/Asynch : http://camel.apache.org/async.html
>>>
>>>
>>>
>>> *From:* Gonzalo Vasquez [mailto:gvasquez@altiuz.cl]
>>> *Sent:* Friday, November 09, 2012 8:32 PM
>>> *To:* users@camel.apache.org
>>> *Subject:* Camel performance tuning
>>>
>>>
>>>
>>> I'm running a route that basically adds a character per line to a plain
>>> text file, but it's taking to long, and it seems that it's due to some kind
>>> of buffering issue when reading/writing from disk.
>>>
>>>
>>>
>>> I'm processing a 5MB file (attached as DC_FACCL132_0000
>>> MORA_1075_16-10-2012_19-09-47_15.txt.zip), with the corresponding XSL
>>> template (also attached).
>>>
>>>
>>>
>>> It's taking for ever to process such a file, I understand I'm tokenizing
>>> on line breaks, which could be the source of the problem as there are many
>>> lines in the file (48198 exactly), but when running jvisualvm (see attached
>>> images/snapshot)I can see the writing op is invoked 20386 times, which seem
>>> not related to the line count. Is there an output buffer size that I can
>>> configure? Or something like that?
>>>
>>>
>>>
>>> This is the route:
>>>
>>> <camel:route id="pager" autoStartup="true">
>>>
>>> <camel:from
>>>
>>> uri="
>>> file:///tmp/in?charset=Windows-1252&amp;move=${file:parent}/../paged/${file:name.noext}.paged.ack&amp;preMove=${file:name.noext}-${date:now:yyyyMMddHHmmssSSS}.${file:ext}
>>> " />
>>>
>>> <camel:split streaming="true" parallelProcessing="false">
>>>
>>> <camel:tokenize token="\n" />
>>>
>>> <camel:to uri="bean:pager" />
>>>
>>> <camel:to
>>>
>>> uri="
>>> file:///tmp/paged?charset=utf8&amp;fileName=${file:name.noext}.paged&amp;fileExist=Append
>>> " />
>>>
>>> </camel:split>
>>>
>>> </camel:route>
>>>
>>>
>>>
>>> This is the referenced bean:
>>>
>>>
>>>
>>> <bean id="pager" class="cl.altiuz.reports.etl.TextProcessor">
>>>
>>> <property name="xsltPath"
>>>
>>> value=
>>> "/Users/gonzalovasquez/Documents/workspace/altiuz-reports/reports-etl/xsl/pager.xsl"
>>> />
>>>
>>> <property name="param" value="C.*PAG.* 1" />
>>>
>>> </bean>
>>>
>>>
>>>
>>> Camel versión is 2,10.1, and happens both on OSX & MS Windows, so I think
>>> isn't a platform dependent problem, but a configuration one.
>>>
>>>
>>>
>>> Any ideas? Any thing else that I should send?
>>>
>>>
>>>
>>> Thanks!
>>>
>>>
>>>
>>> *Gonzalo Vásquez Sáez*
>>>
>>> *Gerente Investigación y Desarrollo (R&D)*
>>> *Altiuz* Soluciones Tecnológicas de Negocios Ltda.
>>> Av. Nueva Tajamar 555 Of. 802, Las Condes
>>> (56-2) 335 2461
>>> *gvasquez@altiuz.c <gc...@altiuz.com>l*
>>>
>>> *http://www.altiuz.cl*
>>>
>>>
>>>
>>>
>>>
>>>        This e-mail and any files transmitted with it are for the sole use
>>> of the intended recipient(s) and may contain confidential and privileged
>>> information. If you are not the intended recipient(s), please reply to the
>>> sender and destroy all copies of the original message. Any unauthorized
>>> review, use, disclosure, dissemination, forwarding, printing or copying of
>>> this email, and/or any action taken in reliance on the contents of this
>>> e-mail is strictly prohibited and may be unlawful.
>>>
>>>
>>> This e-mail and any files transmitted with it are for the sole use of
>>> the intended recipient(s) and may contain confidential and privileged
>>> information. If you are not the intended recipient(s), please reply to the
>>> sender and destroy all copies of the original message. Any unauthorized
>>> review, use, disclosure, dissemination, forwarding, printing or copying of
>>> this email, and/or any action taken in reliance on the contents of this
>>> e-mail is strictly prohibited and may be unlawful.
>>>
>>
>>
>> --


Re: Camel performance tuning

Posted by Gonzalo Vasquez <gv...@altiuz.cl>.
Ok, I've included an aggregator in the splitter, as follows: 

		<camel:route id="pager" autoStartup="true">
			<camel:from
				uri="file:///tmp/in?charset=Windows-1252&amp;move=${file:parent}/../paged/${file:name.noext}.paged.ack&amp;preMove=${file:name.noext}-${date:now:yyyyMMddHHmmssSSS}.${file:ext}" />
			<camel:log message="Iniciando paging" />
			<camel:setHeader headerName="start">
				<camel:simple>${date:now:mm}:${date:now:ss}.${date:now:SSS}</camel:simple>
			</camel:setHeader>
			<camel:split streaming="true" parallelProcessing="false">
				<camel:tokenize token="\n" />
				<!-- <camel:log message="${property.CamelSplitIndex}" /> -->
				<camel:to uri="bean:pager" />
				<camel:aggregate strategyRef="aggregatorStrategy">
					<camel:correlationExpression>
						<camel:simple>${file:name}</camel:simple>
					</camel:correlationExpression>
					<camel:completionSize>
						<camel:constant>250</camel:constant>
					</camel:completionSize>
					<camel:to
						uri="file:///tmp/paged?charset=utf8&amp;fileName=${file:name.noext}.paged&amp;fileExist=Append" />
				</camel:aggregate>
			</camel:split>
			<camel:log
				message="Elapsed: ${header.start} - ${date:now:mm}:${date:now:ss}.${date:now:SSS}" />
		</camel:route>


And the AggregationStrategy:

	<bean id="aggregatorStrategy" class="cl.altiuz.reports.etl.ConcatAggregationStrategy" />


I've also added some headers & logging to calculate elapsed time.

Pre-aggregator the elapsed time was about 30 seconds (for the 5MB test file), and now is about half (15 secs), I can see clearly the improvement, but not as much as expected.

Any extra tips? I''ve included the custom AggregationStrategy I had to create, as all I needed was appending/concatenating body contents.



Gonzalo Vásquez Sáez
Gerente Investigación y Desarrollo (R&D)
Altiuz Soluciones Tecnológicas de Negocios Ltda.
Av. Nueva Tajamar 555 Of. 802, Las Condes
(56-2) 335 2461
gvasquez@altiuz.cl
http://www.altiuz.cl
 


Re: Camel performance tuning

Posted by Gonzalo Vasquez <gv...@altiuz.cl>.
Image files where attached, perhaps you see them at the bottom of the message.-

Gonzalo Vásquez Sáez
Gerente Investigación y Desarrollo (R&D)
Altiuz Soluciones Tecnológicas de Negocios Ltda.
Av. Nueva Tajamar 555 Of. 802, Las Condes
(56-2) 335 2461
gvasquez@altiuz.cl
http://www.altiuz.cl
 



El 09-11-2012, a las 16:33, <Ra...@cognizant.com> escribió:

> No attachment
>  
> From: Gonzalo Vasquez [mailto:gvasquez@altiuz.cl] 
> Sent: Saturday, November 10, 2012 12:55 AM
> To: users@camel.apache.org
> Subject: Re: Camel performance tuning
>  
> Please see attached image from profiler, which shows the two method that get the 80% of CPU time. Also included again the hotspots list, IO again the first, but only with 13.9%
>  
> Gonzalo Vásquez Sáez
> Gerente Investigación y Desarrollo (R&D)
> Altiuz Soluciones Tecnológicas de Negocios Ltda.
> Av. Nueva Tajamar 555 Of. 802, Las Condes
> (56-2) 335 2461
> gvasquez@altiuz.cl
> http://www.altiuz.cl
>  
> 
> 
>  
> El 09-11-2012, a las 15:09, Christian Müller <ch...@gmail.com> escribió:
> 
> 
> Using Hypersonic, Hadoop or Mongo for such a use case is "over engineering"
> the requirement and will end up in much more complicated solution - IMO.
> 
> Best,
> Christian
> 
> On Fri, Nov 9, 2012 at 6:57 PM, <Ra...@cognizant.com> wrote:
> 
> 
> You may also want to check out Hadoop and map reduce
> 
> 
> 
> http://camel.apache.org/hdfs.html
> 
> 
> 
> with respect to point a and b.
> 
> 
> 
> You can have an index on the record and the “reduce” job can serialize on
> the index.
> 
> 
> 
> *From:* Gonzalo Vasquez [mailto:gvasquez@altiuz.cl]
> *Sent:* Friday, November 09, 2012 10:16 PM
> *To:* users@camel.apache.org
> *Subject:* Re: Camel performance tuning
> 
> 
> 
> Thanks for your answer, my comments:
> 
> 
> 
> a) a 5M file could be loaded into memory, but I have streaming enabled as
> file size could be in the range of GB. Notwithstanding, I'll check what
> Hypersonic & Mongo are, as I'm not aware of them.
> 
> b) Parallel processing is set to false, because records must preserve
> order on the output file
> 
> c) Don't see the point here
> 
> d) See a)
> 
> e) what about async processing? There's no "long running process" here
> 
> 
> 
> Thanks again.-
> 
> 
> 
> *Gonzalo Vásquez Sáez*
> 
> *Gerente Investigación y Desarrollo (R&D)*
> *Altiuz* Soluciones Tecnológicas de Negocios Ltda.
> Av. Nueva Tajamar 555 Of. 802, Las Condes
> (56-2) 335 2461
> *gvasquez@altiuz.c <gc...@altiuz.com>l*
> 
> *http://www.altiuz.cl*
> 
> 
> 
> 
> 
> 
> 
> El 09-11-2012, a las 13:12, <Ra...@cognizant.com> escribió:
> 
> 
> 
>  I am really new to Camel but here are some options you can try
> 
> 
> 
> a)      Can you load the 5 MB file to memory before splitting it ? That
> way IO will not be a problem. Probably put it in something like Hypersonic
> or Mongo
> 
> b)      Why is parallel  processing false ? Are the records related to
> each other ? If true you can take advantage of multicore
> 
> c)       Is it possible to first split the files into chunks and then use
> process the chunks independently ?
> 
> d)      Can you write into memory and flush at once ?
> 
> e)      Sync/Asynch : http://camel.apache.org/async.html
> 
> 
> 
> *From:* Gonzalo Vasquez [mailto:gvasquez@altiuz.cl]
> *Sent:* Friday, November 09, 2012 8:32 PM
> *To:* users@camel.apache.org
> *Subject:* Camel performance tuning
> 
> 
> 
> I'm running a route that basically adds a character per line to a plain
> text file, but it's taking to long, and it seems that it's due to some kind
> of buffering issue when reading/writing from disk.
> 
> 
> 
> I'm processing a 5MB file (attached as DC_FACCL132_0000
> MORA_1075_16-10-2012_19-09-47_15.txt.zip), with the corresponding XSL
> template (also attached).
> 
> 
> 
> It's taking for ever to process such a file, I understand I'm tokenizing
> on line breaks, which could be the source of the problem as there are many
> lines in the file (48198 exactly), but when running jvisualvm (see attached
> images/snapshot)I can see the writing op is invoked 20386 times, which seem
> not related to the line count. Is there an output buffer size that I can
> configure? Or something like that?
> 
> 
> 
> This is the route:
> 
> <camel:route id="pager" autoStartup="true">
> 
> <camel:from
> 
> uri="
> file:///tmp/in?charset=Windows-1252&amp;move=${file:parent}/../paged/${file:name.noext}.paged.ack&amp;preMove=${file:name.noext}-${date:now:yyyyMMddHHmmssSSS}.${file:ext}
> " />
> 
> <camel:split streaming="true" parallelProcessing="false">
> 
> <camel:tokenize token="\n" />
> 
> <camel:to uri="bean:pager" />
> 
> <camel:to
> 
> uri="
> file:///tmp/paged?charset=utf8&amp;fileName=${file:name.noext}.paged&amp;fileExist=Append
> " />
> 
> </camel:split>
> 
> </camel:route>
> 
> 
> 
> This is the referenced bean:
> 
> 
> 
> <bean id="pager" class="cl.altiuz.reports.etl.TextProcessor">
> 
> <property name="xsltPath"
> 
> value=
> "/Users/gonzalovasquez/Documents/workspace/altiuz-reports/reports-etl/xsl/pager.xsl"
> />
> 
> <property name="param" value="C.*PAG.* 1" />
> 
> </bean>
> 
> 
> 
> Camel versión is 2,10.1, and happens both on OSX & MS Windows, so I think
> isn't a platform dependent problem, but a configuration one.
> 
> 
> 
> Any ideas? Any thing else that I should send?
> 
> 
> 
> Thanks!
> 
> 
> 
> *Gonzalo Vásquez Sáez*
> 
> *Gerente Investigación y Desarrollo (R&D)*
> *Altiuz* Soluciones Tecnológicas de Negocios Ltda.
> Av. Nueva Tajamar 555 Of. 802, Las Condes
> (56-2) 335 2461
> *gvasquez@altiuz.c <gc...@altiuz.com>l*
> 
> *http://www.altiuz.cl*
> 
> 
> 
> 
> 
>       This e-mail and any files transmitted with it are for the sole use
> of the intended recipient(s) and may contain confidential and privileged
> information. If you are not the intended recipient(s), please reply to the
> sender and destroy all copies of the original message. Any unauthorized
> review, use, disclosure, dissemination, forwarding, printing or copying of
> this email, and/or any action taken in reliance on the contents of this
> e-mail is strictly prohibited and may be unlawful.
> 
> 
> This e-mail and any files transmitted with it are for the sole use of
> the intended recipient(s) and may contain confidential and privileged
> information. If you are not the intended recipient(s), please reply to the
> sender and destroy all copies of the original message. Any unauthorized
> review, use, disclosure, dissemination, forwarding, printing or copying of
> this email, and/or any action taken in reliance on the contents of this
> e-mail is strictly prohibited and may be unlawful.
> 
> 
> 
> 
> --
> 
> This e-mail and any files transmitted with it are for the sole use of the intended recipient(s) and may contain confidential and privileged information. If you are not the intended recipient(s), please reply to the sender and destroy all copies of the original message. Any unauthorized review, use, disclosure, dissemination, forwarding, printing or copying of this email, and/or any action taken in reliance on the contents of this e-mail is strictly prohibited and may be unlawful.


RE: Camel performance tuning

Posted by Ra...@cognizant.com.
No attachment

From: Gonzalo Vasquez [mailto:gvasquez@altiuz.cl]
Sent: Saturday, November 10, 2012 12:55 AM
To: users@camel.apache.org
Subject: Re: Camel performance tuning

Please see attached image from profiler, which shows the two method that get the 80% of CPU time. Also included again the hotspots list, IO again the first, but only with 13.9%

Gonzalo Vásquez Sáez
Gerente Investigación y Desarrollo (R&D)
Altiuz Soluciones Tecnológicas de Negocios Ltda.
Av. Nueva Tajamar 555 Of. 802, Las Condes
(56-2) 335 2461
gvasquez@altiuz.cl<ma...@altiuz.cl>
http://www.altiuz.cl


[cid:image001.jpg@01CDBEDE.FF77C970]

El 09-11-2012, a las 15:09, Christian Müller <ch...@gmail.com>> escribió:


Using Hypersonic, Hadoop or Mongo for such a use case is "over engineering"
the requirement and will end up in much more complicated solution - IMO.

Best,
Christian

On Fri, Nov 9, 2012 at 6:57 PM, <Ra...@cognizant.com>> wrote:


You may also want to check out Hadoop and map reduce



http://camel.apache.org/hdfs.html



with respect to point a and b.



You can have an index on the record and the "reduce" job can serialize on
the index.



*From:* Gonzalo Vasquez [mailto:gvasquez@altiuz.cl]
*Sent:* Friday, November 09, 2012 10:16 PM
*To:* users@camel.apache.org
*Subject:* Re: Camel performance tuning



Thanks for your answer, my comments:



a) a 5M file could be loaded into memory, but I have streaming enabled as
file size could be in the range of GB. Notwithstanding, I'll check what
Hypersonic & Mongo are, as I'm not aware of them.

b) Parallel processing is set to false, because records must preserve
order on the output file

c) Don't see the point here

d) See a)

e) what about async processing? There's no "long running process" here



Thanks again.-



*Gonzalo Vásquez Sáez*

*Gerente Investigación y Desarrollo (R&D)*
*Altiuz* Soluciones Tecnológicas de Negocios Ltda.
Av. Nueva Tajamar 555 Of. 802, Las Condes
(56-2) 335 2461
*gvasquez@altiuz.c <gc...@altiuz.com>l*

*http://www.altiuz.cl*







El 09-11-2012, a las 13:12, <Ra...@cognizant.com> escribió:



 I am really new to Camel but here are some options you can try



a)      Can you load the 5 MB file to memory before splitting it ? That
way IO will not be a problem. Probably put it in something like Hypersonic
or Mongo

b)      Why is parallel  processing false ? Are the records related to
each other ? If true you can take advantage of multicore

c)       Is it possible to first split the files into chunks and then use
process the chunks independently ?

d)      Can you write into memory and flush at once ?

e)      Sync/Asynch : http://camel.apache.org/async.html



*From:* Gonzalo Vasquez [mailto:gvasquez@altiuz.cl]
*Sent:* Friday, November 09, 2012 8:32 PM
*To:* users@camel.apache.org
*Subject:* Camel performance tuning



I'm running a route that basically adds a character per line to a plain
text file, but it's taking to long, and it seems that it's due to some kind
of buffering issue when reading/writing from disk.



I'm processing a 5MB file (attached as DC_FACCL132_0000
MORA_1075_16-10-2012_19-09-47_15.txt.zip), with the corresponding XSL
template (also attached).



It's taking for ever to process such a file, I understand I'm tokenizing
on line breaks, which could be the source of the problem as there are many
lines in the file (48198 exactly), but when running jvisualvm (see attached
images/snapshot)I can see the writing op is invoked 20386 times, which seem
not related to the line count. Is there an output buffer size that I can
configure? Or something like that?



This is the route:

<camel:route id="pager" autoStartup="true">

<camel:from

uri="
file:///tmp/in?charset=Windows-1252&amp;move=${file:parent}/../paged/${file:name.noext}.paged.ack&amp;preMove=${file:name.noext}-${date:now:yyyyMMddHHmmssSSS}.${file:ext}
" />

<camel:split streaming="true" parallelProcessing="false">

<camel:tokenize token="\n" />

<camel:to uri="bean:pager" />

<camel:to

uri="
file:///tmp/paged?charset=utf8&amp;fileName=${file:name.noext}.paged&amp;fileExist=Append
" />

</camel:split>

</camel:route>



This is the referenced bean:



<bean id="pager" class="cl.altiuz.reports.etl.TextProcessor">

<property name="xsltPath"

value=
"/Users/gonzalovasquez/Documents/workspace/altiuz-reports/reports-etl/xsl/pager.xsl"
/>

<property name="param" value="C.*PAG.* 1" />

</bean>



Camel versión is 2,10.1, and happens both on OSX & MS Windows, so I think
isn't a platform dependent problem, but a configuration one.



Any ideas? Any thing else that I should send?



Thanks!



*Gonzalo Vásquez Sáez*

*Gerente Investigación y Desarrollo (R&D)*
*Altiuz* Soluciones Tecnológicas de Negocios Ltda.
Av. Nueva Tajamar 555 Of. 802, Las Condes
(56-2) 335 2461
*gvasquez@altiuz.c <gc...@altiuz.com>l*

*http://www.altiuz.cl*





      This e-mail and any files transmitted with it are for the sole use
of the intended recipient(s) and may contain confidential and privileged
information. If you are not the intended recipient(s), please reply to the
sender and destroy all copies of the original message. Any unauthorized
review, use, disclosure, dissemination, forwarding, printing or copying of
this email, and/or any action taken in reliance on the contents of this
e-mail is strictly prohibited and may be unlawful.


This e-mail and any files transmitted with it are for the sole use of
the intended recipient(s) and may contain confidential and privileged
information. If you are not the intended recipient(s), please reply to the
sender and destroy all copies of the original message. Any unauthorized
review, use, disclosure, dissemination, forwarding, printing or copying of
this email, and/or any action taken in reliance on the contents of this
e-mail is strictly prohibited and may be unlawful.



--
[cid:image002.png@01CDBEDE.FF77C970][cid:image003.png@01CDBEDE.FF77C970]
This e-mail and any files transmitted with it are for the sole use of the intended recipient(s) and may contain confidential and privileged information. If you are not the intended recipient(s), please reply to the sender and destroy all copies of the original message. Any unauthorized review, use, disclosure, dissemination, forwarding, printing or copying of this email, and/or any action taken in reliance on the contents of this e-mail is strictly prohibited and may be unlawful.

Re: Camel performance tuning

Posted by Gonzalo Vasquez <gv...@altiuz.cl>.
Please see attached image from profiler, which shows the two method that get the 80% of CPU time. Also included again the hotspots list, IO again the first, but only with 13.9%

Gonzalo Vásquez Sáez
Gerente Investigación y Desarrollo (R&D)
Altiuz Soluciones Tecnológicas de Negocios Ltda.
Av. Nueva Tajamar 555 Of. 802, Las Condes
(56-2) 335 2461
gvasquez@altiuz.cl
http://www.altiuz.cl
 


El 09-11-2012, a las 15:09, Christian Müller <ch...@gmail.com> escribió:

> Using Hypersonic, Hadoop or Mongo for such a use case is "over engineering"
> the requirement and will end up in much more complicated solution - IMO.
> 
> Best,
> Christian
> 
> On Fri, Nov 9, 2012 at 6:57 PM, <Ra...@cognizant.com> wrote:
> 
>> You may also want to check out Hadoop and map reduce
>> 
>> 
>> 
>> http://camel.apache.org/hdfs.html
>> 
>> 
>> 
>> with respect to point a and b.
>> 
>> 
>> 
>> You can have an index on the record and the “reduce” job can serialize on
>> the index.
>> 
>> 
>> 
>> *From:* Gonzalo Vasquez [mailto:gvasquez@altiuz.cl]
>> *Sent:* Friday, November 09, 2012 10:16 PM
>> *To:* users@camel.apache.org
>> *Subject:* Re: Camel performance tuning
>> 
>> 
>> 
>> Thanks for your answer, my comments:
>> 
>> 
>> 
>> a) a 5M file could be loaded into memory, but I have streaming enabled as
>> file size could be in the range of GB. Notwithstanding, I'll check what
>> Hypersonic & Mongo are, as I'm not aware of them.
>> 
>> b) Parallel processing is set to false, because records must preserve
>> order on the output file
>> 
>> c) Don't see the point here
>> 
>> d) See a)
>> 
>> e) what about async processing? There's no "long running process" here
>> 
>> 
>> 
>> Thanks again.-
>> 
>> 
>> 
>> *Gonzalo Vásquez Sáez*
>> 
>> *Gerente Investigación y Desarrollo (R&D)*
>> *Altiuz* Soluciones Tecnológicas de Negocios Ltda.
>> Av. Nueva Tajamar 555 Of. 802, Las Condes
>> (56-2) 335 2461
>> *gvasquez@altiuz.c <gc...@altiuz.com>l*
>> 
>> *http://www.altiuz.cl*
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> El 09-11-2012, a las 13:12, <Ra...@cognizant.com> escribió:
>> 
>> 
>> 
>>  I am really new to Camel but here are some options you can try
>> 
>> 
>> 
>> a)      Can you load the 5 MB file to memory before splitting it ? That
>> way IO will not be a problem. Probably put it in something like Hypersonic
>> or Mongo
>> 
>> b)      Why is parallel  processing false ? Are the records related to
>> each other ? If true you can take advantage of multicore
>> 
>> c)       Is it possible to first split the files into chunks and then use
>> process the chunks independently ?
>> 
>> d)      Can you write into memory and flush at once ?
>> 
>> e)      Sync/Asynch : http://camel.apache.org/async.html
>> 
>> 
>> 
>> *From:* Gonzalo Vasquez [mailto:gvasquez@altiuz.cl]
>> *Sent:* Friday, November 09, 2012 8:32 PM
>> *To:* users@camel.apache.org
>> *Subject:* Camel performance tuning
>> 
>> 
>> 
>> I'm running a route that basically adds a character per line to a plain
>> text file, but it's taking to long, and it seems that it's due to some kind
>> of buffering issue when reading/writing from disk.
>> 
>> 
>> 
>> I'm processing a 5MB file (attached as DC_FACCL132_0000
>> MORA_1075_16-10-2012_19-09-47_15.txt.zip), with the corresponding XSL
>> template (also attached).
>> 
>> 
>> 
>> It's taking for ever to process such a file, I understand I'm tokenizing
>> on line breaks, which could be the source of the problem as there are many
>> lines in the file (48198 exactly), but when running jvisualvm (see attached
>> images/snapshot)I can see the writing op is invoked 20386 times, which seem
>> not related to the line count. Is there an output buffer size that I can
>> configure? Or something like that?
>> 
>> 
>> 
>> This is the route:
>> 
>> <camel:route id="pager" autoStartup="true">
>> 
>> <camel:from
>> 
>> uri="
>> file:///tmp/in?charset=Windows-1252&amp;move=${file:parent}/../paged/${file:name.noext}.paged.ack&amp;preMove=${file:name.noext}-${date:now:yyyyMMddHHmmssSSS}.${file:ext}
>> " />
>> 
>> <camel:split streaming="true" parallelProcessing="false">
>> 
>> <camel:tokenize token="\n" />
>> 
>> <camel:to uri="bean:pager" />
>> 
>> <camel:to
>> 
>> uri="
>> file:///tmp/paged?charset=utf8&amp;fileName=${file:name.noext}.paged&amp;fileExist=Append
>> " />
>> 
>> </camel:split>
>> 
>> </camel:route>
>> 
>> 
>> 
>> This is the referenced bean:
>> 
>> 
>> 
>> <bean id="pager" class="cl.altiuz.reports.etl.TextProcessor">
>> 
>> <property name="xsltPath"
>> 
>> value=
>> "/Users/gonzalovasquez/Documents/workspace/altiuz-reports/reports-etl/xsl/pager.xsl"
>> />
>> 
>> <property name="param" value="C.*PAG.* 1" />
>> 
>> </bean>
>> 
>> 
>> 
>> Camel versión is 2,10.1, and happens both on OSX & MS Windows, so I think
>> isn't a platform dependent problem, but a configuration one.
>> 
>> 
>> 
>> Any ideas? Any thing else that I should send?
>> 
>> 
>> 
>> Thanks!
>> 
>> 
>> 
>> *Gonzalo Vásquez Sáez*
>> 
>> *Gerente Investigación y Desarrollo (R&D)*
>> *Altiuz* Soluciones Tecnológicas de Negocios Ltda.
>> Av. Nueva Tajamar 555 Of. 802, Las Condes
>> (56-2) 335 2461
>> *gvasquez@altiuz.c <gc...@altiuz.com>l*
>> 
>> *http://www.altiuz.cl*
>> 
>> 
>> 
>> 
>> 
>>       This e-mail and any files transmitted with it are for the sole use
>> of the intended recipient(s) and may contain confidential and privileged
>> information. If you are not the intended recipient(s), please reply to the
>> sender and destroy all copies of the original message. Any unauthorized
>> review, use, disclosure, dissemination, forwarding, printing or copying of
>> this email, and/or any action taken in reliance on the contents of this
>> e-mail is strictly prohibited and may be unlawful.
>> 
>> 
>> This e-mail and any files transmitted with it are for the sole use of
>> the intended recipient(s) and may contain confidential and privileged
>> information. If you are not the intended recipient(s), please reply to the
>> sender and destroy all copies of the original message. Any unauthorized
>> review, use, disclosure, dissemination, forwarding, printing or copying of
>> this email, and/or any action taken in reliance on the contents of this
>> e-mail is strictly prohibited and may be unlawful.
>> 
> 
> 
> 
> --

Re: Camel performance tuning

Posted by Gonzalo Vásquez <gv...@altiuz.cl>.
Thanks Claus,

I'll go for the File Component enhancent, but on Monday, as what I'm trying to achieve is a generic and easy way to configure systems, for the non so skilled within out company, so XML is their way to go.

I'll provide feedback on monday, and posible and enhanced component for the community.

Enviado desde mi iPhone

El 10-11-2012, a las 5:18, Claus Ibsen <cl...@gmail.com> escribió:

> On Fri, Nov 9, 2012 at 7:09 PM, Christian Müller
> <ch...@gmail.com> wrote:
>> Using Hypersonic, Hadoop or Mongo for such a use case is "over engineering"
>> the requirement and will end up in much more complicated solution - IMO.
> 
> Yeah it sure is. After all we are talking about appending data to a file.
> 
> You can just use the Java API as I have shown with the links for the 2
> test cases.
> The "fast" is much faster, as it reuses the same stream for the entire
> processing,
> and that is also how you would do it from java code, to iterate data
> and write to the file.
> 
> If you want this without doing any java code in the Camel DSL, it
> would need to enhance the file component
> to allow it to store a file stream on the exchange, and have it pass
> over to the next splitted message for re-use.
> It's doable, but a bit "hard" to do to support this use-case.
> 
>> Best,
>> Christian
>> 
>> On Fri, Nov 9, 2012 at 6:57 PM, <Ra...@cognizant.com> wrote:
>> 
>>> You may also want to check out Hadoop and map reduce
>>> 
>>> 
>>> 
>>> http://camel.apache.org/hdfs.html
>>> 
>>> 
>>> 
>>> with respect to point a and b.
>>> 
>>> 
>>> 
>>> You can have an index on the record and the “reduce” job can serialize on
>>> the index.
>>> 
>>> 
>>> 
>>> *From:* Gonzalo Vasquez [mailto:gvasquez@altiuz.cl]
>>> *Sent:* Friday, November 09, 2012 10:16 PM
>>> *To:* users@camel.apache.org
>>> *Subject:* Re: Camel performance tuning
>>> 
>>> 
>>> 
>>> Thanks for your answer, my comments:
>>> 
>>> 
>>> 
>>> a) a 5M file could be loaded into memory, but I have streaming enabled as
>>> file size could be in the range of GB. Notwithstanding, I'll check what
>>> Hypersonic & Mongo are, as I'm not aware of them.
>>> 
>>> b) Parallel processing is set to false, because records must preserve
>>> order on the output file
>>> 
>>> c) Don't see the point here
>>> 
>>> d) See a)
>>> 
>>> e) what about async processing? There's no "long running process" here
>>> 
>>> 
>>> 
>>> Thanks again.-
>>> 
>>> 
>>> 
>>> *Gonzalo Vásquez Sáez*
>>> 
>>> *Gerente Investigación y Desarrollo (R&D)*
>>> *Altiuz* Soluciones Tecnológicas de Negocios Ltda.
>>> Av. Nueva Tajamar 555 Of. 802, Las Condes
>>> (56-2) 335 2461
>>> *gvasquez@altiuz.c <gc...@altiuz.com>l*
>>> 
>>> *http://www.altiuz.cl*
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> El 09-11-2012, a las 13:12, <Ra...@cognizant.com> escribió:
>>> 
>>> 
>>> 
>>>  I am really new to Camel but here are some options you can try
>>> 
>>> 
>>> 
>>> a)      Can you load the 5 MB file to memory before splitting it ? That
>>> way IO will not be a problem. Probably put it in something like Hypersonic
>>> or Mongo
>>> 
>>> b)      Why is parallel  processing false ? Are the records related to
>>> each other ? If true you can take advantage of multicore
>>> 
>>> c)       Is it possible to first split the files into chunks and then use
>>> process the chunks independently ?
>>> 
>>> d)      Can you write into memory and flush at once ?
>>> 
>>> e)      Sync/Asynch : http://camel.apache.org/async.html
>>> 
>>> 
>>> 
>>> *From:* Gonzalo Vasquez [mailto:gvasquez@altiuz.cl]
>>> *Sent:* Friday, November 09, 2012 8:32 PM
>>> *To:* users@camel.apache.org
>>> *Subject:* Camel performance tuning
>>> 
>>> 
>>> 
>>> I'm running a route that basically adds a character per line to a plain
>>> text file, but it's taking to long, and it seems that it's due to some kind
>>> of buffering issue when reading/writing from disk.
>>> 
>>> 
>>> 
>>> I'm processing a 5MB file (attached as DC_FACCL132_0000
>>> MORA_1075_16-10-2012_19-09-47_15.txt.zip), with the corresponding XSL
>>> template (also attached).
>>> 
>>> 
>>> 
>>> It's taking for ever to process such a file, I understand I'm tokenizing
>>> on line breaks, which could be the source of the problem as there are many
>>> lines in the file (48198 exactly), but when running jvisualvm (see attached
>>> images/snapshot)I can see the writing op is invoked 20386 times, which seem
>>> not related to the line count. Is there an output buffer size that I can
>>> configure? Or something like that?
>>> 
>>> 
>>> 
>>> This is the route:
>>> 
>>> <camel:route id="pager" autoStartup="true">
>>> 
>>> <camel:from
>>> 
>>> uri="
>>> file:///tmp/in?charset=Windows-1252&amp;move=${file:parent}/../paged/${file:name.noext}.paged.ack&amp;preMove=${file:name.noext}-${date:now:yyyyMMddHHmmssSSS}.${file:ext}
>>> " />
>>> 
>>> <camel:split streaming="true" parallelProcessing="false">
>>> 
>>> <camel:tokenize token="\n" />
>>> 
>>> <camel:to uri="bean:pager" />
>>> 
>>> <camel:to
>>> 
>>> uri="
>>> file:///tmp/paged?charset=utf8&amp;fileName=${file:name.noext}.paged&amp;fileExist=Append
>>> " />
>>> 
>>> </camel:split>
>>> 
>>> </camel:route>
>>> 
>>> 
>>> 
>>> This is the referenced bean:
>>> 
>>> 
>>> 
>>> <bean id="pager" class="cl.altiuz.reports.etl.TextProcessor">
>>> 
>>> <property name="xsltPath"
>>> 
>>> value=
>>> "/Users/gonzalovasquez/Documents/workspace/altiuz-reports/reports-etl/xsl/pager.xsl"
>>> />
>>> 
>>> <property name="param" value="C.*PAG.* 1" />
>>> 
>>> </bean>
>>> 
>>> 
>>> 
>>> Camel versión is 2,10.1, and happens both on OSX & MS Windows, so I think
>>> isn't a platform dependent problem, but a configuration one.
>>> 
>>> 
>>> 
>>> Any ideas? Any thing else that I should send?
>>> 
>>> 
>>> 
>>> Thanks!
>>> 
>>> 
>>> 
>>> *Gonzalo Vásquez Sáez*
>>> 
>>> *Gerente Investigación y Desarrollo (R&D)*
>>> *Altiuz* Soluciones Tecnológicas de Negocios Ltda.
>>> Av. Nueva Tajamar 555 Of. 802, Las Condes
>>> (56-2) 335 2461
>>> *gvasquez@altiuz.c <gc...@altiuz.com>l*
>>> 
>>> *http://www.altiuz.cl*
>>> 
>>> 
>>> 
>>> 
>>> 
>>>       This e-mail and any files transmitted with it are for the sole use
>>> of the intended recipient(s) and may contain confidential and privileged
>>> information. If you are not the intended recipient(s), please reply to the
>>> sender and destroy all copies of the original message. Any unauthorized
>>> review, use, disclosure, dissemination, forwarding, printing or copying of
>>> this email, and/or any action taken in reliance on the contents of this
>>> e-mail is strictly prohibited and may be unlawful.
>>> 
>>> 
>>> This e-mail and any files transmitted with it are for the sole use of
>>> the intended recipient(s) and may contain confidential and privileged
>>> information. If you are not the intended recipient(s), please reply to the
>>> sender and destroy all copies of the original message. Any unauthorized
>>> review, use, disclosure, dissemination, forwarding, printing or copying of
>>> this email, and/or any action taken in reliance on the contents of this
>>> e-mail is strictly prohibited and may be unlawful.
>> 
>> 
>> 
>> --
> 
> 
> 
> -- 
> Claus Ibsen
> -----------------
> Red Hat, Inc.
> FuseSource is now part of Red Hat
> Email: cibsen@redhat.com
> Web: http://fusesource.com
> Twitter: davsclaus
> Blog: http://davsclaus.com
> Author of Camel in Action: http://www.manning.com/ibsen

Re: Camel performance tuning

Posted by Christian Müller <ch...@gmail.com>.
I added my example where I played with the file -> split -> file use case.

With your input file and without the xslt transformation, I got the
following results:

Case 1) -> took 18.128 seconds over all (average 2,814.754 messages per
second )

    <camel:route id="pager">
      <camel:from
uri="file://src/test/data?charset=Windows-1252&amp;noop=true" />
      <camel:to uri="log:START" />
      <camel:split streaming="true" parallelProcessing="false">
        <camel:tokenize token="\n" />
        <camel:to uri="log:LINEPARSER?groupSize=1000" />
        <camel:to
uri="file://target/paged?charset=utf8&amp;fileName=${file:name.noext}.paged&amp;fileExist=Append"
/>
      </camel:split>
      <camel:to uri="log:END" />
    </camel:route>


Case 2) -> took 6.108 seconds over all (average 9,443.242 messages per
second)

    <camel:route id="pager">
      <camel:from
uri="file://src/test/data?charset=Windows-1252&amp;noop=true" />
      <camel:to uri="log:START" />
      <camel:split streaming="true" parallelProcessing="false">
        <camel:tokenize token="\n" />
        <camel:to uri="log:LINEPARSER?groupSize=1000" />
        <camel:setHeader
headerName="aggregationKey"><camel:constant>aggregationKey</camel:constant></camel:setHeader>
        <camel:aggregate strategyRef="aggregatorStrategy"
completionSize="1000" completionTimeout="2000">

<camel:correlationExpression><camel:simple>header.aggregationKey</camel:simple></camel:correlationExpression>
          <camel:convertBodyTo type="String" />
          <camel:to
uri="file://target/paged?charset=utf8&amp;fileName=${file:name.noext}.paged&amp;fileExist=Append"
/>
        </camel:aggregate>
      </camel:split>
      <camel:to uri="log:END" />
    </camel:route>

The trick here is to use a StringBuilder to concatenation the strings
instead of pure string concatenation. It's much more faster...
And you have to check which completion size works best for you.

public class MyAggregatorStrategy implements AggregationStrategy {

    @Override
    public Exchange aggregate(Exchange oldExchange, Exchange newExchange) {
        if (oldExchange == null) {
            newExchange.getIn().setBody(new
StringBuilder(newExchange.getIn().getBody(String.class)));
            return newExchange;
        }


oldExchange.getIn().getBody(StringBuilder.class).append(newExchange.getIn().getBody(String.class));

        return oldExchange;
    }
}


Case 3) -> took 5.791 seconds over all (average 10,062.893 messages per
second)

    <camel:route id="pager">
      <camel:from
uri="file://src/test/data?charset=Windows-1252&amp;noop=true" />
      <camel:to uri="log:START" />
      <camel:split streaming="true" parallelProcessing="false">
        <camel:tokenize token="\n" />
        <camel:to uri="log:LINEPARSER?groupSize=1000" />
        <camel:to uri="stream:file?fileName=target/paged/DC_FACCL132_0000
MORA_1075_16-10-2012_19-09-47_15.txt.paged&amp;encoding=utf8" />
      </camel:split>
      <camel:to uri="log:END" />
    </camel:route>

This route use the camel-stream component which reuse the underlying output
stream.

Best,
Christian

Re: Camel performance tuning

Posted by Gonzalo Vasquez <gv...@altiuz.cl>.
Which is the java class for the File Component? Can I just extend it overriding specific method, or do I have to copy/paste it completely and modify needed parts?

Gonzalo Vásquez Sáez
Gerente Investigación y Desarrollo (R&D)
Altiuz Soluciones Tecnológicas de Negocios Ltda.
Av. Nueva Tajamar 555 Of. 802, Las Condes
(56-2) 335 2461
gvasquez@altiuz.cl
http://www.altiuz.cl
 


El 10-11-2012, a las 5:18, Claus Ibsen <cl...@gmail.com> escribió:

> On Fri, Nov 9, 2012 at 7:09 PM, Christian Müller
> <ch...@gmail.com> wrote:
>> Using Hypersonic, Hadoop or Mongo for such a use case is "over engineering"
>> the requirement and will end up in much more complicated solution - IMO.
>> 
> 
> Yeah it sure is. After all we are talking about appending data to a file.
> 
> You can just use the Java API as I have shown with the links for the 2
> test cases.
> The "fast" is much faster, as it reuses the same stream for the entire
> processing,
> and that is also how you would do it from java code, to iterate data
> and write to the file.
> 
> If you want this without doing any java code in the Camel DSL, it
> would need to enhance the file component
> to allow it to store a file stream on the exchange, and have it pass
> over to the next splitted message for re-use.
> It's doable, but a bit "hard" to do to support this use-case.
> 
>> Best,
>> Christian
>> 
>> On Fri, Nov 9, 2012 at 6:57 PM, <Ra...@cognizant.com> wrote:
>> 
>>> You may also want to check out Hadoop and map reduce
>>> 
>>> 
>>> 
>>> http://camel.apache.org/hdfs.html
>>> 
>>> 
>>> 
>>> with respect to point a and b.
>>> 
>>> 
>>> 
>>> You can have an index on the record and the “reduce” job can serialize on
>>> the index.
>>> 
>>> 
>>> 
>>> *From:* Gonzalo Vasquez [mailto:gvasquez@altiuz.cl]
>>> *Sent:* Friday, November 09, 2012 10:16 PM
>>> *To:* users@camel.apache.org
>>> *Subject:* Re: Camel performance tuning
>>> 
>>> 
>>> 
>>> Thanks for your answer, my comments:
>>> 
>>> 
>>> 
>>> a) a 5M file could be loaded into memory, but I have streaming enabled as
>>> file size could be in the range of GB. Notwithstanding, I'll check what
>>> Hypersonic & Mongo are, as I'm not aware of them.
>>> 
>>> b) Parallel processing is set to false, because records must preserve
>>> order on the output file
>>> 
>>> c) Don't see the point here
>>> 
>>> d) See a)
>>> 
>>> e) what about async processing? There's no "long running process" here
>>> 
>>> 
>>> 
>>> Thanks again.-
>>> 
>>> 
>>> 
>>> *Gonzalo Vásquez Sáez*
>>> 
>>> *Gerente Investigación y Desarrollo (R&D)*
>>> *Altiuz* Soluciones Tecnológicas de Negocios Ltda.
>>> Av. Nueva Tajamar 555 Of. 802, Las Condes
>>> (56-2) 335 2461
>>> *gvasquez@altiuz.c <gc...@altiuz.com>l*
>>> 
>>> *http://www.altiuz.cl*
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> El 09-11-2012, a las 13:12, <Ra...@cognizant.com> escribió:
>>> 
>>> 
>>> 
>>>  I am really new to Camel but here are some options you can try
>>> 
>>> 
>>> 
>>> a)      Can you load the 5 MB file to memory before splitting it ? That
>>> way IO will not be a problem. Probably put it in something like Hypersonic
>>> or Mongo
>>> 
>>> b)      Why is parallel  processing false ? Are the records related to
>>> each other ? If true you can take advantage of multicore
>>> 
>>> c)       Is it possible to first split the files into chunks and then use
>>> process the chunks independently ?
>>> 
>>> d)      Can you write into memory and flush at once ?
>>> 
>>> e)      Sync/Asynch : http://camel.apache.org/async.html
>>> 
>>> 
>>> 
>>> *From:* Gonzalo Vasquez [mailto:gvasquez@altiuz.cl]
>>> *Sent:* Friday, November 09, 2012 8:32 PM
>>> *To:* users@camel.apache.org
>>> *Subject:* Camel performance tuning
>>> 
>>> 
>>> 
>>> I'm running a route that basically adds a character per line to a plain
>>> text file, but it's taking to long, and it seems that it's due to some kind
>>> of buffering issue when reading/writing from disk.
>>> 
>>> 
>>> 
>>> I'm processing a 5MB file (attached as DC_FACCL132_0000
>>> MORA_1075_16-10-2012_19-09-47_15.txt.zip), with the corresponding XSL
>>> template (also attached).
>>> 
>>> 
>>> 
>>> It's taking for ever to process such a file, I understand I'm tokenizing
>>> on line breaks, which could be the source of the problem as there are many
>>> lines in the file (48198 exactly), but when running jvisualvm (see attached
>>> images/snapshot)I can see the writing op is invoked 20386 times, which seem
>>> not related to the line count. Is there an output buffer size that I can
>>> configure? Or something like that?
>>> 
>>> 
>>> 
>>> This is the route:
>>> 
>>> <camel:route id="pager" autoStartup="true">
>>> 
>>> <camel:from
>>> 
>>> uri="
>>> file:///tmp/in?charset=Windows-1252&amp;move=${file:parent}/../paged/${file:name.noext}.paged.ack&amp;preMove=${file:name.noext}-${date:now:yyyyMMddHHmmssSSS}.${file:ext}
>>> " />
>>> 
>>> <camel:split streaming="true" parallelProcessing="false">
>>> 
>>> <camel:tokenize token="\n" />
>>> 
>>> <camel:to uri="bean:pager" />
>>> 
>>> <camel:to
>>> 
>>> uri="
>>> file:///tmp/paged?charset=utf8&amp;fileName=${file:name.noext}.paged&amp;fileExist=Append
>>> " />
>>> 
>>> </camel:split>
>>> 
>>> </camel:route>
>>> 
>>> 
>>> 
>>> This is the referenced bean:
>>> 
>>> 
>>> 
>>> <bean id="pager" class="cl.altiuz.reports.etl.TextProcessor">
>>> 
>>> <property name="xsltPath"
>>> 
>>> value=
>>> "/Users/gonzalovasquez/Documents/workspace/altiuz-reports/reports-etl/xsl/pager.xsl"
>>> />
>>> 
>>> <property name="param" value="C.*PAG.* 1" />
>>> 
>>> </bean>
>>> 
>>> 
>>> 
>>> Camel versión is 2,10.1, and happens both on OSX & MS Windows, so I think
>>> isn't a platform dependent problem, but a configuration one.
>>> 
>>> 
>>> 
>>> Any ideas? Any thing else that I should send?
>>> 
>>> 
>>> 
>>> Thanks!
>>> 
>>> 
>>> 
>>> *Gonzalo Vásquez Sáez*
>>> 
>>> *Gerente Investigación y Desarrollo (R&D)*
>>> *Altiuz* Soluciones Tecnológicas de Negocios Ltda.
>>> Av. Nueva Tajamar 555 Of. 802, Las Condes
>>> (56-2) 335 2461
>>> *gvasquez@altiuz.c <gc...@altiuz.com>l*
>>> 
>>> *http://www.altiuz.cl*
>>> 
>>> 
>>> 
>>> 
>>> 
>>>       This e-mail and any files transmitted with it are for the sole use
>>> of the intended recipient(s) and may contain confidential and privileged
>>> information. If you are not the intended recipient(s), please reply to the
>>> sender and destroy all copies of the original message. Any unauthorized
>>> review, use, disclosure, dissemination, forwarding, printing or copying of
>>> this email, and/or any action taken in reliance on the contents of this
>>> e-mail is strictly prohibited and may be unlawful.
>>> 
>>> 
>>> This e-mail and any files transmitted with it are for the sole use of
>>> the intended recipient(s) and may contain confidential and privileged
>>> information. If you are not the intended recipient(s), please reply to the
>>> sender and destroy all copies of the original message. Any unauthorized
>>> review, use, disclosure, dissemination, forwarding, printing or copying of
>>> this email, and/or any action taken in reliance on the contents of this
>>> e-mail is strictly prohibited and may be unlawful.
>>> 
>> 
>> 
>> 
>> --
> 
> 
> 
> -- 
> Claus Ibsen
> -----------------
> Red Hat, Inc.
> FuseSource is now part of Red Hat
> Email: cibsen@redhat.com
> Web: http://fusesource.com
> Twitter: davsclaus
> Blog: http://davsclaus.com
> Author of Camel in Action: http://www.manning.com/ibsen


Re: Camel performance tuning

Posted by Claus Ibsen <cl...@gmail.com>.
On Fri, Nov 9, 2012 at 7:09 PM, Christian Müller
<ch...@gmail.com> wrote:
> Using Hypersonic, Hadoop or Mongo for such a use case is "over engineering"
> the requirement and will end up in much more complicated solution - IMO.
>

Yeah it sure is. After all we are talking about appending data to a file.

You can just use the Java API as I have shown with the links for the 2
test cases.
The "fast" is much faster, as it reuses the same stream for the entire
processing,
and that is also how you would do it from java code, to iterate data
and write to the file.

If you want this without doing any java code in the Camel DSL, it
would need to enhance the file component
to allow it to store a file stream on the exchange, and have it pass
over to the next splitted message for re-use.
It's doable, but a bit "hard" to do to support this use-case.

> Best,
> Christian
>
> On Fri, Nov 9, 2012 at 6:57 PM, <Ra...@cognizant.com> wrote:
>
>>  You may also want to check out Hadoop and map reduce
>>
>>
>>
>> http://camel.apache.org/hdfs.html
>>
>>
>>
>> with respect to point a and b.
>>
>>
>>
>> You can have an index on the record and the “reduce” job can serialize on
>> the index.
>>
>>
>>
>> *From:* Gonzalo Vasquez [mailto:gvasquez@altiuz.cl]
>> *Sent:* Friday, November 09, 2012 10:16 PM
>> *To:* users@camel.apache.org
>> *Subject:* Re: Camel performance tuning
>>
>>
>>
>> Thanks for your answer, my comments:
>>
>>
>>
>> a) a 5M file could be loaded into memory, but I have streaming enabled as
>> file size could be in the range of GB. Notwithstanding, I'll check what
>> Hypersonic & Mongo are, as I'm not aware of them.
>>
>> b) Parallel processing is set to false, because records must preserve
>> order on the output file
>>
>> c) Don't see the point here
>>
>> d) See a)
>>
>> e) what about async processing? There's no "long running process" here
>>
>>
>>
>> Thanks again.-
>>
>>
>>
>> *Gonzalo Vásquez Sáez*
>>
>> *Gerente Investigación y Desarrollo (R&D)*
>> *Altiuz* Soluciones Tecnológicas de Negocios Ltda.
>> Av. Nueva Tajamar 555 Of. 802, Las Condes
>> (56-2) 335 2461
>> *gvasquez@altiuz.c <gc...@altiuz.com>l*
>>
>> *http://www.altiuz.cl*
>>
>>
>>
>>
>>
>>
>>
>> El 09-11-2012, a las 13:12, <Ra...@cognizant.com> escribió:
>>
>>
>>
>>   I am really new to Camel but here are some options you can try
>>
>>
>>
>> a)      Can you load the 5 MB file to memory before splitting it ? That
>> way IO will not be a problem. Probably put it in something like Hypersonic
>> or Mongo
>>
>> b)      Why is parallel  processing false ? Are the records related to
>> each other ? If true you can take advantage of multicore
>>
>> c)       Is it possible to first split the files into chunks and then use
>> process the chunks independently ?
>>
>> d)      Can you write into memory and flush at once ?
>>
>> e)      Sync/Asynch : http://camel.apache.org/async.html
>>
>>
>>
>> *From:* Gonzalo Vasquez [mailto:gvasquez@altiuz.cl]
>> *Sent:* Friday, November 09, 2012 8:32 PM
>> *To:* users@camel.apache.org
>> *Subject:* Camel performance tuning
>>
>>
>>
>> I'm running a route that basically adds a character per line to a plain
>> text file, but it's taking to long, and it seems that it's due to some kind
>> of buffering issue when reading/writing from disk.
>>
>>
>>
>> I'm processing a 5MB file (attached as DC_FACCL132_0000
>> MORA_1075_16-10-2012_19-09-47_15.txt.zip), with the corresponding XSL
>> template (also attached).
>>
>>
>>
>> It's taking for ever to process such a file, I understand I'm tokenizing
>> on line breaks, which could be the source of the problem as there are many
>> lines in the file (48198 exactly), but when running jvisualvm (see attached
>> images/snapshot)I can see the writing op is invoked 20386 times, which seem
>> not related to the line count. Is there an output buffer size that I can
>> configure? Or something like that?
>>
>>
>>
>> This is the route:
>>
>> <camel:route id="pager" autoStartup="true">
>>
>> <camel:from
>>
>> uri="
>> file:///tmp/in?charset=Windows-1252&amp;move=${file:parent}/../paged/${file:name.noext}.paged.ack&amp;preMove=${file:name.noext}-${date:now:yyyyMMddHHmmssSSS}.${file:ext}
>> " />
>>
>> <camel:split streaming="true" parallelProcessing="false">
>>
>> <camel:tokenize token="\n" />
>>
>> <camel:to uri="bean:pager" />
>>
>> <camel:to
>>
>> uri="
>> file:///tmp/paged?charset=utf8&amp;fileName=${file:name.noext}.paged&amp;fileExist=Append
>> " />
>>
>> </camel:split>
>>
>> </camel:route>
>>
>>
>>
>> This is the referenced bean:
>>
>>
>>
>> <bean id="pager" class="cl.altiuz.reports.etl.TextProcessor">
>>
>> <property name="xsltPath"
>>
>> value=
>> "/Users/gonzalovasquez/Documents/workspace/altiuz-reports/reports-etl/xsl/pager.xsl"
>>  />
>>
>> <property name="param" value="C.*PAG.* 1" />
>>
>> </bean>
>>
>>
>>
>> Camel versión is 2,10.1, and happens both on OSX & MS Windows, so I think
>> isn't a platform dependent problem, but a configuration one.
>>
>>
>>
>> Any ideas? Any thing else that I should send?
>>
>>
>>
>> Thanks!
>>
>>
>>
>> *Gonzalo Vásquez Sáez*
>>
>> *Gerente Investigación y Desarrollo (R&D)*
>> *Altiuz* Soluciones Tecnológicas de Negocios Ltda.
>> Av. Nueva Tajamar 555 Of. 802, Las Condes
>> (56-2) 335 2461
>> *gvasquez@altiuz.c <gc...@altiuz.com>l*
>>
>> *http://www.altiuz.cl*
>>
>>
>>
>>
>>
>>        This e-mail and any files transmitted with it are for the sole use
>> of the intended recipient(s) and may contain confidential and privileged
>> information. If you are not the intended recipient(s), please reply to the
>> sender and destroy all copies of the original message. Any unauthorized
>> review, use, disclosure, dissemination, forwarding, printing or copying of
>> this email, and/or any action taken in reliance on the contents of this
>> e-mail is strictly prohibited and may be unlawful.
>>
>>
>>  This e-mail and any files transmitted with it are for the sole use of
>> the intended recipient(s) and may contain confidential and privileged
>> information. If you are not the intended recipient(s), please reply to the
>> sender and destroy all copies of the original message. Any unauthorized
>> review, use, disclosure, dissemination, forwarding, printing or copying of
>> this email, and/or any action taken in reliance on the contents of this
>> e-mail is strictly prohibited and may be unlawful.
>>
>
>
>
> --



-- 
Claus Ibsen
-----------------
Red Hat, Inc.
FuseSource is now part of Red Hat
Email: cibsen@redhat.com
Web: http://fusesource.com
Twitter: davsclaus
Blog: http://davsclaus.com
Author of Camel in Action: http://www.manning.com/ibsen

Re: Camel performance tuning

Posted by Christian Müller <ch...@gmail.com>.
Using Hypersonic, Hadoop or Mongo for such a use case is "over engineering"
the requirement and will end up in much more complicated solution - IMO.

Best,
Christian

On Fri, Nov 9, 2012 at 6:57 PM, <Ra...@cognizant.com> wrote:

>  You may also want to check out Hadoop and map reduce
>
>
>
> http://camel.apache.org/hdfs.html
>
>
>
> with respect to point a and b.
>
>
>
> You can have an index on the record and the “reduce” job can serialize on
> the index.
>
>
>
> *From:* Gonzalo Vasquez [mailto:gvasquez@altiuz.cl]
> *Sent:* Friday, November 09, 2012 10:16 PM
> *To:* users@camel.apache.org
> *Subject:* Re: Camel performance tuning
>
>
>
> Thanks for your answer, my comments:
>
>
>
> a) a 5M file could be loaded into memory, but I have streaming enabled as
> file size could be in the range of GB. Notwithstanding, I'll check what
> Hypersonic & Mongo are, as I'm not aware of them.
>
> b) Parallel processing is set to false, because records must preserve
> order on the output file
>
> c) Don't see the point here
>
> d) See a)
>
> e) what about async processing? There's no "long running process" here
>
>
>
> Thanks again.-
>
>
>
> *Gonzalo Vásquez Sáez*
>
> *Gerente Investigación y Desarrollo (R&D)*
> *Altiuz* Soluciones Tecnológicas de Negocios Ltda.
> Av. Nueva Tajamar 555 Of. 802, Las Condes
> (56-2) 335 2461
> *gvasquez@altiuz.c <gc...@altiuz.com>l*
>
> *http://www.altiuz.cl*
>
>
>
>
>
>
>
> El 09-11-2012, a las 13:12, <Ra...@cognizant.com> escribió:
>
>
>
>   I am really new to Camel but here are some options you can try
>
>
>
> a)      Can you load the 5 MB file to memory before splitting it ? That
> way IO will not be a problem. Probably put it in something like Hypersonic
> or Mongo
>
> b)      Why is parallel  processing false ? Are the records related to
> each other ? If true you can take advantage of multicore
>
> c)       Is it possible to first split the files into chunks and then use
> process the chunks independently ?
>
> d)      Can you write into memory and flush at once ?
>
> e)      Sync/Asynch : http://camel.apache.org/async.html
>
>
>
> *From:* Gonzalo Vasquez [mailto:gvasquez@altiuz.cl]
> *Sent:* Friday, November 09, 2012 8:32 PM
> *To:* users@camel.apache.org
> *Subject:* Camel performance tuning
>
>
>
> I'm running a route that basically adds a character per line to a plain
> text file, but it's taking to long, and it seems that it's due to some kind
> of buffering issue when reading/writing from disk.
>
>
>
> I'm processing a 5MB file (attached as DC_FACCL132_0000
> MORA_1075_16-10-2012_19-09-47_15.txt.zip), with the corresponding XSL
> template (also attached).
>
>
>
> It's taking for ever to process such a file, I understand I'm tokenizing
> on line breaks, which could be the source of the problem as there are many
> lines in the file (48198 exactly), but when running jvisualvm (see attached
> images/snapshot)I can see the writing op is invoked 20386 times, which seem
> not related to the line count. Is there an output buffer size that I can
> configure? Or something like that?
>
>
>
> This is the route:
>
> <camel:route id="pager" autoStartup="true">
>
> <camel:from
>
> uri="
> file:///tmp/in?charset=Windows-1252&amp;move=${file:parent}/../paged/${file:name.noext}.paged.ack&amp;preMove=${file:name.noext}-${date:now:yyyyMMddHHmmssSSS}.${file:ext}
> " />
>
> <camel:split streaming="true" parallelProcessing="false">
>
> <camel:tokenize token="\n" />
>
> <camel:to uri="bean:pager" />
>
> <camel:to
>
> uri="
> file:///tmp/paged?charset=utf8&amp;fileName=${file:name.noext}.paged&amp;fileExist=Append
> " />
>
> </camel:split>
>
> </camel:route>
>
>
>
> This is the referenced bean:
>
>
>
> <bean id="pager" class="cl.altiuz.reports.etl.TextProcessor">
>
> <property name="xsltPath"
>
> value=
> "/Users/gonzalovasquez/Documents/workspace/altiuz-reports/reports-etl/xsl/pager.xsl"
>  />
>
> <property name="param" value="C.*PAG.* 1" />
>
> </bean>
>
>
>
> Camel versión is 2,10.1, and happens both on OSX & MS Windows, so I think
> isn't a platform dependent problem, but a configuration one.
>
>
>
> Any ideas? Any thing else that I should send?
>
>
>
> Thanks!
>
>
>
> *Gonzalo Vásquez Sáez*
>
> *Gerente Investigación y Desarrollo (R&D)*
> *Altiuz* Soluciones Tecnológicas de Negocios Ltda.
> Av. Nueva Tajamar 555 Of. 802, Las Condes
> (56-2) 335 2461
> *gvasquez@altiuz.c <gc...@altiuz.com>l*
>
> *http://www.altiuz.cl*
>
>
>
>
>
>        This e-mail and any files transmitted with it are for the sole use
> of the intended recipient(s) and may contain confidential and privileged
> information. If you are not the intended recipient(s), please reply to the
> sender and destroy all copies of the original message. Any unauthorized
> review, use, disclosure, dissemination, forwarding, printing or copying of
> this email, and/or any action taken in reliance on the contents of this
> e-mail is strictly prohibited and may be unlawful.
>
>
>  This e-mail and any files transmitted with it are for the sole use of
> the intended recipient(s) and may contain confidential and privileged
> information. If you are not the intended recipient(s), please reply to the
> sender and destroy all copies of the original message. Any unauthorized
> review, use, disclosure, dissemination, forwarding, printing or copying of
> this email, and/or any action taken in reliance on the contents of this
> e-mail is strictly prohibited and may be unlawful.
>



--

RE: Camel performance tuning

Posted by Ra...@cognizant.com.
You may also want to check out Hadoop and map reduce

http://camel.apache.org/hdfs.html

with respect to point a and b.

You can have an index on the record and the "reduce" job can serialize on the index.

From: Gonzalo Vasquez [mailto:gvasquez@altiuz.cl]
Sent: Friday, November 09, 2012 10:16 PM
To: users@camel.apache.org
Subject: Re: Camel performance tuning

Thanks for your answer, my comments:

a) a 5M file could be loaded into memory, but I have streaming enabled as file size could be in the range of GB. Notwithstanding, I'll check what Hypersonic & Mongo are, as I'm not aware of them.
b) Parallel processing is set to false, because records must preserve order on the output file
c) Don't see the point here
d) See a)
e) what about async processing? There's no "long running process" here

Thanks again.-

Gonzalo Vásquez Sáez
Gerente Investigación y Desarrollo (R&D)
Altiuz Soluciones Tecnológicas de Negocios Ltda.
Av. Nueva Tajamar 555 Of. 802, Las Condes
(56-2) 335 2461
gvasquez@altiuz.c<ma...@altiuz.com>l
http://www.altiuz.cl<http://www.altiuz.cl/>


[cid:image001.jpg@01CDBED1.BDBD8950]

El 09-11-2012, a las 13:12, <Ra...@cognizant.com>> escribió:


I am really new to Camel but here are some options you can try

a)      Can you load the 5 MB file to memory before splitting it ? That way IO will not be a problem. Probably put it in something like Hypersonic or Mongo
b)      Why is parallel  processing false ? Are the records related to each other ? If true you can take advantage of multicore
c)       Is it possible to first split the files into chunks and then use process the chunks independently ?
d)      Can you write into memory and flush at once ?
e)      Sync/Asynch : http://camel.apache.org/async.html

From: Gonzalo Vasquez [mailto:gvasquez@altiuz.cl]
Sent: Friday, November 09, 2012 8:32 PM
To: users@camel.apache.org<ma...@camel.apache.org>
Subject: Camel performance tuning

I'm running a route that basically adds a character per line to a plain text file, but it's taking to long, and it seems that it's due to some kind of buffering issue when reading/writing from disk.

I'm processing a 5MB file (attached as DC_FACCL132_0000 MORA_1075_16-10-2012_19-09-47_15.txt.zip), with the corresponding XSL template (also attached).

It's taking for ever to process such a file, I understand I'm tokenizing on line breaks, which could be the source of the problem as there are many lines in the file (48198 exactly), but when running jvisualvm (see attached images/snapshot)I can see the writing op is invoked 20386 times, which seem not related to the line count. Is there an output buffer size that I can configure? Or something like that?

This is the route:
<camel:route id="pager" autoStartup="true">
<camel:from
uri="file:///tmp/in?charset=Windows-1252&amp;move=${file:parent}/../paged/${file:name.noext}.paged.ack&amp;preMove=${file:name.noext}-${date:now:yyyyMMddHHmmssSSS}.${file:ext}<file:///\\tmp\in?charset=Windows-1252&amp;move=$%7bfile:parent%7d/../paged/$%7bfile:name.noext%7d.paged.ack&amp;preMove=$%7bfile:name.noext%7d-$%7bdate:now:yyyyMMddHHmmssSSS%7d.$%7bfile:ext%7d>" />
<camel:split streaming="true" parallelProcessing="false">
<camel:tokenize token="\n" />
<camel:to uri="bean:pager" />
<camel:to
uri="file:///tmp/paged?charset=utf8&amp;fileName=${file:name.noext}.paged&amp;fileExist=Append<file:///\\tmp\paged?charset=utf8&amp;fileName=$%7bfile:name.noext%7d.paged&amp;fileExist=Append>" />
</camel:split>
</camel:route>

This is the referenced bean:

<bean id="pager" class="cl.altiuz.reports.etl.TextProcessor">
<property name="xsltPath"
value="/Users/gonzalovasquez/Documents/workspace/altiuz-reports/reports-etl/xsl/pager.xsl" />
<property name="param" value="C.*PAG.* 1" />
</bean>

Camel versión is 2,10.1, and happens both on OSX & MS Windows, so I think isn't a platform dependent problem, but a configuration one.

Any ideas? Any thing else that I should send?

Thanks!

Gonzalo Vásquez Sáez
Gerente Investigación y Desarrollo (R&D)
Altiuz Soluciones Tecnológicas de Negocios Ltda.
Av. Nueva Tajamar 555 Of. 802, Las Condes
(56-2) 335 2461
gvasquez@altiuz.c<ma...@altiuz.com>l
http://www.altiuz.cl<http://www.altiuz.cl/>


[cid:image001.jpg@01CDBEC2.D8261640]
[cid:image002.png@01CDBEC2.D8261640]
[cid:image003.png@01CDBEC2.D8261640]
This e-mail and any files transmitted with it are for the sole use of the intended recipient(s) and may contain confidential and privileged information. If you are not the intended recipient(s), please reply to the sender and destroy all copies of the original message. Any unauthorized review, use, disclosure, dissemination, forwarding, printing or copying of this email, and/or any action taken in reliance on the contents of this e-mail is strictly prohibited and may be unlawful.

This e-mail and any files transmitted with it are for the sole use of the intended recipient(s) and may contain confidential and privileged information. If you are not the intended recipient(s), please reply to the sender and destroy all copies of the original message. Any unauthorized review, use, disclosure, dissemination, forwarding, printing or copying of this email, and/or any action taken in reliance on the contents of this e-mail is strictly prohibited and may be unlawful.

Re: Camel performance tuning

Posted by Gonzalo Vasquez <gv...@altiuz.cl>.
Thanks for your answer, my comments:

a) a 5M file could be loaded into memory, but I have streaming enabled as file size could be in the range of GB. Notwithstanding, I'll check what Hypersonic & Mongo are, as I'm not aware of them.
b) Parallel processing is set to false, because records must preserve order on the output file
c) Don't see the point here
d) See a)
e) what about async processing? There's no "long running process" here

Thanks again.-

Gonzalo Vásquez Sáez
Gerente Investigación y Desarrollo (R&D)
Altiuz Soluciones Tecnológicas de Negocios Ltda.
Av. Nueva Tajamar 555 Of. 802, Las Condes
(56-2) 335 2461
gvasquez@altiuz.cl
http://www.altiuz.cl
 



El 09-11-2012, a las 13:12, <Ra...@cognizant.com> escribió:

> I am really new to Camel but here are some options you can try
>  
> a)      Can you load the 5 MB file to memory before splitting it ? That way IO will not be a problem. Probably put it in something like Hypersonic or Mongo
> b)      Why is parallel  processing false ? Are the records related to each other ? If true you can take advantage of multicore
> c)       Is it possible to first split the files into chunks and then use process the chunks independently ?
> d)      Can you write into memory and flush at once ?
> e)      Sync/Asynch : http://camel.apache.org/async.html
>  
> From: Gonzalo Vasquez [mailto:gvasquez@altiuz.cl] 
> Sent: Friday, November 09, 2012 8:32 PM
> To: users@camel.apache.org
> Subject: Camel performance tuning
>  
> I'm running a route that basically adds a character per line to a plain text file, but it's taking to long, and it seems that it's due to some kind of buffering issue when reading/writing from disk.
>  
> I'm processing a 5MB file (attached as DC_FACCL132_0000 MORA_1075_16-10-2012_19-09-47_15.txt.zip), with the corresponding XSL template (also attached).
>  
> It's taking for ever to process such a file, I understand I'm tokenizing on line breaks, which could be the source of the problem as there are many lines in the file (48198 exactly), but when running jvisualvm (see attached images/snapshot)I can see the writing op is invoked 20386 times, which seem not related to the line count. Is there an output buffer size that I can configure? Or something like that?
>  
> This is the route:
> <camel:route id="pager" autoStartup="true">
> <camel:from
> uri="file:///tmp/in?charset=Windows-1252&amp;move=${file:parent}/../paged/${file:name.noext}.paged.ack&amp;preMove=${file:name.noext}-${date:now:yyyyMMddHHmmssSSS}.${file:ext}" />
> <camel:split streaming="true" parallelProcessing="false">
> <camel:tokenize token="\n" />
> <camel:to uri="bean:pager" />
> <camel:to
> uri="file:///tmp/paged?charset=utf8&amp;fileName=${file:name.noext}.paged&amp;fileExist=Append" />
> </camel:split>
> </camel:route>
>  
> This is the referenced bean:
>  
> <bean id="pager" class="cl.altiuz.reports.etl.TextProcessor">
> <property name="xsltPath"
> value="/Users/gonzalovasquez/Documents/workspace/altiuz-reports/reports-etl/xsl/pager.xsl" />
> <property name="param" value="C.*PAG.* 1" />
> </bean>
>  
> Camel versión is 2,10.1, and happens both on OSX & MS Windows, so I think isn't a platform dependent problem, but a configuration one.
>  
> Any ideas? Any thing else that I should send?
>  
> Thanks!
>  
> Gonzalo Vásquez Sáez
> Gerente Investigación y Desarrollo (R&D)
> Altiuz Soluciones Tecnológicas de Negocios Ltda.
> Av. Nueva Tajamar 555 Of. 802, Las Condes
> (56-2) 335 2461
> gvasquez@altiuz.cl
> http://www.altiuz.cl
>  
>  
> 
> 
> 
> This e-mail and any files transmitted with it are for the sole use of the intended recipient(s) and may contain confidential and privileged information. If you are not the intended recipient(s), please reply to the sender and destroy all copies of the original message. Any unauthorized review, use, disclosure, dissemination, forwarding, printing or copying of this email, and/or any action taken in reliance on the contents of this e-mail is strictly prohibited and may be unlawful.


RE: Camel performance tuning

Posted by Ra...@cognizant.com.
I am really new to Camel but here are some options you can try


a)      Can you load the 5 MB file to memory before splitting it ? That way IO will not be a problem. Probably put it in something like Hypersonic or Mongo

b)      Why is parallel  processing false ? Are the records related to each other ? If true you can take advantage of multicore

c)       Is it possible to first split the files into chunks and then use process the chunks independently ?

d)      Can you write into memory and flush at once ?

e)      Sync/Asynch : http://camel.apache.org/async.html

From: Gonzalo Vasquez [mailto:gvasquez@altiuz.cl]
Sent: Friday, November 09, 2012 8:32 PM
To: users@camel.apache.org
Subject: Camel performance tuning

I'm running a route that basically adds a character per line to a plain text file, but it's taking to long, and it seems that it's due to some kind of buffering issue when reading/writing from disk.

I'm processing a 5MB file (attached as DC_FACCL132_0000 MORA_1075_16-10-2012_19-09-47_15.txt.zip), with the corresponding XSL template (also attached).

It's taking for ever to process such a file, I understand I'm tokenizing on line breaks, which could be the source of the problem as there are many lines in the file (48198 exactly), but when running jvisualvm (see attached images/snapshot)I can see the writing op is invoked 20386 times, which seem not related to the line count. Is there an output buffer size that I can configure? Or something like that?

This is the route:
<camel:route id="pager" autoStartup="true">
<camel:from
uri="file:///tmp/in?charset=Windows-1252&amp;move=${file:parent}/../paged/${file:name.noext}.paged.ack&amp;preMove=${file:name.noext}-${date:now:yyyyMMddHHmmssSSS}.${file:ext}<file:///\\tmp\in?charset=Windows-1252&amp;move=$%7bfile:parent%7d/../paged/$%7bfile:name.noext%7d.paged.ack&amp;preMove=$%7bfile:name.noext%7d-$%7bdate:now:yyyyMMddHHmmssSSS%7d.$%7bfile:ext%7d>" />
<camel:split streaming="true" parallelProcessing="false">
<camel:tokenize token="\n" />
<camel:to uri="bean:pager" />
<camel:to
uri="file:///tmp/paged?charset=utf8&amp;fileName=${file:name.noext}.paged&amp;fileExist=Append<file:///\\tmp\paged?charset=utf8&amp;fileName=$%7bfile:name.noext%7d.paged&amp;fileExist=Append>" />
</camel:split>
</camel:route>

This is the referenced bean:

<bean id="pager" class="cl.altiuz.reports.etl.TextProcessor">
<property name="xsltPath"
value="/Users/gonzalovasquez/Documents/workspace/altiuz-reports/reports-etl/xsl/pager.xsl" />
<property name="param" value="C.*PAG.* 1" />
</bean>

Camel versión is 2,10.1, and happens both on OSX & MS Windows, so I think isn't a platform dependent problem, but a configuration one.

Any ideas? Any thing else that I should send?

Thanks!

Gonzalo Vásquez Sáez
Gerente Investigación y Desarrollo (R&D)
Altiuz Soluciones Tecnológicas de Negocios Ltda.
Av. Nueva Tajamar 555 Of. 802, Las Condes
(56-2) 335 2461
gvasquez@altiuz.c<ma...@altiuz.com>l
http://www.altiuz.cl<http://www.altiuz.cl/>


[cid:image001.jpg@01CDBEC2.D8261640]
[cid:image002.png@01CDBEC2.D8261640]
[cid:image003.png@01CDBEC2.D8261640]
This e-mail and any files transmitted with it are for the sole use of the intended recipient(s) and may contain confidential and privileged information. If you are not the intended recipient(s), please reply to the sender and destroy all copies of the original message. Any unauthorized review, use, disclosure, dissemination, forwarding, printing or copying of this email, and/or any action taken in reliance on the contents of this e-mail is strictly prohibited and may be unlawful.

Re: Camel performance tuning

Posted by Claus Ibsen <cl...@gmail.com>.
Hi

Okay I have added some tests to Camel to see the performance for such use-case.
https://svn.apache.org/repos/asf/camel/trunk/camel-core/src/test/java/org/apache/camel/component/file/stress/

This test uses the Camel file producer
https://svn.apache.org/repos/asf/camel/trunk/camel-core/src/test/java/org/apache/camel/component/file/stress/FileProducerAppendManyMessagesTest.java

And this uses java code by reusing the same output stream for the writes
https://svn.apache.org/repos/asf/camel/trunk/camel-core/src/test/java/org/apache/camel/component/file/stress/FileProducerAppendManyMessagesFastTest.java

The latter is much faster, as it uses the same stream for all the writes.
The former is slow because a new stream is obtained, and positioned to
the end of the file, and so forth.




On Fri, Nov 9, 2012 at 4:01 PM, Gonzalo Vasquez <gv...@altiuz.cl> wrote:
> I'm running a route that basically adds a character per line to a plain text
> file, but it's taking to long, and it seems that it's due to some kind of
> buffering issue when reading/writing from disk.
>
> I'm processing a 5MB file (attached as DC_FACCL132_0000
> MORA_1075_16-10-2012_19-09-47_15.txt.zip), with the corresponding XSL
> template (also attached).
>
> It's taking for ever to process such a file, I understand I'm tokenizing on
> line breaks, which could be the source of the problem as there are many
> lines in the file (48198 exactly), but when running jvisualvm (see attached
> images/snapshot)I can see the writing op is invoked 20386 times, which seem
> not related to the line count. Is there an output buffer size that I can
> configure? Or something like that?
>
> This is the route:
> <camel:route id="pager" autoStartup="true">
> <camel:from
> uri="file:///tmp/in?charset=Windows-1252&amp;move=${file:parent}/../paged/${file:name.noext}.paged.ack&amp;preMove=${file:name.noext}-${date:now:yyyyMMddHHmmssSSS}.${file:ext}"
> />
> <camel:split streaming="true" parallelProcessing="false">
> <camel:tokenize token="\n" />
> <camel:to uri="bean:pager" />
> <camel:to
> uri="file:///tmp/paged?charset=utf8&amp;fileName=${file:name.noext}.paged&amp;fileExist=Append"
> />
> </camel:split>
> </camel:route>
>
> This is the referenced bean:
>
> <bean id="pager" class="cl.altiuz.reports.etl.TextProcessor">
> <property name="xsltPath"
> value="/Users/gonzalovasquez/Documents/workspace/altiuz-reports/reports-etl/xsl/pager.xsl"
> />
> <property name="param" value="C.*PAG.* 1" />
> </bean>
>
> Camel versión is 2,10.1, and happens both on OSX & MS Windows, so I think
> isn't a platform dependent problem, but a configuration one.
>
> Any ideas? Any thing else that I should send?
>
> Thanks!
>
> Gonzalo Vásquez Sáez
> Gerente Investigación y Desarrollo (R&D)
> Altiuz Soluciones Tecnológicas de Negocios Ltda.
> Av. Nueva Tajamar 555 Of. 802, Las Condes
> (56-2) 335 2461
> gvasquez@altiuz.cl
> http://www.altiuz.cl
>
>
>
>
>
>
>
>



-- 
Claus Ibsen
-----------------
Red Hat, Inc.
FuseSource is now part of Red Hat
Email: cibsen@redhat.com
Web: http://fusesource.com
Twitter: davsclaus
Blog: http://davsclaus.com
Author of Camel in Action: http://www.manning.com/ibsen