You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@camel.apache.org by Vic <vi...@seznam.cz> on 2013/07/18 04:28:14 UTC

Message Processing Performance while splitting

I am processing big file - line by line with camel. The average amount of
processed messages per second is 30k. When I do the same in java using
BufferedReader - the average amount of processed messages per second is
500k. I am processing the same file. This is significant performance lost.
Am I doing something wrong in Camel?

camel route :

from("file:C:/Test?fileName=test_file.txt&noop=true")
.split().tokenize("\n").streaming()
.to("log:INFO?groupSize=10000");

java code:

FileReader fr = new FileReader("C:/Test/test_file.txt");
BufferedReader br = new BufferedReader(fr);
		
long count = 0;
long start = System.currentTimeMillis();
while(br.readLine() != null)
{
	count++;
	if(count % 10000 == 0)
	{
		long now = System.currentTimeMillis();
		long msgPerSecond = 1000*count/(now-start);
		System.out.println(msgPerSecond);
	}
}

br.close();



--
View this message in context: http://camel.465427.n5.nabble.com/Message-Processing-Performance-while-splitting-tp5735824.html
Sent from the Camel - Users mailing list archive at Nabble.com.

Re: Message Processing Performance while splitting

Posted by Willem Jiang <wi...@gmail.com>.

When you add more processors into the route, camel need to do lots of
addition work to handle the message.
If you want a good performance, you should use the Java code to do the loop
in a single method.
But it could be good method for us find the hot spots  the performance
those camel introduces.


Willem Jiang

Red Hat, Inc.
FuseSource is now part of Red Hat
Web: http://www.fusesource.com | http://www.redhat.com
Blog: http://willemjiang.blogspot.com (http://willemjiang.blogspot.com/)
(English)
          http://jnn.iteye.com (http://jnn.javaeye.com/) (Chinese)
Twitter: willemjiang
Weibo: 姜宁willem


On Mon, Jul 22, 2013 at 4:29 AM, Claus Ibsen <cl...@gmail.com> wrote:

> Hi
>
> You cant really compare the 2 approaches.
> In the pure java code you just have a for loop in a single method
> which is as fast as you can go.
> When using Camel routes and the EIPs then a lot more goes on under the
> hood.
>
>
> On Sun, Jul 21, 2013 at 10:12 PM, Viktor Kubinec <vi...@seznam.cz>
> wrote:
> > Thanks Willem.
> >
> > I've created custom splitter, which is using BufferedReader, just as you
> > suggested. It has improved the performance a bit but it is still much
> slower
> > (10-20x) than the java code that I posted. I have spent bit more time
> > investigating this problem. The code (with documentation on how to use
> it)
> > is available here :
> >
> > https://github.com/Viktor-Kubinec/Camel
> >
> > I am using Apache Camel verison 2.11.1. My test file is ~1Gb large,
> contains
> > 100M rows with message "Test Line" in each row. I have two testing
> scenarios
> > :
> >
> > 1. SplitTest - I just read big file containing short lines, split it
> line by
> > line with my custom splitter and logging the process (every 50k-th
> message)
> >
> > 2. Same as 1. but I've added 10 processors that do nothing (because I
> > observed that adding processors reduces performance)
> >
> > Results :
> >
> > java code : 1.5-2M lines per second - no profiling done
> >
> > 1. 150k lines per second and here are some hot spots (from VisualVM) :
> >
> > java.lang.Class.getSimpleName()  (57.3%)
> > org.apache.camel.processor.MulticastProcessor.doAggregate()  (39.2%)
> > java.io.BufferedReader.readLine()        (3.1%)
> > ...
> >
> > 2. 72k lines per second :
> >
> > org.apache.camel.processor.MulticastProcessor.doAggregate()   (70.5%)
> > java.lang.Class.getSimpleName()  (28.8%)
> > java.io.BufferedReader.readLine()  (0.3%)
> > ...
> >
> > I have saved snapshots from VisualVM - they are available in the github
> > repository (link above) : SplitTest.nps, SplitTestWithProcessors.nps.
> More
> > details are available there.
> >
> > This was done on my laptop. I've read that VisualVM has some effect on
> > performance (~5%). java.lang.Class.getSimpleName() is being called in
> > constructor of org.apache.camel.impl.DefaultUnitOfWork. In first scenario
> > only 3.1% of thread time was spent in the readLine() method. In second
> > scenario it was only 0.3%. That is very low.
> >
> > Is there a way how I can optimize this?
> >
> > Thanks,
> >
> > Viktor
> >
> >
> >
> >
> >
> > --
> > View this message in context:
> http://camel.465427.n5.nabble.com/Message-Processing-Performance-while-splitting-tp5735824p5735979.html
> > Sent from the Camel - Users mailing list archive at Nabble.com.
>
>
>
> --
> Claus Ibsen
> -----------------
> Red Hat, Inc.
> Email: cibsen@redhat.com
> Twitter: davsclaus
> Blog: http://davsclaus.com
> Author of Camel in Action: http://www.manning.com/ibsen
>

Re: Message Processing Performance while splitting

Posted by Claus Ibsen <cl...@gmail.com>.

Hi

You cant really compare the 2 approaches.
In the pure java code you just have a for loop in a single method
which is as fast as you can go.
When using Camel routes and the EIPs then a lot more goes on under the hood.


On Sun, Jul 21, 2013 at 10:12 PM, Viktor Kubinec <vi...@seznam.cz> wrote:
> Thanks Willem.
>
> I've created custom splitter, which is using BufferedReader, just as you
> suggested. It has improved the performance a bit but it is still much slower
> (10-20x) than the java code that I posted. I have spent bit more time
> investigating this problem. The code (with documentation on how to use it)
> is available here :
>
> https://github.com/Viktor-Kubinec/Camel
>
> I am using Apache Camel verison 2.11.1. My test file is ~1Gb large, contains
> 100M rows with message "Test Line" in each row. I have two testing scenarios
> :
>
> 1. SplitTest - I just read big file containing short lines, split it line by
> line with my custom splitter and logging the process (every 50k-th message)
>
> 2. Same as 1. but I've added 10 processors that do nothing (because I
> observed that adding processors reduces performance)
>
> Results :
>
> java code : 1.5-2M lines per second - no profiling done
>
> 1. 150k lines per second and here are some hot spots (from VisualVM) :
>
> java.lang.Class.getSimpleName()  (57.3%)
> org.apache.camel.processor.MulticastProcessor.doAggregate()  (39.2%)
> java.io.BufferedReader.readLine()        (3.1%)
> ...
>
> 2. 72k lines per second :
>
> org.apache.camel.processor.MulticastProcessor.doAggregate()   (70.5%)
> java.lang.Class.getSimpleName()  (28.8%)
> java.io.BufferedReader.readLine()  (0.3%)
> ...
>
> I have saved snapshots from VisualVM - they are available in the github
> repository (link above) : SplitTest.nps, SplitTestWithProcessors.nps. More
> details are available there.
>
> This was done on my laptop. I've read that VisualVM has some effect on
> performance (~5%). java.lang.Class.getSimpleName() is being called in
> constructor of org.apache.camel.impl.DefaultUnitOfWork. In first scenario
> only 3.1% of thread time was spent in the readLine() method. In second
> scenario it was only 0.3%. That is very low.
>
> Is there a way how I can optimize this?
>
> Thanks,
>
> Viktor
>
>
>
>
>
> --
> View this message in context: http://camel.465427.n5.nabble.com/Message-Processing-Performance-while-splitting-tp5735824p5735979.html
> Sent from the Camel - Users mailing list archive at Nabble.com.



-- 
Claus Ibsen
-----------------
Red Hat, Inc.
Email: cibsen@redhat.com
Twitter: davsclaus
Blog: http://davsclaus.com
Author of Camel in Action: http://www.manning.com/ibsen

Re: Message Processing Performance while splitting

Posted by Viktor Kubinec <vi...@seznam.cz>.

Thanks Willem.

I've created custom splitter, which is using BufferedReader, just as you
suggested. It has improved the performance a bit but it is still much slower
(10-20x) than the java code that I posted. I have spent bit more time
investigating this problem. The code (with documentation on how to use it)
is available here :

https://github.com/Viktor-Kubinec/Camel

I am using Apache Camel verison 2.11.1. My test file is ~1Gb large, contains
100M rows with message "Test Line" in each row. I have two testing scenarios
:

1. SplitTest - I just read big file containing short lines, split it line by
line with my custom splitter and logging the process (every 50k-th message)

2. Same as 1. but I've added 10 processors that do nothing (because I
observed that adding processors reduces performance)

Results : 

java code : 1.5-2M lines per second - no profiling done

1. 150k lines per second and here are some hot spots (from VisualVM) :

java.lang.Class.getSimpleName()	 (57.3%)	
org.apache.camel.processor.MulticastProcessor.doAggregate()  (39.2%)
java.io.BufferedReader.readLine()	 (3.1%)
...

2. 72k lines per second :

org.apache.camel.processor.MulticastProcessor.doAggregate()   (70.5%)
java.lang.Class.getSimpleName()  (28.8%)
java.io.BufferedReader.readLine()  (0.3%)
...

I have saved snapshots from VisualVM - they are available in the github
repository (link above) : SplitTest.nps, SplitTestWithProcessors.nps. More
details are available there.

This was done on my laptop. I've read that VisualVM has some effect on
performance (~5%). java.lang.Class.getSimpleName() is being called in
constructor of org.apache.camel.impl.DefaultUnitOfWork. In first scenario
only 3.1% of thread time was spent in the readLine() method. In second
scenario it was only 0.3%. That is very low.

Is there a way how I can optimize this?

Thanks,

Viktor





--
View this message in context: http://camel.465427.n5.nabble.com/Message-Processing-Performance-while-splitting-tp5735824p5735979.html
Sent from the Camel - Users mailing list archive at Nabble.com.

Re: Message Processing Performance while splitting

Posted by Willem jiang <wi...@gmail.com>.

Hi ,

Camel is using java.util.Scanner for splitting the input stream by using token "\n".  
So it makes sense that it is slower then using the BufferedReader to read the file.

You can read the file yourself by implementing a customer splitter just like the ZipFile does[1]

[1]https://issues.apache.org/jira/browse/CAMEL-6139  

--  
Willem Jiang

Red Hat, Inc.
FuseSource is now part of Red Hat
Web: http://www.fusesource.com | http://www.redhat.com
Blog: http://willemjiang.blogspot.com (http://willemjiang.blogspot.com/) (English)
          http://jnn.iteye.com (http://jnn.javaeye.com/) (Chinese)
Twitter: willemjiang  
Weibo: 姜宁willem





On Thursday, July 18, 2013 at 10:28 AM, Vic wrote:

> I am processing big file - line by line with camel. The average amount of
> processed messages per second is 30k. When I do the same in java using
> BufferedReader - the average amount of processed messages per second is
> 500k. I am processing the same file. This is significant performance lost.
> Am I doing something wrong in Camel?
>  
> camel route :
>  
> from("file:C:/Test?fileName=test_file.txt&noop=true")
> .split().tokenize("\n").streaming()
> .to("log:INFO?groupSize=10000");
>  
> java code:
>  
> FileReader fr = new FileReader("C:/Test/test_file.txt");
> BufferedReader br = new BufferedReader(fr);
>  
> long count = 0;
> long start = System.currentTimeMillis();
> while(br.readLine() != null)
> {
> count++;
> if(count % 10000 == 0)
> {
> long now = System.currentTimeMillis();
> long msgPerSecond = 1000*count/(now-start);
> System.out.println(msgPerSecond);
> }
> }
>  
> br.close();
>  
>  
>  
> --
> View this message in context: http://camel.465427.n5.nabble.com/Message-Processing-Performance-while-splitting-tp5735824.html
> Sent from the Camel - Users mailing list archive at Nabble.com (http://Nabble.com).