You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@servicemix.apache.org by "vs.souza" <vs...@gmail.com> on 2012/02/15 14:29:10 UTC

Performance issues to load csv file

Hello fellows,

I created a bundle to load a CSV file, validate some information and send it
to an ActiveMQ queue. The problem is that my CSV file has 30.000.000
registers and about 1.2 GB and my bundle is taking about 12 hours to load
750.000 itens to the queue. My ServiceMix is 4.4.3 and I am loading the file
woking wit camel and bindy. Bellow I post my camel-context.xml and my bindy
class:

camel-context.xml:

<?xml version="1.0" encoding="UTF-8"?>

<beans xmlns="http://www.springframework.org/schema/beans"
       xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
       xmlns:camel="http://camel.apache.org/schema/spring"
       xsi:schemaLocation="
       http://www.springframework.org/schema/beans
http://www.springframework.org/schema/beans/spring-beans.xsd
       http://camel.apache.org/schema/spring
http://camel.apache.org/schema/spring/camel-spring.xsd">

  <bean  id="bindyDataFormat"
class="org.apache.camel.dataformat.bindy.csv.BindyCsvDataFormat">
    <constructor-arg value="com.test.integration.camel.spring.poc.file"/>
  </bean>
  
  <camel:camelContext xmlns="http://camel.apache.org/schema/spring">
    <camel:package>com.test.integration.camel.spring.poc</camel:package>
    <camel:route>
      <camel:from
uri="file:/home/jedimaster/Java-Env/Sandbox/From?delete=true"/>
	  <camel:log message="Started unmarshalling file ${file:name} at
${date:now:hh:MM:ss.SSS}..."/>
      <camel:split streaming="true">
      	<camel:tokenize token="\n"/>
      	<camel:unmarshal ref="bindyDataFormat"/>
      	<camel:to uri="activemq:filemove-events"/>
      </camel:split>
      <camel:log message="Finished unmarshalling file ${file:name} at
${date:now:hh:MM:ss.SSS}..."/>
    </camel:route>
    
    
  </camel:camelContext>

</beans>

bindy bean:

package com.test.integration.camel.spring.poc.file;

import java.io.Serializable;

import org.apache.camel.dataformat.bindy.annotation.CsvRecord;
import org.apache.camel.dataformat.bindy.annotation.DataField;

@CsvRecord(separator=",", quote="\"")
public class CSVEventRecordBean implements Serializable{

	private static final long serialVersionUID = -8806841912643394977L;

	@DataField(pos=1)
	private String eventDate;
	
	@DataField(pos=2)
	private String userId;
	
	@DataField(pos=3)
	private String systemId;

	public String getEventDate() {
		return eventDate;
	}

	public void setEventDate(String eventDate) {
		this.eventDate = eventDate;
	}

	public String getUserId() {
		return userId;
	}

	public void setUserId(String userId) {
		this.userId = userId;
	}

	public String getSystemId() {
		return "Bean Generated: " + systemId;
	}

	public void setSystemId(String systemId) {
		this.systemId = systemId;
	}
	
}


How can I improve the performance considerably to make it faster? Do you
have any suggestions?

Cheers.

--
View this message in context: http://servicemix.396122.n5.nabble.com/Performance-issues-to-load-csv-file-tp5486044p5486044.html
Sent from the ServiceMix - User mailing list archive at Nabble.com.

Re: Performance issues to load csv file

Posted by "vs.souza" <vs...@gmail.com>.
I'll try ActiveMQ forums to check if there is something to optmize and make
it faster. Thanks for the help through all this time, my friend.

--
View this message in context: http://servicemix.396122.n5.nabble.com/Performance-issues-to-load-csv-file-tp5486044p5489449.html
Sent from the ServiceMix - User mailing list archive at Nabble.com.

Re: Performance issues to load csv file

Posted by "ajs6f@virginia.edu" <aj...@virginia.edu>.
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

It seems unlikely to me. KahaDB is optimized just for the task it performs. A general-purpose SQL database would almost certainly be slower.

It may not be that persistence is the bottleneck, however. If you're sure that the MQ section of your workflow is where you're hitting the speedbump, I suggest taking this to the ActiveMQ mailing list, where you'll be putting your issue in front of specialists.

- ---
A. Soroka
Software & Systems Engineering
Online Library Environment
the University of Virginia Library



On Feb 15, 2012, at 2:38 PM, vs.souza wrote:

> Actually I need the persistence feature. Do you think it would help to
> configure an external db (like MySQL) for persistence would help? 
> 
> Thanks for the help up to the moment, my friend.
> 
> --
> View this message in context: http://servicemix.396122.n5.nabble.com/Performance-issues-to-load-csv-file-tp5486044p5487149.html
> Sent from the ServiceMix - User mailing list archive at Nabble.com.

-----BEGIN PGP SIGNATURE-----
Version: GnuPG/MacGPG2 v2.0.17 (Darwin)
Comment: GPGTools - http://gpgtools.org

iQEcBAEBAgAGBQJPPC26AAoJEATpPYSyaoIkrroH/1kF1OxFN+8JIxj38xbLv5EC
aPbmeE31uUyU2UuPZS3TpWcBTTDOMCWF7YEHQ+xPxuWyJ6DI3aA88WYTNTraIEVX
hVasJUBK9fcuudSsNaK5PHjy6Ld9yHnzo9WTzwZmplRqfauO0YDDrT53W4uDOTAR
rmQlADpMiEF1OE4Avd/TQPajRuZ7Hv4/usRfdCeUipHRtwCBH0KvnGyMcBTdcdr/
Y4pPwqqNE3zO2IsbldaA2pocekvDFuhSKjswrfaOzlNrwyvqZQUThM04MdIVMp07
23PjeAsfF7HTR9MEalKLBqZAMP2+VYRmNfxlyJ5QP9jpjyvLjpY4HKzFQ2kqc1w=
=OFjD
-----END PGP SIGNATURE-----

Re: Performance issues to load csv file

Posted by "vs.souza" <vs...@gmail.com>.
Actually I need the persistence feature. Do you think it would help to
configure an external db (like MySQL) for persistence would help? 

Thanks for the help up to the moment, my friend.

--
View this message in context: http://servicemix.396122.n5.nabble.com/Performance-issues-to-load-csv-file-tp5486044p5487149.html
Sent from the ServiceMix - User mailing list archive at Nabble.com.

Re: Performance issues to load csv file

Posted by "ajs6f@virginia.edu" <aj...@virginia.edu>.
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

One step you may wish to consider would be to go to a "no persistence" mode for your broker:

https://activemq.apache.org/persistence.html#Persistence-DisablingPersistence

This would mean that your running broker wouldn't need to touch the disk at all, which could nothing but speed up your task. On the other hand, it may not help that much and it means that you lose the security and assurance of a persisted message store. If your workflow has other ways to afford itself those qualities, it may be worth a try.

- ---
A. Soroka
Software & Systems Engineering
Online Library Environment
the University of Virginia Library



On Feb 15, 2012, at 1:24 PM, vs.souza wrote:

> yes... and the default configuration for the broker that comes embedded with
> Servicemix 4.4.0.
> 
> --
> View this message in context: http://servicemix.396122.n5.nabble.com/Performance-issues-to-load-csv-file-tp5486044p5486938.html
> Sent from the ServiceMix - User mailing list archive at Nabble.com.

-----BEGIN PGP SIGNATURE-----
Version: GnuPG/MacGPG2 v2.0.17 (Darwin)
Comment: GPGTools - http://gpgtools.org

iQEcBAEBAgAGBQJPO/qgAAoJEATpPYSyaoIknGQIALqZIJENIdnrF058alGYqmX4
BZ5TErkhTizrY7oqtxKex4SMGriGgvQO5VGFVyhXGNeiTNTZne7q8jbGhHDdcX3I
/k8ExJAdNiCO3lPtyqgEM+G084tGwJsvSoFZLF7UdwkITgb4Ng/LvYowASH93oLa
KlHiqY2BsSXxRDsYZilnhb3rgKgB60W86mJ477d1Mo7iEujowOCDKgLCYHxTpKXL
ttJ8+V/CLPDB1oMw8irVz29tZUL3ZL0kCVf2S7ASeWMvyZr8xIe3a0G3D3EvrVB2
mIMA5Wwh8QQ3dR//Bq8XkbNGML6yMIxpc/IfhFwTDtK047RU2qLFjJPACrPfYYU=
=6YgH
-----END PGP SIGNATURE-----

Re: Performance issues to load csv file

Posted by "vs.souza" <vs...@gmail.com>.
yes... and the default configuration for the broker that comes embedded with
Servicemix 4.4.0.

--
View this message in context: http://servicemix.396122.n5.nabble.com/Performance-issues-to-load-csv-file-tp5486044p5486938.html
Sent from the ServiceMix - User mailing list archive at Nabble.com.

Re: Performance issues to load csv file

Posted by "ajs6f@virginia.edu" <aj...@virginia.edu>.
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

- -----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Are you using the default persistence configuration (KahaDB)?

- - ---
A. Soroka
Software & Systems Engineering
Online Library Environment
the University of Virginia Library



On Feb 15, 2012, at 12:26 PM, vs.souza wrote:

> The problem is about ActiveMQ performance to receive the data. I tested it
> without jms and it took 8 seconds to unmarshal 100.000 records from csv file
> into the memory. :-S
> 
> Any ideias about how to optimize my Broker?
> 
> cheers.
> 
> --
> View this message in context: http://servicemix.396122.n5.nabble.com/Performance-issues-to-load-csv-file-tp5486044p5486759.html
> Sent from the ServiceMix - User mailing list archive at Nabble.com.

- -----BEGIN PGP SIGNATURE-----
Version: GnuPG/MacGPG2 v2.0.17 (Darwin)
Comment: GPGTools - http://gpgtools.org

iQEcBAEBAgAGBQJPO+voAAoJEATpPYSyaoIkl+gH/jAyj2yQhXtgsispBSAXtn9h
E+wcqRwnuaW8EeSQ7fbHyjROhG8tBP/DwVtgZYQuFv4yU/+hLY3n7RVrXUU4wa/i
3dyAFi8DxITRG7B4At0AMVyDM499v+mQvtanyC4fo70nrNuYRLaj+9yHGpsvoh2R
bNXFxM/nFXNYxCcFi3GvWdMIz1jOjseAhBqAfkNWnSGh9YB6lyDGAq8O6WkusKiK
hI0B5NIvaz/mmLm8aPLL8chO1xal0+DtddPKbkkIJBYlLabvLnx7Z0Bpc3ZtWc3i
a2C2zlpwQ+vF2VLRts+9sUZ1p78/Q68TiwV/hk4YJ+RCai1QtERX8hw0Hwndbak=
=1HRp
- -----END PGP SIGNATURE-----
-----BEGIN PGP SIGNATURE-----
Version: GnuPG/MacGPG2 v2.0.17 (Darwin)
Comment: GPGTools - http://gpgtools.org

iQEcBAEBAgAGBQJPO+w/AAoJEATpPYSyaoIkHF4H/3FGGi6T2FJQWKlrRSfmj15s
MjQrdHPgej9tDpctLHkb2ReQsBof93l9TSXmFulLPpU/6D7YGeruSEoAHccIO+7Q
XVt/fHS4dW9RuIp0/M3X723v/PjWY0XSXxrUBk7fqmdbNng10xsbj3AVlYNFu+7n
0z04ZuzuPc4QXoopAOnj/WChRgEbKtA6RxNOQIqW4TKfOUWA0t50U+8Daji+inZe
+UUAEJg1hhTKtZrz9y9jZihTUhd04NJCftPlFihcL1U0k8sQKs+DuCt9S6XefCfI
5D7MQPfLLaMBm0mSmegEjumWzXe7w/yOiQrqtx1hWcA3bEG19ocCKGnSM6owaSE=
=+YQo
-----END PGP SIGNATURE-----

Re: Performance issues to load csv file

Posted by "ajs6f@virginia.edu" <aj...@virginia.edu>.
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Are you using the default persistence configuration (KahaDB)?

- ---
A. Soroka
Software & Systems Engineering
Online Library Environment
the University of Virginia Library



On Feb 15, 2012, at 12:26 PM, vs.souza wrote:

> The problem is about ActiveMQ performance to receive the data. I tested it
> without jms and it took 8 seconds to unmarshal 100.000 records from csv file
> into the memory. :-S
> 
> Any ideias about how to optimize my Broker?
> 
> cheers.
> 
> --
> View this message in context: http://servicemix.396122.n5.nabble.com/Performance-issues-to-load-csv-file-tp5486044p5486759.html
> Sent from the ServiceMix - User mailing list archive at Nabble.com.

-----BEGIN PGP SIGNATURE-----
Version: GnuPG/MacGPG2 v2.0.17 (Darwin)
Comment: GPGTools - http://gpgtools.org

iQEcBAEBAgAGBQJPO+voAAoJEATpPYSyaoIkl+gH/jAyj2yQhXtgsispBSAXtn9h
E+wcqRwnuaW8EeSQ7fbHyjROhG8tBP/DwVtgZYQuFv4yU/+hLY3n7RVrXUU4wa/i
3dyAFi8DxITRG7B4At0AMVyDM499v+mQvtanyC4fo70nrNuYRLaj+9yHGpsvoh2R
bNXFxM/nFXNYxCcFi3GvWdMIz1jOjseAhBqAfkNWnSGh9YB6lyDGAq8O6WkusKiK
hI0B5NIvaz/mmLm8aPLL8chO1xal0+DtddPKbkkIJBYlLabvLnx7Z0Bpc3ZtWc3i
a2C2zlpwQ+vF2VLRts+9sUZ1p78/Q68TiwV/hk4YJ+RCai1QtERX8hw0Hwndbak=
=1HRp
-----END PGP SIGNATURE-----

Re: Performance issues to load csv file

Posted by "vs.souza" <vs...@gmail.com>.
The problem is about ActiveMQ performance to receive the data. I tested it
without jms and it took 8 seconds to unmarshal 100.000 records from csv file
into the memory. :-S

Any ideias about how to optimize my Broker?

cheers.

--
View this message in context: http://servicemix.396122.n5.nabble.com/Performance-issues-to-load-csv-file-tp5486044p5486759.html
Sent from the ServiceMix - User mailing list archive at Nabble.com.

Re: Performance issues to load csv file

Posted by "vs.souza" <vs...@gmail.com>.
What a problem.

Adding the parallelProcessing attribute to split didn't make a difference.
Bellow I show the time it took to send 10.000 registers to the queue in both
cases:

10.000 registers without parallel:
13:54:50.787
14:04:35.646

10.000 registers with parallel:
15:00:28.563
15:10:22.766

Any other ideas? Could it be performance issue on JMS side?

Cheers.

--
View this message in context: http://servicemix.396122.n5.nabble.com/Performance-issues-to-load-csv-file-tp5486044p5486726.html
Sent from the ServiceMix - User mailing list archive at Nabble.com.

Re: Performance issues to load csv file

Posted by "vs.souza" <vs...@gmail.com>.
Thanks for the tip, my friend. I will try that and come back here with more
information about what happened.

Thanks again.

--
View this message in context: http://servicemix.396122.n5.nabble.com/Performance-issues-to-load-csv-file-tp5486044p5486279.html
Sent from the ServiceMix - User mailing list archive at Nabble.com.

Re: Performance issues to load csv file

Posted by "ajs6f@virginia.edu" <aj...@virginia.edu>.
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Just a shot in the dark here, but in the Camel Splitter doc:

https://camel.apache.org/splitter.html

parallelProcessing is marked as default off. In other words, could it be that you have each record being processed sequentially and are therefore not taking any advantage of multithreading? Perhaps you could try

<camel:split streaming="true" parallelProcessing="true">

and see if there is any improvement. If there is, you could then take advantage of the executorServiceRef property to set up a custom threadpool to really tune your use of processing resources.

- ---
A. Soroka
Software & Systems Engineering
Online Library Environment
the University of Virginia Library



On Feb 15, 2012, at 8:29 AM, vs.souza wrote:

> Hello fellows,
> 
> I created a bundle to load a CSV file, validate some information and send it
> to an ActiveMQ queue. The problem is that my CSV file has 30.000.000
> registers and about 1.2 GB and my bundle is taking about 12 hours to load
> 750.000 itens to the queue. My ServiceMix is 4.4.3 and I am loading the file
> woking wit camel and bindy. Bellow I post my camel-context.xml and my bindy
> class:
> 
> camel-context.xml:
> 
> <?xml version="1.0" encoding="UTF-8"?>
> 
> <beans xmlns="http://www.springframework.org/schema/beans"
>       xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
>       xmlns:camel="http://camel.apache.org/schema/spring"
>       xsi:schemaLocation="
>       http://www.springframework.org/schema/beans
> http://www.springframework.org/schema/beans/spring-beans.xsd
>       http://camel.apache.org/schema/spring
> http://camel.apache.org/schema/spring/camel-spring.xsd">
> 
>  <bean  id="bindyDataFormat"
> class="org.apache.camel.dataformat.bindy.csv.BindyCsvDataFormat">
>    <constructor-arg value="com.test.integration.camel.spring.poc.file"/>
>  </bean>
> 
>  <camel:camelContext xmlns="http://camel.apache.org/schema/spring">
>    <camel:package>com.test.integration.camel.spring.poc</camel:package>
>    <camel:route>
>      <camel:from
> uri="file:/home/jedimaster/Java-Env/Sandbox/From?delete=true"/>
> 	  <camel:log message="Started unmarshalling file ${file:name} at
> ${date:now:hh:MM:ss.SSS}..."/>
>      <camel:split streaming="true">
>      	<camel:tokenize token="\n"/>
>      	<camel:unmarshal ref="bindyDataFormat"/>
>      	<camel:to uri="activemq:filemove-events"/>
>      </camel:split>
>      <camel:log message="Finished unmarshalling file ${file:name} at
> ${date:now:hh:MM:ss.SSS}..."/>
>    </camel:route>
> 
> 
>  </camel:camelContext>
> 
> </beans>
> 
> bindy bean:
> 
> package com.test.integration.camel.spring.poc.file;
> 
> import java.io.Serializable;
> 
> import org.apache.camel.dataformat.bindy.annotation.CsvRecord;
> import org.apache.camel.dataformat.bindy.annotation.DataField;
> 
> @CsvRecord(separator=",", quote="\"")
> public class CSVEventRecordBean implements Serializable{
> 
> 	private static final long serialVersionUID = -8806841912643394977L;
> 
> 	@DataField(pos=1)
> 	private String eventDate;
> 	
> 	@DataField(pos=2)
> 	private String userId;
> 	
> 	@DataField(pos=3)
> 	private String systemId;
> 
> 	public String getEventDate() {
> 		return eventDate;
> 	}
> 
> 	public void setEventDate(String eventDate) {
> 		this.eventDate = eventDate;
> 	}
> 
> 	public String getUserId() {
> 		return userId;
> 	}
> 
> 	public void setUserId(String userId) {
> 		this.userId = userId;
> 	}
> 
> 	public String getSystemId() {
> 		return "Bean Generated: " + systemId;
> 	}
> 
> 	public void setSystemId(String systemId) {
> 		this.systemId = systemId;
> 	}
> 	
> }
> 
> 
> How can I improve the performance considerably to make it faster? Do you
> have any suggestions?
> 
> Cheers.
> 
> --
> View this message in context: http://servicemix.396122.n5.nabble.com/Performance-issues-to-load-csv-file-tp5486044p5486044.html
> Sent from the ServiceMix - User mailing list archive at Nabble.com.

-----BEGIN PGP SIGNATURE-----
Version: GnuPG/MacGPG2 v2.0.17 (Darwin)
Comment: GPGTools - http://gpgtools.org

iQEcBAEBAgAGBQJPO7vAAAoJEATpPYSyaoIkr+oH/1jdTe3ueq38+C3/wy0GCX2E
QZRDl7YQYuDpS7eQADTMRxb0tJyqTOaLXnmLaiVpCyymdJOv2hvlZa0tFegdXqMv
qnE4czUTe5zJi/KXxnrsVuS1yIJyRK7E8IXurkTPecb8c1eRiHMCE1247jQ6LPcG
0/3e5SLGQzz4/0p4wob1f4eJGn7+u4OIFYirzUMMaUX7XZ3IwnmVrntCU6oTfAkB
Xqa8PevjGB/l34GCHHfV2LhAhRgIyIx1vK8SJja748Flkn18NeLGIkUXfsz9gyXM
fRtVxLVMPXklbdUvQUhtfoqgEI7cJ7v/PfAleFlEySeC7bvphYveWbdpcMuy53A=
=yZLh
-----END PGP SIGNATURE-----