You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@camel.apache.org by vcheruvu <vi...@macquarie.com> on 2010/02/01 23:58:56 UTC

Performance - Camel JPA

I have written an application using camel-jpa to extract batch of 10,000
records from old table, transform each entity object(25 fields) to another
entity object(15 fields) and persist to new table using JpaProducer. I have
given 1Gb memory when starting the application.

I have noticed it takes about good 30 mins to complete 40,000 records.
Basically, per second only 22 records are processed. I am assuming that it
should be much faster like process 60 records in a second. is that
reasonable suspicion that 22 records per second is slow? JPA insert(store
transformed entity object to new table) and update(mark each row in old
table that it is processed) are taking too long. I have noticed camel
server(daemon that constantly gets update and persist to new table) gets
slower after 20,000 records. I am using batchprocessing where 10,000 records
will be committed in a batch. The transformation logic (used bean injection
for translation) doesn't take any time and can be considered that it takes
in microseconds to complete transformation.

I have ensured that index are put in place for old table and new table.
There is no need of second level cache in this scenario. I have used UUID to
generate unique key when inserting new record. Yet this apps take 30 mins
for 40,000.

I have looked at Hibernate JPA performance tuning perspective and applied
the changes as explained in second paragraph. Are there any optimization
that I could make in Camel?

is there JDO component support in next Camel release?
--
View this message in context: http://old.nabble.com/Performance---Camel-JPA-tp27412920p27412920.html
Sent from the Camel - Users mailing list archive at Nabble.com.

Re: Performance - Camel JPA

Posted by Claus Ibsen <cl...@gmail.com>.

Hi


On Thu, Feb 4, 2010 at 3:26 AM, vcheruvu <vi...@macquarie.com> wrote:
>
> We needed near realtime to extract data from old table and persist in new
> table in different database for downstream processing. So we are using camel
> and java as solution to get something going for now.
>
> Yes using direct jdbc is ideal. However I still have to map rows to an
> object. I thought i would be writing lousy mapping and make mistakes that
> JPA contributors have experienced. So, Why re-invent the wheel, I am using
> hibernate JPA (ORM) which use optimal and best practice to map each row's
> fields to object field.  Loading 40000 entities is not the issue. I came to
> know that the issue is with  inserting transformed entity took too long.
> This is because, in camel route  config  as shown below,

:) ORM is not ideal for ETL work. When you talk about 40000 rows that
is hardly anything.
Try millions or even more. Then you may have to use a different
strategy than using Java and an ORM.

Hibernate et all are optimized for applications build on top of the
database. Not for bulk loading millions of rows into tables.
But you have done you due diligence and if 5 min to load 40000 rows
meets your demand that is fine. And if other engineers in your team
can understand and maintain the code you wrote that is great.




>
> <route>
>                <from uri="jpa:com.OldEntity?consumer.query=select x from OldEntity x
> where
> x.processed=0&amp;maximumResults=1000&amp;consumeDelete=false&amp;delay=3000&amp;consumeLockEntity=false@amp;consumer.fixedDelay=true"/>
>                < to uri="bean:transformerBean?method=transformOrder"/>
>                <convertBodyTo type="com.NewEventEntity"/>
>                <to uri="jpa:com.NewEventEntity"/>
>        </route>
>
> Each entity that was loaded by jpaconsumer has been channeled to
> transformation and persist transformed entity by jpaproducer. This is single
> thread and waits for all 1000 to complete and then batch update commit are
> made for old table to mark flag as committed. This is basically making
> JPAConsumer to wait till all the  1000 Entity processing are completed and
> then polls for next 1000 entities. Another issue is that I only used 1
> database connection.   I thought I could increase the speed for inserting
> new entity.
>
>
> So I have split the original route, check below for modified version
>
> <route>
> <!-- this route is about getting 1000 entities and tranform them to
> newEntity -->
>                <from uri="jpa:com.OldEntity?consumer.query=select x from com.oldEntity x
> where
> x.processed=0&amp;maximumResults=1&amp;consumeDelete=false&amp;delay=3000&amp;consumeLockEntity=false@amp;consumer.fixedDelay=true"/>
>                <to uri="bean:transformerBean?method=transformOrder"/>
> <!--  call another route, essential sent it to  queue for further processing
> -->
>                <to
> uri="vm:storeNewEntity?size=10000&amp;timeout=1000000&amp;concurrentConsumers=100"></to>
> </route>
>
>
>  <route>
> <!-- queue size is 10000 and there are 100 threads that work off the queue
> to insert new entities. -->
>        <from
> uri="vm:storeNewEntity?size=10000&amp;timeout=1000000&amp;concurrentConsumers=100"/>
>        <to uri="jpa:com.mbl.entity.NewEventEntity"/>
>    </route>
>
> I have also added c3p0 for database connection pool in persistence.xml
>
>          <property name="hibernate.c3p0.min_size" value="10"/>
>      <property name="hibernate.c3p0.max_size" value="100"/>
>      <property name="hibernate.c3p0.timeout" value="60"/>
>      <property name="hibernate.c3p0.max_statements" value="50"/>
>      <property name="hibernate.c3p0.idle_test_period" value="10000"/>
>
> I only had to make config change and it significantly improved the
> performance. I could complete 40,000 entities in less than 5 mins. So, per
> second it can process 133 records.   I believe there is still room for
> improvement.  Instead of using JPAProducer,  I should call store proc to
> insert into new table.
>
>
> My conclusion, JPAConsumer and translation is fine, problem was with JPA
> insert.
>

Good you got a solution you like.



>
> Claus Ibsen-2 wrote:
>>
>> On Tue, Feb 2, 2010 at 6:30 AM, Kevin Jackson <fo...@gmail.com> wrote:
>>> Hi,
>>> [snip]
>>>
>>>> I have ensured that index are put in place for old table and new table.
>>>> There is no need of second level cache in this scenario. I have used
>>>> UUID to
>>>> generate unique key when inserting new record. Yet this apps take 30
>>>> mins
>>>> for 40,000.
>>>
>>> Indexes on the new table are going to hurt your insert performance.
>>> For large data loads, have you tried:
>>> 1 - push data into a table with no ref integrity (a load table) and no
>>> indexes
>>> 2 - asynchronously (after all the data has been loaded into the load
>>> table), call a stored procedure that copies the data from load to the
>>> real table
>>> 3 - after store proc has run, truncate the load table
>>>
>>> Kev
>>>
>>
>> Yeah I do not think JPA fits well with ETL kinda work.
>> http://en.wikipedia.org/wiki/Extract,_transform,_load
>>
>> There is a zillion other ways to load a lot of data into a database,
>> and using an ORM will newer be very fast.
>>
>> Try googling a bit with your database name and ETL etc. And/or talk to
>> DB specialists in your organization.
>>
>> If you need to do hand crafted SQL queries you may want to use Spring
>> JDBC or iBatis etc. Sometimes its just easier to use Spring JDBC as
>> its a little handy library.
>>
>> --
>> Claus Ibsen
>> Apache Camel Committer
>>
>> Author of Camel in Action: http://www.manning.com/ibsen/
>> Open Source Integration: http://fusesource.com
>> Blog: http://davsclaus.blogspot.com/
>> Twitter: http://twitter.com/davsclaus
>>
>>
>
> --
> View this message in context: http://old.nabble.com/Performance---Camel-JPA-tp27412920p27446740.html
> Sent from the Camel - Users mailing list archive at Nabble.com.
>
>



-- 
Claus Ibsen
Apache Camel Committer

Author of Camel in Action: http://www.manning.com/ibsen/
Open Source Integration: http://fusesource.com
Blog: http://davsclaus.blogspot.com/
Twitter: http://twitter.com/davsclaus

Re: Performance - Camel JPA

Posted by vcheruvu <vi...@macquarie.com>.

We needed near realtime to extract data from old table and persist in new
table in different database for downstream processing. So we are using camel
and java as solution to get something going for now. 

Yes using direct jdbc is ideal. However I still have to map rows to an
object. I thought i would be writing lousy mapping and make mistakes that
JPA contributors have experienced. So, Why re-invent the wheel, I am using
hibernate JPA (ORM) which use optimal and best practice to map each row's
fields to object field.  Loading 40000 entities is not the issue. I came to
know that the issue is with  inserting transformed entity took too long.
This is because, in camel route  config  as shown below,

<route>
		<from uri="jpa:com.OldEntity?consumer.query=select x from OldEntity x
where
x.processed=0&amp;maximumResults=1000&amp;consumeDelete=false&amp;delay=3000&amp;consumeLockEntity=false@amp;consumer.fixedDelay=true"/>
		< to uri="bean:transformerBean?method=transformOrder"/>
		<convertBodyTo type="com.NewEventEntity"/>
		<to uri="jpa:com.NewEventEntity"/>
	</route>

Each entity that was loaded by jpaconsumer has been channeled to
transformation and persist transformed entity by jpaproducer. This is single
thread and waits for all 1000 to complete and then batch update commit are
made for old table to mark flag as committed. This is basically making
JPAConsumer to wait till all the  1000 Entity processing are completed and
then polls for next 1000 entities. Another issue is that I only used 1
database connection.   I thought I could increase the speed for inserting
new entity.

So I have split the original route, check below for modified version

<route>
<!-- this route is about getting 1000 entities and tranform them to
newEntity -->
		<from uri="jpa:com.OldEntity?consumer.query=select x from com.oldEntity x
where
x.processed=0&amp;maximumResults=1&amp;consumeDelete=false&amp;delay=3000&amp;consumeLockEntity=false@amp;consumer.fixedDelay=true"/>
		<to uri="bean:transformerBean?method=transformOrder"/>
<!--  call another route, essential sent it to  queue for further processing
-->
		<to
uri="vm:storeNewEntity?size=10000&amp;timeout=1000000&amp;concurrentConsumers=100"></to>
</route>

 <route>
<!-- queue size is 10000 and there are 100 threads that work off the queue
to insert new entities. -->
    	<from
uri="vm:storeNewEntity?size=10000&amp;timeout=1000000&amp;concurrentConsumers=100"/>
    	<to uri="jpa:com.mbl.entity.NewEventEntity"/>
    </route>

I have also added c3p0 for database connection pool in persistence.xml

	  <property name="hibernate.c3p0.min_size" value="10"/>
      <property name="hibernate.c3p0.max_size" value="100"/>
      <property name="hibernate.c3p0.timeout" value="60"/>
      <property name="hibernate.c3p0.max_statements" value="50"/>
      <property name="hibernate.c3p0.idle_test_period" value="10000"/>

I only had to make config change and it significantly improved the
performance. I could complete 40,000 entities in less than 5 mins. So, per
second it can process 133 records.   I believe there is still room for
improvement.  Instead of using JPAProducer,  I should call store proc to
insert into new table. 

My conclusion, JPAConsumer and translation is fine, problem was with JPA
insert.

Claus Ibsen-2 wrote:
> 
> On Tue, Feb 2, 2010 at 6:30 AM, Kevin Jackson <fo...@gmail.com> wrote:
>> Hi,
>> [snip]
>>
>>> I have ensured that index are put in place for old table and new table.
>>> There is no need of second level cache in this scenario. I have used
>>> UUID to
>>> generate unique key when inserting new record. Yet this apps take 30
>>> mins
>>> for 40,000.
>>
>> Indexes on the new table are going to hurt your insert performance.
>> For large data loads, have you tried:
>> 1 - push data into a table with no ref integrity (a load table) and no
>> indexes
>> 2 - asynchronously (after all the data has been loaded into the load
>> table), call a stored procedure that copies the data from load to the
>> real table
>> 3 - after store proc has run, truncate the load table
>>
>> Kev
>>
> 
> Yeah I do not think JPA fits well with ETL kinda work.
> http://en.wikipedia.org/wiki/Extract,_transform,_load
> 
> There is a zillion other ways to load a lot of data into a database,
> and using an ORM will newer be very fast.
> 
> Try googling a bit with your database name and ETL etc. And/or talk to
> DB specialists in your organization.
> 
> If you need to do hand crafted SQL queries you may want to use Spring
> JDBC or iBatis etc. Sometimes its just easier to use Spring JDBC as
> its a little handy library.
> 
> -- 
> Claus Ibsen
> Apache Camel Committer
> 
> Author of Camel in Action: http://www.manning.com/ibsen/
> Open Source Integration: http://fusesource.com
> Blog: http://davsclaus.blogspot.com/
> Twitter: http://twitter.com/davsclaus
> 
> 

-- 
View this message in context: http://old.nabble.com/Performance---Camel-JPA-tp27412920p27446740.html
Sent from the Camel - Users mailing list archive at Nabble.com.

Re: Performance - Camel JPA

Posted by Claus Ibsen <cl...@gmail.com>.

On Tue, Feb 2, 2010 at 6:30 AM, Kevin Jackson <fo...@gmail.com> wrote:
> Hi,
> [snip]
>
>> I have ensured that index are put in place for old table and new table.
>> There is no need of second level cache in this scenario. I have used UUID to
>> generate unique key when inserting new record. Yet this apps take 30 mins
>> for 40,000.
>
> Indexes on the new table are going to hurt your insert performance.
> For large data loads, have you tried:
> 1 - push data into a table with no ref integrity (a load table) and no indexes
> 2 - asynchronously (after all the data has been loaded into the load
> table), call a stored procedure that copies the data from load to the
> real table
> 3 - after store proc has run, truncate the load table
>
> Kev
>

Yeah I do not think JPA fits well with ETL kinda work.
http://en.wikipedia.org/wiki/Extract,_transform,_load

There is a zillion other ways to load a lot of data into a database,
and using an ORM will newer be very fast.

Try googling a bit with your database name and ETL etc. And/or talk to
DB specialists in your organization.

If you need to do hand crafted SQL queries you may want to use Spring
JDBC or iBatis etc. Sometimes its just easier to use Spring JDBC as
its a little handy library.

-- 
Claus Ibsen
Apache Camel Committer

Author of Camel in Action: http://www.manning.com/ibsen/
Open Source Integration: http://fusesource.com
Blog: http://davsclaus.blogspot.com/
Twitter: http://twitter.com/davsclaus

Re: Performance - Camel JPA

Posted by Kevin Jackson <fo...@gmail.com>.

Hi,
[snip]

> I have ensured that index are put in place for old table and new table.
> There is no need of second level cache in this scenario. I have used UUID to
> generate unique key when inserting new record. Yet this apps take 30 mins
> for 40,000.

Indexes on the new table are going to hurt your insert performance.
For large data loads, have you tried:
1 - push data into a table with no ref integrity (a load table) and no indexes
2 - asynchronously (after all the data has been loaded into the load
table), call a stored procedure that copies the data from load to the
real table
3 - after store proc has run, truncate the load table

Kev