You are viewing a plain text version of this content. The canonical link for it is here.

Posted to derby-user@db.apache.org by vodarus vodarus <vo...@gmail.com> on 2008/06/11 10:31:26 UTC

Speed of using Derby DB

Hello.

I tried to use Derby and compare it with Oracle. I thought that Derby can
have the same performance as Oracle on easy procedures.

Purpose of bench-mark test: use Derby as local db and get better performance
for local data-manipulations.

DB schema:

create table TESTBIG
(
    CLIENT       int not null,
    ORDER_ID     int not null,
    ORDER_AMOUNT int not null
);

alter table TESTBIG add constraint TESTBIG_PK primary key (CLIENT,
ORDER_ID);

create table TESTTOTALS
(
    CLIENT       int not null,
    CLIENT_TOTAL int
);

alter table TESTTOTALS add constraint TESTTOTALS_PK primary key (CLIENT);

We populating TESTBIG table with 1 000 000 rows, then stored procedure
calculates TESTTOTAL: CLIENT_TOTAL is SUMM of all ORDER_AMOUNT for that
CLIENT.

i wrote stored procedure for Derby in Java language:

static public void calculateTotal() {

        int totalAmount = 0;
        int lastClient = 0;

        try {
            Connection connection =
DriverManager.getConnection("jdbc:default:connection");
            Statement s =
connection.createStatement(ResultSet.TYPE_FORWARD_ONLY,
ResultSet.CONCUR_READ_ONLY);
            ResultSet rs = s.executeQuery("SELECT CLIENT, ORDER_ID,
ORDER_AMOUNT FROM TESTBIG");

            PreparedStatement updateData =
connection.prepareStatement("UPDATE testtotals SET " +
                    " client_total = client_total + ? " +
                    " WHERE client = ?");
            PreparedStatement insertData =
connection.prepareStatement("INSERT INTO testtotals " +
                    " (client, client_total) " +
                    " VALUES (?, ?) ");

            while (rs.next()) {
                int client = rs.getInt(1);
                int order_amount = rs.getInt(3);

                if (lastClient == 0) {
                    lastClient = client;
                    totalAmount = 0;
                }

                if (lastClient != client) {

                    // System.out.println("MERGE amount" + lastClient + ":"
+ totalAmount);
                    updateData.setInt(1, totalAmount);
                    updateData.setInt(2, lastClient);
                    int sqlRowCount = updateData.executeUpdate();

                    if (sqlRowCount == 0) {
                        insertData.setInt(1, lastClient);
                        insertData.setInt(2, totalAmount);
                        sqlRowCount = insertData.executeUpdate();
                    }

                    lastClient = client;
                    totalAmount = order_amount;
                } else {
                    totalAmount = totalAmount + order_amount;
                }
            }

            updateData.setInt(1, totalAmount);
            updateData.setInt(2, lastClient);
            int sqlRowCount = updateData.executeUpdate();

            if (sqlRowCount == 0) {
                insertData.setInt(1, lastClient);
                insertData.setInt(2, totalAmount);
                sqlRowCount = insertData.executeUpdate();
            }


            rs.close();
            s.close();
            connection.commit();
            connection.close();
        } catch (SQLException ex) {

Logger.getLogger(CalculateTotal.class.getName()).log(Level.SEVERE, null,
ex);
        }
    }


it takes up to 12 seconds to calculate TESTTOTALS. Oracle PL/SQL procedure
with the same algorithm need 1,5 second.

*How can i improve performance? Or Derby is so slow because of Java / JVM
issues???*

Thanks.

Re: Speed of using Derby DB

Posted by vodarus vodarus <vo...@gmail.com>.

2008/6/19 <de...@segel.com>:

>   I haven't looked at Derby's code, but just because you haven't
> 'committed' the transaction doesn't mean that Derby isn't writing the
> records.
>
> Without a commit, depending on your isolation level, other users won't see
> or feel the effects of your data. The transactions are stored in a log so
> that you can rollback the transaction, the data is still getting written.
>
>  (A simple test would be to write a record of a known length, X number of
> times, on a machine where the JVM has a set restriction size. If you think
> everything is in memory and not being written to disk, then after Y number
> of iterations, you will fill up the heap and go boom.)
>
>  But Derby can be different so what do I know?
>
Hi.

I don't know how it works, but it seems that Derby eat a lot of CPU and can
not use HDD at full speed.

it seems to my that CPU is bottleneck for application. May be because of
data transformation to RecordSet and all manipulations are doing from high
level, not from low and data-storage.


Is there any abilities for Derby to have algorithm working with data on
lowest levels of DBMS? As it do other DBMS like MySQL / PostgreSQL / Oracle
?

Re: Speed of using Derby DB

Posted by vodarus vodarus <vo...@gmail.com>.

>
> Maybe that's your hardware, operating system or JVM version?
> I was using Derby 10.4.1.3 on Solaris 10 and Java SE 6 on a dual CPU (AMD
> 2.4 GHz) machine.


I checked Derby 10.4.1.3 with JDK 5u15, JDK 6u6 and JDK 7 build 28 (latest
now).
usage 10.4.1.3 seems to be 1 second faster, but 9,5 is the best time.

switching between JDK 5 -> 6 -> 7 did not increase speed.


>
> But if 6 seconds is still too much, and you can't use another algorithm, I
> don't think I have anything more that can help you.


there are 100 algorithms, and they can not be written in pure-SQL. That is
the main idea why i write SP instead of pure-SQL.


> I don't think it is correct to blame this on Java. There are other database
> products using Java, and for all I know they might be faster for the
> specific case we are discussing.

I thought that if Java take 2 sec instead of 1 PL/SQL - java can scale up
without license payments. So Java TCO can become cheaper than Oracle.


> Also, I'm not sure how comparable the Java stored procedure code and the
> PL/SQL are.
> Could you post the latter?

pl/SQL code + timing.

set term off
CREATE OR REPLACE PROCEDURE test1 AS
    iTotalAmt   NUMBER;
/*
    можно было просто NUMBER написать, но захотелось показать
    тесную интеграцию процедурного кода и структуры данных
*/
    iLastClient testtotals.client%TYPE;
BEGIN
    FOR cc IN ( SELECT client, order_amount
                  FROM testbig
                 ORDER BY client)
    LOOP
        If iLastClient IS NULL Then
            iLastClient := cc.Client;
            iTotalAmt   := *0*;
        End If;

        If iLastClient != cc.Client Then
            UPDATE testtotals SET
                 client_total = iTotalAmt
             WHERE client = iLastClient;
            If SQL%ROWCOUNT = *0* Then
                INSERT INTO testtotals
                       (client, client_total)
                VALUES (iLastClient, iTotalAmt);
            End If;
            iLastClient := cc.Client;
            iTotalAmt   := *0*;
        Else
            iTotalAmt := iTotalAmt + cc.order_amount;
        End If;
    END LOOP;
    COMMIT;
END;
/
-- Непосредственно тестирование, запускаем три раза
DELETE FROM testtotals WHERE client < *64*
/
COMMIT
/
set timing on term on
PROMPT Start
EXECUTE test1

set timing off term off
DELETE FROM testtotals WHERE client < *64*
/
COMMIT
/
set timing on term on
EXECUTE test1

set timing off term off
DELETE FROM testtotals WHERE client < *64*
/
COMMIT
/
set timing on term on
EXECUTE test1

set timing off term off
DROP PROCEDURE test1

/
set term on
PROMPT Done

 Code not exactly the same and it have ligical errors. But that error are
not change time.

>
> Does anyone know if (or rather how well maybe?) Oracle is capable of
> optimizing the PL/SQL?

I think tt PL/SQL are working with data on more low level (level near
storage) than Debry.

Re: Speed of using Derby DB

Posted by Kristian Waagan <Kr...@Sun.COM>.

vodarus vodarus wrote:
[ snip ]
> 
>     I could also get down to these times by using a HashMap to store the
>     intermediate totals in 'calculateTotalCommon'. This does of course
>     use more memory and might cause trouble if you don't know the number
>     of clients in your table (i.e. whether you need 25 thousand or 100
>     million entries in the map).
> 
> also not applicable, because data amount is 10-100 times more than RAM.

Note that you only need to keep the intermediate results in memory, and 
you don't have to calculate everything before you write data back into 
the database.
In this case, it is the number of unique clients that matter.

I agree it is still a non-optimal solution though.

> 
> 
>     It seems what happens is that the log buffer goes full. By
>     increasing the log buffer, I was able to get a little better
>     performance. As always with tuning, it's about balance and
>     tradeoffs. If your IO system is really good, maybe you can run with
>     a big log buffer and get better performance. However, the effect you
>     see from this also depends on how often you have commits (then the
>     complete buffer is flushed anyway, at least in Derby).
> 
>     So, in short, experiment with the following, using either the
>     "insert into..." query or your client code modified to somehow store
>     the totals in memory:
>      a) Log buffer size
>      b) Page cache size (and JVM heap)
>      c) Page size
> 
>     One of my attempts looked like this:
>     java -Xmx512M -Dderby.storage.pageSize=32768
>     -Dderby.storage.logBufferSize=524288
>     -Dderby.storage.pageCacheSize=2500 -cp .:${JDB10413} derbytest.FatTest
> 
> 
> this show 9,5 sec time. Best is Oracle with 1,5 sec.
> also is this parameters work with old database or you need to recreate 
> database to get this parameters work?

Except for the pageSize options, the options take effect when you start 
Derby.

Note that the what I posted was just an example. You might still have to 
experiment to find the best fit for your environment / data.

> 
> 
> 
>     Using your original test code I haven't been able to get lower than
>     around 5 seconds (best), the average being somewhere around 6 seconds.
> 
> i could get 9,5 sec best. what another improvements can you do with 
> application?

Maybe that's your hardware, operating system or JVM version?
I was using Derby 10.4.1.3 on Solaris 10 and Java SE 6 on a dual CPU 
(AMD 2.4 GHz) machine.

But if 6 seconds is still too much, and you can't use another algorithm, 
I don't think I have anything more that can help you.

> 
> 
> 
>     As always, you have to do your own tests on your own system to see
>     if it is good enough for your use :)
> 
> :)
> 
> 
>     Often there are other things to consider besides performance, for
>     instance installation and ease of use.
> 
> :) yes, but performance issue is very important. more important than 
> installation.
> Because with Oracle analysis will take near 100 (algorithms) * 100 (data 
> is 100 time more than in experiment) * 1,5 sec = 4.17 hours each. So 
> near 6 data parts a day per server.
> 
> Java: analysis 100 * 100 * 9,5 = 26,4 hours each. So less than ONE data 
> part a day per server. So company need to buy and use 6 time more 
> servers than it use now. Also disadvantages is Java don't have integrate 
> SQL into language, so SQL can be validated at runtime, not compile time 
> (unlike PL/SQL).
> 

I don't think it is correct to blame this on Java. There are other 
database products using Java, and for all I know they might be faster 
for the specific case we are discussing.

Also, I'm not sure how comparable the Java stored procedure code and the 
PL/SQL are.
Could you post the latter?
Does anyone know if (or rather how well maybe?) Oracle is capable of 
optimizing the PL/SQL?



regards,
-- 
Kristian

> 
>     Does anyone have any ideas on other possible tunings?
> 
>     -- 
>     Kristian
> 
>

Re: Speed of using Derby DB

Posted by vodarus vodarus <vo...@gmail.com>.

>
> Just to be sure, you did recreate the tables?
> In any case, the page size would mostly help pull data in faster and that
> doesn't matter for this test.
>
>
>> What is the "Øysteins approach "? Can you write steps to get 2.4 seconds
>> time?
>>
>
> Øysteins approach is using the query "insert into testtotals select client,
> sum(order_amount) from testbig group by client;".
> As you state, this is not what you want in your case and it might not be
> applicable.

No this is test for non-pure-SQL  functions only.

>
> I could also get down to these times by using a HashMap to store the
> intermediate totals in 'calculateTotalCommon'. This does of course use more
> memory and might cause trouble if you don't know the number of clients in
> your table (i.e. whether you need 25 thousand or 100 million entries in the
> map).

also not applicable, because data amount is 10-100 times more than RAM.


It seems what happens is that the log buffer goes full. By increasing the
> log buffer, I was able to get a little better performance. As always with
> tuning, it's about balance and tradeoffs. If your IO system is really good,
> maybe you can run with a big log buffer and get better performance. However,
> the effect you see from this also depends on how often you have commits
> (then the complete buffer is flushed anyway, at least in Derby).
>
> So, in short, experiment with the following, using either the "insert
> into..." query or your client code modified to somehow store the totals in
> memory:
>  a) Log buffer size
>  b) Page cache size (and JVM heap)
>  c) Page size
>
> One of my attempts looked like this:
> java -Xmx512M -Dderby.storage.pageSize=32768
> -Dderby.storage.logBufferSize=524288 -Dderby.storage.pageCacheSize=2500 -cp
> .:${JDB10413} derbytest.FatTest


this show 9,5 sec time. Best is Oracle with 1,5 sec.
also is this parameters work with old database or you need to recreate
database to get this parameters work?

>
>
> Using your original test code I haven't been able to get lower than around
> 5 seconds (best), the average being somewhere around 6 seconds.

i could get 9,5 sec best. what another improvements can you do with
application?

>
>
> As always, you have to do your own tests on your own system to see if it is
> good enough for your use :)

:)

>
> Often there are other things to consider besides performance, for instance
> installation and ease of use.

:) yes, but performance issue is very important. more important than
installation.
Because with Oracle analysis will take near 100 (algorithms) * 100 (data is
100 time more than in experiment) * 1,5 sec = 4.17 hours each. So near 6
data parts a day per server.

Java: analysis 100 * 100 * 9,5 = 26,4 hours each. So less than ONE data part
a day per server. So company need to buy and use 6 time more servers than it
use now. Also disadvantages is Java don't have integrate SQL into language,
so SQL can be validated at runtime, not compile time (unlike PL/SQL).

>
> Does anyone have any ideas on other possible tunings?
>
> --
> Kristian
>

Re: Speed of using Derby DB

Posted by Kristian Waagan <Kr...@Sun.COM>.

vodarus vodarus wrote:
> Also interesting that DROP and CREATE <table> seems to be faster than 
> DELETE FROM <table>
> 
> it depends on internal architecture.
> by the way... Is Derby have blocking engine (like MySQL/MyISAM or MS SQL 
> 2k) or versionning (like PostgreSQL or Oracle)?

Derby uses locking.
Row and table level locking are supported.


-- 
Kristian

Re: Speed of using Derby DB

Posted by vodarus vodarus <vo...@gmail.com>.

Also interesting that DROP and CREATE <table> seems to be faster than DELETE
FROM <table>

it depends on internal architecture.
by the way... Is Derby have blocking engine (like MySQL/MyISAM or MS SQL 2k)
or versionning (like PostgreSQL or Oracle)?

Re: Speed of using Derby DB

Posted by Kristian Waagan <Kr...@Sun.COM>.

vodarus vodarus wrote:
>     Hi,
> 
>     By using Øysteins approach I was able to get the time down to 2.4
>     seconds on my machine, on which the client [1] and stored procedure
>     code took around 12 seconds. The best I could get on the latter,
>     tweaking page cache size and page size, was around 8 seconds.
> 
>     By cheating and removing some durability guarantees, I got down to a
>     best time (not quite stable) of 1.5 seconds using Øysteins suggestion.
> 
>     I was surprised of the high disk activity seen when running the
>     code. Lots of writes are taking place, which I did not quite expect
>     for Øysteins query. But I do not know the implementation or the
>     algorithm being used.
> 
>     There also seem to be some overhead invoking a stored procedure, as
>     the client [1] code is faster. This would of course look different
>     if the network JDBC driver was used, as you wouldn't have to
>     transfer the data over the wire.
> 
>     To me it seems what takes most of the time is updating the result table.
> 
>     So in short, no fresh ideas! Anyone else?
>     I didn't try using batches for the updated though.
> 
> 
>     PS: Note that your pageSize setting is invalid (must be one of 4096,
>     8192, 16384, or 32768) and Derby will silently ignore it and use the
>     default...
> 
> 
>     -- 
>     Kristian
> 
> 
>     [1] Note that client in this case still refers to the embedded
>     driver, but the code composing the stored procedure is invoked from
>     the driver side instead of "inside" the database.
> 
> Hello )))
> 
> I set pageSize to 32768, but result time seems near 11-12 sec.

Just to be sure, you did recreate the tables?
In any case, the page size would mostly help pull data in faster and 
that doesn't matter for this test.

> 
> What is the "Øysteins approach "? Can you write steps to get 2.4 seconds 
> time?

Øysteins approach is using the query "insert into testtotals select 
client, sum(order_amount) from testbig group by client;".
As you state, this is not what you want in your case and it might not be 
applicable.

I could also get down to these times by using a HashMap to store the 
intermediate totals in 'calculateTotalCommon'. This does of course use 
more memory and might cause trouble if you don't know the number of 
clients in your table (i.e. whether you need 25 thousand or 100 million 
entries in the map).

> 
> Thanks.
> 
> 
> PS "To me it seems what takes most of the time is updating the result 
> table." But what is the problem there? I commit data at the end, so DBMS 
> should not do any writes ...

It seems what happens is that the log buffer goes full. By increasing 
the log buffer, I was able to get a little better performance. As always 
with tuning, it's about balance and tradeoffs. If your IO system is 
really good, maybe you can run with a big log buffer and get better 
performance. However, the effect you see from this also depends on how 
often you have commits (then the complete buffer is flushed anyway, at 
least in Derby).

So, in short, experiment with the following, using either the "insert 
into..." query or your client code modified to somehow store the totals 
in memory:
  a) Log buffer size
  b) Page cache size (and JVM heap)
  c) Page size

One of my attempts looked like this:
java -Xmx512M -Dderby.storage.pageSize=32768 
-Dderby.storage.logBufferSize=524288 -Dderby.storage.pageCacheSize=2500 
-cp .:${JDB10413} derbytest.FatTest

Using your original test code I haven't been able to get lower than 
around 5 seconds (best), the average being somewhere around 6 seconds.

As always, you have to do your own tests on your own system to see if it 
is good enough for your use :)
Often there are other things to consider besides performance, for 
instance installation and ease of use.

Does anyone have any ideas on other possible tunings?

-- 
Kristian

RE: Speed of using Derby DB

Posted by de...@segel.com.

 

I havent looked at Derbys code, but just because you havent committed
the transaction doesnt mean that Derby isnt writing the records.

Without a commit, depending on your isolation level, other users wont see
or feel the effects of your data. The transactions are stored in a log so
that you can rollback the transaction, the data is still getting written.

 

(A simple test would be to write a record of a known length, X number of
times, on a machine where the JVM has a set restriction size. If you think
everything is in memory and not being written to disk, then after Y number
of iterations, you will fill up the heap and go boom.)

 

But Derby can be different so what do I know?

 

  _____  

From: vodarus vodarus [mailto:vodarus@gmail.com] 
Sent: Thursday, June 19, 2008 3:43 AM
To: Derby Discussion
Subject: Re: Speed of using Derby DB

 

Hi,

By using Øysteins approach I was able to get the time down to 2.4 seconds on
my machine, on which the client [1] and stored procedure code took around 12
seconds. The best I could get on the latter, tweaking page cache size and
page size, was around 8 seconds.

By cheating and removing some durability guarantees, I got down to a best
time (not quite stable) of 1.5 seconds using Øysteins suggestion.

I was surprised of the high disk activity seen when running the code. Lots
of writes are taking place, which I did not quite expect for Øysteins query.
But I do not know the implementation or the algorithm being used.

There also seem to be some overhead invoking a stored procedure, as the
client [1] code is faster. This would of course look different if the
network JDBC driver was used, as you wouldn't have to transfer the data over
the wire.

To me it seems what takes most of the time is updating the result table.

So in short, no fresh ideas! Anyone else?
I didn't try using batches for the updated though.


PS: Note that your pageSize setting is invalid (must be one of 4096, 8192,
16384, or 32768) and Derby will silently ignore it and use the default...


-- 
Kristian


[1] Note that client in this case still refers to the embedded driver, but
the code composing the stored procedure is invoked from the driver side
instead of "inside" the database.

Hello )))

I set pageSize to 32768, but result time seems near 11-12 sec. 


What is the "Øysteins approach "? Can you write steps to get 2.4 seconds
time?

Thanks.


PS "To me it seems what takes most of the time is updating the result
table." But what is the problem there? I commit data at the end, so DBMS
should not do any writes ...

Re: Speed of using Derby DB

Posted by vodarus vodarus <vo...@gmail.com>.

>
> Hi,
>
> By using Øysteins approach I was able to get the time down to 2.4 seconds
> on my machine, on which the client [1] and stored procedure code took around
> 12 seconds. The best I could get on the latter, tweaking page cache size and
> page size, was around 8 seconds.
>
> By cheating and removing some durability guarantees, I got down to a best
> time (not quite stable) of 1.5 seconds using Øysteins suggestion.
>
> I was surprised of the high disk activity seen when running the code. Lots
> of writes are taking place, which I did not quite expect for Øysteins query.
> But I do not know the implementation or the algorithm being used.
>
> There also seem to be some overhead invoking a stored procedure, as the
> client [1] code is faster. This would of course look different if the
> network JDBC driver was used, as you wouldn't have to transfer the data over
> the wire.
>
> To me it seems what takes most of the time is updating the result table.
>
> So in short, no fresh ideas! Anyone else?
> I didn't try using batches for the updated though.
>
>
> PS: Note that your pageSize setting is invalid (must be one of 4096, 8192,
> 16384, or 32768) and Derby will silently ignore it and use the default...
>
>
> --
> Kristian
>
>
> [1] Note that client in this case still refers to the embedded driver, but
> the code composing the stored procedure is invoked from the driver side
> instead of "inside" the database.

Hello )))

I set pageSize to 32768, but result time seems near 11-12 sec.

What is the "Øysteins approach "? Can you write steps to get 2.4 seconds
time?

Thanks.


PS "To me it seems what takes most of the time is updating the result
table." But what is the problem there? I commit data at the end, so DBMS
should not do any writes ...

Re: Speed of using Derby DB

Posted by Kristian Waagan <Kr...@Sun.COM>.

vodarus vodarus wrote:
> Any fresh ideas colleagues?
> 
> How can i improve derby speed?

Hi,

By using Øysteins approach I was able to get the time down to 2.4 
seconds on my machine, on which the client [1] and stored procedure code 
took around 12 seconds. The best I could get on the latter, tweaking 
page cache size and page size, was around 8 seconds.

By cheating and removing some durability guarantees, I got down to a 
best time (not quite stable) of 1.5 seconds using Øysteins suggestion.

I was surprised of the high disk activity seen when running the code. 
Lots of writes are taking place, which I did not quite expect for 
Øysteins query. But I do not know the implementation or the algorithm 
being used.

There also seem to be some overhead invoking a stored procedure, as the 
client [1] code is faster. This would of course look different if the 
network JDBC driver was used, as you wouldn't have to transfer the data 
over the wire.

To me it seems what takes most of the time is updating the result table.

So in short, no fresh ideas! Anyone else?
I didn't try using batches for the updated though.

PS: Note that your pageSize setting is invalid (must be one of 4096, 
8192, 16384, or 32768) and Derby will silently ignore it and use the 
default...

-- 
Kristian

[1] Note that client in this case still refers to the embedded driver, 
but the code composing the stored procedure is invoked from the driver 
side instead of "inside" the database.

Re: Speed of using Derby DB

Posted by vodarus vodarus <vo...@gmail.com>.

Any fresh ideas colleagues?

How can i improve derby speed?

Re: Speed of using Derby DB

Posted by vodarus vodarus <vo...@gmail.com>.

>
> For derby the default is autocommit=true, which means it will do a
> synchronous log commit write for every statement.  My reading of the app and
> the
> call to commit at the end seems to indicate you want to only commit
> at the end.  Depending on the data distribution (ie. how many times you
> call executeUpdate this may be a big issue.  When you run the test do
> you see a lot of idle time or is 1 cpu at 100% for 12-14 secs? The
> suggestion for using aggregates seems like a better way to do the app.
>
> What kind of page cache does oracle have when you compare?  Are you
> interested in performance of a query when no page is in cache or when
> all pages are in cache?  This size table is bigger than the default
> derby page cache (1000 pages) so it basically is going to be an I/O test of
> how fast
> stuff can be read from disk each time.  If you want to try a cached
> test try setting derby page cache bigger - probably 10,000 pages will
> fit that table (probably less - didn't do exact math).
>
> Hello.

You can see example code now. *conn.setAutoCommit(false);* putted there.

derby.properties:
derby.storage.pageCacheSize=100000
derby.storage.pageSize=32000
#derby.language.logStatementText=true

app run with:
java -Xms100m -Xmx1000m -cp dist/derbytest.jar derbytest.FatTest
So memory is enough for app.

Soft- & Hard-ware:

Code2Duo 2Gz, 2Gb RAM, MS Win XP SP2.
java version "1.5.0_15"
Java(TM) 2 Runtime Environment, Standard Edition (build 1.5.0_15-b04)
Java HotSpot(TM) Client VM (build 1.5.0_15-b04, mixed mode)

derby version - 10.3.2.1

Oracle XE with default configuration.

Re: Speed of using Derby DB

Posted by Mike Matrigali <mi...@sbcglobal.net>.

vodarus vodarus wrote:
> Hello.
> 
> I tried to use Derby and compare it with Oracle. I thought that Derby 
> can have the same performance as Oracle on easy procedures.
> 
> Purpose of bench-mark test: use Derby as local db and get better 
> performance for local data-manipulations.
> 
> DB schema:
> 
> create table TESTBIG
> (
>     CLIENT       int not null,
>     ORDER_ID     int not null,
>     ORDER_AMOUNT int not null
> );
> 
> alter table TESTBIG add constraint TESTBIG_PK primary key (CLIENT, 
> ORDER_ID);
>    
> create table TESTTOTALS
> (
>     CLIENT       int not null,
>     CLIENT_TOTAL int
> );
> 
> alter table TESTTOTALS add constraint TESTTOTALS_PK primary key (CLIENT);
> 
> We populating TESTBIG table with 1 000 000 rows, then stored procedure 
> calculates TESTTOTAL: CLIENT_TOTAL is SUMM of all ORDER_AMOUNT for that 
> CLIENT.
> 
> i wrote stored procedure for Derby in Java language:
> 
> static public void calculateTotal() {
> 
>         int totalAmount = 0;
>         int lastClient = 0;
> 
>         try {
>             Connection connection = 
> DriverManager.getConnection("jdbc:default:connection");
>             Statement s = 
> connection.createStatement(ResultSet.TYPE_FORWARD_ONLY, 
> ResultSet.CONCUR_READ_ONLY);
>             ResultSet rs = s.executeQuery("SELECT CLIENT, ORDER_ID, 
> ORDER_AMOUNT FROM TESTBIG");
> 
>             PreparedStatement updateData = 
> connection.prepareStatement("UPDATE testtotals SET " +
>                     " client_total = client_total + ? " +
>                     " WHERE client = ?");
>             PreparedStatement insertData = 
> connection.prepareStatement("INSERT INTO testtotals " +
>                     " (client, client_total) " +
>                     " VALUES (?, ?) ");
> 
>             while (rs.next()) {
>                 int client = rs.getInt(1);
>                 int order_amount = rs.getInt(3);
> 
>                 if (lastClient == 0) {
>                     lastClient = client;
>                     totalAmount = 0;
>                 }
> 
>                 if (lastClient != client) {
>                    
>                     // System.out.println("MERGE amount" + lastClient + 
> ":" + totalAmount);
>                     updateData.setInt(1, totalAmount);
>                     updateData.setInt(2, lastClient);
>                     int sqlRowCount = updateData.executeUpdate();
> 
>                     if (sqlRowCount == 0) {
>                         insertData.setInt(1, lastClient);
>                         insertData.setInt(2, totalAmount);
>                         sqlRowCount = insertData.executeUpdate();
>                     }
> 
>                     lastClient = client;
>                     totalAmount = order_amount;
>                 } else {
>                     totalAmount = totalAmount + order_amount;
>                 }
>             }
> 
>             updateData.setInt(1, totalAmount);
>             updateData.setInt(2, lastClient);
>             int sqlRowCount = updateData.executeUpdate();
> 
>             if (sqlRowCount == 0) {
>                 insertData.setInt(1, lastClient);
>                 insertData.setInt(2, totalAmount);
>                 sqlRowCount = insertData.executeUpdate();
>             }
> 
> 
>             rs.close();
>             s.close();
>             connection.commit();
>             connection.close();
>         } catch (SQLException ex) {
>             
> Logger.getLogger(CalculateTotal.class.getName()).log(Level.SEVERE, null, 
> ex);
>         }
>     }
> 
> 
> it takes up to 12 seconds to calculate TESTTOTALS. Oracle PL/SQL 
> procedure with the same algorithm need 1,5 second.
> 
> *How can i improve performance? Or Derby is so slow because of Java / 
> JVM issues???*
> 
> Thanks

For derby the default is autocommit=true, which means it will do a 
synchronous log commit write for every statement.  My reading of the app 
and the
call to commit at the end seems to indicate you want to only commit
at the end.  Depending on the data distribution (ie. how many times you
call executeUpdate this may be a big issue.  When you run the test do
you see a lot of idle time or is 1 cpu at 100% for 12-14 secs? The 
suggestion for using aggregates seems like a better way to do the app.

What kind of page cache does oracle have when you compare?  Are you 
interested in performance of a query when no page is in cache or when
all pages are in cache?  This size table is bigger than the default
derby page cache (1000 pages) so it basically is going to be an I/O test 
of how fast
stuff can be read from disk each time.  If you want to try a cached
test try setting derby page cache bigger - probably 10,000 pages will
fit that table (probably less - didn't do exact math).

Re: Speed of using Derby DB

Posted by vodarus vodarus <vo...@gmail.com>.

>
> If you are using Derby in embedded mode, I do not think there will be a
> performance advantage of using a stored procedure over executing the code
> directly in the application.  It may even be faster to skip the stored
> procedure.  It would be interesting if you could try it out and report the
> results.
>

I change this function and execute it from Java application directly, not as
stored procudere.

Results looks like the same. Near 12-14 seconds.

Re: Speed of using Derby DB

Posted by Øystein Grøvlen <Oy...@Sun.COM>.

vodarus vodarus wrote:
> I thought tt using stored procedure will avoid transferring data from 
> DataEngine to Application.

That cost is very little when using embedded where both are executing in 
the same VM.

> Using Java + Oracle will be slower than usage of PL/SQL code.

But Java + Oracle implies client/server and transfer of data between 
different processes.

> Application will spend a lot of resources to transfer data from DBMS to 
> external code and back.
> 
> Do you mean usage of Java application itself and avoid usage of stored 
> procedure?

Yes.

> 
> 
>         PS does derby / javaDB have it's own forum?
> 
> 
>     Do you mean outside this mailing list?
> 
> 
> Yes. I think forum is more visual tool for communication than mail-list.

(I would think that depends on your mail client.)

Nabble and several other sites provide a way to access the mailing list 
in as a forum. http://www.nabble.com/Apache-Derby-Users-f93.html

-- 
Øystein

Re: Speed of using Derby DB

Posted by vodarus vodarus <vo...@gmail.com>.

> PS:
>> It *may* be that the reason for the slow execution is outdated index
>> cardinality statistics. Check out "CALL
>> SYSCS_UTIL.SYSCS_INPLACE_COMPRESS_TABLE" either on nabble or in the derby
>> docs.
>>
>
> Vodarus,
>
> Did you get around to try out the tip from Thomas?
>
> If you first create the index and then load the data, the statistics can be
> badly outdated. If you first created the table, loaded the data and finally
> created the index you should be fine though.
>
> Just curious, as we have seen reports of severe performance degradations
> because of this issue, which causes the optimizer to choose a bad plan.
>
> regards,
>
Hello.

I created example of application. All app merged into one file. All in this
topic can run app and get the same results.

final private WORK_TYPE work_type should be set to DATA_FROM_SCRATCH at
first run.
Than use DATA_DROP_AND_CREATE for recreating data or DATA_USE_OLD to use old
data.

Anyone can use app and get the same results as i. I am interested in high
speed of calculateTotalStored or calculateTotalClient methods ONLY!!!


PS this app for testing purposes only, so code re-usability and other things
like it are not important.

Re: Speed of using Derby DB

Posted by vodarus vodarus <vo...@gmail.com>.

>
>
> Did you get around to try out the tip from Thomas?
>
> Now you can see app example. I use full-table scan in SP, so how indexes
can improve this???

Re: Speed of using Derby DB

Posted by Kristian Waagan <Kr...@Sun.COM>.

Thomas Nielsen wrote:
> vodarus vodarus wrote:
>>         PS does derby / javaDB have it's own forum?
>>
>>
>>     Do you mean outside this mailing list?
>>
>>
>> Yes. I think forum is more visual tool for communication than mail-list.
>>
> 
> If so, you can read the list using nabble (www.nabble.com) :)
> 
> Cheers,
> Thomas
> 
> PS:
> It *may* be that the reason for the slow execution is outdated index 
> cardinality statistics. Check out "CALL 
> SYSCS_UTIL.SYSCS_INPLACE_COMPRESS_TABLE" either on nabble or in the 
> derby docs.

Vodarus,

Did you get around to try out the tip from Thomas?

If you first create the index and then load the data, the statistics can 
be badly outdated. If you first created the table, loaded the data and 
finally created the index you should be fine though.

Just curious, as we have seen reports of severe performance degradations 
because of this issue, which causes the optimizer to choose a bad plan.

regards,
-- 
Kristian

>

Re: Speed of using Derby DB

Posted by Thomas Nielsen <Th...@Sun.COM>.

vodarus vodarus wrote:
>         PS does derby / javaDB have it's own forum?
> 
> 
>     Do you mean outside this mailing list?
> 
> 
> Yes. I think forum is more visual tool for communication than mail-list.
> 

If so, you can read the list using nabble (www.nabble.com) :)

Cheers,
Thomas

PS:
It *may* be that the reason for the slow execution is outdated index 
cardinality statistics. Check out "CALL 
SYSCS_UTIL.SYSCS_INPLACE_COMPRESS_TABLE" either on nabble or in the 
derby docs.

-- 
Thomas Nielsen

Re: Speed of using Derby DB

Posted by vodarus vodarus <vo...@gmail.com>.

2008/6/11 Øystein Grøvlen <Oy...@sun.com>:

>
> If you are using Derby in embedded mode, I do not think there will be a
> performance advantage of using a stored procedure over executing the code
> directly in the application.  It may even be faster to skip the stored
> procedure.  It would be interesting if you could try it out and report the
> results.


I thought tt using stored procedure will avoid transferring data from
DataEngine to Application.
Using Java + Oracle will be slower than usage of PL/SQL code. Application
will spend a lot of resources to transfer data from DBMS to external code
and back.

Do you mean usage of Java application itself and avoid usage of stored
procedure?


>  PS does derby / javaDB have it's own forum?
>>
>>
> Do you mean outside this mailing list?


Yes. I think forum is more visual tool for communication than mail-list.

Re: Speed of using Derby DB

Posted by Øystein Grøvlen <Oy...@Sun.COM>.

> And what tools can you use to analyse big amount of relation-data? SQL 
> can not be used in many cases. So people usually use stored procedures 
> in RDBMS - PL/SQL in Oracle. I have idea: replase Oracle with Java 
> application. And that Java application have RDBMS inside (like Derby in 
> Embedded mode).
> 

If you are using Derby in embedded mode, I do not think there will be a 
performance advantage of using a stored procedure over executing the 
code directly in the application.  It may even be faster to skip the 
stored procedure.  It would be interesting if you could try it out and 
report the results.

> PS does derby / javaDB have it's own forum?
> 

Do you mean outside this mailing list?

-- 
Øystein

Re: Speed of using Derby DB

Posted by vodarus vodarus <vo...@gmail.com>.

Hello.

2008/6/11 Øystein Grøvlen <Oy...@sun.com>:

> vodarus vodarus wrote:
>
>> it takes up to 12 seconds to calculate TESTTOTALS. Oracle PL/SQL procedure
>> with the same algorithm need 1,5 second.
>>
>> *How can i improve performance? Or Derby is so slow because of Java / JVM
>> issues???*
>>
>> Thanks.
>>
>
> I do not have any experience with performance of stored procedures, but
> note that Derby has a pretty high overhead for query compilation since it
> compiles queries directly into Java byte code.  Hence, comparison on single
> executions of statements will usual not be in favor of Derby. However, if
> same statements are executed many times efficiency improves (both due to
> reuse of compiled plans and hot-spot compilation of the Java code).


I run stored procedure for many times:
for (int i = 0; i < 5; i++) {
                s = conn.createStatement();

                sql = "DELETE FROM testtotals";
                s.execute(sql);

                begin = new Date();
                sql = "CALL calculateTotal()";
                s.execute(sql);
                end = new Date();
                System.out.println("Timing " + (end.getTime() -
begin.getTime()));
                conn.commit();
                System.out.println("Committed the transaction");
            }


> For you particular problem, why use stored procedures when you can do the
> same with the following query:
>
> insert into testtotals
> select client, sum(order_amount) from testbig group by client;


Because it's just example. There should be any function that can not be
replased with pure-SQL. of course SQL can be used in this test, but test
purpose is "check speed of server-side data analysing with big amount of
data".

And what tools can you use to analyse big amount of relation-data? SQL can
not be used in many cases. So people usually use stored procedures in RDBMS
- PL/SQL in Oracle. I have idea: replase Oracle with Java application. And
that Java application have RDBMS inside (like Derby in Embedded mode).


PS does derby / javaDB have it's own forum?

Re: Speed of using Derby DB

Posted by Øystein Grøvlen <Oy...@Sun.COM>.

vodarus vodarus wrote:
> it takes up to 12 seconds to calculate TESTTOTALS. Oracle PL/SQL 
> procedure with the same algorithm need 1,5 second.
> 
> *How can i improve performance? Or Derby is so slow because of Java / 
> JVM issues???*
> 
> Thanks.

I do not have any experience with performance of stored procedures, but 
note that Derby has a pretty high overhead for query compilation since 
it compiles queries directly into Java byte code.  Hence, comparison on 
single executions of statements will usual not be in favor of Derby. 
However, if same statements are executed many times efficiency improves 
(both due to reuse of compiled plans and hot-spot compilation of the 
Java code).

For you particular problem, why use stored procedures when you can do 
the same with the following query:

insert into testtotals
select client, sum(order_amount) from testbig group by client;

-- 
Øystein

Re: Speed of using Derby DB

Posted by vodarus vodarus <vo...@gmail.com>.

>
> If you want a better comparison, write a Java Stored Procedure in Oracle
> and then compare the time. Even then you will have differences that will
> effect your performance.
>

I do not need "Java Stored Procedure in Oracle". I need fast application
that will analyse big amount of relation data. This analysis can not be done
via SQL, so necessary to use some server side stored procedure.

When i use Java with Derby - perfomance fall is 12 times for 1 iteration,
but necessary to do 100 of analysis for each amount of data. So performance
will fall 12*100 times. So it will be better to buy Oracle than use Derby
and buy more computers.

In this comparsion PL/SQL wins because it have better syntax AND better
perfomance.

But may be i have done something wrong and together we can fix it :)

Re: Speed of using Derby DB

Posted by vodarus vodarus <vo...@gmail.com>.

2008/6/11 <de...@segel.com>:

>  Hi,
>
> You really can't compare Derby against Oracle.
>
> They are two different beasts.
>
> You can't embed Oracle in your application. You can embed Derby.
>
> You spend $$$$ on an Oracle solution. You spend $ on Derby. TANSTAAFL!
>
Yes, i thought that Derby can win this comparsion if it will work with the
same speed. I have really easy procedure, so all perfomance cost should be
in I/O (also i have more RAM than database take on HDD)

> Oracle has a paid support staff of developers. Derby? Their support staff
> is paid by IBM and SUN. Ooops! IBM stopped supporting Cloudscape, no? Or
> else community members support Derby outside of their $Dayjob.
>
> Oracle big, written in C/C++ etc.
>
> Derby? 100% Java, small.
>
>
>
> If you want a better comparison, write a Java Stored Procedure in Oracle
> and then compare the time. Even then you will have differences that will
> effect your performance.
>
it's more easy to write code on PL/ SQL than use Java. Of course you can try
to use SQLJ but...

Let's compare:
PL/SQL code:

            UPDATE testtotals SET
                 client_total = iTotalAmt
             WHERE client = iLastClient;

the same code in Java:

PreparedStatement update = connection.prepareStatement("UPDATE
testtotals SET " +
                    " client_total = client_total + ? " +
                    " WHERE client = ?");
                    update.setInt(*1*, totalAmount);
                    update.setInt(*2*, lastClient);
                    int sqlRowCount = update.executeUpdate();

Also there is no support for SQL in java, so SQL live as Strings without any
checks. SQLJ can help here, but java refuse usage of preprocessor, so IDE's
not understanding SQLJ dialect.

> Oracle can take advantage of partitioning database tables, certain caching,
> and of course query optimization will vary.
>

I camparing only speed of data analysis, and only Oracle PL/SQL and Java. I
prefer Java, but it have many disadvantages when you need to work with big
amount of data.

RE: Speed of using Derby DB

Posted by de...@segel.com.

Hi,

 

You really can't compare Derby against Oracle.

 

They are two different beasts.

 

You can't embed Oracle in your application. You can embed Derby.

You spend $$$$ on an Oracle solution. You spend $ on Derby. TANSTAAFL!

 

Oracle has a paid support staff of developers. Derby? Their support staff is
paid by IBM and SUN. Ooops! IBM stopped supporting Cloudscape, no? Or else
community members support Derby outside of their $Dayjob.

 

Oracle big, written in C/C++ etc.

Derby? 100% Java, small.

 

If you want a better comparison, write a Java Stored Procedure in Oracle and
then compare the time. Even then you will have differences that will effect
your performance.

Oracle can take advantage of partitioning database tables, certain caching,
and of course query optimization will vary.

 

I'm not sure if  Informix's Standard Engine supported SPL, but if it did, it
would be a closer comparison, although SE is C based and should perform
faster.

 

HTH

 

-G

  _____  

From: vodarus vodarus [mailto:vodarus@gmail.com] 
Sent: Wednesday, June 11, 2008 3:31 AM
To: derby-user@db.apache.org
Subject: Speed of using Derby DB

 

Hello.

I tried to use Derby and compare it with Oracle. I thought that Derby can
have the same performance as Oracle on easy procedures. 

Purpose of bench-mark test: use Derby as local db and get better performance
for local data-manipulations.

DB schema:

create table TESTBIG
( 
    CLIENT       int not null, 
    ORDER_ID     int not null, 
    ORDER_AMOUNT int not null 
);

alter table TESTBIG add constraint TESTBIG_PK primary key (CLIENT,
ORDER_ID);
    
create table TESTTOTALS 
( 
    CLIENT       int not null, 
    CLIENT_TOTAL int 
);

alter table TESTTOTALS add constraint TESTTOTALS_PK primary key (CLIENT);

We populating TESTBIG table with 1 000 000 rows, then stored procedure
calculates TESTTOTAL: CLIENT_TOTAL is SUMM of all ORDER_AMOUNT for that
CLIENT.

i wrote stored procedure for Derby in Java language:

static public void calculateTotal() {

        int totalAmount = 0;
        int lastClient = 0;

        try {
            Connection connection =
DriverManager.getConnection("jdbc:default:connection");
            Statement s =
connection.createStatement(ResultSet.TYPE_FORWARD_ONLY,
ResultSet.CONCUR_READ_ONLY);
            ResultSet rs = s.executeQuery("SELECT CLIENT, ORDER_ID,
ORDER_AMOUNT FROM TESTBIG");

            PreparedStatement updateData =
connection.prepareStatement("UPDATE testtotals SET " +
                    " client_total = client_total + ? " +
                    " WHERE client = ?");
            PreparedStatement insertData =
connection.prepareStatement("INSERT INTO testtotals " +
                    " (client, client_total) " +
                    " VALUES (?, ?) ");

            while (rs.next()) {
                int client = rs.getInt(1);
                int order_amount = rs.getInt(3);

                if (lastClient == 0) {
                    lastClient = client;
                    totalAmount = 0;
                }

                if (lastClient != client) {
                    
                    // System.out.println("MERGE amount" + lastClient + ":"
+ totalAmount);
                    updateData.setInt(1, totalAmount);
                    updateData.setInt(2, lastClient);
                    int sqlRowCount = updateData.executeUpdate();

                    if (sqlRowCount == 0) {
                        insertData.setInt(1, lastClient);
                        insertData.setInt(2, totalAmount);
                        sqlRowCount = insertData.executeUpdate();
                    }

                    lastClient = client;
                    totalAmount = order_amount;
                } else {
                    totalAmount = totalAmount + order_amount;
                }
            }

            updateData.setInt(1, totalAmount);
            updateData.setInt(2, lastClient);
            int sqlRowCount = updateData.executeUpdate();

            if (sqlRowCount == 0) {
                insertData.setInt(1, lastClient);
                insertData.setInt(2, totalAmount);
                sqlRowCount = insertData.executeUpdate();
            }


            rs.close();
            s.close();
            connection.commit();
            connection.close();
        } catch (SQLException ex) {
 
Logger.getLogger(CalculateTotal.class.getName()).log(Level.SEVERE, null,
ex);
        }
    }


it takes up to 12 seconds to calculate TESTTOTALS. Oracle PL/SQL procedure
with the same algorithm need 1,5 second.

How can i improve performance? Or Derby is so slow because of Java / JVM
issues???

Thanks.