You are viewing a plain text version of this content. The canonical link for it is here.
Posted to derby-user@db.apache.org by Steve Pannier <st...@qlogic.com> on 2007/07/12 17:50:51 UTC

Performance question

I have a question regarding Derby performance:

We will be developing a Java application to collect and store statistics
and counters from devices, so that we can then graph the performance of
these devices.  We need to store statistics on a per-second basis for up
to several months at a time, and every second we need to be able to
store stats for up to 400 devices.  Plus, each device can have up to 50
statistics/counters ... every second.  So you can see there will be
large amounts of data being stored.

We are hoping Derby will meet our needs as it is free, has a small
footprint, and is Java-based.  I have been testing Derby to see how it
performs storing large amounts of data in this environment, and I've
seen a slight degradation of performance over time.  I'm using Derby
10.2.2.0.  I track the amount of time it takes for each iteration, with
one "iteration" being an insert of 400 records.  My last test was 1
million iterations, and the average time for each iteration slowly
increased over the life of the test.  (I've attached a file showing
average times.)

I did read the "Tuning Derby" manual, and applied the following tuning
changes while running my tests:
- Set autocommit to false (I programmatically do a commit
  every 2000 inserts)
- Changed derby.storage.pageSize to 16384
- Changed derby.storage.pageCacheSize to 5000
- Use a PreparedStatement for my inserts

My table is called "stats", and is defined as follows:

    "create table STATS (" +
    "query_id int not null," +
    "create_time bigint not null," +
    "switch char(20), port int, " +
    "intProp1 int, intProp2 int," +
    "intProp3 int, intProp4 int," +
    "intProp5 int, intProp6 int," +
    "intProp7 int, intProp8 int," +
    "strProp1 char(20), strProp2 char(20)," +
    "strProp3 char(20), strProp4 char(20)," +
    "strProp5 char(20), strProp6 char(20)," +
    "strProp7 char(20), strProp8 char(20)," +
    "longProp1 bigint, longProp2 bigint," +
    "longProp3 bigint, longProp4 bigint," +
    "longProp5 bigint, longProp6 bigint," +
    "longProp7 bigint, longProp8 bigint," +
    "longProp9 bigint, longProp10 bigint," +
    "longProp11 bigint, longProp12 bigint," +
    "longProp13 bigint, longProp14 bigint," +
    "longProp15 bigint, longProp16 bigint," +
    "longProp17 bigint, longProp18 bigint," +
    "longProp19 bigint, longProp20 bigint," +
    "longProp21 bigint, longProp22 bigint," +
    "longProp23 bigint, longProp24 bigint," +
    "longProp25 bigint, longProp26 bigint," +
    "longProp27 bigint, longProp28 bigint," +
    "longProp29 bigint, longProp30 bigint," +
    "longProp31 bigint, longProp32 bigint," +
    "PRIMARY KEY ( create_time, switch, port ))"

Does anyone have any suggestions or ideas as to why I'm seeing the
performance degrade over time?  Is there a better way of defining my
"stats" table?  Any other tuning parameters I can apply?

Thanks in advance!
 
Steve


RE: Performance question

Posted by Steve Pannier <st...@qlogic.com>.
Bryan,

Thanks for your reply.

>> > footprint, and is Java-based.  I have been testing Derby to see how
it
>> > performs storing large amounts of data in this environment, and
I've
>> > seen a slight degradation of performance over time.  I'm using
Derby
>> 
>> Hi Steve, I think this is very interesting data.
>> 
>> Can you run your experiment out farther, say to 2M, 5M, or 10M rows?

My last test of 1 million "iterations" inserted 400M records to my
database (since each iteration is 400 inserts).  Did you mean to suggest
re-running the test with 2M, 5M, or 10M *iterations* as opposed to
*rows*?

>> 
>> Can you characterize the overall behavior of the machine during the
>> experiment? What is the CPU load like? What is the disk load like?
>> How does the Derby memory profile look? Do any of these things appear
>> to change as the experiment goes on?

I didn't watch cpu load over the entire test run, which lasted the
entire weekend.  But I did see (from 'top') that the load was at ~82%
for the first several minutes after starting the test.  One note: My
test program doesn't store data on a per-second basis (i.e. it doesn't
sleep between iterations).  It just inserts the data as fast as the
system will allow. (This is not a requirement, though - I just do it
this way to get lots of data into the database as fast as possible since
we want to run retrieval tests also.)  So higher cpu loads are expected
during the test runs.

As far as the Derby memory profile - how would I go about measuring
that?

And not being much of a system monitor guru, do you have any tips as to
how to monitor the cpu/disk/memprofile over the life of the experiment?

>> 
>> Lastly, can you try your experiment with Derby 10.3, which is in beta
>> testing now?  There were a lot of performance changes in 10.3,
although
>> nothing that was specifically targeted at exactly your scenario.
>> 
>> I think there could be a lot of possible explanations of the behavior
>> you're seeing.
>> 
>> On the other hand, starting with a completely fresh system and seeing
>> only a 8% slowdown as it gets dirty is, overall, pretty good, don't
you
>> think?

Well, the slow down I saw was minimal over the one weekend that I ran
the test.  And if that were the end of the maximum continuous amount of
time to store data, it would be acceptable.  But our requirement is to
store/save statistical data for up to 13 months (sorry, I didn't make
that clear in my original post), so if the performance continues to
degrade over that time period, then it would be problematic.

I was hoping that there would be a tuning parameter I could tweak -
maybe bump up pageSize or pageCacheSize or set a different parameter
that would make a difference.

>> 
>> thanks,
>> 
>> bryan

On other comment about the tests I'm running.  Their purpose is to test
the maximum limits of our data storage requirements, so they are not
meant to simulate normal usage of our application.  But we do need to
know what the limits are, and if we can even meet the requirements set
forth by our marketing group, so thus the tests.

I hope to be able to continue running these tests.  But we have other
projects that keep pulling me away from this one.  If time permits, I'll
be able to try some of your suggestions.

Regards.
 
Steve


Re: Performance question

Posted by Bryan Pendleton <bp...@amberpoint.com>.
> footprint, and is Java-based.  I have been testing Derby to see how it
> performs storing large amounts of data in this environment, and I've
> seen a slight degradation of performance over time.  I'm using Derby

Hi Steve, I think this is very interesting data.

Can you run your experiment out farther, say to 2M, 5M, or 10M rows?

Can you characterize the overall behavior of the machine during the
experiment? What is the CPU load like? What is the disk load like?
How does the Derby memory profile look? Do any of these things appear
to change as the experiment goes on?

Lastly, can you try your experiment with Derby 10.3, which is in beta
testing now? There were a lot of performance changes in 10.3, although
nothing that was specifically targeted at exactly your scenario.

I think there could be a lot of possible explanations of the behavior
you're seeing.

On the other hand, starting with a completely fresh system and seeing
only a 8% slowdown as it gets dirty is, overall, pretty good, don't you think?

thanks,

bryan


Re: Performance question

Posted by Knut Anders Hatlen <Kn...@Sun.COM>.
Steve Pannier <st...@qlogic.com> writes:

> I have a question regarding Derby performance:
>
> We will be developing a Java application to collect and store statistics
> and counters from devices, so that we can then graph the performance of
> these devices.  We need to store statistics on a per-second basis for up
> to several months at a time, and every second we need to be able to
> store stats for up to 400 devices.  Plus, each device can have up to 50
> statistics/counters ... every second.  So you can see there will be
> large amounts of data being stored.
>
> We are hoping Derby will meet our needs as it is free, has a small
> footprint, and is Java-based.  I have been testing Derby to see how it
> performs storing large amounts of data in this environment, and I've
> seen a slight degradation of performance over time.  I'm using Derby
> 10.2.2.0.  I track the amount of time it takes for each iteration, with
> one "iteration" being an insert of 400 records.  My last test was 1
> million iterations, and the average time for each iteration slowly
> increased over the life of the test.  (I've attached a file showing
> average times.)
>
> I did read the "Tuning Derby" manual, and applied the following tuning
> changes while running my tests:
> - Set autocommit to false (I programmatically do a commit
>   every 2000 inserts)
> - Changed derby.storage.pageSize to 16384
> - Changed derby.storage.pageCacheSize to 5000
> - Use a PreparedStatement for my inserts

Sounds like sensible settings. I'm not sure about the pageSize property,
though. Derby performs some auto-tuning of the page size for each table,
so you might want to re-run the test without that property (and also
increase the pageCacheSize to compensate for the smaller pages).

> My table is called "stats", and is defined as follows:
>
>     "create table STATS (" +
>     "query_id int not null," +
>     "create_time bigint not null," +
>     "switch char(20), port int, " +
>     "intProp1 int, intProp2 int," +
>     "intProp3 int, intProp4 int," +
>     "intProp5 int, intProp6 int," +
>     "intProp7 int, intProp8 int," +
>     "strProp1 char(20), strProp2 char(20)," +
>     "strProp3 char(20), strProp4 char(20)," +
>     "strProp5 char(20), strProp6 char(20)," +
>     "strProp7 char(20), strProp8 char(20)," +
>     "longProp1 bigint, longProp2 bigint," +
>     "longProp3 bigint, longProp4 bigint," +
>     "longProp5 bigint, longProp6 bigint," +
>     "longProp7 bigint, longProp8 bigint," +
>     "longProp9 bigint, longProp10 bigint," +
>     "longProp11 bigint, longProp12 bigint," +
>     "longProp13 bigint, longProp14 bigint," +
>     "longProp15 bigint, longProp16 bigint," +
>     "longProp17 bigint, longProp18 bigint," +
>     "longProp19 bigint, longProp20 bigint," +
>     "longProp21 bigint, longProp22 bigint," +
>     "longProp23 bigint, longProp24 bigint," +
>     "longProp25 bigint, longProp26 bigint," +
>     "longProp27 bigint, longProp28 bigint," +
>     "longProp29 bigint, longProp30 bigint," +
>     "longProp31 bigint, longProp32 bigint," +
>     "PRIMARY KEY ( create_time, switch, port ))"
>
> Does anyone have any suggestions or ideas as to why I'm seeing the
> performance degrade over time?

My first guess is that the disk is causing the slowdown. Disks normally
write faster to lower cylinder numbers than to higher cylinder
numbers. Since your test inserts (I think) several gigabytes, the
database might grow into slower areas of the disk.

Another thing that could influence the performance negatively, is that
the primary key index is getting bigger as more rows are inserted. Derby
organizes its indexes as B-trees, and as the number of rows in the table
increases, the B-tree grows higher and the cost of inserting new rows
grows logarithmically.

> Is there a better way of defining my
> "stats" table? Any other tuning parameters I can apply?

If you have the luxury of owning two disks, this update-intensive load
probably benefits from putting the database and the transaction log on
separate disks. You would have to create the database with a URL like
this:

  jdbc:derby:/disk1/derby/db;create=true;logDevice=/disk2/derby/log

You could also try to have multiple threads inserting rows in
parallel. That might give you better CPU utilization since there will be
threads ready to work while other threads are blocked waiting for disk
I/O.

-- 
Knut Anders