You are viewing a plain text version of this content. The canonical link for it is here.
Posted to derby-user@db.apache.org by Jonas Ahlinder <jo...@digitalroute.com> on 2008/10/16 16:16:58 UTC

performance issue

Hi.

We are developing a function to store session information, for use in a HA environment.
However we are not reaching the throughput we want.

Since its for persistance, we need to do autocommit on all operations.
The Table has two indexes and a blob of data.
The first index is a char(20) and the second is a timestamp.

I put transaction log and data on separate disks.

The first issue is that on a desktop machine ( running vista ) with two 7.2k rpm sata disks I get over 900 tps, while on a server ( running RHEL 5 ) and two 15k rpm sas disks, I get around 250 tps.
I realise this might not have anything to do with derby, but running iozone tells me that the server has _a lot_ faster IO.
Im of course very interested in performance tweaking ideas regarding a HP smart array p400i aswell.
We need to get it to work properly in the server enviroment.

The other issue, and perhaps more related to derby, is that the timestamp index wont stop growing.
After 24 hours it had grownfrom around 100mb to over 1gb, and the performance obviously dropped massively due to this.
I have tried running in-place compression on the table, which didn't get me any space back.
What tests do you need me to run to be able to say what might be wrong ?

Regards

/Jonas Ahlinder

RE: performance issue

Posted by Jonas Ahlinder <jo...@digitalroute.com>.
Ive made sure now theyre both running server hotspot, no significant difference.
However I ran -Xprof and windows spends about 27% on randomaccess.writebytes, while linux spends about 86% doing the same, so I guess we have a good candidate for the cause of the performance hit.
Ive tried different IO schedulers now but that doesn't seem to make much difference ( ill do some more proper benchmarking though ).
Could it be EXT3 just isnt as effective as NTFS in this particular situation ?

-----Original Message-----
From: Peter Ondruška [mailto:peter.ondruska@gmail.com]
Sent: Thursday, October 16, 2008 8:02 PM
To: Derby Discussion
Subject: Re: performance issue

Windows on desktop by default will cache writes to disk whereas your
Linux may not. Also you may be running Windows JVM client hotspot and
Linux could be server hotspot.

On 10/16/08, Jonas Ahlinder <jo...@digitalroute.com> wrote:
> Hi.
>
> On Windows i use 1.6.0_10 and on Linux 1.6.0_10rc2 ( the final version was
> released today i think, havent gotten to install that yet ).
> As for parameters i only use -Xmx512m since the database is pretty bit.
> And i just realised there are some more details i should have mentioned
> about the database.
>
> The table is 500.000 rows which are created _before_ the actual work begins,
> these are empty rows that we do updates on to delete/insert/update
> information in the rows.
> This was to be able to stop the database growth, something that obviously
> didnt work.
>
> ________________________________________
> From: Peter Ondruška [peter.ondruska@gmail.com]
> Sent: Thursday, October 16, 2008 4:23 PM
> To: Derby Discussion
> Subject: Re: performance issue
>
> Hello, speed depends also on JVM. What version and JVM parameters are you
> using?
>
> On 10/16/08, Jonas Ahlinder <jo...@digitalroute.com> wrote:
>> Hi.
>>
>> We are developing a function to store session information, for use in a HA
>> environment.
>> However we are not reaching the throughput we want.
>>
>> Since its for persistance, we need to do autocommit on all operations.
>> The Table has two indexes and a blob of data.
>> The first index is a char(20) and the second is a timestamp.
>>
>> I put transaction log and data on separate disks.
>>
>> The first issue is that on a desktop machine ( running vista ) with two
>> 7.2k
>> rpm sata disks I get over 900 tps, while on a server ( running RHEL 5 )
>> and
>> two 15k rpm sas disks, I get around 250 tps.
>> I realise this might not have anything to do with derby, but running
>> iozone
>> tells me that the server has _a lot_ faster IO.
>> Im of course very interested in performance tweaking ideas regarding a HP
>> smart array p400i aswell.
>> We need to get it to work properly in the server enviroment.
>>
>> The other issue, and perhaps more related to derby, is that the timestamp
>> index wont stop growing.
>> After 24 hours it had grownfrom around 100mb to over 1gb, and the
>> performance obviously dropped massively due to this.
>> I have tried running in-place compression on the table, which didn't get
>> me
>> any space back.
>> What tests do you need me to run to be able to say what might be wrong ?
>>
>> Regards
>>
>> /Jonas Ahlinder
>>
>
>


Re: performance issue

Posted by Peter Ondruška <pe...@gmail.com>.
Windows on desktop by default will cache writes to disk whereas your
Linux may not. Also you may be running Windows JVM client hotspot and
Linux could be server hotspot.

On 10/16/08, Jonas Ahlinder <jo...@digitalroute.com> wrote:
> Hi.
>
> On Windows i use 1.6.0_10 and on Linux 1.6.0_10rc2 ( the final version was
> released today i think, havent gotten to install that yet ).
> As for parameters i only use -Xmx512m since the database is pretty bit.
> And i just realised there are some more details i should have mentioned
> about the database.
>
> The table is 500.000 rows which are created _before_ the actual work begins,
> these are empty rows that we do updates on to delete/insert/update
> information in the rows.
> This was to be able to stop the database growth, something that obviously
> didnt work.
>
> ________________________________________
> From: Peter Ondruška [peter.ondruska@gmail.com]
> Sent: Thursday, October 16, 2008 4:23 PM
> To: Derby Discussion
> Subject: Re: performance issue
>
> Hello, speed depends also on JVM. What version and JVM parameters are you
> using?
>
> On 10/16/08, Jonas Ahlinder <jo...@digitalroute.com> wrote:
>> Hi.
>>
>> We are developing a function to store session information, for use in a HA
>> environment.
>> However we are not reaching the throughput we want.
>>
>> Since its for persistance, we need to do autocommit on all operations.
>> The Table has two indexes and a blob of data.
>> The first index is a char(20) and the second is a timestamp.
>>
>> I put transaction log and data on separate disks.
>>
>> The first issue is that on a desktop machine ( running vista ) with two
>> 7.2k
>> rpm sata disks I get over 900 tps, while on a server ( running RHEL 5 )
>> and
>> two 15k rpm sas disks, I get around 250 tps.
>> I realise this might not have anything to do with derby, but running
>> iozone
>> tells me that the server has _a lot_ faster IO.
>> Im of course very interested in performance tweaking ideas regarding a HP
>> smart array p400i aswell.
>> We need to get it to work properly in the server enviroment.
>>
>> The other issue, and perhaps more related to derby, is that the timestamp
>> index wont stop growing.
>> After 24 hours it had grownfrom around 100mb to over 1gb, and the
>> performance obviously dropped massively due to this.
>> I have tried running in-place compression on the table, which didn't get
>> me
>> any space back.
>> What tests do you need me to run to be able to say what might be wrong ?
>>
>> Regards
>>
>> /Jonas Ahlinder
>>
>
>

RE: performance issue

Posted by Jonas Ahlinder <jo...@digitalroute.com>.
Hi.

On Windows i use 1.6.0_10 and on Linux 1.6.0_10rc2 ( the final version was released today i think, havent gotten to install that yet ).
As for parameters i only use -Xmx512m since the database is pretty bit.
And i just realised there are some more details i should have mentioned about the database.

The table is 500.000 rows which are created _before_ the actual work begins, these are empty rows that we do updates on to delete/insert/update information in the rows.
This was to be able to stop the database growth, something that obviously didnt work.

________________________________________
From: Peter Ondruška [peter.ondruska@gmail.com]
Sent: Thursday, October 16, 2008 4:23 PM
To: Derby Discussion
Subject: Re: performance issue

Hello, speed depends also on JVM. What version and JVM parameters are you using?

On 10/16/08, Jonas Ahlinder <jo...@digitalroute.com> wrote:
> Hi.
>
> We are developing a function to store session information, for use in a HA
> environment.
> However we are not reaching the throughput we want.
>
> Since its for persistance, we need to do autocommit on all operations.
> The Table has two indexes and a blob of data.
> The first index is a char(20) and the second is a timestamp.
>
> I put transaction log and data on separate disks.
>
> The first issue is that on a desktop machine ( running vista ) with two 7.2k
> rpm sata disks I get over 900 tps, while on a server ( running RHEL 5 ) and
> two 15k rpm sas disks, I get around 250 tps.
> I realise this might not have anything to do with derby, but running iozone
> tells me that the server has _a lot_ faster IO.
> Im of course very interested in performance tweaking ideas regarding a HP
> smart array p400i aswell.
> We need to get it to work properly in the server enviroment.
>
> The other issue, and perhaps more related to derby, is that the timestamp
> index wont stop growing.
> After 24 hours it had grownfrom around 100mb to over 1gb, and the
> performance obviously dropped massively due to this.
> I have tried running in-place compression on the table, which didn't get me
> any space back.
> What tests do you need me to run to be able to say what might be wrong ?
>
> Regards
>
> /Jonas Ahlinder
>


Re: performance issue

Posted by Peter Ondruška <pe...@gmail.com>.
Hello, speed depends also on JVM. What version and JVM parameters are you using?

On 10/16/08, Jonas Ahlinder <jo...@digitalroute.com> wrote:
> Hi.
>
> We are developing a function to store session information, for use in a HA
> environment.
> However we are not reaching the throughput we want.
>
> Since its for persistance, we need to do autocommit on all operations.
> The Table has two indexes and a blob of data.
> The first index is a char(20) and the second is a timestamp.
>
> I put transaction log and data on separate disks.
>
> The first issue is that on a desktop machine ( running vista ) with two 7.2k
> rpm sata disks I get over 900 tps, while on a server ( running RHEL 5 ) and
> two 15k rpm sas disks, I get around 250 tps.
> I realise this might not have anything to do with derby, but running iozone
> tells me that the server has _a lot_ faster IO.
> Im of course very interested in performance tweaking ideas regarding a HP
> smart array p400i aswell.
> We need to get it to work properly in the server enviroment.
>
> The other issue, and perhaps more related to derby, is that the timestamp
> index wont stop growing.
> After 24 hours it had grownfrom around 100mb to over 1gb, and the
> performance obviously dropped massively due to this.
> I have tried running in-place compression on the table, which didn't get me
> any space back.
> What tests do you need me to run to be able to say what might be wrong ?
>
> Regards
>
> /Jonas Ahlinder
>

RE: performance issue

Posted by Jonas Ahlinder <jo...@digitalroute.com>.
Hello.

I do believe that both desktop and server has disk cache turned on, but i will definitely confirm this in the morning.
Also im aware that this impacts durability.
________________________________________
From: Dag.Wanvik@Sun.COM [Dag.Wanvik@Sun.COM]
Sent: Thursday, October 16, 2008 5:38 PM
To: Derby Discussion
Subject: Re: performance issue

Jonas Ahlinder <jo...@digitalroute.com> writes:

> The first issue is that on a desktop machine ( running vista ) with
> two 7.2k rpm sata disks I get over 900 tps, while on a server (
> running RHEL 5 ) and two 15k rpm sas disks, I get around 250 tps.

Could it be that the desk top machine has disk caching enabled,
whereas the server disks do not? For durability you would want it
switched off.

Dag


Re: performance issue

Posted by Øystein Grøvlen <Oy...@Sun.COM>.

Jonas Ahlinder wrote:
> The benchmark client is single-threaded atm.
> To run it multi-threaded some sort of locking will most likely have ot me implemented ( which will be done as soon as we can confirmt he performance is ok ).
> I have tried running more threads, and it does seem to give better performance, but the current state of the client doesnt really allow for reliable testresults.
> With autocommit on, and with the disk running 100% usage ( and quite a bit of wait queue at least on Linux ) do you think multiple threads will really help ?
> And CPU ( 4 cores ) seem to run about 50% wait and 50% idle, which seems rather wierd to me, but i guess its mostly waiting for IO.

Which disk is running at 100% usage?  The data disk or the log disk?
If it is the log disk that is saturated, having multiple threads may 
help throughput because then it will be possible to commit multiple 
transactions per disk write.

--
Øystein


> ________________________________________
> From: Bryan Pendleton [bpendleton@amberpoint.com]
> Sent: Thursday, October 16, 2008 5:43 PM
> To: Derby Discussion
> Subject: Re: performance issue
> 
>>> The first issue is that on a desktop machine ( running vista ) with
>>> two 7.2k rpm sata disks I get over 900 tps, while on a server (
>>> running RHEL 5 ) and two 15k rpm sas disks, I get around 250 tps.
> 
> Is your benchmark client multi-threaded? Or single-threaded?
> 
> During the run(s) are your machine(s) CPU-bound? Or disk-bound?
> 
> thanks,
> 
> bryan
> 

RE: performance issue

Posted by Jonas Ahlinder <jo...@digitalroute.com>.
Hello all involved.

The growing index problem is "solved", running compress_table sequentially ( not inplace_compress_table as it didn't do much of anything ) does shrink the filesize on disk.
However it takes way too long to do.
We will have to solve this in some entirely different manner.

The performance issue is still open, although I have a theory that I would be very happy if you could smash holes in or agree with.
My theory is that java on windows isnt 100% durable when making diskcommits.
That there is a cache or buffer somwhere ( os/controller/disk ) that doesn't get flushed right away when doing commit in derby ( sync is what derby does afaik ), and that's the reason it's a lot faster, and linux/solaris gives a more realistic image of transaction performance.

/Jonas

-----Original Message-----
From: Craig.Russell@Sun.COM [mailto:Craig.Russell@Sun.COM]
Sent: Sunday, October 19, 2008 11:10 PM
To: Derby Discussion
Subject: Re: performance issue

For something that large, it's probably best to file a JIRA and upload
the code as a patch. You can decide whether the uploaded code can be
licensed under the Apache license (e.g. if you don't mind it becoming
a test case) or not.

Craig

On Oct 19, 2008, at 3:17 AM, Jonas Ahlinder wrote:

> We would be willing to share the code.
> What would be the best way to do this ?
> Theres 869 lines of code, do i just paste it in a mail to the list ?
> ________________________________________
> From: Bryan Pendleton [bpendleton@amberpoint.com]
> Sent: Thursday, October 16, 2008 8:49 PM
> To: Derby Discussion
> Subject: Re: performance issue
>
>> I have tried running more threads, and it does seem to give
>> better performance, but the current state of the client
>> doesnt really allow for reliable testresults.
>> With autocommit on, and with the disk running 100% usage
>> ( and quite a bit of wait queue at least on Linux ) do
>> you think multiple threads will really help ?
>
> Yes.
>
> But, before changing your benchmark, it seems better to try
> to understand the results you are getting.
>
> Are you willing/able to share your benchmark with the community?
>
> thanks,
>
> bryan
>
>

Craig L Russell
Architect, Sun Java Enterprise System http://db.apache.org/jdo
408 276-5638 mailto:Craig.Russell@sun.com
P.S. A good JDO? O, Gasp!


Re: performance issue

Posted by Craig L Russell <Cr...@Sun.COM>.
For something that large, it's probably best to file a JIRA and upload  
the code as a patch. You can decide whether the uploaded code can be  
licensed under the Apache license (e.g. if you don't mind it becoming  
a test case) or not.

Craig

On Oct 19, 2008, at 3:17 AM, Jonas Ahlinder wrote:

> We would be willing to share the code.
> What would be the best way to do this ?
> Theres 869 lines of code, do i just paste it in a mail to the list ?
> ________________________________________
> From: Bryan Pendleton [bpendleton@amberpoint.com]
> Sent: Thursday, October 16, 2008 8:49 PM
> To: Derby Discussion
> Subject: Re: performance issue
>
>> I have tried running more threads, and it does seem to give
>> better performance, but the current state of the client
>> doesnt really allow for reliable testresults.
>> With autocommit on, and with the disk running 100% usage
>> ( and quite a bit of wait queue at least on Linux ) do
>> you think multiple threads will really help ?
>
> Yes.
>
> But, before changing your benchmark, it seems better to try
> to understand the results you are getting.
>
> Are you willing/able to share your benchmark with the community?
>
> thanks,
>
> bryan
>
>

Craig L Russell
Architect, Sun Java Enterprise System http://db.apache.org/jdo
408 276-5638 mailto:Craig.Russell@sun.com
P.S. A good JDO? O, Gasp!


Re: performance issue

Posted by Kathey Marsden <km...@sbcglobal.net>.
Jonas Ahlinder wrote:
> We would be willing to share the code.
> What would be the best way to do this ?
>   

Probably the best thing to do is to open up a Jira issue and attach the 
code.
See http://db.apache.org/derby/DerbyBugGuidelines.html

Thanks

Kathey



RE: performance issue

Posted by Jonas Ahlinder <jo...@digitalroute.com>.
We would be willing to share the code.
What would be the best way to do this ?
Theres 869 lines of code, do i just paste it in a mail to the list ?
________________________________________
From: Bryan Pendleton [bpendleton@amberpoint.com]
Sent: Thursday, October 16, 2008 8:49 PM
To: Derby Discussion
Subject: Re: performance issue

 > I have tried running more threads, and it does seem to give
 > better performance, but the current state of the client
 > doesnt really allow for reliable testresults.
 > With autocommit on, and with the disk running 100% usage
 > ( and quite a bit of wait queue at least on Linux ) do
 > you think multiple threads will really help ?

Yes.

But, before changing your benchmark, it seems better to try
to understand the results you are getting.

Are you willing/able to share your benchmark with the community?

thanks,

bryan



Re: performance issue

Posted by Bryan Pendleton <bp...@amberpoint.com>.
 > I have tried running more threads, and it does seem to give
 > better performance, but the current state of the client
 > doesnt really allow for reliable testresults.
 > With autocommit on, and with the disk running 100% usage
 > ( and quite a bit of wait queue at least on Linux ) do
 > you think multiple threads will really help ?

Yes.

But, before changing your benchmark, it seems better to try
to understand the results you are getting.

Are you willing/able to share your benchmark with the community?

thanks,

bryan


RE: performance issue

Posted by Jonas Ahlinder <jo...@digitalroute.com>.
The benchmark client is single-threaded atm.
To run it multi-threaded some sort of locking will most likely have ot me implemented ( which will be done as soon as we can confirmt he performance is ok ).
I have tried running more threads, and it does seem to give better performance, but the current state of the client doesnt really allow for reliable testresults.
With autocommit on, and with the disk running 100% usage ( and quite a bit of wait queue at least on Linux ) do you think multiple threads will really help ?
And CPU ( 4 cores ) seem to run about 50% wait and 50% idle, which seems rather wierd to me, but i guess its mostly waiting for IO.
________________________________________
From: Bryan Pendleton [bpendleton@amberpoint.com]
Sent: Thursday, October 16, 2008 5:43 PM
To: Derby Discussion
Subject: Re: performance issue

>> The first issue is that on a desktop machine ( running vista ) with
>> two 7.2k rpm sata disks I get over 900 tps, while on a server (
>> running RHEL 5 ) and two 15k rpm sas disks, I get around 250 tps.

Is your benchmark client multi-threaded? Or single-threaded?

During the run(s) are your machine(s) CPU-bound? Or disk-bound?

thanks,

bryan


Re: performance issue

Posted by Bryan Pendleton <bp...@amberpoint.com>.
>> The first issue is that on a desktop machine ( running vista ) with
>> two 7.2k rpm sata disks I get over 900 tps, while on a server (
>> running RHEL 5 ) and two 15k rpm sas disks, I get around 250 tps.

Is your benchmark client multi-threaded? Or single-threaded?

During the run(s) are your machine(s) CPU-bound? Or disk-bound?

thanks,

bryan

Re: performance issue

Posted by "Dag H. Wanvik" <Da...@Sun.COM>.
Jonas Ahlinder <jo...@digitalroute.com> writes:

> The first issue is that on a desktop machine ( running vista ) with
> two 7.2k rpm sata disks I get over 900 tps, while on a server (
> running RHEL 5 ) and two 15k rpm sas disks, I get around 250 tps.

Could it be that the desk top machine has disk caching enabled,
whereas the server disks do not? For durability you would want it
switched off.

Dag