You are viewing a plain text version of this content. The canonical link for it is here.
Posted to derby-dev@db.apache.org by Suresh Thalamati <ts...@Source-Zone.org> on 2004/12/04 01:36:24 UTC

alligned Vs Non Alligned log writes ..

Hi all,

I  am  trying to find out whether there would be any performance
improvement  for insert/delete/updates operations by modifying the
logging system to do writes aligned on sector boundaries (512 bytes).
This could possibly done by grouping log records to 4k/8k pages. The way
current system does log writes to the log files is :

1.     Write file header of 24 bytes
2.     Write log buffers of size 32k when full or whenever a log flush 
is required  because  of commit or data page flush,
         in which case the amount data  written  could be less than 32k.
3.     Write a Zero at the end of log file on a log switch.

Because of the initial  24 bytes written to the file   even if the log
buffers are written to disk only when they are full ,  writes would
never be aligned on sector boundaries. Wrote a simple java program that
simulates the way the derby logging system does writes to measure
performance difference between aligned versus non-aligned writes..  What
I  found is On Windows 2000  there is a  substantial gain only when
buffer size is 64K (30.8984375  milli sec for a aligned writes VS  
48.8984375 for a non-aligned write. Not sure why there is no substantial
improvement in case of other buffer sizes.

Plat form :
WINDOWS 2000 ,  MEMORY 756 MB  ,  CPU  1200 MHz
DISK : 4200 rpm.  (Write Cache Disabled)

$ java allignWritePerfTest xyz.dat 128
(Buffer-Size(K) | Alligned Write(msec) |NonAlligned Write(msec))

512 |  243.7890625  |  259.828125
256 |  110.234375  |  131.28125
128 |  52.890625  |  66.03125
64 |  30.8984375  |  48.8984375
32 |  19.3984375  |  21.359375
16 |  18.078125  |  26.3671875
12 |  17.0546875  |  19.8671875
10 |  17.2109375  |  19.09375
8 |  15.1015625  |  27.2265625
6 |  16.0390625  |  16.6640625
4 |  16.3515625  |  16.6640625
2 |  14.7890625  |  23.9453125
1 |  18.7734375  |  15.8828125


It would be great if  some one can run this test on other platforms and
post the results to the list.

Any Comments / Suggestions ?

Thanks
-suresh.




Re: alligned Vs Non Alligned log writes ..

Posted by Andrew McIntyre <fu...@nonintuitive.com>.
On Dec 3, 2004, at 4:36 PM, Suresh Thalamati wrote:

> It would be great if  some one can run this test on other platforms and
> post the results to the list.

AIX 5.2, IBM JDK 1.4.2, 1 GHz Power4, only marginal differences in 
times. full results:
(Buffer-Size(K) | Alligned Write(msec) |NonAlligned Write(msec))

512 |  15.3046875  |  15.9609375
256 |  10.7109375  |  11.578125
128 |  8.328125  |  8.640625
64 |  7.234375  |  7.4296875
32 |  6.59375  |  6.78125
16 |  6.328125  |  6.4921875
12 |  6.3515625  |  6.40625
10 |  6.3125  |  6.359375
8 |  6.1875  |  6.359375
6 |  6.2421875  |  6.3125
4 |  6.1171875  |  6.265625
2 |  6.078125  |  6.2109375
1 |  6.0703125  |  6.171875

Solaris 9, Sun JDK 1.4.2_03, Sparc II class CPU of unknown speed, 
significant increase in speed for certain values, esp. 4, 8, 16. full 
results:

512 |  578.3046875  |  586.796875
256 |  289.1171875  |  297.1484375
128 |  144.1875  |  153.21875
64 |  72.09375  |  81.3359375
32 |  36.0859375  |  44.6171875
16 |  18.046875  |  26.7109375
12 |  17.7265625  |  22.09375
10 |  18.1484375  |  19.8046875
8 |  9.1953125  |  17.59375
6 |  13.0546875  |  15.25
4 |  8.7265625  |  13.171875
2 |  8.5625  |  10.7578125
1 |  8.4765625  |  9.6484375

And just for fun, Mac OS X 10.3.5, JDK 1.4.2, rwd mode (not rws, of 
course), 466 Mhz G4, aligned writes faster in most cases, sometimes by 
a large margin, except at 512 block size, where there is a significant 
performance penalty. full results:

512 |  70.5859375  |  42.453125
256 |  9.8203125  |  22.1328125
128 |  3.6953125  |  7.0390625
64 |  1.28125  |  4.28125
32 |  0.3515625  |  1.8828125
16 |  0.1171875  |  0.109375
12 |  0.09375  |  0.1015625
10 |  0.0703125  |  0.078125
8 |  0.0546875  |  0.0625
6 |  0.2265625  |  0.046875
4 |  0.03125  |  0.0390625
2 |  0.03125  |  0.03125
1 |  0.03125  |  0.0234375

andrew


Re: alligned Vs Non Alligned log writes ..

Posted by Daniel John Debrunner <dj...@debrunners.com>.
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Mike Matrigali wrote:


>>On Suse Linux, with an old Dell machine (2cpu 733Mhz Pentium) I see no
>>real difference between aligned and non-aligned writes.
>>I used Sun's 1.4.2 & 1.5.0 and IBM's 1.4.2 vms

Suse Linux 9.0
Controller - Dell AIC-7899P/m
Disk - Fujitsu MAG3091MP - SCSI - 9Gb - (just installed so no
space/fragmentation issues I hope)
FileSystem - reiserfs

Dan.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.5 (MingW32)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFBuJaUIv0S4qsbfuQRAv8AAJ9COuwyQsVEuRadsUuB4yELW9jXAACdHubS
HXZD3MAIhONb5e1UW+gW+y4=
=QvG5
-----END PGP SIGNATURE-----


Re: alligned Vs Non Alligned log writes ..

Posted by RPost <rp...@pacbell.net>.
Yes, the concept exists on Linux.

You need to run the hdparm utility.

----- Original Message ----- 
From: "Mike Matrigali" <mi...@sbcglobal.net>
To: "Derby Development" <de...@db.apache.org>
Sent: Thursday, December 09, 2004 10:04 AM
Subject: Re: alligned Vs Non Alligned log writes ..


> If possible could the people submitting the results let us know
> the type of disk on the machine.  At least scsi vs. ide.  Speed of
> disk would be nice too.  And at least on windows machines if
> "write cache" is enabled for the disk or not - I don't know if this
> concept exists on linux..
>
> Daniel John Debrunner wrote:
>
> >
> >
> >>>>It would be great if  some one can run this test on other platforms
and
> >>>>post the results to the list.
> >
> >
> > On Suse Linux, with an old Dell machine (2cpu 733Mhz Pentium) I see no
> > real difference between aligned and non-aligned writes.
> > I used Sun's 1.4.2 & 1.5.0 and IBM's 1.4.2 vms
> >
> > Dan.


Re: alligned Vs Non Alligned log writes ..

Posted by Mike Matrigali <mi...@sbcglobal.net>.
If possible could the people submitting the results let us know
the type of disk on the machine.  At least scsi vs. ide.  Speed of
disk would be nice too.  And at least on windows machines if
"write cache" is enabled for the disk or not - I don't know if this
concept exists on linux..

Daniel John Debrunner wrote:

> 
> 
>>>>It would be great if  some one can run this test on other platforms and
>>>>post the results to the list.
> 
> 
> On Suse Linux, with an old Dell machine (2cpu 733Mhz Pentium) I see no
> real difference between aligned and non-aligned writes.
> I used Sun's 1.4.2 & 1.5.0 and IBM's 1.4.2 vms
> 
> Dan.

Re: alligned Vs Non Alligned log writes ..

Posted by Suresh Thalamati <ts...@Source-Zone.org>.
Daniel John Debrunner wrote:

>
> >>It would be great if  some one can run this test on other platforms and
> >>post the results to the list.
>
>
> On Suse Linux, with an old Dell machine (2cpu 733Mhz Pentium) I see no
> real difference between aligned and non-aligned writes.
> I used Sun's 1.4.2 & 1.5.0 and IBM's 1.4.2 vms
>
> Dan.


I  also ran on linux,  there does not seem to be any major difference
between aligned vs non-aligned. 

FYI:
OS: suse linux ( 2.4.21-202-smp4G)
cpu :  2 731 MHZ pentium III processors
memory: 256 MB
disk: SCSI  10600 rpm (Tagged Queuing enabled)
file system: reiserfs (rw)
JVM:Classic VM (build 1.4.2, J2RE 1.4.2 IBM build cxia32142-20040926
(JIT enable
d: jitc))
write cache enabled: NO

(Buffer-Size(K) | Alligned Write(msec) |NonAlligned Write(msec)) | gain %

512 |  47.3818359375  |  49.673828125  |  4.837280240730436%
256 |  35.9814453125  |  37.6083984375  |  4.521644727914236%
128 |  28.541015625  |  35.5166015625  |  24.44056661876411%
64 |  27.0712890625  |  26.033203125  |  -3.8346380000721467%
32 |  25.2734375  |  25.2626953125  |  -0.042503863987633395%
16 |  24.853515625  |  24.9521484375  |  0.3968565815324183%
12 |  24.6181640625  |  24.7978515625  |  0.7298980522829197%
10 |  24.9287109375  |  25.6162109375  |  2.757864222196106%
8 |  24.677734375  |  24.6181640625  |  -0.2413929560743965%
6 |  24.591796875  |  24.7333984375  |  0.5758081169089032%
4 |  24.51171875  |  24.654296875  |  0.5816733067729084%
2 |  24.4208984375  |  24.548828125  |  0.5238533210701004%
1 |  24.466796875  |  24.490234375  |  0.09579308693222639%


Thanks
-suresht


Re: alligned Vs Non Alligned log writes ..

Posted by Daniel John Debrunner <dj...@debrunners.com>.
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1


>> It would be great if  some one can run this test on other platforms and
>> post the results to the list.

On Suse Linux, with an old Dell machine (2cpu 733Mhz Pentium) I see no
real difference between aligned and non-aligned writes.
I used Sun's 1.4.2 & 1.5.0 and IBM's 1.4.2 vms

Dan.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.5 (MingW32)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFBt6AcIv0S4qsbfuQRAqV0AKDYfD6lHzjdeCTiO0jvX7nuBbhKmgCfWSrG
mdHm50mJO9glMEm+q29yS18=
=0Zji
-----END PGP SIGNATURE-----


Re: alligned Vs Non Alligned log writes ..

Posted by Sunitha Kambhampati <sk...@Yngvi.Org>.
>It would be great if  some one can run this test on other platforms and
> post the results to the list.

Heres what I got by running the test on my T40 laptop (Intel Pentium (M)1.6Ghz,4200 rpm IDE disk (write cache disabled), 1Gb RAM, Windows 2k, ibm142 )

$java allignWritePerfTest2 temp.dat 1024

ordered by speed gain: max gain seen in case of 64k.   
buff(K)  gain %  Alligned write  NonAlligned 
		 ms		 ms
64 	73.88% 	18.75683594 	32.61523438
128 	44.38% 	35.23632813 	50.87402344
256 	22.58% 	70.88378906 	86.89257813
512 	10.35% 	142.5976563 	157.3544922
4 	7.44% 	16.02929688 	17.22265625
32 	5.14% 	16.72363281 	17.58398438
8 	3.52% 	15.82421875 	16.38085938
16 	0.25% 	15.86230469 	15.90136719
2 	0.06% 	15.68652344 	15.69628906
6 	-0.41% 	16.47851563 	16.41015625
12 	-1.72% 	16.41992188 	16.13671875
1 	-6.42% 	15.82324219 	14.80664063
10 	-6.48% 	17.95605469 	16.79199219

-- we get slowdown for buffers < 4k
-- best gain on 64k ( 73 %)

Sunitha. 


Re: alligned Vs Non Alligned log writes ..

Posted by Jan Hlavatý <hl...@code.cz>.
Suresh Thalamati wrote:
> Thanks Jan. The numbers you got seems to show real performance gain with
> aligned writes.
> Test run by Dan & Sunitha & myself does not seem to show similar gain. I
> wonder if  this
> is anything to do with your disk being RAID ?
> 
> --what JVM are you running the test on  ?
> --is Write cache enabled/disabled  for the disk ?
> --what is the cache size on the disk ?

I had write cache on on the IDE disks. I turned it off
using hdparm -W0.
I got following numbers now:
java -cp . allignWritePerfTest2 kuk.bin 128

512 |  99.984375  |  103.078125  |  3.09423347398031%
256 |  53.5546875  |  53.1171875  |  -0.8169219547775413%
128 |  20.2421875  |  31.328125  |  54.76649942107295%
64 |  10.0  |  18.359375  |  83.59375%
32 |  9.265625  |  9.4296875  |  1.7706576728499215%
16 |  8.859375  |  9.0  |  1.5873015873015817%
12 |  8.8125  |  8.8984375  |  0.9751773049645323%
10 |  8.75  |  8.828125  |  0.8928571428571388%
8 |  8.578125  |  8.7265625  |  1.7304189435336923%
6 |  8.5546875  |  8.7109375  |  1.8264840182648356%
4 |  8.4375  |  8.8359375  |  4.7222222222222285%
2 |  8.390625  |  8.8359375  |  5.307262569832403%
1 |  8.4140625  |  9.1015625  |  8.170844939647168%

It is a lot slower than before generally (about 3 times).

There still seems to be performance gain in the case of 64k and 128k though. On other sizes,
there seems to be no significant gain now.

JVM: Java HotSpot(TM) Client VM (build 1.5.0-b64, mixed mode, sharing).
I have 2 IDE drives like this (hdparm -I):
/dev/hde:

ATA device, with non-removable media
        Model Number:       WDC WD800JB-00CRA1
        Serial Number:      WD-WMA8E2112592
        Firmware Revision:  17.07W17
Standards:
        Supported: 5 4 3 2
        Likely used: 6
Configuration:
        Logical         max     current
        cylinders       16383   16383
        heads           16      16
        sectors/track   63      63
        --
        CHS current addressable sectors:   16514064
        LBA    user addressable sectors:  156301488
        device size with M = 1024*1024:       76319 MBytes
        device size with M = 1000*1000:       80026 MBytes (80 GB)
Capabilities:
        LBA, IORDY(can be disabled)
        bytes avail on r/w long: 40     Queue depth: 1
        Standby timer values: spec'd by Standard, with device specific minimum
        R/W multiple sector transfer: Max = 16  Current = 16
        Recommended acoustic management value: 128, current value: 254
        DMA: mdma0 mdma1 mdma2 udma0 udma1 udma2 udma3 udma4 *udma5
             Cycle time: min=120ns recommended=120ns
        PIO: pio0 pio1 pio2 pio3 pio4
             Cycle time: no flow control=120ns  IORDY flow control=120ns
Commands/features:
        Enabled Supported:
           *    READ BUFFER cmd
           *    WRITE BUFFER cmd
           *    Host Protected Area feature set
           *    Look-ahead
                Write cache
           *    Power Management feature set
                Security Mode feature set
           *    SMART feature set
           *    Device Configuration Overlay feature set
                Automatic Acoustic Management feature set
                SET MAX security extension
           *    DOWNLOAD MICROCODE cmd
           *    SMART self-test
           *    SMART error logging
Security:
                supported
        not     enabled
        not     locked
        not     frozen
        not     expired: security count
        not     supported: enhanced erase
HW reset results:
        CBLID- above Vih
        Device num = 0 determined by CSEL
Checksum: correct


There is a ext3 filesystem on top of LVM2 logical volume on top of software raid 1 partition.

Jan

Re: alligned Vs Non Alligned log writes ..

Posted by Suresh Thalamati <ts...@Source-Zone.org>.

Jan Hlavatý wrote:

>Suresh Thalamati wrote:
>  
>
>>It would be great if  some one can run this test on other platforms and
>>post the results to the list.
>>    
>>
>
>I have added display of percentual difference to make it easier to look at.
>Results vary a lot, so dont take it for absolute.
>
>Heres what I got on my Linux (FC3, Celeron 800, 512MB, 2 IDE DISK RAID1 ext3 fs)
>
>  
>

Thanks Jan. The numbers you got seems to show real performance gain with
aligned writes.
Test run by Dan & Sunitha & myself does not seem to show similar gain. I
wonder if  this
is anything to do with your disk being RAID ?

--what JVM are you running the test on  ?
--is Write cache enabled/disabled  for the disk ?
--what is the cache size on the disk ?


Thanks
-suresht


Re: alligned Vs Non Alligned log writes ..

Posted by Jan Hlavatý <hl...@code.cz>.
Suresh Thalamati wrote:
> 
> It would be great if  some one can run this test on other platforms and
> post the results to the list.

I have added display of percentual difference to make it easier to look at.
Results vary a lot, so dont take it for absolute.

Heres what I got on my Linux (FC3, Celeron 800, 512MB, 2 IDE DISK RAID1 ext3 fs)

[root@lin ~]# java -cp . allignWritePerfTest2 kuk.bin 128
(Buffer-Size(K) | Alligned Write(msec) |NonAlligned Write(msec)) | gain %

512 |  21.0390625  |  27.2265625 |   29.409580393613084%
512 |  20.4921875  |  26.609375  |   29.85131528783836%
512 |  20.40625    |  28.46875   |   39.50995405819296%
512                             avg ~ 33%

256 |  10.1328125  |  18.3359375 |   80.95605242868157%
256 |  10.234375   |  20.375     |   99.08396946564886%
256 |  10.0546875  |  19.3046875 |   91.996891996892%
256                             avg ~ 50%

128 |   5.3515625  |  12.96875   |  142.33576642335765%
128 |   5.1640625  |  13.6640625 |  164.59909228441757%
128 |   5.28125    |  13.125     |  148.5207100591716%
128                             avg ~151%

 64 |   2.578125   |  10.515625  |  307.8787878787879%
 64 |   2.53125    |  10.625     |  319.75308641975306%
 64 |   2.5703125  |  10.375     |  303.64741641337383%
64                              avg ~310%

 32 |   1.453125   |   2.1796875 |   50.0%
 32 |   1.3828125  |   1.8828125 |   36.15819209039549%
 32 |   1.4609375  |   1.8984375 |   29.946524064171115%
32                              avg ~ 38%

 16 |   1.1796875  |   3.015625  |  155.6291390728477%
 16 |   1.1328125  |   2.8515625 |  151.72413793103448%
 16 |   1.109375   |   2.8671875 |  158.45070422535213%
16                             avg ~155%

 12 |   1.1328125  |   2.8125    |  148.27586206896552%
 12 |   0.921875   |   2.421875  |  162.71186440677968%
 12 |   0.8046875  |   2.40625   |  199.02912621359224%
12                             avg ~169%

 10 |   2.171875   |   2.4453125 |   12.589928057553962%
 10 |   2.1796875  |   2.1484375 |   -1.4336917562723954%
 10 |   2.0625     |   2.4296875 |   17.803030303030297%
10                             avg ~  9%

  8 |   0.5078125  |   2.125     |  318.46153846153845%
  8 |   0.859375   |   1.8515625 |  115.45454545454547%
  8 |   0.4921875  |   2.3828125 |  384.12698412698415%
8                              avg ~272%

  6 |   1.6796875  |   2.6640625 |   58.6046511627907%
  6 |   2.8203125  |   2.859375  |    1.3850415512465304%
  6 |   1.28125    |   2.0703125 |   61.585365853658544%
6                              avg ~ 40%

  4 |   0.984375   |   1.6171875 |   64.28571428571428%
  4 |   1.015625   |   2.0859375 |  105.38461538461539%
  4 |   0.84375    |   1.515625  |   79.62962962962962%
4                              avg ~ 82%

  2 |   0.671875   |   0.7109375 |    5.813953488372093%
  2 |   1.0078125  |   0.6328125 |  -37.2093023255814%
  2 |   0.6640625  |   0.7890625 |   18.82352941176471%
2                              avg ~ -4%

  1 |   0.4921875  |   0.3046875 |  -38.095238095238095%
  1 |   0.25       |   0.296875  |   18.75%
  1 |   0.25       |   0.3046875 |   21.875%
1                               avg ~ -1%


Ordered by speed gain:

64   avg ~310%
8    avg ~272%
12   avg ~169%
16   avg ~155%
128  avg ~151%
4    avg ~ 82%
256  avg ~ 50%
6    avg ~ 40%
32   avg ~ 38%
512  avg ~ 33%
10   avg ~  9%
1    avg ~ -1%
2    avg ~ -4%


Here it looks like:
1) its a lot faster for some reason (you got slow disk ;))
2) we can get actual slowdown for buffer sizes not multiple of 4k
3) we get slowdown for buffers < 4k
3) we get speedup for sizes > 4k which are multiples of 4k
   (Linux uses 4k for everything)
4) best gain on 64k, 8k (about 4 times faster)


Jan

Re: alligned Vs Non Alligned log writes ..

Posted by Jeremy Boynes <jb...@apache.org>.
Mike Matrigali wrote:
> 
> Has anyone used the advanced NIO interfaces in java?  We thought
> that those should give us increased I/O throughput for things like
> our log file.  When we looked at it about a year ago, our tests
> didn't show the improvement we were looking for.  Anyone have
> opinions on what is the fastest way to write blocks of data ranging
> from 32k and down, and requiring sync to disk for each write?

Have you looked at what HOWL is doing?

http://howl.objectweb.org/

--
Jeremy

Re: alligned Vs Non Alligned log writes ..

Posted by Suresh Thalamati <ts...@Source-Zone.org>.
Mike Matrigali wrote:

>If java ever provides a way to directly queue I/O straight from
>our buffer to disk with no intermediate data copy then it may
>be important to use a page based log scheme.  For now it looks
>like the stream interfaces being used and the JVM's optimization
>of those interfaces are working ok.  A nice property of the current
>log which we don't take advantage of is that since it is a stream,
>we could dynamically change the size of the block of data we
>write, increasing it as log activity increases.  This sort of
>happens with group commit where we will write less than a block
>if a commit happens - but we never consider growing the buffers
>bigger than the boot time size.
>
>  
>
>From the number it does  looks like performance is better is only in
some cases(64k) . Considering the
overhead involved in making writes aligned ,  like performing extra
writes to handle unfilled page
cases in short transactions.  By implementing alligned log writes , 
logn transaction perfomance
might improve but short transasction peformance is likely to degrade..

I agree  that it  may not be good idea to modify the logging system to
perform  alligned writes at least for now.


Thanks
-suresht


Re: alligned Vs Non Alligned log writes ..

Posted by Mike Matrigali <mi...@sbcglobal.net>.
originally I assumed that alligned writing would increase the
log I/O throughput over the current use of buffered streaming
files.  My assumption was that on average every I/O we did would
partially cross at least 2 blocks and result in 2 I/O's for
every I/O we issued.

Given the performance results I now think that if the system
is doing a normal forced I/O when requested (ie. no write cache
enabled and no memory backed caching controller), then it is
not surprising we don't see much difference.  What I believe
is happening is that if 2 I/O's are indeed getting issued,
the OS/JVM is queing both I/O's basically at the same time and
then waits for both of them complete before returning.  In most
cases of recent disks my guess is that these 2 I/O's are likely
to go into a single sector and get forced in a single I/O given
the relatively small buffer sizes we use in the log.

If java ever provides a way to directly queue I/O straight from
our buffer to disk with no intermediate data copy then it may
be important to use a page based log scheme.  For now it looks
like the stream interfaces being used and the JVM's optimization
of those interfaces are working ok.  A nice property of the current
log which we don't take advantage of is that since it is a stream,
we could dynamically change the size of the block of data we
write, increasing it as log activity increases.  This sort of
happens with group commit where we will write less than a block
if a commit happens - but we never consider growing the buffers
bigger than the boot time size.

Has anyone used the advanced NIO interfaces in java?  We thought
that those should give us increased I/O throughput for things like
our log file.  When we looked at it about a year ago, our tests
didn't show the improvement we were looking for.  Anyone have
opinions on what is the fastest way to write blocks of data ranging
from 32k and down, and requiring sync to disk for each write?