You are viewing a plain text version of this content. The canonical link for it is here.
Posted to derby-commits@db.apache.org by mi...@apache.org on 2005/06/13 17:18:53 UTC

svn commit: r190415 - /incubator/derby/code/trunk/java/engine/org/apache/derby/iapi/services/io /incubator/derby/code/trunk/java/engine/org/apache/derby/iapi/types /incubator/derby/code/trunk/java/engine/org/apache/derby/impl/jdbc

Author: mikem
Date: Mon Jun 13 08:18:51 2005
New Revision: 190415

URL: http://svn.apache.org/viewcvs?rev=190415&view=rev
Log:
Fix for DERBY-302, committing on behalf of 

DERBY-302 - Insert on Clob of 500k of data using streams takes long time. 
It takes 3 mins on a sun jvm1.4.2 and 20 seconds with ibm jvm 1.4.2. 
The following fix improves the performance of insert into a 500k blob from 20 
seconds to around 1 second.  Note that by changing just the test program 
time was reduced from 3 minutes to avg around 20 seconds.

Currently in derby,  for an insert on a clob using setCharacterStream what will
happen is , the entire stream will be materialized into a char array and sent 
to store for the insert.  ( we should not have to stream here. I will file 
another jira issue for this and put in all information I learnt)

Given this is how inserts for large clobs are happening, the performance issue 
analysis is as follows:
--  profiler run shows that most time is spent in SQLChar.readExternal which 
is where the materialization into a char array for the user's input stream 
happens.  The growth of this array happens gradually till the entire stream 
is materialized into the array.  Below code snippet shows by how much the 
array is grown each time when it realizes it has to read more bytes from the 
stream.

The dvd hierarchy for clob is  -  SQLClob ( dvd) extends SQLVarChar 
extends SQLChar.

So in SQLChar.readExternal
........
int growby = in.available();
if(growby < 64)
growby = 64
and then an allocation and an arraycopy to the new allocated array.

--  In the code snippet,  'in' is the wrapper around the user's stream which is
ReaderToUTF8Stream.   ReaderToUTF8Stream extends InputStream and does not 
override available() method . As per the spec, InputStream.available() 
returns 0.

-- Thus each time, the array growth is by 64 bytes which is obviously not 
performant.  so for a 500k clob insert, this would mean allocation & 
arraycopy steps happen  ~8000 times.

-- The ReaderToUTF8Stream that has the user's stream reads from the stream and
does the utf8 conversion and puts it in a 4k array.  I think it is reasonable 
to have a 32k buffer to store this information for clobs.

Although I think there seems to be more possible optimizations in this area,  
I prefer the incremental approach too :)  so this patch  is a first step 
towards fixing the insert clob performance in the current system.

Fix includes:
-- enhanced the way the array was grown to keep the original  64 bytes for 
char ( seems reasonable given the upper limit for char) but override it to 
have  4k for varchar and clobs.

-- OVERRIDE AVAILABLE() IN READERTOUTF8STREAM TO RETURN A BETTER ESTIMATE OF 
HOW MANY BYTES CAN BE READ. 


Modified:
    incubator/derby/code/trunk/java/engine/org/apache/derby/iapi/services/io/LimitReader.java
    incubator/derby/code/trunk/java/engine/org/apache/derby/iapi/types/SQLChar.java
    incubator/derby/code/trunk/java/engine/org/apache/derby/iapi/types/SQLVarchar.java
    incubator/derby/code/trunk/java/engine/org/apache/derby/impl/jdbc/ReaderToUTF8Stream.java

Modified: incubator/derby/code/trunk/java/engine/org/apache/derby/iapi/services/io/LimitReader.java
URL: http://svn.apache.org/viewcvs/incubator/derby/code/trunk/java/engine/org/apache/derby/iapi/services/io/LimitReader.java?rev=190415&r1=190414&r2=190415&view=diff
==============================================================================
--- incubator/derby/code/trunk/java/engine/org/apache/derby/iapi/services/io/LimitReader.java (original)
+++ incubator/derby/code/trunk/java/engine/org/apache/derby/iapi/services/io/LimitReader.java Mon Jun 13 08:18:51 2005
@@ -119,6 +119,16 @@
 		limitInPlace = true;
 		return;
 	}
+    
+    /**
+     * return limit of the stream that can be read without throwing
+     * EOFException
+     * @return the remaining bytes left to be read from the stream
+     */
+    public final int getLimit()
+    {
+        return remainingBytes;
+    }
 
 	/**
 		Clear any limit set by setLimit. After this call no limit checking

Modified: incubator/derby/code/trunk/java/engine/org/apache/derby/iapi/types/SQLChar.java
URL: http://svn.apache.org/viewcvs/incubator/derby/code/trunk/java/engine/org/apache/derby/iapi/types/SQLChar.java?rev=190415&r1=190414&r2=190415&view=diff
==============================================================================
--- incubator/derby/code/trunk/java/engine/org/apache/derby/iapi/types/SQLChar.java (original)
+++ incubator/derby/code/trunk/java/engine/org/apache/derby/iapi/types/SQLChar.java Mon Jun 13 08:18:51 2005
@@ -86,6 +86,18 @@
 	extends DataType implements StringDataValue, StreamStorable
 {
 
+    /**
+     * threshold, that decides when we return space back to the VM
+     * see getString() where it is used
+     */
+    protected final static int RETURN_SPACE_THRESHOLD = 4096;
+    
+    /**
+     * when we know that the array needs to grow by at least
+     * one byte, it is not performant to grow by just one byte
+     * instead this amount is used to provide a reasonable growby size.
+     */
+    private final static int GROWBY_FOR_CHAR = 64;
 	/**
 		Static array that can be used for blank padding.
 	*/
@@ -335,7 +347,7 @@
 				// data is stored in the char[] array
 
 				value = new String(rawData, 0, len);
-				if (len > 4096) {
+				if (len > RETURN_SPACE_THRESHOLD) {
 					// free up this char[] array to reduce memory usage
 					rawData = null;
 					rawLength = -1;
@@ -641,6 +653,11 @@
         int utflen = in.readUnsignedShort();
 
         int requiredLength;
+        // minimum amount that is reasonable to grow the array
+        // when we know the array needs to growby at least one
+        // byte but we dont want to grow by one byte as that
+        // is not performant
+        int minGrowBy = growBy();
         if (utflen != 0)
         {
             // the object was not stored as a streaming column 
@@ -654,8 +671,8 @@
             // OR
             // The original string was a 0 length string.
             requiredLength = in.available();
-            if (requiredLength < 64)
-                requiredLength = 64;
+            if (requiredLength < minGrowBy)
+                requiredLength = minGrowBy;
         }
 
         char str[];
@@ -707,12 +724,26 @@
             if (strlen >= arrayLength) // the char array needs to be grown 
             {
                 int growby = in.available();
-
-                // We know at the array needs to be grown by at least one.
+                // We know that the array needs to be grown by at least one.
                 // However, even if the input stream wants to block on every
                 // byte, we don't want to grow by a byte at a time.
-                if (growby < 64)
-                    growby = 64;
+                // Note, for large data (clob > 32k), it is performant
+                // to grow the array by atleast 4k rather than a small amount
+                // Even better maybe to grow by 32k but then may be
+                // a little excess(?) for small data. 
+                // hopefully in.available() will give a fair
+                // estimate of how much data can be read to grow the 
+                // array by larger and necessary chunks.
+                // This performance issue due to 
+                // the slow growth of this array was noticed since inserts
+                // on clobs was taking a really long time as
+                // the array here grew previously by 64 bytes each time 
+                // till stream was drained.  (Derby-302)
+                // for char, growby 64 seems reasonable, but for varchar
+                // clob 4k or 32k is performant and hence
+                // growBy() is override correctly to ensure this
+                if (growby < minGrowBy)
+                    growby = minGrowBy;
 
                 int newstrlength = arrayLength + growby;
                 char oldstr[] = str;
@@ -804,6 +835,18 @@
         cKey = null;
     }
 
+    /**
+     * returns the reasonable minimum amount by 
+     * which the array can grow . See readExternal. 
+     * when we know that the array needs to grow by at least
+     * one byte, it is not performant to grow by just one byte
+     * instead this amount is used to provide a resonable growby size.
+     * @return minimum reasonable growby size
+     */
+    protected int growBy()
+    {
+        return GROWBY_FOR_CHAR;  //seems reasonable for a char
+    }
 	/**
 	 * @see Storable#restoreToNull
 	 *

Modified: incubator/derby/code/trunk/java/engine/org/apache/derby/iapi/types/SQLVarchar.java
URL: http://svn.apache.org/viewcvs/incubator/derby/code/trunk/java/engine/org/apache/derby/iapi/types/SQLVarchar.java?rev=190415&r1=190414&r2=190415&view=diff
==============================================================================
--- incubator/derby/code/trunk/java/engine/org/apache/derby/iapi/types/SQLVarchar.java (original)
+++ incubator/derby/code/trunk/java/engine/org/apache/derby/iapi/types/SQLVarchar.java Mon Jun 13 08:18:51 2005
@@ -182,4 +182,17 @@
 	{
 		return TypeId.VARCHAR_PRECEDENCE;
 	}
+    
+    /**
+     * returns the reasonable minimum amount by 
+     * which the array can grow . See readExternal. 
+     * when we know that the array needs to grow by at least
+     * one byte, it is not performant to grow by just one byte
+     * instead this amount is used to provide a resonable growby size.
+     * @return minimum reasonable growby size
+     */
+    protected final int growBy()
+    {
+        return RETURN_SPACE_THRESHOLD;  //seems reasonable for a varchar or clob 
+    }
 }

Modified: incubator/derby/code/trunk/java/engine/org/apache/derby/impl/jdbc/ReaderToUTF8Stream.java
URL: http://svn.apache.org/viewcvs/incubator/derby/code/trunk/java/engine/org/apache/derby/impl/jdbc/ReaderToUTF8Stream.java?rev=190415&r1=190414&r2=190415&view=diff
==============================================================================
--- incubator/derby/code/trunk/java/engine/org/apache/derby/impl/jdbc/ReaderToUTF8Stream.java (original)
+++ incubator/derby/code/trunk/java/engine/org/apache/derby/impl/jdbc/ReaderToUTF8Stream.java Mon Jun 13 08:18:51 2005
@@ -43,11 +43,14 @@
 	private int blen;
 	private boolean eof;
 	private boolean multipleBuffer;
+    // buffer to hold the data read from stream 
+    // and converted to UTF8 format
+    private final static int BUFSIZE = 32768;
 
 	ReaderToUTF8Stream(LimitReader reader)
 	{
 		this.reader = reader;
-		buffer = new byte[4096];
+		buffer = new byte[BUFSIZE];
 		blen = -1;
 	}
 
@@ -196,5 +199,20 @@
 		reader.close();
 	}
 
+    /**
+     * Return an optimized version of bytes available to read from 
+     * the stream 
+     * Note, it is not exactly per java.io.InputStream#available()
+     */
+    public final int available()
+    {
+       int remainingBytes = reader.getLimit();
+       // this object buffers BUFSIZE bytes that can be read 
+       // and when that is finished it reads the next available bytes
+       // from the reader object 
+       // reader.getLimit() returns the remaining bytes available
+       // on this stream
+       return (BUFSIZE > remainingBytes ? remainingBytes : BUFSIZE);
+    }
 }