You are viewing a plain text version of this content. The canonical link for it is here.

Posted to derby-dev@db.apache.org by "Tomohito Nakayama (JIRA)" <de...@db.apache.org> on 2006/03/02 11:47:51 UTC

[jira] Updated: (DERBY-326) Improve streaming of large objects for network server and client

     [ http://issues.apache.org/jira/browse/DERBY-326?page=all ]

Tomohito Nakayama updated DERBY-326:
------------------------------------

    Attachment: DERBY-326_6.patch

 I upload DERBY-326_6.patch. 
---
    Description of patch : 
       * Remove processing to expand data from InputStream of blob/clob to memory before sending to client. 
        * Implement layer B streaming at NetworkServer. 
         * As written in this issue firstly, almost rewrite whole org.apache.derby.impl.drda.DDMWriter#writeScalarStream. 
           Here , "almost" means that code was not wrote from scratch, but was wrote as reform. 
           Remarkable point is as next : 
           Original code was using variable "bytesToRead" to handle remaining amount of data sent and remaining roon in DSS segment . 
           Now this variable "bytesToRead" was removed from. 
           New code, instead, have variable "spareDssLength" to handle remaining room in DSS segment . 
         * Add call to ensureLength in writeScalarStream expecting appropriate buffer size. 
         * Move comment in java/drda/org/apache/derby/impl/drda/DDMWriter.java about client driver implementation 
            to java/client/org/apache/derby/client/net/Request.java. 
               
        * Modify org.apache.derby.impl.drda.EXTDTAInputStream to stream InputStream retrieved from ResultSet directly. 
         * The source stream is read twice, first for seeing whether source stream is/is not empty, second for streaming it. 
         * To keep reference to valid stream, EXTDTAInputStream have reference to resultset and blob/clob also. 
             
       * Modify master file of result for blobclob4BLOB. 
        * Now as same as result of embed driver, dead lock will be happen in clobTest92. 
        * Different expected exception was happen in negative test in blobclob4BLOB. 
            
       * Added asserting code. 
       * Added negative test to kill streaming.

       * Other improvements from DERBY-326_5.patch were as next.
        * Reusing objects refered from instance variable of ReEncodedInputStream.
        * Modified not to open InputStream lazily as before.

 Testing : 
       * Executed derbyall and found no error except for found in http://www.multinet.no/~solberg/public/Apache/index.html

---
I have not measured the resut in performance yet.
I will report it later.

> Improve streaming of large objects for network server and client
> ----------------------------------------------------------------
>
>          Key: DERBY-326
>          URL: http://issues.apache.org/jira/browse/DERBY-326
>      Project: Derby
>         Type: Improvement
>   Components: Network Server, Network Client, Performance
>     Reporter: Kathey Marsden
>     Assignee: Tomohito Nakayama
>  Attachments: ClobTest.zip, DERBY-326.patch, DERBY-326_2.patch, DERBY-326_3.patch, DERBY-326_4.patch, DERBY-326_5.patch, DERBY-326_5_indented.patch, DERBY-326_6.patch, ReEncodedInputStream.java.modifiedForLongRun
>
> Currently the stream writing  methods in network server and client require a  length parameter. This means that we have to get the length of the stream before sending it. For example in network server in EXTDTAInputStream we have to use getString and getbytes() instead of getCharacterStream and getBinaryStream so that we can get the  length.
> SQLAM Level 7 provides for the enhanced LOB processing to allow streaming without indicating the length, so, the writeScalarStream methods in
> network server DDMWriter.java and network client Request.java can be changed to not require a length.
> Code inspection of these methods seems to indicate that while the length is never written it is used heavily in generating the DSS. One strange thing is that it appears on error, the stream is padded out to full length with zeros, but an actual exception is never sent.  Basically I think perhaps these methods need to be rewritten from scratch based on the spec requirements for lobs.
> After the writeScalarStream methods have been changed, then EXTDAInputStream can be changed to properly stream LOBS. See TODO tags in this file for more info.  I am guessing similar optimizations available in the client as well, but am not sure where that code is.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira

Re: [jira] Updated: (DERBY-326) Improve streaming of large objects for network server and client

Posted by TomohitoNakayama <to...@basil.ocn.ne.jp>.

Hello Bryan.
Thank you for your reading the patch.


I will answer your questions...

> 1) In DDMWriter.writeScalarStream(), at about line 702, there is the

<snip>

>    Do you know how to make that code be triggered?

> 2) In DRDAConnThread.writeEXTDTA(), at about line 7487, there is the

<snip>

>    Do you know how to make writeEXTDTA() be called with an object other
>    than a EXTDTAInputStream? 


I don't know both of them.
I just read the callee modules and respected original construction.

I think it is matter of taking risk to remove those code which may be 
not used now ....

At least, I think it is needed to try derbyall with some more sanity 
checking code to remove them.


> 3) I had two thoughts about the code which wraps the various stream 
> objects
>    in BufferedInputStreams in order to use the mark() and reset() 
> methods: 

>    a) It seems that there are two different places where we perform the
>       wrapping of the streams: once in DDMWriter.writeScalarStream, and
>       once in EXTDTAInputStream.openInputStreamAgain(). That seemed a bit
>       unfortunate, and I was wondering if you thought we could arrange
>       things so that there was exactly one place where we did this stream
>       wrapping. 

I see. I will reconsider how to accomplish.


>    b) It seemed that the streams that were being wrapped were all streams
>       of our own implementation, such as the ReEncodedInputStream, and 
> the
>       streams which are returned by Blob.getBinaryStream and by
>       Clob.getAsciiStream(). Rather than wrapping all of these streams in
>       BufferedInputStreams, could we not just implement the mark() and
>       reset() methods on our streams, and avoid the extra overhead of the
>       stream wrapping? 

Hmm....
I don't want to make mark/reset implementation other than implemented in 
java.io.BufferInputStream already ....
Wrapping those our streams when they are made would be considerable ...


> 4) I didn't really understand the method openInputStreamAgain() in
>    EXTDTAInputStream. It seems that, for a Blob or Clob, we open the
>    stream to the Blob/Clob, then read a single byte from the stream just
>    to see if we will get end-of-file or not, and then if we didn't get
>    end-of-file, we close and re-open the stream.
>
>    Am I understanding this code properly? If so, it seems rather awkward.
>    Three ideas presented themselves to me:
>
>    a) Perhaps there is a method that we can call on the Blob/Clob object
>       to figure out if it is null or not, other than reading the first
>       byte from the stream?
>    b) Or, if we have to read the first byte, maybe we could just hang on
>       to that first byte in a local buffer, rather than having to close
>       and re-open the stream in order to be able to re-read that byte.
>    c) Or, since we are going to need to have a markSupported() stream
>       anyway, perhaps we could wait until we have constructed the 
> markSupported
>       stream, and then use the mark/reset support to peek at the first 
> byte
>       to see if the Blob/Clob is empty or not. 

About a) I found that length() method of Blob/Clob have problem in 
performance because they read up all of the stream and return length.
About b) it sounds like what is markSupported ....
Then c) would be the best answer.
//Considering with 3)-b) , wrapping stream with BufferInputStream when 
they are made seems to be right way ...


> 5) It seems that the streaming implementation in DDMWriter always does an
>    actual I/O (that is, calls sendBytes()) for each 32K segment. While 
> that
>    is certainly correct, I found myself wondering how much of a 
> performance
>    impact it was having for *very* large data transfers. If users are 
> using
>    Derby to manage blobs which are 10's or 100's of megabytes in size, 
> then
>    I wonder if there are still more performance benefits that could be
>    realized by batching up multiple 32K DSS Layer B segments and then 
> sending
>    them with a single call to sendBytes(). For example, maybe we could
>    only do a hard I/O call on each 4th or 8th segment, and give the 
> network
>    a chance to work with 128K or 256K worth of bytes at a time.
>
>    I don't know if this would make a real difference or not, and it would
>    obviously remove some of the memory reduction benefit of the rest 
> of your
>    changes, but it occurred to me and I wanted to mention it. 

I think it would be nice to place buffer, which was configurable when to 
be flushed, before sending stream to .


> 6) Lastly, it would be nice to add a comment to the method
>    EXTDTAInputStream.initInputStream(), specifically to explain the
>    meaning of the return value from that method, since it's a bit subtle. 

I see.
I will add comments for them.


Now, I'm running measurement of the performance .
It will take for some more.

And then, I will start 1)-6).


Best regards.

-- 
/*

        Tomohito Nakayama
        tomonaka@basil.ocn.ne.jp
        tomohito@rose.zero.ad.jp
        tmnk@apache.org

        Naka
        http://www5.ocn.ne.jp/~tomohito/TopPage.html

*/

Re: [jira] Updated: (DERBY-326) Improve streaming of large objects for network server and client

Posted by Bryan Pendleton <bp...@amberpoint.com>.

Tomohito Nakayama (JIRA) wrote:
>  I upload DERBY-326_6.patch. 

I spent a while working through this patch. It was very interesting
and I have learned a lot. Thank you for all the hard work on this patch.

The patch applied cleanly to the trunk and I had no troubles building it.
I created a number of test situations by hand and stepped through them
carefully in the debugger, and I was not able to construct any tests
which failed.

Also, while I would not call this a thorough performance test, I did
some experimentation with BLOB and CLOB objects in the 20 megabyte
range, and the results were dramatic:
  - before your patch, my test program was taking about 75 seconds
  - after your patch, my test program was taking 6 seconds
So that is a wonderful result! Hopefully your other performance tests
are as successful.

I do have some comments, but they are more along the lines of questions,
since as I said above I could not find any scenarios which actually broke
with your changes. Hopefully you can take a few minutes and look though
them, to help me learn some more about the code.

The questions are included below.

thanks,

bryan

1) In DDMWriter.writeScalarStream(), at about line 702, there is the
    code:

     if (length == 0)
         return;

    I tried for a while to cause that code to be executed, but I could not.
    Do you know how to make that code be triggered? It seemed to me that
    the test for isEmptyStream() in DRDAConnThread.writeFdocaVal() might be
    short-circuiting this case and if that is true, perhaps we could remove
    that "if" statement from DDMWriter.

2) In DRDAConnThread.writeEXTDTA(), at about line 7487, there is the
    code:

      else if (o instanceof  byte[]) {

    This code is very interesting, because it seems to be the only place
    that might be calling DDMWriter.writeScalarStream() with a length
    value other than -1. But once again, I tried for a while to cause that
    code to be executed, and could not. I thought that it would fire if
    I used LONG VARCHAR FOR BIT DATA, but even when I used that data type,
    I could not cause the code at line 7487 to be executed.

    From what I can tell, LONG VARCHAR FOR BIT DATA is limited to 32700
    bytes max, and values of that type are always transmitted in-line, not
    as externalized data.

    Do you know how to make writeEXTDTA() be called with an object other
    than a EXTDTAInputStream?

    If not, then do you know how to cause DDMWriter.writeScalarStream() to
    be called with a length other than -1?

    I guess my basic point here is that we're doing some major surgery on
    DDMWriter.writeScalarStream(), and if the only path that we ever take
    through the code is the (length == -1) case, then perhaps we should
    consider removing the code which tries to use the known-in-advance length.

3) I had two thoughts about the code which wraps the various stream objects
    in BufferedInputStreams in order to use the mark() and reset() methods:

    a) It seems that there are two different places where we perform the
       wrapping of the streams: once in DDMWriter.writeScalarStream, and
       once in EXTDTAInputStream.openInputStreamAgain(). That seemed a bit
       unfortunate, and I was wondering if you thought we could arrange
       things so that there was exactly one place where we did this stream
       wrapping.
    b) It seemed that the streams that were being wrapped were all streams
       of our own implementation, such as the ReEncodedInputStream, and the
       streams which are returned by Blob.getBinaryStream and by
       Clob.getAsciiStream(). Rather than wrapping all of these streams in
       BufferedInputStreams, could we not just implement the mark() and
       reset() methods on our streams, and avoid the extra overhead of the
       stream wrapping?

4) I didn't really understand the method openInputStreamAgain() in
    EXTDTAInputStream. It seems that, for a Blob or Clob, we open the
    stream to the Blob/Clob, then read a single byte from the stream just
    to see if we will get end-of-file or not, and then if we didn't get
    end-of-file, we close and re-open the stream.

    Am I understanding this code properly? If so, it seems rather awkward.
    Three ideas presented themselves to me:

    a) Perhaps there is a method that we can call on the Blob/Clob object
       to figure out if it is null or not, other than reading the first
       byte from the stream?
    b) Or, if we have to read the first byte, maybe we could just hang on
       to that first byte in a local buffer, rather than having to close
       and re-open the stream in order to be able to re-read that byte.
    c) Or, since we are going to need to have a markSupported() stream
       anyway, perhaps we could wait until we have constructed the markSupported
       stream, and then use the mark/reset support to peek at the first byte
       to see if the Blob/Clob is empty or not.

5) It seems that the streaming implementation in DDMWriter always does an
    actual I/O (that is, calls sendBytes()) for each 32K segment. While that
    is certainly correct, I found myself wondering how much of a performance
    impact it was having for *very* large data transfers. If users are using
    Derby to manage blobs which are 10's or 100's of megabytes in size, then
    I wonder if there are still more performance benefits that could be
    realized by batching up multiple 32K DSS Layer B segments and then sending
    them with a single call to sendBytes(). For example, maybe we could
    only do a hard I/O call on each 4th or 8th segment, and give the network
    a chance to work with 128K or 256K worth of bytes at a time.

    I don't know if this would make a real difference or not, and it would
    obviously remove some of the memory reduction benefit of the rest of your
    changes, but it occurred to me and I wanted to mention it.

6) Lastly, it would be nice to add a comment to the method
    EXTDTAInputStream.initInputStream(), specifically to explain the
    meaning of the return value from that method, since it's a bit subtle.