You are viewing a plain text version of this content. The canonical link for it is here.
Posted to derby-dev@db.apache.org by "Kristian Waagan (JIRA)" <ji...@apache.org> on 2008/08/06 16:16:44 UTC

[jira] Reopened: (DERBY-3769) Make LOBStoredProcedure on the server side smarter about the read buffer size

     [ https://issues.apache.org/jira/browse/DERBY-3769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Kristian Waagan reopened DERBY-3769:
------------------------------------


The fix isn't sufficient for Clob.

Currently the buffer threshold is expressed in characters, but it seems it has to be expressed in bytes. Since we are transferring data over the line as UTF-8 (is this always the case?), a solution might be to always assume 3 bytes per character. The fix would then be to introduce a separate threshold for Clobs:
  MAX_CLOB_RETURN_LENGTH = MAX_RETURN_LENGTH / 3

The buffer size will be too small for most cases (i.e. when the Clob contains characters than can be represented by ASCII). In my opinion, that is an acceptable tradeoff compared to resetting the Clob stream and skipping data frequently on the server (see DERBY-3766).
I'm sure a more sophisticated optimization can be implemented later.

> Make LOBStoredProcedure on the server side smarter about the read buffer size
> -----------------------------------------------------------------------------
>
>                 Key: DERBY-3769
>                 URL: https://issues.apache.org/jira/browse/DERBY-3769
>             Project: Derby
>          Issue Type: Improvement
>          Components: Network Server
>    Affects Versions: 10.3.3.0, 10.4.1.3, 10.5.0.0
>            Reporter: Kristian Waagan
>            Assignee: Kristian Waagan
>             Fix For: 10.4.2.0, 10.5.0.0
>
>         Attachments: derby-3769-1a-buffer_size_adjustment.diff, derby-3769-1b-buffer_size_adjustment.diff
>
>
> Derby has a max length for VARBINARY and VARCHAR, which is 32'672 bytes or characters (see Limits.DB2_VARCHAR_MAXWIDTH).
> When working with LOBs represented by locators, using a read buffer larger than the max value causes the server to process far more data than necessary.
> Say the read buffer is 33'000 bytes, and these bytes are requested by the client. This requests ends up in LOBStoredProcedure.BLOBGETBYTES.
> Assume the stream position is 64'000, and this is where we want to read from. The following happens:
>  a) BLOBGETBYTES instructs EmbedBlob to read 33'000 bytes, advancing the stream position to 97'000.
>  b) Derby fetches/receives the 33'000 bytes, but can only send 32'672. The rest of the data (328 bytes) is discarded.
>  c) The client receives the 32'672 bytes, recalculates the position and length arguments and sends another request.
>  d) BLOBGETBYTES(locator, 96672, 328) is executed. EmbedBlob detects that the stream position has advanced too far, so it resets the stream to position zero and skips/reads until position 96'672 has been reached.
>  e) The remaining 328 bytes are sent to the client.
> This issue deals with points b) and d), by avoiding the need to reset the stream.
> Points a) and e) are also problematic if a large number of bytes are going to be read, say hundreds of megabytes, but that's another issue.
> It is unfortunate that using 32 K (32 * 1024) as the buffer size is almost the worst case; 32'768 - 32'672 = 96 bytes.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.