You are viewing a plain text version of this content. The canonical link for it is here.
Posted to derby-dev@db.apache.org by "Kristian Waagan (JIRA)" <ji...@apache.org> on 2010/05/14 13:54:43 UTC

[jira] Created: (DERBY-4661) Reduce size of encoding buffer for short character values

Reduce size of encoding buffer for short character values
---------------------------------------------------------

                 Key: DERBY-4661
                 URL: https://issues.apache.org/jira/browse/DERBY-4661
             Project: Derby
          Issue Type: Improvement
          Components: JDBC
    Affects Versions: 10.7.0.0
         Environment: Inserts using setXStream(int, Reader/InputStream, int/long) for short values on character columns
            Reporter: Kristian Waagan
            Assignee: Kristian Waagan
            Priority: Minor


When inserting character values Derby converts from Java char to an on-disk encoding of UTF-8. To to this, the user stream is read and the resulting bytes after conversion are placed in a "translation buffer". The default size of the buffer is 32 KB. When inserting a lot of short values, the pressure on the Java garbage collector is unnecessary high and the allocation/GC also causes a somewhat higher CPU usage.

This effect of this issue can easily be reduced by sizing the buffer in the appropriate cases.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (DERBY-4661) Reduce size of encoding buffer for short character values

Posted by "Kristian Waagan (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/DERBY-4661?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Kristian Waagan updated DERBY-4661:
-----------------------------------

    Issue & fix info: [Patch Available]

> Reduce size of encoding buffer for short character values
> ---------------------------------------------------------
>
>                 Key: DERBY-4661
>                 URL: https://issues.apache.org/jira/browse/DERBY-4661
>             Project: Derby
>          Issue Type: Improvement
>          Components: JDBC
>    Affects Versions: 10.7.0.0
>         Environment: Inserts using setXStream(int, Reader/InputStream, int/long) for short values on character columns
>            Reporter: Kristian Waagan
>            Assignee: Kristian Waagan
>            Priority: Minor
>         Attachments: derby-4661-1a-reduce_encoding_bz.diff, derby-4661-1a-reduce_encoding_bz.stat
>
>
> When inserting character values Derby converts from Java char to an on-disk encoding of UTF-8. To to this, the user stream is read and the resulting bytes after conversion are placed in a "translation buffer". The default size of the buffer is 32 KB. When inserting a lot of short values, the pressure on the Java garbage collector is unnecessary high and the allocation/GC also causes a somewhat higher CPU usage.
> This effect of this issue can easily be reduced by sizing the buffer in the appropriate cases.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Resolved: (DERBY-4661) Reduce size of encoding buffer for short character values

Posted by "Kristian Waagan (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/DERBY-4661?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Kristian Waagan resolved DERBY-4661.
------------------------------------

    Issue & fix info:   (was: [Patch Available])
       Fix Version/s: 10.7.0.0
          Resolution: Fixed

> Reduce size of encoding buffer for short character values
> ---------------------------------------------------------
>
>                 Key: DERBY-4661
>                 URL: https://issues.apache.org/jira/browse/DERBY-4661
>             Project: Derby
>          Issue Type: Improvement
>          Components: JDBC
>    Affects Versions: 10.7.0.0
>         Environment: Inserts using setXStream(int, Reader/InputStream, int/long) for short values on character columns
>            Reporter: Kristian Waagan
>            Assignee: Kristian Waagan
>            Priority: Minor
>             Fix For: 10.7.0.0
>
>         Attachments: derby-4661-1a-reduce_encoding_bz.diff, derby-4661-1a-reduce_encoding_bz.stat, derby-4661-1b-reduce_encoding_bz.diff, derby-4661-1b-reduce_encoding_bz.diff
>
>
> When inserting character values Derby converts from Java char to an on-disk encoding of UTF-8. To to this, the user stream is read and the resulting bytes after conversion are placed in a "translation buffer". The default size of the buffer is 32 KB. When inserting a lot of short values, the pressure on the Java garbage collector is unnecessary high and the allocation/GC also causes a somewhat higher CPU usage.
> This effect of this issue can easily be reduced by sizing the buffer in the appropriate cases.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Closed: (DERBY-4661) Reduce size of encoding buffer for short character values

Posted by "Kristian Waagan (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/DERBY-4661?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Kristian Waagan closed DERBY-4661.
----------------------------------


Closing issue.

> Reduce size of encoding buffer for short character values
> ---------------------------------------------------------
>
>                 Key: DERBY-4661
>                 URL: https://issues.apache.org/jira/browse/DERBY-4661
>             Project: Derby
>          Issue Type: Improvement
>          Components: JDBC
>    Affects Versions: 10.7.0.0
>         Environment: Inserts using setXStream(int, Reader/InputStream, int/long) for short values on character columns
>            Reporter: Kristian Waagan
>            Assignee: Kristian Waagan
>            Priority: Minor
>             Fix For: 10.5.3.1, 10.6.1.1, 10.7.0.0
>
>         Attachments: derby-4661-1a-reduce_encoding_bz.diff, derby-4661-1a-reduce_encoding_bz.stat, derby-4661-1b-reduce_encoding_bz.diff, derby-4661-1b-reduce_encoding_bz.diff
>
>
> When inserting character values Derby converts from Java char to an on-disk encoding of UTF-8. To to this, the user stream is read and the resulting bytes after conversion are placed in a "translation buffer". The default size of the buffer is 32 KB. When inserting a lot of short values, the pressure on the Java garbage collector is unnecessary high and the allocation/GC also causes a somewhat higher CPU usage.
> This effect of this issue can easily be reduced by sizing the buffer in the appropriate cases.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (DERBY-4661) Reduce size of encoding buffer for short character values

Posted by "Kristian Waagan (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/DERBY-4661?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Kristian Waagan updated DERBY-4661:
-----------------------------------

    Attachment: derby-4661-1a-reduce_encoding_bz.diff
                derby-4661-1a-reduce_encoding_bz.stat

Attaching patch 1a.

* iapi/types/ReaderToUTF8Stream
The actual fix. Note the special case when the value length is zero. To avoid the issue for shorter header lengths in the future, I used Math.max instead of handling valueLength == 0 specifically.

* other classes
Added StreamHeaderGenerater.getMaxHdrLength() and made SQLClob use it.

Some simple tests showed a performance improvement of around 30%. Real world workloads will not see such a gain, but the fix may help heavily loaded servers somewhat where users are inserting small data values using the streaming classes.

Regression tests passed.
Patch ready for review.

> Reduce size of encoding buffer for short character values
> ---------------------------------------------------------
>
>                 Key: DERBY-4661
>                 URL: https://issues.apache.org/jira/browse/DERBY-4661
>             Project: Derby
>          Issue Type: Improvement
>          Components: JDBC
>    Affects Versions: 10.7.0.0
>         Environment: Inserts using setXStream(int, Reader/InputStream, int/long) for short values on character columns
>            Reporter: Kristian Waagan
>            Assignee: Kristian Waagan
>            Priority: Minor
>         Attachments: derby-4661-1a-reduce_encoding_bz.diff, derby-4661-1a-reduce_encoding_bz.stat
>
>
> When inserting character values Derby converts from Java char to an on-disk encoding of UTF-8. To to this, the user stream is read and the resulting bytes after conversion are placed in a "translation buffer". The default size of the buffer is 32 KB. When inserting a lot of short values, the pressure on the Java garbage collector is unnecessary high and the allocation/GC also causes a somewhat higher CPU usage.
> This effect of this issue can easily be reduced by sizing the buffer in the appropriate cases.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (DERBY-4661) Reduce size of encoding buffer for short character values

Posted by "Kristian Waagan (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/DERBY-4661?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Kristian Waagan updated DERBY-4661:
-----------------------------------

    Fix Version/s: 10.5.3.1
                   10.6.1.1

Backported to the 10.5 branch with revision 951363.
Backported to the 10.6 branch with revision 951362.

> Reduce size of encoding buffer for short character values
> ---------------------------------------------------------
>
>                 Key: DERBY-4661
>                 URL: https://issues.apache.org/jira/browse/DERBY-4661
>             Project: Derby
>          Issue Type: Improvement
>          Components: JDBC
>    Affects Versions: 10.7.0.0
>         Environment: Inserts using setXStream(int, Reader/InputStream, int/long) for short values on character columns
>            Reporter: Kristian Waagan
>            Assignee: Kristian Waagan
>            Priority: Minor
>             Fix For: 10.5.3.1, 10.6.1.1, 10.7.0.0
>
>         Attachments: derby-4661-1a-reduce_encoding_bz.diff, derby-4661-1a-reduce_encoding_bz.stat, derby-4661-1b-reduce_encoding_bz.diff, derby-4661-1b-reduce_encoding_bz.diff
>
>
> When inserting character values Derby converts from Java char to an on-disk encoding of UTF-8. To to this, the user stream is read and the resulting bytes after conversion are placed in a "translation buffer". The default size of the buffer is 32 KB. When inserting a lot of short values, the pressure on the Java garbage collector is unnecessary high and the allocation/GC also causes a somewhat higher CPU usage.
> This effect of this issue can easily be reduced by sizing the buffer in the appropriate cases.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (DERBY-4661) Reduce size of encoding buffer for short character values

Posted by "Knut Anders Hatlen (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/DERBY-4661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12867505#action_12867505 ] 

Knut Anders Hatlen commented on DERBY-4661:
-------------------------------------------

This looks like a useful improvement. A couple of minor comments:

I think the code would be easier to follow if some of the magic numbers were replaced by constants or expressions. In particular:

  - the number 10922 could be replaced by bz/3. That would make it clearer where the number came from, and it would reduce the likelihood of bugs being introduced if we change the default buffer size later.

  - using a constant (e.g., READ_BUFFER_RESERVATION) instead of the number 6 in ReaderToUTF8Stream's constructor and in fillBuffer() would make the relation between those methods easier to see.

Perhaps it would be better to call the new method in StreamHeaderGenerator getMaxHeaderLength()?

The call to LimitReader.setLimit() in the lengthless constructor was made redundant by the patch. Should it be removed?

> Reduce size of encoding buffer for short character values
> ---------------------------------------------------------
>
>                 Key: DERBY-4661
>                 URL: https://issues.apache.org/jira/browse/DERBY-4661
>             Project: Derby
>          Issue Type: Improvement
>          Components: JDBC
>    Affects Versions: 10.7.0.0
>         Environment: Inserts using setXStream(int, Reader/InputStream, int/long) for short values on character columns
>            Reporter: Kristian Waagan
>            Assignee: Kristian Waagan
>            Priority: Minor
>         Attachments: derby-4661-1a-reduce_encoding_bz.diff, derby-4661-1a-reduce_encoding_bz.stat
>
>
> When inserting character values Derby converts from Java char to an on-disk encoding of UTF-8. To to this, the user stream is read and the resulting bytes after conversion are placed in a "translation buffer". The default size of the buffer is 32 KB. When inserting a lot of short values, the pressure on the Java garbage collector is unnecessary high and the allocation/GC also causes a somewhat higher CPU usage.
> This effect of this issue can easily be reduced by sizing the buffer in the appropriate cases.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (DERBY-4661) Reduce size of encoding buffer for short character values

Posted by "Kristian Waagan (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/DERBY-4661?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Kristian Waagan updated DERBY-4661:
-----------------------------------

    Attachment: derby-4661-1b-reduce_encoding_bz.diff

Attached patch 1b.

Thanks for reviewing the patch, Knut Anders.

I have addressed your suggestions, and I also made a few other changes (size logic, fixed two typos).
Regression tests passed (suites.All).

Patch ready for a second review.

> Reduce size of encoding buffer for short character values
> ---------------------------------------------------------
>
>                 Key: DERBY-4661
>                 URL: https://issues.apache.org/jira/browse/DERBY-4661
>             Project: Derby
>          Issue Type: Improvement
>          Components: JDBC
>    Affects Versions: 10.7.0.0
>         Environment: Inserts using setXStream(int, Reader/InputStream, int/long) for short values on character columns
>            Reporter: Kristian Waagan
>            Assignee: Kristian Waagan
>            Priority: Minor
>         Attachments: derby-4661-1a-reduce_encoding_bz.diff, derby-4661-1a-reduce_encoding_bz.stat, derby-4661-1b-reduce_encoding_bz.diff
>
>
> When inserting character values Derby converts from Java char to an on-disk encoding of UTF-8. To to this, the user stream is read and the resulting bytes after conversion are placed in a "translation buffer". The default size of the buffer is 32 KB. When inserting a lot of short values, the pressure on the Java garbage collector is unnecessary high and the allocation/GC also causes a somewhat higher CPU usage.
> This effect of this issue can easily be reduced by sizing the buffer in the appropriate cases.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (DERBY-4661) Reduce size of encoding buffer for short character values

Posted by "Knut Anders Hatlen (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/DERBY-4661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12871060#action_12871060 ] 

Knut Anders Hatlen commented on DERBY-4661:
-------------------------------------------

Thanks for making these changes, Kristian. The 1b patch looks good to me.

I assume this part of the patch was unintended, though:

Property changes on: java/engine/org/apache/derby/iapi/types
___________________________________________________________________
Added: svn:ignore
   + .ReaderToUTF8Stream.java.swp

> Reduce size of encoding buffer for short character values
> ---------------------------------------------------------
>
>                 Key: DERBY-4661
>                 URL: https://issues.apache.org/jira/browse/DERBY-4661
>             Project: Derby
>          Issue Type: Improvement
>          Components: JDBC
>    Affects Versions: 10.7.0.0
>         Environment: Inserts using setXStream(int, Reader/InputStream, int/long) for short values on character columns
>            Reporter: Kristian Waagan
>            Assignee: Kristian Waagan
>            Priority: Minor
>         Attachments: derby-4661-1a-reduce_encoding_bz.diff, derby-4661-1a-reduce_encoding_bz.stat, derby-4661-1b-reduce_encoding_bz.diff
>
>
> When inserting character values Derby converts from Java char to an on-disk encoding of UTF-8. To to this, the user stream is read and the resulting bytes after conversion are placed in a "translation buffer". The default size of the buffer is 32 KB. When inserting a lot of short values, the pressure on the Java garbage collector is unnecessary high and the allocation/GC also causes a somewhat higher CPU usage.
> This effect of this issue can easily be reduced by sizing the buffer in the appropriate cases.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (DERBY-4661) Reduce size of encoding buffer for short character values

Posted by "Kristian Waagan (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/DERBY-4661?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Kristian Waagan updated DERBY-4661:
-----------------------------------

    Attachment: derby-4661-1b-reduce_encoding_bz.diff

Yes, that change wasn't intended (my IDE did that for me).

Uploaded a new rev of patch 1b, and committed it to trunk with revision 948069.

Thanks,

> Reduce size of encoding buffer for short character values
> ---------------------------------------------------------
>
>                 Key: DERBY-4661
>                 URL: https://issues.apache.org/jira/browse/DERBY-4661
>             Project: Derby
>          Issue Type: Improvement
>          Components: JDBC
>    Affects Versions: 10.7.0.0
>         Environment: Inserts using setXStream(int, Reader/InputStream, int/long) for short values on character columns
>            Reporter: Kristian Waagan
>            Assignee: Kristian Waagan
>            Priority: Minor
>         Attachments: derby-4661-1a-reduce_encoding_bz.diff, derby-4661-1a-reduce_encoding_bz.stat, derby-4661-1b-reduce_encoding_bz.diff, derby-4661-1b-reduce_encoding_bz.diff
>
>
> When inserting character values Derby converts from Java char to an on-disk encoding of UTF-8. To to this, the user stream is read and the resulting bytes after conversion are placed in a "translation buffer". The default size of the buffer is 32 KB. When inserting a lot of short values, the pressure on the Java garbage collector is unnecessary high and the allocation/GC also causes a somewhat higher CPU usage.
> This effect of this issue can easily be reduced by sizing the buffer in the appropriate cases.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.