You are viewing a plain text version of this content. The canonical link for it is here.
Posted to derby-dev@db.apache.org by "Kristian Waagan (JIRA)" <ji...@apache.org> on 2008/12/04 17:08:44 UTC
[jira] Updated: (DERBY-3934) Improve performance of reading
modified Clobs
[ https://issues.apache.org/jira/browse/DERBY-3934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Kristian Waagan updated DERBY-3934:
-----------------------------------
Attachment: derby-3934-3a-clobupdreader_utf8reader.stat
derby-3934-3a-clobupdreader_utf8reader.diff
'derby-3934-3a-clobupdreader_utf8reader.diff' makes the handling of
StoreStreamClob and TemporaryClob consistent.
The following files are touched (all in derby.impl.jdbc):
*** EmbedClob.
Updated call to ClobUpdatableReader. The change of the position argument is
intentional.
*** TemporaryClob
Replaced the ClobUpdatableReader returned by getReader with a UTF8Reader.
Internal handling of TemporaryClob should deal with changing contents
specifically, or create a ClobUpdatableReader where required.
Note also the use of the new CharacterStreamDescriptor class. This piece of
code will probably be changed later on, when there is more information about
the stream available. For instance, caching byte/char positions allows to skip
directly to the byte position through the underlying file API. This way, we
don't have to decode all the raw bytes to skip the correct number of chars.
*** ClobUpdatableReader
More or less rewritten. It now uses the new methods exposed by InternalClob to
detect changes in the underlying Clob content. Note that this class doesn't
handle repositioning, only detection of changes and forwarding of read/skip
calls.
Note the lazy initialization of the underlying reader.
WARNING: There is one thing missing, which is proper synchronization. Access to
store will be synchronized in other locations, but this class is not thread
safe. I haven't decided yet whether to synchronize on the reader object or the
root connection. I think the latter is the best choice. Does anyone know
anything about the cost of taking locks on the same object multiple times?
*** StoreStreamClob
Replaced old UTF8Reader constructor with the new one. Again, this code needs
to be updated when more information about the stream is available. This is to
allow UTF8Reader to perform better.
*** UTF8Reader
Added a new constructor, using the new CharacterStreamDescriptor class.
Removed one constructor.
Retrofitted the second old constructor to use CharacterStreamDescriptor. This
will be removed when the calling code has been updated.
The old method calculating the buffer size will also be removed.
Stopped referencing PositionedStoreStream, using PositionedStream interface
instead. This allows the positioning logic to be used for both store streams
and LOBInputStream streams.
The reader has been prepared to be able to deal with multiple data offsets,
i.e. handling several store stream formats. For instance, the current
implementations has an offset of two bytes, where as the planned new one will
have an offset of at least five bytes. LOBInputStream has an offset of zero
bytes (no header information).
From now on, position aware streams are not closed as early as before, because
we might have go backwards in the stream. Streams that can only move forwards
are closed as soon as possible (as before).
Tests are running, and about 3/4 finished. No errors so far. I will post final
results later.
Patch ready for review.
The plan forwards
-----------------
After patch 3a is in, I plan to do the following;
1) Implement TemporaryClob.getInternalReader().
This will dramatically improve the Clob.getSubString performance for
modified Clobs.
2) I will consider adding a simple byte/char position cache.
The point of this is to be able to skip to a given byte position without
having to decode byte into chars. This is a mechanism that will only help
certain access patterns, but it should come with a very low overhead.
3) Continue working with the new Clob format.
When it is in place, care must be taken to utilize the new steam
information where possible. The primary one is returning the length through
Clob.length(). A second opportunity is using the length information to take
decisions in the byte/char position cache.
This work is mostly related to DERBY-3907.
I'm using the simple Clob regression tests in my work, and it has already
revealed a bug :) I had forgotten to include the know byte length in the
CharacterStreamDescription, which caused UTF8Reader to allocate a buffer that
was way too big (8K instead of 100 bytes).
The last step in my LOB work will be to write a simple report documenting the
improvements.
> Improve performance of reading modified Clobs
> ---------------------------------------------
>
> Key: DERBY-3934
> URL: https://issues.apache.org/jira/browse/DERBY-3934
> Project: Derby
> Issue Type: Improvement
> Components: JDBC
> Affects Versions: 10.5.0.0
> Reporter: Kristian Waagan
> Assignee: Kristian Waagan
> Attachments: derby-3934-1a-clob_replace_test.diff, derby-3934-2a-intclob_new_methods.diff, derby-3934-3a-clobupdreader_utf8reader.diff, derby-3934-3a-clobupdreader_utf8reader.stat
>
>
> The performance of reading modified Clobs is poor, which is demonstrated by running a test program selecting a 10 MB Clob and then getting the contents using getSubString:
> - unmodified Clob (StoreStreamClob) : ~1 300 ms
> - modified Clob (TemporaryClob): ~156 000 ms
> In this case, the Clob was modified by changing the first character.
> A number of subtasks will be created to handle the various issues, which will be related to both performance and code cleanup.
> For a brief overview, see http://www.nabble.com/Suggestion-for-improving-ClobUpdatableReader-and-related-code-to20308303.html
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.