You are viewing a plain text version of this content. The canonical link for it is here.
Posted to derby-dev@db.apache.org by Kristian Waagan <Kr...@Sun.COM> on 2008/11/03 19:49:32 UTC
Suggestion for improving ClobUpdatableReader and related code

Hello,

While investigating the LOB code, it occurred to me that 
ClobUpdatableReader and some related code could be changed for
two main reasons; performance and code readability/simplification. I'll 
focus on the latter one in this mail.

At the moment, the updatable reader functionality is tightly coupled to 
TemporaryClob and has special handling for the various internal Clob 
representations. I believe the functionality can be provided efficiently 
on a more general level, and the responsibilities can also be more 
clearly separated.
Below I try to outline a solution to this problem - I would like some 
high level feedback on whether the suggestion is sound or not.
(please ask if needed, the description omits information in an attempt 
to keep it short)

 * Responsibilities
   - positioning: handled by/through UTF8Reader.
   - detecting modifications and handling them: ClobUpdatableReader.

 * New classes/interfaces
   - PositionedStream (generalization of PositionedStoreStream): extends 
InputStream; getPosition(), reposition(long). The idea here is to 
exploit the fact that TemporaryClob is directly addressable (by byte 
position, *not* by char position).
   - CharacterStreamDescriptor: a class containing information about a 
byte stream representing characters. Will be passed in to UTF8Reader, so 
that it can configure itself appropriately (current b/c pos, b/c length, 
is bufferable/positionAware, max char length, dataOffset).

 * Changes
   - InternalClob: add isReleased() and getUpdateCount() to support the 
updatable reader functionality. The first is used to check if the 
internal representation has changed, the latter to detect content 
modifications.
   - ClobUpdatableReader: will be simplified, practically rewritten.
   - UTF8Reader: new constructors and other minor changes. One notable 
change is that it will no longer be this class' responsibility to read 
the encoded length information in the streams from store. I'm hoping 
this can be done in a utility class to avoid duplicating that code.

If I don't get any pushback on this suggestion, I will create a parent 
Jira issue (probably describing the performance problem) and a set of 
subtasks under it. The diff for my current prototype patch is at around 
1200 lines.


-- 
Kristian