You are viewing a plain text version of this content. The canonical link for it is here.
Posted to derby-dev@db.apache.org by Knut Anders Hatlen <Kn...@Sun.COM> on 2006/01/05 14:12:02 UTC

Re: [jira] Updated: (DERBY-212) Optimize some specific methods in Network Server to improve performance

Hi Bryan,

Thank you very much for reviewing the patch. See my answers below.

Bryan Pendleton <bp...@amberpoint.com> writes:

> Knut Anders Hatlen (JIRA) wrote:
>> I have uploaded a patch (DERBY-212-parsePKGNAMCSN.diff) which changes
>> the following files:
>
> Hi Knut,
>
> I thought I'd take a look at your changes, since I've recently
> been studying a lot of this same code. I hope these comments
> are useful to you.
>
> In general, I think the changes are very good and will be a
> definite improvement. DEATH TO STRING TOKENIZERS!!!
>
> My comments below are very much "nits" and
> mostly have to do with some readability and consistency issues.
>
> thanks,
>
> bryan
>
>
> 1) Why is StmtKey an inner class? It feels like it is as much of
> a first-class object as Pkgnamcsn or DRDAString. I think you should
> consider elevating it to the same level as those other classes.

I felt that StmtKey had no meaning outside the Database class and
therefore made it an inner class. I also found the concept of StmtKey
(which originally was a substring of pkgnamcsn) a bit strange and
wanted to hide the details inside the Database class instead of
exposing them in a separate class. DRDAString is also used in one
class only (DRDAConnThread), but I felt that that class represented a
more general concept than StmtKey. Additionally, I didn't find a very
good name for the class (PkgnamcsnWithoutConsistencyToken?), and
thought that it was ok to call it something simple like StmtKey only
if it was an inner class. I'll have another look at it and see if I
should change it.

> 2) One of my general feelings is that performance improvements
> have to be careful to balance the tradeoff between the
> increased complexity that they add to the code. I think you've
> done a very good job of this, but I'm somewhat troubled by the
> new class DRDAString. It seems to me that there are a few ways
> in which it's slightly awkward, and I wonder whether a slightly
> simpler version of this class would accomplish most of your
> goals but be a little less awkward. In particular:
>
>    - Having to maintain a separate 'length' field seems to add
>      a fair amount of "bulk" to DRDAString.java, all to handle
>      the one case of calling setBytes() with a new string which
>      is shorter in length than the current string. How often
>      does this actually happen, in practice? How bad do you think
>      it would be to just always allocate a new buffer in setBytes(),
>      even when it's a new shorter buffer. That would mean we
>      didn't have to have a separate "length" field.

I don't remember having seen any of the DRDAString objects being
modified after the first call to parsePKGNAMCSN() within a session, so
I don't think we lose anything by simplifying it.

>    - The "autotrim" feature feels, frankly, like a wart. It
>      doesn't feel like it belongs with this class. I think that
>      this functionality belongs either in the DDMReader.readString()
>      method, or in the DRDAConnThread.parsePKGNAMCSN() method.

I agree. I'll try to find a better place for it.

>    - I also found the "cachedManager" and "cachedBuffer" feature
>      to be pretty complicated. I thought that "cachedString"
>      was OK, but the other two caching fields seem like quite a
>      bit of complexity. How does this functionality actually
>      get used in practice? That is, under what circumstances is
>      getBuffer(CcsidManager) called such that the passed-in
>      codeset manager is different than the original codeset
>      manager for this string?

It's not used since the only CcsidManager implemented (currently) is
EbcdicCcsidManager. The only case in which this functionality would be
used is when the encoding used upstream is not the same as the
encoding downstream. This is allowed by the DRDA spec, but I don't
think it's going to be implemented in Derby anytime soon. I'll
simplify this as well.

> 3) I like the Pkgnamcsn.java class very much. My only comment is
> that I think all the "getter" methods should use the standard
> DRDA acronyms. So I think getRdbnam(), getRdbcolid() and
> getPkgid() are fine, but I wish it was:
>    - getPkgcnstkn(), not getPkgcnstknStr(), and
>    - I wish the javadoc comment for getSecnumber() said
>      "get section number", not "Get PKGSN", or maybe I wish
>      the method was named getPkgsn(). I'm a little conflicted here,
>      because PKGSN is definitely an acronym used in the DRDA specs,
>      but DRDAConnThread.java seems to generally refer to this field
>      as "secnumber". I just think there's a little bit of consistency
>      cleanup we could do here.
>
> 4) Similar to the previous comment, I think that the field in
> DRDAStatement.java should be named 'pkgcnstkn', not 'pkgcnstknStr'.
> I think the 'Str' on the end is a bit awkward.
>
> 5) Similarly, in DRDAConThread, having 'pkgcnstknStr' results in the
> awkard construction 'writePKGNAMCSN(pkgcnstknStr.toString())', and
> I thought to myself: "why are we calling 'toString' on a String'?
> This might be less awkward if the variable was named 'pkgcnstkn' instead.

I agree with all your comments in 3-5. Ideally, I would have wanted
Pkgnamcsn.getPkgcnstknStr() to be called getPkgcnstkn() *and* return a
byte array instead of a string, since it's not logically a string. I
didn't do that because DRDAStatement relied on the ability to use the
consistency token as a hash key. Thinking more about it, it seems like
a good idea to create a new class called ConsistencyToken. This way we
can avoid converting byte array -> string -> byte array, and we can
still use it as a hash key.

> 6) I see that we always initialize the DRDAStrings to 0-sized
> buffers in the DRDAConnThread initializer, then re-allocate them
> to the "right" size later. Is it possible to predict the "right" size when
> initially allocating them? For example, does it make sense to use
> values like CodePoint.RDBNAM_LEN rather than 0 here?

Since this only happens during initialization of DRDAConnThread
objects (which are reused) this would not have any significant
effect. Also, it doesn't seem like the size of the fields matches
RDBNAM_LEN et al. very often.

> 7) I think that the use of the 'changedStrings' variable in
> DRDAConnThread.parsePKGNAMCSN could use with a more substantial
> comment, as this is one of the more subtle parts of your change,
> particularly since it ends up percolating down into DDMReader.readString
> and DRDAString.setBytes. Perhaps you could say something along the
> lines of the following:
>
> 	// It is common for a subsequent DRDA message to reference
> 	// the same database, collection, package, consistency
> 	// token, and section number. In this case,
> 	// efficiency can be achieved by recognizing that we're
> 	// using the same values that we did before, and so we
> 	// don't have to allocate and initialize a new Pkgnamcsn
> 	// object, but can just re-use the last one.

Good idea. We could also get rid of the changedStrings variable
completely. Since DRDAString caches the string value, we can just
check the identity of the strings to see if they have
changed. Something like this:

   if ((prevPkgnamcsn == null) ||
       (prevPkgnamcsn.getRdbcolid() != rdbcolid.toString()) ||
       ....) {
     prevPkgnamcsn = new Pkgnamcsn(...);
   }

This way, DDMReader.readString() and DRDAString.setBytes() can still
be void instead of boolean.

> 8) Lastly, I must admit that I'm surprised that the caching and
> re-use of the Pkgnamcsn object via the 'prevPkgnamcsn' field in
> DRDAConnThread is actually worth it. The Pkgnamcsn object seems
> very lightweight to me, and in order to detect that we can cache
> and re-use it, we have to have all the overhead of calling
> getTrimmedSize() and equalTo() on each of the Pkgnamcsn components,
> as well as the awkwardness of having a boolean return value in
> DDMReader.readString and DRDAString.setBytes, both of which end up
> surprising the reader since they would "naturally" be 'void'
> methods, not 'boolean' methods.

Although Pkgnamcsn is lightweight, the cost of checking is small
compared to the cost of allocation and garbage collection. I don't
have the actual numbers, but I did see a significant performance
increase by this change. If the Pkgnamcsn objects could not be reused
very often, the overhead would probably be too big to be justified,
but the objects seem to be reused almost all the time. 

> Obviously I'm having an intuitive reaction here, but it "feels"
> like the code path that was added to support the detection of
> cases where we have the same Pkgnamcsn value is larger than the
> code path that we save by allocating a new Pgknamcsn object on
> every call to DRDAConnThread.parsePKGNAMCSN(). I can readily
> see how all the other performance optimizations in your patch were
> wins; I'm just surprised that this particular optimization was a win.

Thanks for all your comments, I found them very helpful! I will go
through the patch again and try to change it and simplify it as you
have suggested.

-- 
Knut Anders