You are viewing a plain text version of this content. The canonical link for it is here.
Posted to derby-dev@db.apache.org by Dy...@Sun.COM on 2006/11/23 11:13:55 UTC

ArrayInputStream and performance

I have noticed that two methods, setPosition(int) and setLimit(int)
seem to use more CPU than what I find "reasonable" given what they're
supposed to do. Together they use 4.5% of user CPU and 1.5% of system
CPU. 

Looking closer at what each method does, it seems like a fair amount
of the CPU is on argument/consistency checking. 

For setPosition(int) it seems simple to replace the checking with
an ASSERT, but in setLimit(int) a failed check will cause 'start', 'end'
and 'position' to be set to 0 before an exception is thrown:

        start = position;
        end   = position + length;

         if (end <= pageData.length)
         {
             return;
         }
         else
         {
 			start = end = position = 0;
 			throw new EOFException();
         }

Does anybody know if there is code that catches this exception and
that relies on these variables being set to 0?

With both checks replaced by ASSERTs I was able to to run all the
junit tests without seeing any problems...

-- 
dt


Re: ArrayInputStream and performance

Posted by Dy...@Sun.COM.
Mike Matrigali <mi...@sbcglobal.net> writes:

> I am not sure I understand what is being proposed here.  Is the idea
> that in a delivered system the checks will not be done, 

Correct. See the proposed patch at
https://issues.apache.org/jira/browse/DERBY-2118


>or does
> replacing it by ASSERTS mean that the check is done differently.

No.

> I know store uses a lot of this code, and there is some inline
> code in this exact area for performance reasons.  At one level of
> the store engine it "knows" that what it is acting on is a byte
> array and at least back when the work was done it was a big performance
> improvement - but at other levels of access it is only an
> ArrayInputStream.  Part of this also comes from allowing alternate
> module implementations.

I think that for the case at hand you could have inlined the call to
setPosition (it is not accessed throgh an interface). 
In fact I tried just that, but the effect did not seem
dramatic. Certainly not enough to justify exposing a private data
member... 

-- 
dt


Re: ArrayInputStream and performance

Posted by Mike Matrigali <mi...@sbcglobal.net>.
I am not sure I understand what is being proposed here.  Is the idea
that in a delivered system the checks will not be done, or does
replacing it by ASSERTS mean that the check is done differently.

I know store uses a lot of this code, and there is some inline
code in this exact area for performance reasons.  At one level of
the store engine it "knows" that what it is acting on is a byte
array and at least back when the work was done it was a big performance
improvement - but at other levels of access it is only an 
ArrayInputStream.  Part of this also comes from allowing alternate
module implementations.

Knut Anders Hatlen wrote:
> Dyre.Tjeldvoll@Sun.COM writes:
> 
> 
>>Knut Anders Hatlen <Kn...@Sun.COM> writes:
>>
>>
>>>I can't answer your question, but I will add that I find much of the
>>>code that uses ArrayInputStream very confusing. ArrayInputStream is
>>>supposed to wrap a byte array to provide encapsulation and easy access
>>>to the data through the InputStream interface. However, many (most?)
>>>of the callers inline the accesses to the data (presumably for
>>>performance reasons), so we end up with lots of methods looking like
>>>this:
>>>
>>>  public X readSomething(ArrayInputStream ais, byte[] data, int offset...) {
>>>    // lots of direct manipulation of the byte array
>>>    // ...
>>>    // ...
>>>    // ...
>>>    // finally, make sure that the state of the stream is brought to a
>>>    // consistent state:
>>>    ais.setPosition(offset + numberOfManipulatedBytes);
>>>  }
>>>
>>>Both the byte array and the offset are part of the ArrayInputStream
>>>class, so having all three of them in the parameter list feels a bit
>>>ugly. And if we need to manipulate the internal state directly that
>>>often, perhaps an InputStream is not the best data structure in the
>>>first place?
>>
>>I tend to agree. I just finished running derbyall with the checks
>>replaced by ASSERTs and I saw no failures. So unless I hear strong
>>objections I'll create a Jira for this change.
> 
> 
> FYI, I just ran the DERBY-1961 test clients and traced them with a
> DTrace script that printed how often each method was called. For the
> join client, ArrayInputStream.setPosition() was the most frequently
> called method (43837.7 calls/tx). For the single-record select client,
> it was third (58.4 calls/tx), only beaten by Object.<init>() and
> DDMWriter.ensureLength(). I think this means that setPosition() is the
> engine method that is most frequently called, at least in read-mostly
> transactions.  ArrayInputStream.setLimit() also appeared near the top
> of the list. See http://wiki.apache.org/db-derby/Derby1961MethodCalls
> for the details.
> 


Re: ArrayInputStream and performance

Posted by Knut Anders Hatlen <Kn...@Sun.COM>.
Dyre.Tjeldvoll@Sun.COM writes:

> Knut Anders Hatlen <Kn...@Sun.COM> writes:
>
>> I can't answer your question, but I will add that I find much of the
>> code that uses ArrayInputStream very confusing. ArrayInputStream is
>> supposed to wrap a byte array to provide encapsulation and easy access
>> to the data through the InputStream interface. However, many (most?)
>> of the callers inline the accesses to the data (presumably for
>> performance reasons), so we end up with lots of methods looking like
>> this:
>>
>>   public X readSomething(ArrayInputStream ais, byte[] data, int offset...) {
>>     // lots of direct manipulation of the byte array
>>     // ...
>>     // ...
>>     // ...
>>     // finally, make sure that the state of the stream is brought to a
>>     // consistent state:
>>     ais.setPosition(offset + numberOfManipulatedBytes);
>>   }
>>
>> Both the byte array and the offset are part of the ArrayInputStream
>> class, so having all three of them in the parameter list feels a bit
>> ugly. And if we need to manipulate the internal state directly that
>> often, perhaps an InputStream is not the best data structure in the
>> first place?
>
> I tend to agree. I just finished running derbyall with the checks
> replaced by ASSERTs and I saw no failures. So unless I hear strong
> objections I'll create a Jira for this change.

FYI, I just ran the DERBY-1961 test clients and traced them with a
DTrace script that printed how often each method was called. For the
join client, ArrayInputStream.setPosition() was the most frequently
called method (43837.7 calls/tx). For the single-record select client,
it was third (58.4 calls/tx), only beaten by Object.<init>() and
DDMWriter.ensureLength(). I think this means that setPosition() is the
engine method that is most frequently called, at least in read-mostly
transactions.  ArrayInputStream.setLimit() also appeared near the top
of the list. See http://wiki.apache.org/db-derby/Derby1961MethodCalls
for the details.

-- 
Knut Anders

Re: ArrayInputStream and performance

Posted by Dy...@Sun.COM.
Knut Anders Hatlen <Kn...@Sun.COM> writes:

> I can't answer your question, but I will add that I find much of the
> code that uses ArrayInputStream very confusing. ArrayInputStream is
> supposed to wrap a byte array to provide encapsulation and easy access
> to the data through the InputStream interface. However, many (most?)
> of the callers inline the accesses to the data (presumably for
> performance reasons), so we end up with lots of methods looking like
> this:
>
>   public X readSomething(ArrayInputStream ais, byte[] data, int offset...) {
>     // lots of direct manipulation of the byte array
>     // ...
>     // ...
>     // ...
>     // finally, make sure that the state of the stream is brought to a
>     // consistent state:
>     ais.setPosition(offset + numberOfManipulatedBytes);
>   }
>
> Both the byte array and the offset are part of the ArrayInputStream
> class, so having all three of them in the parameter list feels a bit
> ugly. And if we need to manipulate the internal state directly that
> often, perhaps an InputStream is not the best data structure in the
> first place?

I tend to agree. I just finished running derbyall with the checks
replaced by ASSERTs and I saw no failures. So unless I hear strong
objections I'll create a Jira for this change.

-- 
dt


Re: ArrayInputStream and performance

Posted by Dy...@Sun.COM.
Mike Matrigali <mi...@sbcglobal.net> writes:

> It would be fine to use an unchecked and/or an ASSERT based check for
>  readFieldLengthAndSetStreamPosition.  The "store" module owns this
> access and is not counting on limit checks to catch anything here.

So an unchecked setPosition would be OK here? 

What about StoredPage.restoreRecordFromSlot(...)?

As far as I can tell, it is reading from an ArrayInputStream that is a
protected member of StoredPage. The argument to setPosition in this
case is the local variable offset_to_row_data, that gets computed (and
checked extensively with ASSERTs) in the beginning of the method.

-- 
dt


Re: ArrayInputStream and performance

Posted by Knut Anders Hatlen <Kn...@Sun.COM>.
Mike Matrigali <mi...@sbcglobal.net> writes:

> A discussion on the list would be great.  Can anyone post a complete
> test run with exact description of test, and flat and hierarchical
> results performance monitor reports.  It is interesting if info has
> both number of calls and time.

I could see if I could extract some hierarchical data, at least for
the number of calls.

> I was starting to look at:
> http://wiki.apache.org/db-derby/Derby1961MethodCalls
>     I find it really useful to start looking top down at number of
>     operations per test rather than bottom up.  So for instance,
>     some things jumped out:
> o derby.iapi.services.io.ArrayInputStream.setPosition     58.4240
>     At first I was expecting this to be something like 1 per row + 1
> per column.  I assume it got big through btree search - hierarchical
> data would show.  It might be interesting to know btree overhead
> vs. heap overhead.  maybe btree compare can be optimized?
>
> o  derby.iapi.store.raw.ContainerLock.isCompatible  14.7802
>     This one looks wierd, I don't even see it in the
>     cleanup_flat_profile.txt posted with DERBY-2118.  I sort of
>     expected 1 call for the btree and one call for the heap.  Maybe
>     this is a multi-user run with a long list of table locks, if so
> maybe this can be optimized by somehow grouping the locks so that it
> is not necessary to compare to each one?

I think this number is twice as high as it should have been since
ContainerLock.isCompatible(ContainerLock) calls
ContainerLock.isCompatible(int) and my script only compares class name
and method name, not the parameter list. You are correct that the
numbers come from a multi-user run. This particular run had 10
concurrent clients, whereas I believe the numbers in DERBY-2118 came
from a single-user run (in which I wouldn't expect isCompatible() to
be called at all).

Grouping the locks sounds like an interesting idea, but I guess it
would require significant changes in the lock manager and the Lockable
objects. As it is now, all compatibility checking happens in the
Lockable objects, which are implemented outside the lock manager. To
implement the grouping efficiently, we would probably need to give the
lock manager some knowledge about the compatibility matrix and let
more of the compatibility checking happen inside the lock manager.

>   The
>     description on the wiki doesn't say much about the environment
>     the test was run under (ie. # processors, speed processors,
> write-sync state of disks, how many disks, was log/data on same disk?)

Yes, the page doesn't have much information about the
configuration. I'll update the main wiki page for the DERBY-1961
investigation (Derby1961ResourceUsage2) with some more details. The
tests I have run, have used this configuration:

  - Solaris 10
  - Sun Java SE 6.0 (build 102-104)
  - Derby 10.2.1.6
  - Dual Opterons (2 x 2.4 GHz)
  - data and log on separate disks
  - SCSI disks with write cache enabled (bad, I know, but I wanted the
    CPU to be fully utilized for the update tests without needing
    100-150 clients)
  - Derby running in a client/server configuration
  - derby.storage.pageCacheSize=12500
  - single-record select tests had 10 concurrent (number chosen
    because this load has max throughput at about 10 concurrent
    clients)
  - single-record update tests had 20 concurrent clients (more or less
    randomly chosen, but with the goal of full CPU utilization)
  - join tests had 4 concurrent client (also more or less randomly
    chosen)

Kristian and Dyre have also posted results on that page, and I believe
they have used similar configurations.

> Also I do encourage those interested in looking at changing the network
> server code to look at these performance results.  This code is new to
> Derby and I am not aware of much previous work in the network server
> performance area so there is likely some low hanging fruit
> there.

Indeed! There was an effort to improve network server performance
about a year ago, and a number of low hanging fruits were found and
fixed for 10.2. But I'm sure there still is more which could be
improved relatively easily.

-- 
Knut Anders

Re: ArrayInputStream and performance

Posted by Dy...@Sun.COM.
Mike Matrigali <mi...@sbcglobal.net> writes:

> A discussion on the list would be great.  Can anyone post a complete
> test run with exact description of test, 

The source code for the test client can be found at 
http://issues.apache.org/jira/browse/DERBY-1961

> and flat and hierarchical
> results performance monitor reports.

The top of some flat profiles can be found at the bottom of the Wiki
page. I can provide hierarchical dumps also but such dumps from my
profiler tends to get rather large (12-15 MB text), and it isn't
straight-forward to identify the relevant parts. 

>It is interesting if info has
> both number of calls and time.

The profiler I have been using can unfortunately not give you the call
count. The call count on the Wiki has been obtained using dTrace, I
think. 

[snip]

>   The
>     description on the wiki doesn't say much about the environment
>     the test was run under (ie. # processors, speed processors,
> write-sync state of disks, how many disks, was log/data on same disk?)

Thanks for the feedback. I've updated the Wiki with this info.
 
-- 
dt


Re: ArrayInputStream and performance

Posted by Mike Matrigali <mi...@sbcglobal.net>.

Knut Anders Hatlen wrote:
> Mike Matrigali <mi...@sbcglobal.net> writes:
> 
> 

> 
> I have attached new files to the wiki page which show the frequency of
> each caller-callee pair. Seems like your assumption was correct, that
> most of the calls to setPosition() came from searching the B-tree.
> 
> All of the calls to setPosition() came (indirectly) from
> BasePage.fetchFromSlot(). Of the 20.7 calls to fetchFromSlot(),
> 
>   16 came from ControlRow.CompareIndexRowFromPageToKey
>   1.7 came from BranchControlRow.getChildPageAtSlot
>   1 came from B2IRowLocking3.lockRowOnPage
>   1 came from BTreeForwardScan.fetchRows
>   1 came from GenericConglomerateController.fetch
> 
> All but the last method are B-tree related, right?

yes.
> 
> There are three levels in the B-tree in this test.
> 


Re: ArrayInputStream and performance

Posted by Knut Anders Hatlen <Kn...@Sun.COM>.
Mike Matrigali <mi...@sbcglobal.net> writes:

> A discussion on the list would be great.  Can anyone post a complete
> test run with exact description of test, and flat and hierarchical
> results performance monitor reports.  It is interesting if info has
> both number of calls and time.
>
> I was starting to look at:
> http://wiki.apache.org/db-derby/Derby1961MethodCalls
>     I find it really useful to start looking top down at number of
>     operations per test rather than bottom up.  So for instance,
>     some things jumped out:
> o derby.iapi.services.io.ArrayInputStream.setPosition     58.4240
>     At first I was expecting this to be something like 1 per row + 1
> per column.  I assume it got big through btree search - hierarchical
> data would show.  It might be interesting to know btree overhead
> vs. heap overhead.  maybe btree compare can be optimized?

I have attached new files to the wiki page which show the frequency of
each caller-callee pair. Seems like your assumption was correct, that
most of the calls to setPosition() came from searching the B-tree.

All of the calls to setPosition() came (indirectly) from
BasePage.fetchFromSlot(). Of the 20.7 calls to fetchFromSlot(),

  16 came from ControlRow.CompareIndexRowFromPageToKey
  1.7 came from BranchControlRow.getChildPageAtSlot
  1 came from B2IRowLocking3.lockRowOnPage
  1 came from BTreeForwardScan.fetchRows
  1 came from GenericConglomerateController.fetch

All but the last method are B-tree related, right?

There are three levels in the B-tree in this test.

-- 
Knut Anders

Re: ArrayInputStream and performance

Posted by Mike Matrigali <mi...@sbcglobal.net>.
A discussion on the list would be great.  Can anyone post a complete
test run with exact description of test, and flat and hierarchical
results performance monitor reports.  It is interesting if info has
both number of calls and time.

I was starting to look at:
http://wiki.apache.org/db-derby/Derby1961MethodCalls
     I find it really useful to start looking top down at number of
     operations per test rather than bottom up.  So for instance,
     some things jumped out:
o derby.iapi.services.io.ArrayInputStream.setPosition     58.4240
     At first I was expecting this to be something like 1 per row + 1 
per column.  I assume it got big through btree search - hierarchical 
data would show.  It might be interesting to know btree overhead vs. 
heap overhead.  maybe btree compare can be optimized?

o  derby.iapi.store.raw.ContainerLock.isCompatible  14.7802
     This one looks wierd, I don't even see it in the
     cleanup_flat_profile.txt posted with DERBY-2118.  I sort of
     expected 1 call for the btree and one call for the heap.  Maybe
     this is a multi-user run with a long list of table locks, if so
maybe this can be optimized by somehow grouping the locks so that it
is not necessary to compare to each one?

   The
     description on the wiki doesn't say much about the environment
     the test was run under (ie. # processors, speed processors, 
write-sync state of disks, how many disks, was log/data on same disk?)

Also I do encourage those interested in looking at changing the network
server code to look at these performance results.  This code is new to
Derby and I am not aware of much previous work in the network server
performance area so there is likely some low hanging fruit there. 
Previous to open source most tuning concentrated on embedded, single 
user performance, ramped up JIT, cached data,  compiled statement 
performance.

Dyre.Tjeldvoll@Sun.COM wrote:

> 
> So I would like to encourage you and Mike and anynone else to look at
> Derby's performance. Do your own comparison with other databases using
> the vm you prefer. Let us know what the results were. If performance
> isn't a problem let me, and everone else that seems to think so, know
> what we're doing wrong. And if it turns out that performance could be
> better; do your own profiling. Propose some intelligent changes based
> on what you see. Maybe I or someone else can implement it based on
> your guidelines?
> 


Re: ArrayInputStream and performance

Posted by Dy...@Sun.COM.
Daniel John Debrunner <dj...@apache.org> writes:

> I'm worried by this approach of removing checking of the limit or the
> position, it's much like saying we don't needs bounds checking on
> arrays because I know my code is correct.
>
> The current code provides some protection from a software bug,
> corrupted page or hacked page. Removing those checks may lead to hard
> to detect bugs where a position and/or limit is calculated incorrectly
> and subsequently leads to corrupted data or random exceptions being
> thrown.
>
> My feeling is that the integrity of the store and the code is better
> served by keeping these checks.

Thank you for clarifying your position on this. 

> I also think we need more performance numbers to justify such a
> change, a single set of runs from a single vm does not justify it. I
> will run numbers on linux with a number of vm's when I get the chance.
>
> Also often in these cases it is better to try and optimize at a higher
> level, rather than try to optimize at the lowest level (especially
> when removing such checks). In this case see if the number of calls to
> setLimit() or setPosition() could be reduced rather than
> micro-optimizing these methods by changing their core functionality.
>
> As an example the setLimit() call around the readExternalFromArray
> method. Maybe this responsibility could be pushed into the data type
> itself, and for some builtin types we trust their read mechanism to
> read the correct amount of data. E.g. reading a INTEGER will always
> read four bytes, so no need to set a limit around it. The limit is
> pushed there to support types that do not know their length on disk
> (e.g. some character types, some binary types, user defined types) and
> was to support arbitrary user types when the engine cannot trust or
> require that the de-serialization will read the complete stream and
> only its data.

Thank you for that suggestion. 

I'll freely admit that my understanding of this (and other) part(s) of
the code is very limited, and that you and others with an intimate
knowledge of the code probably can get much more relevant information
out of a profiler dump than I can.

So I would like to encourage you and Mike and anynone else to look at
Derby's performance. Do your own comparison with other databases using
the vm you prefer. Let us know what the results were. If performance
isn't a problem let me, and everone else that seems to think so, know
what we're doing wrong. And if it turns out that performance could be
better; do your own profiling. Propose some intelligent changes based
on what you see. Maybe I or someone else can implement it based on
your guidelines?

-- 
dt

Re: ArrayInputStream and performance

Posted by Daniel John Debrunner <dj...@apache.org>.
Dyre.Tjeldvoll@Sun.COM wrote:
> Mike Matrigali <mi...@sbcglobal.net> writes:
> 
> 
>> It would be fine to use an unchecked and/or an ASSERT based check for
>>  readFieldLengthAndSetStreamPosition.  The "store" module owns this
>> access and is not counting on limit checks to catch anything here.
> 
> Another question:
> 
> All observed calls to setLimit (single tuple select load) come from the same
> method: StoredPage.readRecordFromArray(...) 
> (setLimit is also called from readRecordFromStream() but this does not
> seem to happen with this type of load)
> 
> And the argument to setLimit()is always the local variable fieldDataLength
> which is the return value from
> StoredFieldHeader.readLengthAndSetStreamPosition().
> 
> So if readLengthAndSetStreamPosition() can update the position without
> checking, presumably the return value from this method can be trusted
> as well? Is it then necessary to check this value again in setLimit(),
> or could we have used an unchecked version of setLimit here?

I'm worried by this approach of removing checking of the limit or the 
position, it's much like saying we don't needs bounds checking on arrays 
because I know my code is correct.

The current code provides some protection from a software bug, corrupted 
page or hacked page. Removing those checks may lead to hard to detect 
bugs where a position and/or limit is calculated incorrectly and 
subsequently leads to corrupted data or random exceptions being thrown.

My feeling is that the integrity of the store and the code is better 
served by keeping these checks.

I also think we need more performance numbers to justify such a change, 
a single set of runs from a single vm does not justify it. I will run 
numbers on linux with a number of vm's when I get the chance.

Also often in these cases it is better to try and optimize at a higher 
level, rather than try to optimize at the lowest level (especially when 
removing such checks). In this case see if the number of calls to 
setLimit() or setPosition() could be reduced rather than 
micro-optimizing these methods by changing their core functionality.

As an example the setLimit() call around the readExternalFromArray 
method. Maybe this responsibility could be pushed into the data type 
itself, and for some builtin types we trust their read mechanism to read 
the correct amount of data. E.g. reading a INTEGER will always read four 
bytes, so no need to set a limit around it. The limit is pushed there to 
support types that do not know their length on disk (e.g. some character 
types, some binary types, user defined types) and was to support 
arbitrary user types when the engine cannot trust or require that the 
de-serialization will read the complete stream and only its data.

Dan.


Re: ArrayInputStream and performance

Posted by Dy...@Sun.COM.
Mike Matrigali <mi...@sbcglobal.net> writes:


> It would be fine to use an unchecked and/or an ASSERT based check for
>  readFieldLengthAndSetStreamPosition.  The "store" module owns this
> access and is not counting on limit checks to catch anything here.

Another question:

All observed calls to setLimit (single tuple select load) come from the same
method: StoredPage.readRecordFromArray(...) 
(setLimit is also called from readRecordFromStream() but this does not
seem to happen with this type of load)

And the argument to setLimit()is always the local variable fieldDataLength
which is the return value from
StoredFieldHeader.readLengthAndSetStreamPosition().

So if readLengthAndSetStreamPosition() can update the position without
checking, presumably the return value from this method can be trusted
as well? Is it then necessary to check this value again in setLimit(),
or could we have used an unchecked version of setLimit here?
  
-- 
dt

Re: ArrayInputStream and performance

Posted by Mike Matrigali <mi...@sbcglobal.net>.

Knut Anders Hatlen wrote:
> Daniel John Debrunner <dj...@apache.org> writes:
> 
> 
>>Knut Anders Hatlen wrote:
>>
>>
>>>I can't answer your question, but I will add that I find much of the
>>>code that uses ArrayInputStream very confusing. ArrayInputStream is
>>>supposed to wrap a byte array to provide encapsulation and easy access
>>>to the data through the InputStream interface. However, many (most?)
>>>of the callers inline the accesses to the data (presumably for
>>>performance reasons), so we end up with lots of methods looking like
>>>this:
>>>
>>>  public X readSomething(ArrayInputStream ais, byte[] data, int offset...) {
>>>    // lots of direct manipulation of the byte array
>>>    // ...
>>>    // ...
>>>    // ...
>>>    // finally, make sure that the state of the stream is brought to a
>>>    // consistent state:
>>>    ais.setPosition(offset + numberOfManipulatedBytes);
>>>  }
>>
>>
>>I could only find one method that looked something like the above:
>>
>>StoredFieldHeader.readFieldLengthAndSetStreamPosition
>>
>>Could you provide a list of the others so I can see what the issue is?
> 
> 
> You are quite right; most of the methods that inline the accesses
> don't call setPosition(). I think I must have confused the callers of
> ArrayInputStream methods with the internal implementation of
> ArrayInputStream. They use a pattern of first copying the instance
> variables into local variables, then do the work on the local
> variables, and finally write back to the instance variables. While I
> find that a little confusing because I would expect the run-time
> compiler to be able to perform that kind of optimization itself,
> that's a different issue from what is discussed in this thread.
I put in a lot of the store inline optimization, regretting the 
"ugliness" of it at the time - but it was all driven by profiler results 
at the time.  When looking at it you
should realize it was tuned to the state of jvm's, probably 5 years ago.
Some may not be needed any more, but at the time these kinds of changes
helped the cached scan performance of the store by close to 10 times on
some hardware/jvm's.
> 
> The mentioned method (readFieldLengthAndSetStreamPosition) seems to be
> the one that causes most calls to ArrayInputStream. All of the calls
> are to setPosition(). If the boundary checking in setPosition() has an
> unreasonably high resource consumption, it is always an option to
> check the usages of readFieldLengthAndSetStreamPosition, and if it can
> be proven safe to skip the boundary checking, that method could use an
> unchecked variant of setPosition().

It would be fine to use an unchecked and/or an ASSERT based check for
  readFieldLengthAndSetStreamPosition.  The "store" module owns this 
access and is not counting on limit checks to catch anything here.
> 


Re: ArrayInputStream and performance

Posted by Knut Anders Hatlen <Kn...@Sun.COM>.
Daniel John Debrunner <dj...@apache.org> writes:

> Knut Anders Hatlen wrote:
>> Daniel John Debrunner <dj...@apache.org> writes:
>>
>>> Knut Anders Hatlen wrote:
>>>> They use a pattern of first copying the instance
>>>> variables into local variables, then do the work on the local
>>>> variables, and finally write back to the instance variables. While I
>>>> find that a little confusing because I would expect the run-time
>>>> compiler to be able to perform that kind of optimization itself,
>>>> that's a different issue from what is discussed in this thread.
>>> Can the run-time optimizer/compiler perform such an optimization when
>>> the instance variable is not final? I would have thought not.
>>
>> Yes, I believe it can. Java's memory model basically allows the
>> compiler to assume that the variables are accessed by a single thread
>> as long as it doesn't hit a memory barrier (like a volatile variable
>> or a synchronized block).
>
> How effective can this be when the accesses to the instance field are
> intermixed with method calls of other objects, especially when those
> methods are interface calls?

I don't know the details about how the optimizations are performed in
those cases, but it shouldn't be a problem for the methods in
ArrayInputStream though, since they barely have method calls. A
typical method looks like this:

    public final char readChar() throws IOException {
 		int    pos  = position;
		byte[] data = pageData;

		if (pos >= (end -1))
			throw new EOFException(); // end of file

		int c = ((data[pos++] & 0xff) << 8) | (data[pos++] & 0xff);

		position = pos;

		return (char) c;
    }

And I would assume that this equivalent method would run just as fast:

    public final char readChar() throws IOException {

        if (position >= end - 1)
            throw new EOFException();

        return (char) (((pageData[position++] & 0xff) << 8) |
                       (pageData[position++] & 0xff));
    }

Although I doubt that it would run any faster. But perhaps the jar
file would be a couple of bytes smaller! :)

-- 
Knut Anders

Re: ArrayInputStream and performance

Posted by Daniel John Debrunner <dj...@apache.org>.
Knut Anders Hatlen wrote:
> Daniel John Debrunner <dj...@apache.org> writes:
> 
>> Knut Anders Hatlen wrote:
>>> They use a pattern of first copying the instance
>>> variables into local variables, then do the work on the local
>>> variables, and finally write back to the instance variables. While I
>>> find that a little confusing because I would expect the run-time
>>> compiler to be able to perform that kind of optimization itself,
>>> that's a different issue from what is discussed in this thread.
>> Can the run-time optimizer/compiler perform such an optimization when
>> the instance variable is not final? I would have thought not.
> 
> Yes, I believe it can. Java's memory model basically allows the
> compiler to assume that the variables are accessed by a single thread
> as long as it doesn't hit a memory barrier (like a volatile variable
> or a synchronized block).

How effective can this be when the accesses to the instance field are 
intermixed with method calls of other objects, especially when those 
methods are interface calls?

Dan.



Re: ArrayInputStream and performance

Posted by Knut Anders Hatlen <Kn...@Sun.COM>.
Daniel John Debrunner <dj...@apache.org> writes:

> Knut Anders Hatlen wrote:
>> They use a pattern of first copying the instance
>> variables into local variables, then do the work on the local
>> variables, and finally write back to the instance variables. While I
>> find that a little confusing because I would expect the run-time
>> compiler to be able to perform that kind of optimization itself,
>> that's a different issue from what is discussed in this thread.
>
> Can the run-time optimizer/compiler perform such an optimization when
> the instance variable is not final? I would have thought not.

Yes, I believe it can. Java's memory model basically allows the
compiler to assume that the variables are accessed by a single thread
as long as it doesn't hit a memory barrier (like a volatile variable
or a synchronized block).

The standard example I have seen, is that the JVM is allowed to
rewrite this code

  while (running) {
    sleep();
  }

to this

  if (running) {
    while (true) {
      sleep();
    }
  }

as long as 'running' is not a volatile variable. I'm sure I've read
somewhere that the server VM actually will do this optimization, but I
can't find a reference to it right now.

-- 
Knut Anders

Re: ArrayInputStream and performance

Posted by Daniel John Debrunner <dj...@apache.org>.
Knut Anders Hatlen wrote:
> They use a pattern of first copying the instance
> variables into local variables, then do the work on the local
> variables, and finally write back to the instance variables. While I
> find that a little confusing because I would expect the run-time
> compiler to be able to perform that kind of optimization itself,
> that's a different issue from what is discussed in this thread.

Can the run-time optimizer/compiler perform such an optimization when 
the instance variable is not final? I would have thought not.
You also have to remember that most of the store code was written in the 
early days of java, before JITs existed, thus such optimizations would 
not occur.

Dan.



Re: ArrayInputStream and performance

Posted by Knut Anders Hatlen <Kn...@Sun.COM>.
Daniel John Debrunner <dj...@apache.org> writes:

> Knut Anders Hatlen wrote:
>> Daniel John Debrunner <dj...@apache.org> writes:
>>
>>> Knut Anders Hatlen wrote:
>>>
>>>> I can't answer your question, but I will add that I find much of the
>>>> code that uses ArrayInputStream very confusing. ArrayInputStream is
>>>> supposed to wrap a byte array to provide encapsulation and easy access
>>>> to the data through the InputStream interface. However, many (most?)
>>>> of the callers inline the accesses to the data (presumably for
>>>> performance reasons), so we end up with lots of methods looking like
>>>> this:
>>>>
>>>>   public X readSomething(ArrayInputStream ais, byte[] data, int offset...) {
>>>>     // lots of direct manipulation of the byte array
>>>>     // ...
>>>>     // ...
>>>>     // ...
>>>>     // finally, make sure that the state of the stream is brought to a
>>>>     // consistent state:
>>>>     ais.setPosition(offset + numberOfManipulatedBytes);
>>>>   }
>>>
>>> I could only find one method that looked something like the above:
>>>
>>> StoredFieldHeader.readFieldLengthAndSetStreamPosition
>>>
>>> Could you provide a list of the others so I can see what the issue is?
>>
>> You are quite right; most of the methods that inline the accesses
>> don't call setPosition(). 
>
> I'm still confused here, what do you mean by "inline the accesses".
> Can you point to an example of what you mean?

I was refering to StoredPage and StoredRecordHeader which sometimes
access the page data as a byte array (what I called "inlined"), and
sometimes through ArrayInputStream. As the byte array is really a part
of StoredPage's internal state, I guess calling it "inlined" is not
very accurate.

> I was assuming that you meant there were many cases where code
> accessed the byte array directly and using ArrayInputStream, but I
> don't see many. ArrayInputStream.getData() is only called in three
> places, in FlushedScan and Scan. Maybe I misread what you are trying
> to say?

I'm sorry that I confused you. When I wrote that first comment, I had
been jumping back and forth between store and ArrayInputStream and
probably mixed the two since they contain some similar-looking code. I
thought I saw a pattern which I tried to describe, but I guess that
pattern only existed in my head (or at least it wasn't as common as I
initially felt it was).

The good news is that I find it much clearer and less confusing than I
did two days ago. Thanks to you all for helping me getting a clearer
picture of this part of the code!

And again, sorry for the confusion I have created for you guys.

-- 
Knut Anders - who will read the code more closely before calling it
confusing the next time :)

Re: ArrayInputStream and performance

Posted by Daniel John Debrunner <dj...@apache.org>.
Knut Anders Hatlen wrote:
> Daniel John Debrunner <dj...@apache.org> writes:
> 
>> Knut Anders Hatlen wrote:
>>
>>> I can't answer your question, but I will add that I find much of the
>>> code that uses ArrayInputStream very confusing. ArrayInputStream is
>>> supposed to wrap a byte array to provide encapsulation and easy access
>>> to the data through the InputStream interface. However, many (most?)
>>> of the callers inline the accesses to the data (presumably for
>>> performance reasons), so we end up with lots of methods looking like
>>> this:
>>>
>>>   public X readSomething(ArrayInputStream ais, byte[] data, int offset...) {
>>>     // lots of direct manipulation of the byte array
>>>     // ...
>>>     // ...
>>>     // ...
>>>     // finally, make sure that the state of the stream is brought to a
>>>     // consistent state:
>>>     ais.setPosition(offset + numberOfManipulatedBytes);
>>>   }
>>
>> I could only find one method that looked something like the above:
>>
>> StoredFieldHeader.readFieldLengthAndSetStreamPosition
>>
>> Could you provide a list of the others so I can see what the issue is?
> 
> You are quite right; most of the methods that inline the accesses
> don't call setPosition(). 

I'm still confused here, what do you mean by "inline the accesses".
Can you point to an example of what you mean?

I was assuming that you meant there were many cases where code accessed 
the byte array directly and using ArrayInputStream, but I don't see 
many. ArrayInputStream.getData() is only called in three places, in 
FlushedScan and Scan. Maybe I misread what you are trying to say?


Dan.


Re: ArrayInputStream and performance

Posted by Knut Anders Hatlen <Kn...@Sun.COM>.
Daniel John Debrunner <dj...@apache.org> writes:

> Knut Anders Hatlen wrote:
>
>> I can't answer your question, but I will add that I find much of the
>> code that uses ArrayInputStream very confusing. ArrayInputStream is
>> supposed to wrap a byte array to provide encapsulation and easy access
>> to the data through the InputStream interface. However, many (most?)
>> of the callers inline the accesses to the data (presumably for
>> performance reasons), so we end up with lots of methods looking like
>> this:
>>
>>   public X readSomething(ArrayInputStream ais, byte[] data, int offset...) {
>>     // lots of direct manipulation of the byte array
>>     // ...
>>     // ...
>>     // ...
>>     // finally, make sure that the state of the stream is brought to a
>>     // consistent state:
>>     ais.setPosition(offset + numberOfManipulatedBytes);
>>   }
>
>
> I could only find one method that looked something like the above:
>
> StoredFieldHeader.readFieldLengthAndSetStreamPosition
>
> Could you provide a list of the others so I can see what the issue is?

You are quite right; most of the methods that inline the accesses
don't call setPosition(). I think I must have confused the callers of
ArrayInputStream methods with the internal implementation of
ArrayInputStream. They use a pattern of first copying the instance
variables into local variables, then do the work on the local
variables, and finally write back to the instance variables. While I
find that a little confusing because I would expect the run-time
compiler to be able to perform that kind of optimization itself,
that's a different issue from what is discussed in this thread.

The mentioned method (readFieldLengthAndSetStreamPosition) seems to be
the one that causes most calls to ArrayInputStream. All of the calls
are to setPosition(). If the boundary checking in setPosition() has an
unreasonably high resource consumption, it is always an option to
check the usages of readFieldLengthAndSetStreamPosition, and if it can
be proven safe to skip the boundary checking, that method could use an
unchecked variant of setPosition().

-- 
Knut Anders

Re: ArrayInputStream and performance

Posted by Daniel John Debrunner <dj...@apache.org>.
Knut Anders Hatlen wrote:

> I can't answer your question, but I will add that I find much of the
> code that uses ArrayInputStream very confusing. ArrayInputStream is
> supposed to wrap a byte array to provide encapsulation and easy access
> to the data through the InputStream interface. However, many (most?)
> of the callers inline the accesses to the data (presumably for
> performance reasons), so we end up with lots of methods looking like
> this:
> 
>   public X readSomething(ArrayInputStream ais, byte[] data, int offset...) {
>     // lots of direct manipulation of the byte array
>     // ...
>     // ...
>     // ...
>     // finally, make sure that the state of the stream is brought to a
>     // consistent state:
>     ais.setPosition(offset + numberOfManipulatedBytes);
>   }


I could only find one method that looked something like the above:

StoredFieldHeader.readFieldLengthAndSetStreamPosition

Could you provide a list of the others so I can see what the issue is?

Thanks,
Dan.


> 
> Both the byte array and the offset are part of the ArrayInputStream
> class, so having all three of them in the parameter list feels a bit
> ugly. And if we need to manipulate the internal state directly that
> often, perhaps an InputStream is not the best data structure in the
> first place?
> 



Re: ArrayInputStream and performance

Posted by Knut Anders Hatlen <Kn...@Sun.COM>.
Dyre.Tjeldvoll@Sun.COM writes:

> I have noticed that two methods, setPosition(int) and setLimit(int)
> seem to use more CPU than what I find "reasonable" given what they're
> supposed to do. Together they use 4.5% of user CPU and 1.5% of system
> CPU. 
>
> Looking closer at what each method does, it seems like a fair amount
> of the CPU is on argument/consistency checking. 
>
> For setPosition(int) it seems simple to replace the checking with
> an ASSERT, but in setLimit(int) a failed check will cause 'start', 'end'
> and 'position' to be set to 0 before an exception is thrown:
>
>         start = position;
>         end   = position + length;
>
>          if (end <= pageData.length)
>          {
>              return;
>          }
>          else
>          {
>  			start = end = position = 0;
>  			throw new EOFException();
>          }
>
> Does anybody know if there is code that catches this exception and
> that relies on these variables being set to 0?

I can't answer your question, but I will add that I find much of the
code that uses ArrayInputStream very confusing. ArrayInputStream is
supposed to wrap a byte array to provide encapsulation and easy access
to the data through the InputStream interface. However, many (most?)
of the callers inline the accesses to the data (presumably for
performance reasons), so we end up with lots of methods looking like
this:

  public X readSomething(ArrayInputStream ais, byte[] data, int offset...) {
    // lots of direct manipulation of the byte array
    // ...
    // ...
    // ...
    // finally, make sure that the state of the stream is brought to a
    // consistent state:
    ais.setPosition(offset + numberOfManipulatedBytes);
  }

Both the byte array and the offset are part of the ArrayInputStream
class, so having all three of them in the parameter list feels a bit
ugly. And if we need to manipulate the internal state directly that
often, perhaps an InputStream is not the best data structure in the
first place?

-- 
Knut Anders