You are viewing a plain text version of this content. The canonical link for it is here.
Posted to derby-dev@db.apache.org by Jack Klebanoff <kl...@sbcglobal.net> on 2005/03/02 01:13:03 UTC
Re: [PATCH] BackingStoreHashtable
Mike Matrigali wrote:
> Any idea if many more queries will now spill to disk? I am worried that
> many more will, and now queries will run slower. Seems like there
> should be a better connection between the optimizer estimate to use
> in memory hash and what actually is done in memory. As I understand it
> now the Optimizer costs hash only as in memory, if it's estimates say
> it won't fit into memory it chooses a different plan. I assume the
> other plan is more efficient than a hash plan which may have to go to
> disk for every probe (even if pages are cached going to latched pages
> on every probe is a lot more expensive than in memory hash).
>
> My guess is no one did the work to set max_inmemory_rowcnt, because the
> backing store stuff wasn't implemented yet.
>
> I have not read the code yet. Is the 1% of all memory, or 1% of free
> memory?
It is 1% of Runtime.getRuntime().totalMemory().
Changing the costing is difficult. The FromBaseTable.estimateCost method
uses the JoinStrategy|.|multiplyBaseCostByOuterRows method to compute
the cost of the join as either the cost of constructing the inner table
or the number of rows in the outer table times the cost of constructing
the inner table. HashJoinStrategy|.|multiplyBaseCostByOuterRows returns
false, so the cost of a hash join is estimated as the cost of
constructing the hash table, and independent of the number of rows in
the outer table. This is a reasonable approximation if the hash table is
in memory, the outer table is not enormous, and the hash function does a
good job.
If some of the hash table is on disk then neither join cost estimate
method is very good. Then cost of a hash join is approximately
hashtableBuildCost + outerTableRowCount*hashProbeCost, where
hashProbeCost is the expected cost of one probe into the hash table.
This will be a function of the fraction of inner table rows on disk, and
the number of rows on disk.
The JoinStrategy interface has a method, estimateCost, that looks like
it could be made to do the trick, but for some reason
FromBaseTable.estimateCost does not use it.
The estimate of the inner table row count is used to estimate the size
of the hash table. If it is large we do not use a hash join at all. If
we try to include the disk access cost in our hash join cost estimate we
have to use the inner table row count estimate to estimate the fraction
of rows on disk. But if the inner table passes the maximum size test
this fraction is zero if the row count estimate is correct, or very
small if we assume some statistical variation. It doesn't make sense to
include the disk hash table probe cost in the join cost estimate unless
we also ditch the maximum size restriction on hash joins. This might be
a good idea, but I would like to fix BackingStoreHashtable so that
neither hash joins, nor DISTINCT, nor scroll insensitive cursors blow up
before tackling join costing.
I think that the right thing to do is for HashJoinStrategy to pass the
correct maximum in-memory row count to BackingStoreHashtable so that the
two are following the same playbook with respect to spilling. I will
work on this.
Jack
Re: [PATCH] BackingStoreHashtable
Posted by Jack Klebanoff <kl...@sbcglobal.net>.
Mike Matrigali wrote:
> Thanks for the reply, I'll work with you to get this committed. I will
> wait on the change you are working on. I think that is the best short
> term solution, as you point out there is more work later on to improve
> the work you have done. I would appreciate it if at least one other
> person with experience on the language side take a look at this also.
>
> It has been awhile since I looked at jvm memory stuff, but it use to be
> a problem that totalMemory() would return the memory that the jvm
> currently has, not the amount of memory that it is allowed to have. So
> if you called it after just starting it might return a very small
> number, say 1 meg, even if the jvm was started and told to grow to a max
> of 100 meg. Worse was that the behavior was not consistent across
> JVM/OS combinations.
>
> This memory issue is a real problem as there are a number of things
> that derby could do faster if it knew it could do the whole thing in
> memory, but once you run out of memory it is hard to recover without
> failing the current operation (and quite possibly other derby threads
> and in a server environment other non derby threads).
>
> At one point sun was proposing some jvm interfaces so one could tell if
> you were getting "close" to running out of memory - so that applications
> could take action before errors happened. If such a thing existed then
> something like BackingStoreHashTable could grow in memory more
> aggressively and then if it noticed the impending problem it could spill
> everything to disk, and free up it's current usage.
>
I have modified my patch so that the optimizer and BackingStoreHashtable
use the same decision about when a hash table will spill to disk. The
optimizer calls the JoinStrategy.maxCapacity method to find the maximum
number of rows that the JoinStrategy can handle in a given number of
bytes. It rejects the strategy if the estimated row count is larger.
(Currently the optimizer limits each join to 1M of memory). The
HashJoinStrategy.maxCapacity method divides the maximum byte count by
the sum of the size of one row plus the size of a Hashtable entry. The
NestedLoopJoinStrategy.maxCapacity method always returns
Interer.MAX_VALUE. The HashJoinStrategy.getScanArgs method passes the
maximum capacity to the ResultSetFactory.|getHashScanResultSet method,
so that the actual BackingStoreHashtable will spill to disk when the
optimizer thought that it would. This means that hash joins will not
spill to disk unless the inner table has more rows than the optimizer
estimated.
I also changed the DiskHashtable implementation to pass its
keepAfterCommit parameter on to the
TransactionController.openConglomerate method. Previously DiskHashtable
only used keepAfterCommit to construct the temporaryFlag argument of
||TransactionController.createConglomerate and always passed "false" as
the hold argument of ||TransactionController.openConglomerate.
Since I made changes to the optimizer and hash join code generator I
hope that a Derby language expert can review at least that part of my
updated patch.
||I have not changed the way that BackingStoreHashtable decides when to
spill when its max_inmemory_rowcnt parameter is negative. (Only hash
joins pass a non-negative ||max_inmemory_rowcnt||). As Mike pointed out,
spilling when the in memory hash table grows larger than 1% of
Runtime.totalMemory() is not completely satisfactory. The JVM may be
able to get more memory and totalMemory() is likely to be small soon
after the JVM starts up. However, I do not know of anything that is
better. If totalMemory() grows subsequent ||BackingStoreHashtables will
be able to use more memory. Since ||BackingStoreHashtables are
temporary, this does not seem so bad to me.|
|
Regards
Jack Klebanoff
|||||
Re: [PATCH] BackingStoreHashtable
Posted by Mike Matrigali <mi...@sbcglobal.net>.
Thanks for the reply, I'll work with you to get this committed. I will
wait on the change you are working on. I think that is the best short
term solution, as you point out there is more work later on to improve
the work you have done. I would appreciate it if at least one other
person with experience on the language side take a look at this also.
It has been awhile since I looked at jvm memory stuff, but it use to be
a problem that totalMemory() would return the memory that the jvm
currently has, not the amount of memory that it is allowed to have. So
if you called it after just starting it might return a very small
number, say 1 meg, even if the jvm was started and told to grow to a max
of 100 meg. Worse was that the behavior was not consistent across
JVM/OS combinations.
This memory issue is a real problem as there are a number of things
that derby could do faster if it knew it could do the whole thing in
memory, but once you run out of memory it is hard to recover without
failing the current operation (and quite possibly other derby threads
and in a server environment other non derby threads).
At one point sun was proposing some jvm interfaces so one could tell if
you were getting "close" to running out of memory - so that applications
could take action before errors happened. If such a thing existed then
something like BackingStoreHashTable could grow in memory more
aggressively and then if it noticed the impending problem it could spill
everything to disk, and free up it's current usage.
Jack Klebanoff wrote:
> Mike Matrigali wrote:
>
>> Any idea if many more queries will now spill to disk? I am worried that
>> many more will, and now queries will run slower. Seems like there
>> should be a better connection between the optimizer estimate to use
>> in memory hash and what actually is done in memory. As I understand it
>> now the Optimizer costs hash only as in memory, if it's estimates say
>> it won't fit into memory it chooses a different plan. I assume the
>> other plan is more efficient than a hash plan which may have to go to
>> disk for every probe (even if pages are cached going to latched pages
>> on every probe is a lot more expensive than in memory hash).
>>
>> My guess is no one did the work to set max_inmemory_rowcnt, because the
>> backing store stuff wasn't implemented yet.
>>
>> I have not read the code yet. Is the 1% of all memory, or 1% of free
>> memory?
>
>
> It is 1% of Runtime.getRuntime().totalMemory().
>
> Changing the costing is difficult. The FromBaseTable.estimateCost method
> uses the JoinStrategy|.|multiplyBaseCostByOuterRows method to compute
> the cost of the join as either the cost of constructing the inner table
> or the number of rows in the outer table times the cost of constructing
> the inner table. HashJoinStrategy|.|multiplyBaseCostByOuterRows returns
> false, so the cost of a hash join is estimated as the cost of
> constructing the hash table, and independent of the number of rows in
> the outer table. This is a reasonable approximation if the hash table is
> in memory, the outer table is not enormous, and the hash function does a
> good job.
>
> If some of the hash table is on disk then neither join cost estimate
> method is very good. Then cost of a hash join is approximately
> hashtableBuildCost + outerTableRowCount*hashProbeCost, where
> hashProbeCost is the expected cost of one probe into the hash table.
> This will be a function of the fraction of inner table rows on disk, and
> the number of rows on disk.
>
> The JoinStrategy interface has a method, estimateCost, that looks like
> it could be made to do the trick, but for some reason
> FromBaseTable.estimateCost does not use it.
>
> The estimate of the inner table row count is used to estimate the size
> of the hash table. If it is large we do not use a hash join at all. If
> we try to include the disk access cost in our hash join cost estimate we
> have to use the inner table row count estimate to estimate the fraction
> of rows on disk. But if the inner table passes the maximum size test
> this fraction is zero if the row count estimate is correct, or very
> small if we assume some statistical variation. It doesn't make sense to
> include the disk hash table probe cost in the join cost estimate unless
> we also ditch the maximum size restriction on hash joins. This might be
> a good idea, but I would like to fix BackingStoreHashtable so that
> neither hash joins, nor DISTINCT, nor scroll insensitive cursors blow up
> before tackling join costing.
>
> I think that the right thing to do is for HashJoinStrategy to pass the
> correct maximum in-memory row count to BackingStoreHashtable so that the
> two are following the same playbook with respect to spilling. I will
> work on this.
>
> Jack
>
>