You are viewing a plain text version of this content. The canonical link for it is here.
Posted to derby-dev@db.apache.org by "Mike Matrigali (JIRA)" <de...@db.apache.org> on 2006/10/04 01:12:21 UTC

[jira] Updated: (DERBY-1908) Investigate: What's the "unit" for optimizer cost estimates?

     [ http://issues.apache.org/jira/browse/DERBY-1908?page=all ]

Mike Matrigali updated DERBY-1908:
----------------------------------


Here is the "units" view from the storage layer, which I believe should be the basis for all the optimizer costs.  

The actual interface does not specify a unit.  This originally was a decision to allow for a number of different
implementations.  The guarantee was that across all calls one could compare the cost to another cost and
get reasonable results.  Having said that the actual implementation of the costs returned by store have always
been based on ms. elapsed time for a set of basic  operations.  These basic operations were run and then 
a set of constants defined.  The last time this was done was quite awhile ago, on probably a 400mhz machine.

The "hidden" unit of ms. was broken when the optimizer added timeout - which is basically a decision to stop
optimizing once the estimated cost is less than the elapsed time of the compile.  At this point something outside
the interface assumed the unit was ms.

I think a good direction would be to change the interfaces to somehow try to support costs as truly elapsed time, 
fix at the least the defaults to be based on a modern machine, fix any optimizer code that may not be currently
treating the cost unit correctly (like multiplying a cost by a cost), and maybe look at dynamically sizing the costs based
on current machine operations.  

I will look around for the old unit tests that produced the original costs.  You can see the constants used in
C:/p4/m1/opensource/java/engine/org/apache/derby/iapi/store/access/StoreCostController.java.  

> Investigate: What's the "unit" for optimizer cost estimates?
> ------------------------------------------------------------
>
>                 Key: DERBY-1908
>                 URL: http://issues.apache.org/jira/browse/DERBY-1908
>             Project: Derby
>          Issue Type: Task
>          Components: SQL, Performance
>            Reporter: A B
>
> Derby optimizer decisions are necessarily based on cost estimates.  But what are "units" for these cost estimates?  There is logic in OptimizerImpl.getNextPermutation() that treats cost estimates as if their unit is milliseconds--but is that really the case?
> The answer to that question may in fact be "Yes, the units are milliseconds"--and maybe the unexpected cost estimates that are sometimes seen are really caused by something else (ex. DERBY-1905).  But if that's the case, it would be great to look at the optimizer costing code (see esp. FromBaseTable.estimateCost()) to verify that all of the "magic" of costing really makes sense given that the underlying unit is supposed to be milliseconds.
> Also, if the stats/cost estimate calculations are truly meant to be in terms of milliseconds, I can't help but wonder on what machine/criteria the determination of milliseconds is based.  Is it time to update the stats for "modern" machines, or perhaps (shooting for the sky) to dynamically adjust the millisecond stats based on the machine that's running Derby and use the adjusted values somehow?  I have no answers to these questions, but I think it would be great if someone out there was inclined to discuss/investigate these kinds of questions a bit more...

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira