You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@calcite.apache.org by "Siddharth Teotia (JIRA)" <ji...@apache.org> on 2019/02/04 22:34:00 UTC

[jira] [Created] (CALCITE-2828) Handle cost propagation properly in Volcano Planner

Siddharth Teotia created CALCITE-2828:
-----------------------------------------

             Summary: Handle cost propagation properly in Volcano Planner
                 Key: CALCITE-2828
                 URL: https://issues.apache.org/jira/browse/CALCITE-2828
             Project: Calcite
          Issue Type: Bug
            Reporter: Siddharth Teotia
            Assignee: Julian Hyde


When getCost(rel) is called, a node's nonCumulativeCost() is computed. When using CachingRelMetadataProvider is used, metadata is cached (rowCount, cost, etc.) for future use. In order to make sure that we do not use stale metadata, each RelOptPlanner provides getRelMetadataTimestamp(rel) which is used to invalidate the cache (if the cached entry has timestamp != getRelMetadataTimestamp(rel), it is not used.

 

The problem in this case was due to the fact that VolcanoPlanner uses the rel's current RelSubset's timestamp as getRelMetadataTimestamp(). Since a rel can belong to multiple RelSubset, this results in inconsistent cache hits/misses. For example, if a rel belongs to RelSubset#1 and RelSubset#2 with relMetadataTimestamp of 1 and 2, respectively. If rel happens to update its cost with RelSubset#1 first, then the cache will be updated with timestamp 1 so when the same rel in RelSubset#2's context try to look up its metadata, it will fail. This results in inefficient use of the cache. The main problem occurs when we get incorrect cache hits (e.g. previous iteration of metadata query on RelSubset#2 populated the cache with timestamp 2, but later in the context of RelSubset#1, we think there is a valid cache and use the stale metadata)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)