You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@calcite.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2019/02/20 01:28:00 UTC

[jira] [Updated] (CALCITE-2828) Handle cost propagation properly in Volcano Planner

     [ https://issues.apache.org/jira/browse/CALCITE-2828?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

ASF GitHub Bot updated CALCITE-2828:
------------------------------------
    Labels: pull-request-available  (was: )

> Handle cost propagation properly in Volcano Planner
> ---------------------------------------------------
>
>                 Key: CALCITE-2828
>                 URL: https://issues.apache.org/jira/browse/CALCITE-2828
>             Project: Calcite
>          Issue Type: Bug
>            Reporter: Siddharth Teotia
>            Assignee: Julian Hyde
>            Priority: Major
>              Labels: pull-request-available
>
> When getCost(rel) is called, a node's nonCumulativeCost() is computed. When using CachingRelMetadataProvider is used, metadata is cached (rowCount, cost, etc.) for future use. In order to make sure that we do not use stale metadata, each RelOptPlanner provides getRelMetadataTimestamp(rel) which is used to invalidate the cache (if the cached entry has timestamp != getRelMetadataTimestamp(rel), it is not used.
>  
> The problem in this case was due to the fact that VolcanoPlanner uses the rel's current RelSubset's timestamp as getRelMetadataTimestamp(). Since a rel can belong to multiple RelSubset, this results in inconsistent cache hits/misses. For example, if a rel belongs to RelSubset#1 and RelSubset#2 with relMetadataTimestamp of 1 and 2, respectively. If rel happens to update its cost with RelSubset#1 first, then the cache will be updated with timestamp 1 so when the same rel in RelSubset#2's context try to look up its metadata, it will fail. This results in inefficient use of the cache. The main problem occurs when we get incorrect cache hits (e.g. previous iteration of metadata query on RelSubset#2 populated the cache with timestamp 2, but later in the context of RelSubset#1, we think there is a valid cache and use the stale metadata)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)