You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@calcite.apache.org by "Jesús Camacho Rodríguez (JIRA)" <ji...@apache.org> on 2014/11/26 13:03:12 UTC

[jira] [Comment Edited] (CALCITE-481) Add "Spool" operator, to allow re-use of relational expressions

    [ https://issues.apache.org/jira/browse/CALCITE-481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14226075#comment-14226075 ] 

Jesús Camacho Rodríguez edited comment on CALCITE-481 at 11/26/14 12:02 PM:
----------------------------------------------------------------------------

Thanks for opening this issue [~julianhyde]. I think a _spool_ operator would be an important addition to Calcite.

A couple of pointers on how the _spool_ operator can be used to accelerate query execution: [here|http://www.dbis.informatik.hu-berlin.de/fileadmin/lectures/SS2008/Seminar_MatViews/p533-zhou.pdf] and [here|http://research.microsoft.com/en-us/um/people/jrzhou/pub/scope-vldbj.pdf]. I also found [this blog post|http://sqlblog.com/blogs/rob_farley/archive/2013/06/11/spooling-in-sql-execution-plans.aspx] talking about the integration of the _spool_ operator within MS SQL Server.

I think those links give a neat idea of how the _spool_ operator could be implemented \-both logically and physically\- to bring benefits to query execution.

One aspect that we could discuss is whether we need to have two versions of the operator at the logical level as they do (_eager_ and _lazy_) or a single one. IMO, eager or lazy seems a physical aspect, so probably a single version of the operator would be enough. What do you think?


was (Author: jcamachorodriguez):
Thanks for opening this issue [~julianhyde]. I think a _spool_ operator would be an important addition to Calcite.

A couple of pointers on how the _spool_ operator can be used to accelerate query execution: [here|http://www.dbis.informatik.hu-berlin.de/fileadmin/lectures/SS2008/Seminar_MatViews/p533-zhou.pdf] and [here|http://research.microsoft.com/en-us/um/people/jrzhou/pub/scope-vldbj.pdf]. I also found [this blog post|http://sqlblog.com/blogs/rob_farley/archive/2013/06/11/spooling-in-sql-execution-plans.aspx] talking about the integration of the _spool_ operator within MS SQL Server.

I think those links give a neat idea of how the _spool_ operator could be implemented -both logically and physically- to bring benefits to query execution.

One aspect that we could discuss is whether we need to have two versions of the operator at the logical level as they do (_eager_ and _lazy_) or a single one. IMO, eager or lazy seems a physical aspect, so probably a single version of the operator would be enough. What do you think?

> Add "Spool" operator, to allow re-use of relational expressions
> ---------------------------------------------------------------
>
>                 Key: CALCITE-481
>                 URL: https://issues.apache.org/jira/browse/CALCITE-481
>             Project: Calcite
>          Issue Type: Bug
>            Reporter: Julian Hyde
>            Assignee: Julian Hyde
>
> If a sub-tree occurs more than once in a query an efficient plan would probably evaluate once and have two readers read the same data. We propose a "Spool" relational expression for this purpose.
> Spool would have one input, the expression that populates it.
> In the VolcanoPlanner, any RelNode can already have multiple consumers (each of which sees the same row type and the same data) but an optimal plan does not typically include multiple uses of the same node, so most implementors (e.g. EnumerableRelImplementor) would just not notice, and generate the same code twice. Having an explicit Spool would alert the implementor to re-use the result.
> We do not prescribe a mechanism for implementing Spool as a physical operator. A job that populates a temporary table is one possible mechanism.
> As part of this case, we should implement Spool in Enumerable convention, and use it to evaluate some test queries.
> The other reason to implement Spool is costing. The cost of a Spool with N consumers is typically something like A + B . N. A, the fixed cost, is significantly larger than B, the re-play cost.
> Volcano's dynamic programming model does not make it easy to account for re-use. There are approaches in academia based on integer linear programming; see e.g. http://www.slideshare.net/INRIA-OAK/plreuse 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)