You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@calcite.apache.org by Julian Hyde <jh...@apache.org> on 2015/02/22 10:47:46 UTC

Review calcite-445

Vladimir,

Can you please review
https://github.com/julianhyde/incubator-calcite/tree/calcite-445. I have
changed the rules that transform scans of FilterableTable and
ProjectableFilterableTable into something that can be executed, shifting
more of the complex runtime logic (e.g. negotiating which filters and
projects to push down) into BindableTableScan.

This change lays the foundations for streaming queries, such as being able
to query both the current contents of a table and future deltas to it (i.e.
its stream). You can see that work on
https://github.com/julianhyde/incubator-calcite/tree/chi. I couldn’t manage
to get RelOptTable.getExpression(Class) to generate a different expression
for the two forms of the same table, and switching from EnumerableTableScan
to BindableTableScan made life much easier.

I also added “create” methods for quite a few more RelNode sub-classes. I
think this will become the recommended way of creating each kind of RelNode.

Julian

Re: Review calcite-445

Posted by Julian Hyde <ju...@gmail.com>.
Vladimir,

Per your earlier suggestion, I have broken up the create2 method. See the latest commit in the calcite-445 branch.

> On Feb 22, 2015, at 10:08 AM, Vladimir Sitnikov <si...@gmail.com> wrote:
> 
>> Longer term I would like EnumerableTableScan to work only on top of a QueryableTable
> Why's that?
> If there is no expression, you can just EnumerableRelImplementor.stash
> the "any_table" and that is it. No expression is required.
> Just-in-time stashing to rescue.
> 
> However, I do agree it makes sense to dodge
> RelOptTableImpl.getExpression: I agree there are cases when expression
> is known only at parse-time (e.g. as a result of
> org.apache.calcite.adapter.enumerable.EnumerableRelImplementor#stash)

Thanks for that concession. There will be cases for all 3: Enumerable with stash, using Bindable convention and no code generation, and full code generation. I just want Calcite users to be able to choose.

>> You can't understand how broken it was unless you run testStream and try to get it working
> 
> Our typical strategy was "test is a detailed design", however for
> streamable it is not yet the case.
> 
> I do not like how you sneak BINDABLE rules in while "ENABLE_BINDABLE=false".
> Can you clarify what do you mean by stream tables? How it should work?

I don't have time to explain all of it. The work isn't finished yet. Watch the chi branch, issue CALCITE-602, the dev list and the samza dev list.

> I cannot understand how it is supposed to work. All the rules just
> eliminate LogicalDelta.

For now, yes.

> I believe I can kill all the bindable rules and manage testStream to
> fly, however I do not want to get my hands dirty on the thing that
> should work in unknown to me fashion.

Good -- it would be great if you could make these cases work with Interpreter disabled. Don't kill the bindable rules, just disable them.

> Is the sole purpose of StreamableTable that it returns a table on demand?
> How it is different from regular table/table macro/table function?

Streaming is a significant extension to relational algebra. Analogous to adding differentiation and integration to the algebra of polynomial functions. You can't emulate the delta and chi operators in terms of existing ones. Read http://ilpubs.stanford.edu:8090/758/1/2003-67.pdf for more of the theory.

> 
>> ImmutableIntList#identity create an AbstractList<Integer> (
> I do not follow you. ImmutableIntList#identity should create just
> ImmutableIntList. What's verbose and less efficient with that?

It can't do it in terms of #range because range does not return an ImmutableIntList.

>> This is emulating the behavior of ReflectiveTable and FieldSelector. I put it in to make existing tests work.
> Why emulate if you can reuse?
> Who guarantees that those implementations would not diverge in the future?

I don't believe reuse is possible, otherwise I would have done it.

Julian


Re: Review calcite-445

Posted by Vladimir Sitnikov <si...@gmail.com>.
>Longer term I would like EnumerableTableScan to work only on top of a QueryableTable
Why's that?
If there is no expression, you can just EnumerableRelImplementor.stash
the "any_table" and that is it. No expression is required.
Just-in-time stashing to rescue.

However, I do agree it makes sense to dodge
RelOptTableImpl.getExpression: I agree there are cases when expression
is known only at parse-time (e.g. as a result of
org.apache.calcite.adapter.enumerable.EnumerableRelImplementor#stash)

> You can't understand how broken it was unless you run testStream and try to get it working

Our typical strategy was "test is a detailed design", however for
streamable it is not yet the case.

I do not like how you sneak BINDABLE rules in while "ENABLE_BINDABLE=false".
Can you clarify what do you mean by stream tables? How it should work?

I cannot understand how it is supposed to work. All the rules just
eliminate LogicalDelta.
I believe I can kill all the bindable rules and manage testStream to
fly, however I do not want to get my hands dirty on the thing that
should work in unknown to me fashion.

Is the sole purpose of StreamableTable that it returns a table on demand?
How it is different from regular table/table macro/table function?

> ImmutableIntList#identity create an AbstractList<Integer> (
I do not follow you. ImmutableIntList#identity should create just
ImmutableIntList. What's verbose and less efficient with that?

> Each "scan" creates an Enumerable, not an Enumerator. Enumerable does not acquire resources
My bad. Enumerable does not provide a way to release resources indeed.

>This is emulating the behavior of ReflectiveTable and FieldSelector. I put it in to make existing tests work.
Why emulate if you can reuse?
Who guarantees that those implementations would not diverge in the future?


Vladimir

Re: Review calcite-445

Posted by Julian Hyde <ju...@hydromatic.net>.
> On Feb 22, 2015, at 4:11 AM, Vladimir Sitnikov <si...@gmail.com> wrote:
> 
> Hi,
> 
>> This change lays the foundations for streaming queries, such as being able
>> to query both the current contents of a table and future deltas to it
> 
> I am sorry, I cannot yet understand how BINDABLE projects/filters are
> the foundation for chi.
> 
>> to generate a different expression
>> for the two forms of the same table,
> 
> Can you pin-point the part of the code?
> As far as I can find, you always call StreamableTable.stream() before
> usage, so I cannot understand why do you need several expressions.

You can see my various previous attempts in the chi.2 branch. You can't understand how broken it was unless you run testStream and try to get it working. At one point I went as far as to add a "boolean stream" field to TableScan, which would have been awful.

The crux is this: EnumerableTableScan has to generate an Expression, and the expression is generated by RelOptTable.getExpression(Class). Therefore when I called Table.stream() I was forced to create a new RelOptTableImpl with the a new expression that somehow represented the fact that I was asking for the stream rather than the base table.

Things only got better when I allowed RelOptTable.getExpression to return null. For a long time Jacques has been pointing out that regular tables should not need to have linq4j expressions.

Longer term I would like EnumerableTableScan to work only on top of a QueryableTable. The expression will come from the QueryableTable and RelOptTable.getExpression(Class) will become obsolete.

>> more of the complex runtime logic (e.g. negotiating which filters and
>> projects to push down) into BindableTableScan.
> 
> You mean TableScanNode, don't you? (its create method is a bit too long)
> 
> It is interesting that TableScanNode can rescan _multiple_ times in
> case projects are too aggressive.
> It tuns out to be:
> 1) Functional issue: enumerable1 is not closed in case of rescan. This
> might lead to resource leak.

Yeah, I'm actually quite proud of those multiple attempts to find filters. It allows the ProjectableFilterableTable to be flaky (changing its mind about which filters it can execute) but will always terminate, and projects as few columns as possible.

Each "scan" creates an Enumerable, not an Enumerator. Enumerable does not acquire resources and hence does not have a close() method.

> 2) Documentation/design flaw. It probably would be nice if PFT could
> return more projections than asked.

If PFT were able to negotiate projections it would be a different interface, and more complex for users to implement.

> In general change in calcite-445 looks good, however as you put in
> javadoc of CalcSplitRule: "Not enabled by default, as it works against
> the usual flow". It is bad cost does not accomodate the number of
> pushed filters, etc.
> 
> Comments regarding the code:
> c1) TableScanNode seems to pretend it knows how to map Class to field
> list. I'm not sure if it is in line with table.rowType.
> https://github.com/julianhyde/incubator-calcite/blob/bd0b606178252c90805aff49851f4c9d5eea1a0a/core/src/main/java/org/apache/calcite/interpreter/TableScanNode.java#L180

This is emulating the behavior of ReflectiveTable and FieldSelector. I put it in to make existing tests work.

> c2) ImmutableIntList#identity might probably be better expressed in
> terms of ImmutableIntList#range (that is avoid creation of int[])

I thought about having ImmutableIntList#identity create an AbstractList<Integer> (as range does). But for code (such as TableScanNode.create2) that requires an ImmutableIntList you'd have to write "ImmutableIntList.copyOf(ImmutableIntList.identity(x))". That is verbose, less efficient, and ends up creating the int[] in the end.

Julian

Re: Review calcite-445

Posted by Vladimir Sitnikov <si...@gmail.com>.
Hi,

> This change lays the foundations for streaming queries, such as being able
> to query both the current contents of a table and future deltas to it

I am sorry, I cannot yet understand how BINDABLE projects/filters are
the foundation for chi.

>  to generate a different expression
> for the two forms of the same table,

Can you pin-point the part of the code?
As far as I can find, you always call StreamableTable.stream() before
usage, so I cannot understand why do you need several expressions.

> more of the complex runtime logic (e.g. negotiating which filters and
> projects to push down) into BindableTableScan.

You mean TableScanNode, don't you? (its create method is a bit too long)

It is interesting that TableScanNode can rescan _multiple_ times in
case projects are too aggressive.
It tuns out to be:
1) Functional issue: enumerable1 is not closed in case of rescan. This
might lead to resource leak.
2) Documentation/design flaw. It probably would be nice if PFT could
return more projections than asked.

In general change in calcite-445 looks good, however as you put in
javadoc of CalcSplitRule: "Not enabled by default, as it works against
the usual flow". It is bad cost does not accomodate the number of
pushed filters, etc.

Comments regarding the code:
c1) TableScanNode seems to pretend it knows how to map Class to field
list. I'm not sure if it is in line with table.rowType.
https://github.com/julianhyde/incubator-calcite/blob/bd0b606178252c90805aff49851f4c9d5eea1a0a/core/src/main/java/org/apache/calcite/interpreter/TableScanNode.java#L180
c2) ImmutableIntList#identity might probably be better expressed in
terms of ImmutableIntList#range (that is avoid creation of int[])

Vladimir