You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@vxquery.apache.org by Eldon Carman <ec...@ucr.edu> on 2013/12/05 01:23:03 UTC

Suggested Rewrite Rules

In previous e-mails we have suggested a few new rewrite rules and I want to
get feedback on them.

---------
The first rewrite rule would merge two unnest child operations into a
single unnest operator.

UNNEST( $v2 : child($v1, "step2") )
  UNNEST($v1 : child($v0, "step1") )
where $v1 is not used in the rest of the plan.

Options for solution:

1. UNNEST( $v2 : child(child($v0, "step1"), "step2") )
or
2. UNNEST( $v2 : child($v0, "step1/step2") )

First, is this optimization best placed in the rewrite rules space?
(consider the compiler, etc.)
Second, which of the solutions should we consider for implementation? Or do
you know something else?

---------
The second rewrite rule would merge unnest child into a data scan operation.

UNNEST( $v1 : child($v0, "step1") )
  DATASCAN( collection( $source ), $v0 )
where $v0 is not used in the rest of the plan.

Options for solution:

DATASCAN( child(collection( $source ), "step1"), $v1 )

where the OperatorDescriptor for DATASCAN would understand the child of
collection.

-----------
The third rule searches for subplans that consume a single item input.

SUBPLAN {
  AGGREGATE($v2 : sequence(%expression($v1)))
    UNNEST($v1 : iterate($v0))
      NESTED_TUPLE_SOURCE
}

if $v0 is a single item, not a sequence. Then rewrite to:

ASSIGN($v2 : %expression($v0))

First, does this rule look correct?
Second, is it worth putting this rule in place?

Re: Suggested Rewrite Rules

Posted by Eldon Carman <ec...@ucr.edu>.
On Thu, Dec 5, 2013 at 11:13 AM, Vinayak Borkar <vi...@gmail.com> wrote:

> On 12/4/13, 4:23 PM, Eldon Carman wrote:
>
>> In previous e-mails we have suggested a few new rewrite rules and I want
>> to
>> get feedback on them.
>>
>> ---------
>> The first rewrite rule would merge two unnest child operations into a
>> single unnest operator.
>>
>> UNNEST( $v2 : child($v1, "step2") )
>>    UNNEST($v1 : child($v0, "step1") )
>> where $v1 is not used in the rest of the plan.
>>
>> Options for solution:
>>
>> 1. UNNEST( $v2 : child(child($v0, "step1"), "step2") )
>> or
>> 2. UNNEST( $v2 : child($v0, "step1/step2") )
>>
>> First, is this optimization best placed in the rewrite rules space?
>> (consider the compiler, etc.)
>> Second, which of the solutions should we consider for implementation? Or
>> do
>> you know something else?
>>
>
> The rewriter is the best place to perform this transformation IMO. During
> translation you may not be able to "realize" this optimization in every
> case where it applies. You have a much better chance of benefiting from
> this rule in the rewriter.
>
> In Algebricks, the UNNEST operator expects an Unnesting Function. On the
> other hand, the input to an Unnesting function is a Scalar Function.
> Unnesting functions implement an iterator API for the UNNEST to consume
> every item without the need to first materialize the whole sequence. In (1)
> the outer child will be invoked as an iterator, but the inner child will be
> invoked as a scalar function leading it to materialize all step1 items.
>
> On the other hand, (2) allows the child function to internally construct
> nested iterators that can concurrently iterate over step1 and for each
> step1 item, iterate over all step2 items.
>
>
I have committed a rule for option one, since it can use the child
functions as they are defined our operator definition. It may be worth
looking at this in more detail for more optimization. My initial test did
not show any significant change the query time with this new rewrite rule.



> ---------
>> The second rewrite rule would merge unnest child into a data scan
>> operation.
>>
>> UNNEST( $v1 : child($v0, "step1") )
>>    DATASCAN( collection( $source ), $v0 )
>> where $v0 is not used in the rest of the plan.
>>
>> Options for solution:
>>
>> DATASCAN( child(collection( $source ), "step1"), $v1 )
>>
>> where the OperatorDescriptor for DATASCAN would understand the child of
>> collection.
>>
>
> DataScan accepts a Source object as its argument. So you cannot pass in a
> function object to it (At least that's how it appears looking at your
> string above). You will need to hold on to some representation of the path
> being pushed into the scan, in the DataSource object implemented in
> VXQuery. When it come time to create the runtime, you can have that passed
> to the getScannerRuntime(...) call in IMetadataProvider as the implObject
> argument.
>
> The rewrite rule will in effect "push" the path needed into the VXQuery
> source object.


I see what your saying about the DataSource object. We will need to pass
more information to the DATASCAN. The folder and what child paths need to
be applied. Was not sure how to represent this as input to the DATASCAN
operator.


>
>> -----------
>> The third rule searches for subplans that consume a single item input.
>>
>> SUBPLAN {
>>    AGGREGATE($v2 : sequence(%expression($v1)))
>>      UNNEST($v1 : iterate($v0))
>>        NESTED_TUPLE_SOURCE
>> }
>>
>> if $v0 is a single item, not a sequence. Then rewrite to:
>>
>> ASSIGN($v2 : %expression($v0))
>>
>> First, does this rule look correct?
>> Second, is it worth putting this rule in place?
>>
>
> How are you going to determine that $v0 is a single item? Which cases will
> that help with?
>
>>
>>
Two things stand out. First the $v0 is applied to a treat statement that
ensures that $v0 has only one item. Second option, an UNNEST operator
generates $v0. Both case show the $v0 to only have a single item.

Re: Suggested Rewrite Rules

Posted by Vinayak Borkar <vi...@gmail.com>.
On 12/4/13, 4:23 PM, Eldon Carman wrote:
> In previous e-mails we have suggested a few new rewrite rules and I want to
> get feedback on them.
>
> ---------
> The first rewrite rule would merge two unnest child operations into a
> single unnest operator.
>
> UNNEST( $v2 : child($v1, "step2") )
>    UNNEST($v1 : child($v0, "step1") )
> where $v1 is not used in the rest of the plan.
>
> Options for solution:
>
> 1. UNNEST( $v2 : child(child($v0, "step1"), "step2") )
> or
> 2. UNNEST( $v2 : child($v0, "step1/step2") )
>
> First, is this optimization best placed in the rewrite rules space?
> (consider the compiler, etc.)
> Second, which of the solutions should we consider for implementation? Or do
> you know something else?

The rewriter is the best place to perform this transformation IMO. 
During translation you may not be able to "realize" this optimization in 
every case where it applies. You have a much better chance of benefiting 
from this rule in the rewriter.

In Algebricks, the UNNEST operator expects an Unnesting Function. On the 
other hand, the input to an Unnesting function is a Scalar Function. 
Unnesting functions implement an iterator API for the UNNEST to consume 
every item without the need to first materialize the whole sequence. In 
(1) the outer child will be invoked as an iterator, but the inner child 
will be invoked as a scalar function leading it to materialize all step1 
items.

On the other hand, (2) allows the child function to internally construct 
nested iterators that can concurrently iterate over step1 and for each 
step1 item, iterate over all step2 items.



>
> ---------
> The second rewrite rule would merge unnest child into a data scan operation.
>
> UNNEST( $v1 : child($v0, "step1") )
>    DATASCAN( collection( $source ), $v0 )
> where $v0 is not used in the rest of the plan.
>
> Options for solution:
>
> DATASCAN( child(collection( $source ), "step1"), $v1 )
>
> where the OperatorDescriptor for DATASCAN would understand the child of
> collection.

DataScan accepts a Source object as its argument. So you cannot pass in 
a function object to it (At least that's how it appears looking at your 
string above). You will need to hold on to some representation of the 
path being pushed into the scan, in the DataSource object implemented in 
VXQuery. When it come time to create the runtime, you can have that 
passed to the getScannerRuntime(...) call in IMetadataProvider as the 
implObject argument.

The rewrite rule will in effect "push" the path needed into the VXQuery 
source object.


>
> -----------
> The third rule searches for subplans that consume a single item input.
>
> SUBPLAN {
>    AGGREGATE($v2 : sequence(%expression($v1)))
>      UNNEST($v1 : iterate($v0))
>        NESTED_TUPLE_SOURCE
> }
>
> if $v0 is a single item, not a sequence. Then rewrite to:
>
> ASSIGN($v2 : %expression($v0))
>
> First, does this rule look correct?
> Second, is it worth putting this rule in place?

How are you going to determine that $v0 is a single item? Which cases 
will that help with?


>