You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@calcite.apache.org by Raajay <ra...@gmail.com> on 2015/08/08 08:42:14 UTC

Calcite returning multiple operator trees

Hello,

I am just starting to explore Calcite, especially related to its
application to Hive. As I understand it, Calcite takes as input the
following:

1. An SQL operator tree
2. A set of rules (pattern based) that define alterations to the three
3. A cost model for basic operations.


Given these, calcite explores the space of all possible alterations (based
on the rules) and returns the one with least cost. Is there an interface
for Calcite to return multiple Operator trees, say for example, all trees
with cost below a threshold ?

If yes, pointers to it would be great.

Thanks
Raajay

Re: Calcite returning multiple operator trees

Posted by Julian Hyde <jh...@gmail.com>.
Yes, I guess you could keep the tree generated at each step in HepPlanner. Try it, and let us know what you come up with.

> On Aug 11, 2015, at 11:36 AM, Raajay <ra...@gmail.com> wrote:
> 
> Ah! Okay, good to know :) I was thinking it perhaps would be possible by
> changing the "GarbageCollection" in HepPlanner.
> 
> Thanks again,
> Raajay
> 
> On Tue, Aug 11, 2015 at 1:24 PM, Julian Hyde <jh...@apache.org> wrote:
> 
>> You are correct. The strategy to obtain alternate plans from
>> HepPlanner is - don't use HepPlanner.  :)
>> 
>> On Tue, Aug 11, 2015 at 9:52 AM, Raajay <ra...@gmail.com> wrote:
>>> Okay got it. Thanks a lot!
>>> 
>>> As a follow up, what would be the strategy to obtain multiple operator
>>> trees while using a HepPlanner ? I am interested in HepPlanner because it
>>> is used in Hive Extensively.
>>> 
>>> My understanding from reading the code for HepPlanner is that obtaining
>>> alternate plans with higher cost might not be possible. Every node in the
>>> graph contains only one RelNode (not notion of "best") and thus any
>>> traversal will give only one plan.
>>> 
>>> Thanks,
>>> Raajay
>>> 
>>> On Mon, Aug 10, 2015 at 4:29 PM, Julian Hyde <jh...@apache.org> wrote:
>>> 
>>>> There isn’t an interface as such. But you could modify the code to
>> achieve
>>>> that behavior.
>>>> 
>>>> All of the various plans are held in the graph of the VolcanoPlanner
>>>> consisting of RelSet, RelSubset and RelNode objects.
>>>> RelSubset.CheapestPlanReplacer walks over that graph finding the
>> cheapest.
>>>> It uses the RelSubset.best field but you could create a similar class
>> that
>>>> return all plans within K of the best cost.
>>>> 
>>>> A few words of caution. Each path through the graph is another plan, and
>>>> there is a huge number of combinations, and many of the combinations
>> will
>>>> be very similar to each other. You should keep the threshold small, and
>>>> filter out plans that are not very different.
>>>> 
>>>> Julian
>>>> 
>>>> 
>>>>> On Aug 7, 2015, at 11:42 PM, Raajay <ra...@gmail.com> wrote:
>>>>> 
>>>>> Hello,
>>>>> 
>>>>> I am just starting to explore Calcite, especially related to its
>>>>> application to Hive. As I understand it, Calcite takes as input the
>>>>> following:
>>>>> 
>>>>> 1. An SQL operator tree
>>>>> 2. A set of rules (pattern based) that define alterations to the three
>>>>> 3. A cost model for basic operations.
>>>>> 
>>>>> 
>>>>> Given these, calcite explores the space of all possible alterations
>>>> (based
>>>>> on the rules) and returns the one with least cost. Is there an
>> interface
>>>>> for Calcite to return multiple Operator trees, say for example, all
>> trees
>>>>> with cost below a threshold ?
>>>>> 
>>>>> If yes, pointers to it would be great.
>>>>> 
>>>>> Thanks
>>>>> Raajay
>>>> 
>>>> 
>> 


Re: Calcite returning multiple operator trees

Posted by Raajay <ra...@gmail.com>.
Ah! Okay, good to know :) I was thinking it perhaps would be possible by
changing the "GarbageCollection" in HepPlanner.

Thanks again,
Raajay

On Tue, Aug 11, 2015 at 1:24 PM, Julian Hyde <jh...@apache.org> wrote:

> You are correct. The strategy to obtain alternate plans from
> HepPlanner is - don't use HepPlanner.  :)
>
> On Tue, Aug 11, 2015 at 9:52 AM, Raajay <ra...@gmail.com> wrote:
> > Okay got it. Thanks a lot!
> >
> > As a follow up, what would be the strategy to obtain multiple operator
> > trees while using a HepPlanner ? I am interested in HepPlanner because it
> > is used in Hive Extensively.
> >
> > My understanding from reading the code for HepPlanner is that obtaining
> > alternate plans with higher cost might not be possible. Every node in the
> > graph contains only one RelNode (not notion of "best") and thus any
> > traversal will give only one plan.
> >
> > Thanks,
> > Raajay
> >
> > On Mon, Aug 10, 2015 at 4:29 PM, Julian Hyde <jh...@apache.org> wrote:
> >
> >> There isn’t an interface as such. But you could modify the code to
> achieve
> >> that behavior.
> >>
> >> All of the various plans are held in the graph of the VolcanoPlanner
> >> consisting of RelSet, RelSubset and RelNode objects.
> >> RelSubset.CheapestPlanReplacer walks over that graph finding the
> cheapest.
> >> It uses the RelSubset.best field but you could create a similar class
> that
> >> return all plans within K of the best cost.
> >>
> >> A few words of caution. Each path through the graph is another plan, and
> >> there is a huge number of combinations, and many of the combinations
> will
> >> be very similar to each other. You should keep the threshold small, and
> >> filter out plans that are not very different.
> >>
> >> Julian
> >>
> >>
> >> > On Aug 7, 2015, at 11:42 PM, Raajay <ra...@gmail.com> wrote:
> >> >
> >> > Hello,
> >> >
> >> > I am just starting to explore Calcite, especially related to its
> >> > application to Hive. As I understand it, Calcite takes as input the
> >> > following:
> >> >
> >> > 1. An SQL operator tree
> >> > 2. A set of rules (pattern based) that define alterations to the three
> >> > 3. A cost model for basic operations.
> >> >
> >> >
> >> > Given these, calcite explores the space of all possible alterations
> >> (based
> >> > on the rules) and returns the one with least cost. Is there an
> interface
> >> > for Calcite to return multiple Operator trees, say for example, all
> trees
> >> > with cost below a threshold ?
> >> >
> >> > If yes, pointers to it would be great.
> >> >
> >> > Thanks
> >> > Raajay
> >>
> >>
>

Re: Calcite returning multiple operator trees

Posted by Julian Hyde <jh...@apache.org>.
You are correct. The strategy to obtain alternate plans from
HepPlanner is - don't use HepPlanner.  :)

On Tue, Aug 11, 2015 at 9:52 AM, Raajay <ra...@gmail.com> wrote:
> Okay got it. Thanks a lot!
>
> As a follow up, what would be the strategy to obtain multiple operator
> trees while using a HepPlanner ? I am interested in HepPlanner because it
> is used in Hive Extensively.
>
> My understanding from reading the code for HepPlanner is that obtaining
> alternate plans with higher cost might not be possible. Every node in the
> graph contains only one RelNode (not notion of "best") and thus any
> traversal will give only one plan.
>
> Thanks,
> Raajay
>
> On Mon, Aug 10, 2015 at 4:29 PM, Julian Hyde <jh...@apache.org> wrote:
>
>> There isn’t an interface as such. But you could modify the code to achieve
>> that behavior.
>>
>> All of the various plans are held in the graph of the VolcanoPlanner
>> consisting of RelSet, RelSubset and RelNode objects.
>> RelSubset.CheapestPlanReplacer walks over that graph finding the cheapest.
>> It uses the RelSubset.best field but you could create a similar class that
>> return all plans within K of the best cost.
>>
>> A few words of caution. Each path through the graph is another plan, and
>> there is a huge number of combinations, and many of the combinations will
>> be very similar to each other. You should keep the threshold small, and
>> filter out plans that are not very different.
>>
>> Julian
>>
>>
>> > On Aug 7, 2015, at 11:42 PM, Raajay <ra...@gmail.com> wrote:
>> >
>> > Hello,
>> >
>> > I am just starting to explore Calcite, especially related to its
>> > application to Hive. As I understand it, Calcite takes as input the
>> > following:
>> >
>> > 1. An SQL operator tree
>> > 2. A set of rules (pattern based) that define alterations to the three
>> > 3. A cost model for basic operations.
>> >
>> >
>> > Given these, calcite explores the space of all possible alterations
>> (based
>> > on the rules) and returns the one with least cost. Is there an interface
>> > for Calcite to return multiple Operator trees, say for example, all trees
>> > with cost below a threshold ?
>> >
>> > If yes, pointers to it would be great.
>> >
>> > Thanks
>> > Raajay
>>
>>

Re: Calcite returning multiple operator trees

Posted by Raajay <ra...@gmail.com>.
Okay got it. Thanks a lot!

As a follow up, what would be the strategy to obtain multiple operator
trees while using a HepPlanner ? I am interested in HepPlanner because it
is used in Hive Extensively.

My understanding from reading the code for HepPlanner is that obtaining
alternate plans with higher cost might not be possible. Every node in the
graph contains only one RelNode (not notion of "best") and thus any
traversal will give only one plan.

Thanks,
Raajay

On Mon, Aug 10, 2015 at 4:29 PM, Julian Hyde <jh...@apache.org> wrote:

> There isn’t an interface as such. But you could modify the code to achieve
> that behavior.
>
> All of the various plans are held in the graph of the VolcanoPlanner
> consisting of RelSet, RelSubset and RelNode objects.
> RelSubset.CheapestPlanReplacer walks over that graph finding the cheapest.
> It uses the RelSubset.best field but you could create a similar class that
> return all plans within K of the best cost.
>
> A few words of caution. Each path through the graph is another plan, and
> there is a huge number of combinations, and many of the combinations will
> be very similar to each other. You should keep the threshold small, and
> filter out plans that are not very different.
>
> Julian
>
>
> > On Aug 7, 2015, at 11:42 PM, Raajay <ra...@gmail.com> wrote:
> >
> > Hello,
> >
> > I am just starting to explore Calcite, especially related to its
> > application to Hive. As I understand it, Calcite takes as input the
> > following:
> >
> > 1. An SQL operator tree
> > 2. A set of rules (pattern based) that define alterations to the three
> > 3. A cost model for basic operations.
> >
> >
> > Given these, calcite explores the space of all possible alterations
> (based
> > on the rules) and returns the one with least cost. Is there an interface
> > for Calcite to return multiple Operator trees, say for example, all trees
> > with cost below a threshold ?
> >
> > If yes, pointers to it would be great.
> >
> > Thanks
> > Raajay
>
>

Re: Calcite returning multiple operator trees

Posted by Julian Hyde <jh...@apache.org>.
There isn’t an interface as such. But you could modify the code to achieve that behavior.

All of the various plans are held in the graph of the VolcanoPlanner consisting of RelSet, RelSubset and RelNode objects.  RelSubset.CheapestPlanReplacer walks over that graph finding the cheapest. It uses the RelSubset.best field but you could create a similar class that  return all plans within K of the best cost.

A few words of caution. Each path through the graph is another plan, and there is a huge number of combinations, and many of the combinations will be very similar to each other. You should keep the threshold small, and filter out plans that are not very different.

Julian


> On Aug 7, 2015, at 11:42 PM, Raajay <ra...@gmail.com> wrote:
> 
> Hello,
> 
> I am just starting to explore Calcite, especially related to its
> application to Hive. As I understand it, Calcite takes as input the
> following:
> 
> 1. An SQL operator tree
> 2. A set of rules (pattern based) that define alterations to the three
> 3. A cost model for basic operations.
> 
> 
> Given these, calcite explores the space of all possible alterations (based
> on the rules) and returns the one with least cost. Is there an interface
> for Calcite to return multiple Operator trees, say for example, all trees
> with cost below a threshold ?
> 
> If yes, pointers to it would be great.
> 
> Thanks
> Raajay