You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@asterixdb.apache.org by mingda li <li...@gmail.com> on 2016/10/31 18:16:32 UTC

About the Multiple Join Optimization on AsterixDB

Dear all,

Hi, I am working on multiple join on Hyracks level. I am not sure if I do
the multiple join on AsterixDB, whether it will optimize the query by
changing the join order or just execute according to how we write the
query. I think this may not be done in Algebricks level based on rule but
not sure.

Bests,
Mingda

Re: About the Multiple Join Optimization on AsterixDB

Posted by mingda li <li...@gmail.com>.
Yeah, actually, I have tested the multiple join on TPCDS dataset (1g) on
UCLA‘s AsterixDB cluster (16 nodes).
Two different join orders as following:

SELECT COUNT(*) FROM (SELECT * FROM catalog_sales cs1 JOIN catalog_returns
cr1 ON (cs1.cs_order_number = cr1.cr_order_number AND cs1.cs_item_sk =
cr1.cr_item_sk))  JOIN inventory i1 ON i1.inv_item_sk = cs1.cs_item_sk;

SELECT COUNT(*) FROM (SELECT * FROM catalog_sales cs1 JOIN inventory i1 ON
cs1.cs_item_sk = i1.inv_item_sk) JOIN catalog_returns cr1 ON
(cs1.cs_order_number = cr1.cr_order_number AND cs1.cs_item_sk =
cr1.cr_item_sk);


*Good join order just needs 2.613  sec while the bad one needs 35.123 sec. *


*I am also building bloom filter and sampler in Hyracks level to collect
stats to try optimize the query on hyracks.*


*Bests,*

*Mingda*

On Mon, Oct 31, 2016 at 1:22 PM, Ildar Absalyamov <
ildar.absalyamov@gmail.com> wrote:

> As Yingyi pointed out we don't reorder joins because the framework for
> stats and cardinalities is not there yet.
> However what we can do in an meantime is to provide an interface for the
> statistical information, needed for join reordering, independent of the way
> the stats were collected (either sampling-based or LSM-based) and work out
> details of the cost model.
>
> 2016-10-31 11:30 GMT-07:00 mingda li <li...@gmail.com>:
>
> > Hi Yingyi,
> >
> > I see. Thanks for your reply:-)
> >
> > Bests,
> > Mingda
> >
> >
> > On Mon, Oct 31, 2016 at 11:23 AM, Yingyi Bu <bu...@gmail.com> wrote:
> >
> > > Mingda,
> > >
> > >      I'm not sure how much re-ordering can be done at the Hyracks
> level,
> > > i.e., the runtime level.
> > >      In the optimizer (the asterixdb/algebricks level), we don't have
> > > re-ordering for  joins, because:
> > >      --- the cost model has not been added yet.  I'm not sure about the
> > > timeline for this. @Ildar?
> > >      --- respecting user-specified join orders is important for certain
> > > cases, for example, to get stable/predictable performance (zero
> surprise)
> > > for applications.
> > >
> > >      In the runtime, we have a role-reversal optimization in hybrid
> hash
> > > join, which is a safe optimization that is not based on estimations.
> You
> > > can look at OptimizedHybridHashJoin.
> > >
> > > Best,
> > > Yingyi
> > >
> > >
> > > On Mon, Oct 31, 2016 at 11:16 AM, mingda li <li...@gmail.com>
> > > wrote:
> > >
> > > > Dear all,
> > > >
> > > > Hi, I am working on multiple join on Hyracks level. I am not sure if
> I
> > do
> > > > the multiple join on AsterixDB, whether it will optimize the query by
> > > > changing the join order or just execute according to how we write the
> > > > query. I think this may not be done in Algebricks level based on rule
> > but
> > > > not sure.
> > > >
> > > > Bests,
> > > > Mingda
> > > >
> > >
> >
>
>
>
> --
> Best regards,
> Ildar
>

Re: About the Multiple Join Optimization on AsterixDB

Posted by Ildar Absalyamov <il...@gmail.com>.
As Yingyi pointed out we don't reorder joins because the framework for
stats and cardinalities is not there yet.
However what we can do in an meantime is to provide an interface for the
statistical information, needed for join reordering, independent of the way
the stats were collected (either sampling-based or LSM-based) and work out
details of the cost model.

2016-10-31 11:30 GMT-07:00 mingda li <li...@gmail.com>:

> Hi Yingyi,
>
> I see. Thanks for your reply:-)
>
> Bests,
> Mingda
>
>
> On Mon, Oct 31, 2016 at 11:23 AM, Yingyi Bu <bu...@gmail.com> wrote:
>
> > Mingda,
> >
> >      I'm not sure how much re-ordering can be done at the Hyracks level,
> > i.e., the runtime level.
> >      In the optimizer (the asterixdb/algebricks level), we don't have
> > re-ordering for  joins, because:
> >      --- the cost model has not been added yet.  I'm not sure about the
> > timeline for this. @Ildar?
> >      --- respecting user-specified join orders is important for certain
> > cases, for example, to get stable/predictable performance (zero surprise)
> > for applications.
> >
> >      In the runtime, we have a role-reversal optimization in hybrid hash
> > join, which is a safe optimization that is not based on estimations.  You
> > can look at OptimizedHybridHashJoin.
> >
> > Best,
> > Yingyi
> >
> >
> > On Mon, Oct 31, 2016 at 11:16 AM, mingda li <li...@gmail.com>
> > wrote:
> >
> > > Dear all,
> > >
> > > Hi, I am working on multiple join on Hyracks level. I am not sure if I
> do
> > > the multiple join on AsterixDB, whether it will optimize the query by
> > > changing the join order or just execute according to how we write the
> > > query. I think this may not be done in Algebricks level based on rule
> but
> > > not sure.
> > >
> > > Bests,
> > > Mingda
> > >
> >
>



-- 
Best regards,
Ildar

Re: About the Multiple Join Optimization on AsterixDB

Posted by mingda li <li...@gmail.com>.
Hi Yingyi,

I see. Thanks for your reply:-)

Bests,
Mingda


On Mon, Oct 31, 2016 at 11:23 AM, Yingyi Bu <bu...@gmail.com> wrote:

> Mingda,
>
>      I'm not sure how much re-ordering can be done at the Hyracks level,
> i.e., the runtime level.
>      In the optimizer (the asterixdb/algebricks level), we don't have
> re-ordering for  joins, because:
>      --- the cost model has not been added yet.  I'm not sure about the
> timeline for this. @Ildar?
>      --- respecting user-specified join orders is important for certain
> cases, for example, to get stable/predictable performance (zero surprise)
> for applications.
>
>      In the runtime, we have a role-reversal optimization in hybrid hash
> join, which is a safe optimization that is not based on estimations.  You
> can look at OptimizedHybridHashJoin.
>
> Best,
> Yingyi
>
>
> On Mon, Oct 31, 2016 at 11:16 AM, mingda li <li...@gmail.com>
> wrote:
>
> > Dear all,
> >
> > Hi, I am working on multiple join on Hyracks level. I am not sure if I do
> > the multiple join on AsterixDB, whether it will optimize the query by
> > changing the join order or just execute according to how we write the
> > query. I think this may not be done in Algebricks level based on rule but
> > not sure.
> >
> > Bests,
> > Mingda
> >
>

Re: About the Multiple Join Optimization on AsterixDB

Posted by Yingyi Bu <bu...@gmail.com>.
Mingda,

     I'm not sure how much re-ordering can be done at the Hyracks level,
i.e., the runtime level.
     In the optimizer (the asterixdb/algebricks level), we don't have
re-ordering for  joins, because:
     --- the cost model has not been added yet.  I'm not sure about the
timeline for this. @Ildar?
     --- respecting user-specified join orders is important for certain
cases, for example, to get stable/predictable performance (zero surprise)
for applications.

     In the runtime, we have a role-reversal optimization in hybrid hash
join, which is a safe optimization that is not based on estimations.  You
can look at OptimizedHybridHashJoin.

Best,
Yingyi


On Mon, Oct 31, 2016 at 11:16 AM, mingda li <li...@gmail.com> wrote:

> Dear all,
>
> Hi, I am working on multiple join on Hyracks level. I am not sure if I do
> the multiple join on AsterixDB, whether it will optimize the query by
> changing the join order or just execute according to how we write the
> query. I think this may not be done in Algebricks level based on rule but
> not sure.
>
> Bests,
> Mingda
>