You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@drill.apache.org by Jacques Nadeau <ja...@apache.org> on 2015/01/19 06:33:25 UTC

Approach for local ordering in planning

In planning we currently state collation as total ordering. In some cases
it would be useful to create a concept of local ordering. For example,
partition by x then sort by y.  Does anyone have any thoughts on how we
should define this in terms of traits/physical properties? The syntax would
realistically only apply to ctas or as a description of existing files so I
think we shouldn't need to enhance the language beyond those locations.

J

Re: Approach for local ordering in planning

Posted by Julian Hyde <ju...@gmail.com>.
You can definitely define new traits (RelTraitDef instances). I
believe you can also add instances of these user-defined traits to
your RelNodes. If not, you should be able to.

Given that, the framework is flexible enough to allow people to choose
whether they want to combine ordering and partitioning or keep them
separate. You could have an ordering trait, a partitioning trait, and
an ordering+partitioning trait, and a particular run of the planner
could have any subset of these enabled.


On Tue, Jan 20, 2015 at 8:48 AM, Aman Sinha <as...@maprtech.com> wrote:
> I believe keeping ordering and partitioning as separate traits gives more
> flexibility.  Combining them might preclude certain types of plans.  For
> instance, in many systems the assumption is any type of distribution
> destroys sortedness of the data, so a re-sort is needed after distribution
> (i.e just doing a merge is not enough, although Drill does actually
> preserve sortedness, so it does a merge).   Without knowing what the
> combined trait would look like, I have a feeling that it will be
> constraining for certain plans.
>
> Separately, I think the optimizer should allow for adding new traits..for
> instance compression.  Input streams may be hash/roundrobin distributed
> and/or ordered and/or compressed.
>
> Aman
>
> On Mon, Jan 19, 2015 at 12:55 PM, Julian Hyde <ju...@gmail.com> wrote:
>
>> We have discussed before whether ordering and partitioning should be
>> distinct traits or the same trait. I was (still am) ambivalent about it.
>> I’ve been having some discussions with the Hive team, and it looks as if
>> they will make ordering & partitioning the same trait.
>>
>> Julian
>>
>> On Jan 19, 2015, at 9:51 AM, Jinfeng Ni <ji...@gmail.com> wrote:
>>
>> > For the case of "partition by x sort by y", I think planner currently
>> keeps
>> > the partition / sort in separate trait;  "partition by x" as a
>> distribution
>> > trait, "sort by y" as a collation.  Distribution trait has higher
>> priority
>> > than the sort collation. Drill's physical operators will have both those
>> > traits, when doing planning work.
>> >
>> >
>> > On Sun, Jan 18, 2015 at 9:33 PM, Jacques Nadeau <ja...@apache.org>
>> wrote:
>> >
>> >> In planning we currently state collation as total ordering. In some
>> cases
>> >> it would be useful to create a concept of local ordering. For example,
>> >> partition by x then sort by y.  Does anyone have any thoughts on how we
>> >> should define this in terms of traits/physical properties? The syntax
>> would
>> >> realistically only apply to ctas or as a description of existing files
>> so I
>> >> think we shouldn't need to enhance the language beyond those locations.
>> >>
>> >> J
>> >>
>>
>>

Re: Approach for local ordering in planning

Posted by Aman Sinha <as...@maprtech.com>.
I believe keeping ordering and partitioning as separate traits gives more
flexibility.  Combining them might preclude certain types of plans.  For
instance, in many systems the assumption is any type of distribution
destroys sortedness of the data, so a re-sort is needed after distribution
(i.e just doing a merge is not enough, although Drill does actually
preserve sortedness, so it does a merge).   Without knowing what the
combined trait would look like, I have a feeling that it will be
constraining for certain plans.

Separately, I think the optimizer should allow for adding new traits..for
instance compression.  Input streams may be hash/roundrobin distributed
and/or ordered and/or compressed.

Aman

On Mon, Jan 19, 2015 at 12:55 PM, Julian Hyde <ju...@gmail.com> wrote:

> We have discussed before whether ordering and partitioning should be
> distinct traits or the same trait. I was (still am) ambivalent about it.
> I’ve been having some discussions with the Hive team, and it looks as if
> they will make ordering & partitioning the same trait.
>
> Julian
>
> On Jan 19, 2015, at 9:51 AM, Jinfeng Ni <ji...@gmail.com> wrote:
>
> > For the case of "partition by x sort by y", I think planner currently
> keeps
> > the partition / sort in separate trait;  "partition by x" as a
> distribution
> > trait, "sort by y" as a collation.  Distribution trait has higher
> priority
> > than the sort collation. Drill's physical operators will have both those
> > traits, when doing planning work.
> >
> >
> > On Sun, Jan 18, 2015 at 9:33 PM, Jacques Nadeau <ja...@apache.org>
> wrote:
> >
> >> In planning we currently state collation as total ordering. In some
> cases
> >> it would be useful to create a concept of local ordering. For example,
> >> partition by x then sort by y.  Does anyone have any thoughts on how we
> >> should define this in terms of traits/physical properties? The syntax
> would
> >> realistically only apply to ctas or as a description of existing files
> so I
> >> think we shouldn't need to enhance the language beyond those locations.
> >>
> >> J
> >>
>
>

Re: Approach for local ordering in planning

Posted by Julian Hyde <ju...@gmail.com>.
We have discussed before whether ordering and partitioning should be distinct traits or the same trait. I was (still am) ambivalent about it. I’ve been having some discussions with the Hive team, and it looks as if they will make ordering & partitioning the same trait.

Julian

On Jan 19, 2015, at 9:51 AM, Jinfeng Ni <ji...@gmail.com> wrote:

> For the case of "partition by x sort by y", I think planner currently keeps
> the partition / sort in separate trait;  "partition by x" as a distribution
> trait, "sort by y" as a collation.  Distribution trait has higher priority
> than the sort collation. Drill's physical operators will have both those
> traits, when doing planning work.
> 
> 
> On Sun, Jan 18, 2015 at 9:33 PM, Jacques Nadeau <ja...@apache.org> wrote:
> 
>> In planning we currently state collation as total ordering. In some cases
>> it would be useful to create a concept of local ordering. For example,
>> partition by x then sort by y.  Does anyone have any thoughts on how we
>> should define this in terms of traits/physical properties? The syntax would
>> realistically only apply to ctas or as a description of existing files so I
>> think we shouldn't need to enhance the language beyond those locations.
>> 
>> J
>> 


Re: Approach for local ordering in planning

Posted by Jinfeng Ni <ji...@gmail.com>.
For the case of "partition by x sort by y", I think planner currently keeps
the partition / sort in separate trait;  "partition by x" as a distribution
trait, "sort by y" as a collation.  Distribution trait has higher priority
than the sort collation. Drill's physical operators will have both those
traits, when doing planning work.


On Sun, Jan 18, 2015 at 9:33 PM, Jacques Nadeau <ja...@apache.org> wrote:

> In planning we currently state collation as total ordering. In some cases
> it would be useful to create a concept of local ordering. For example,
> partition by x then sort by y.  Does anyone have any thoughts on how we
> should define this in terms of traits/physical properties? The syntax would
> realistically only apply to ctas or as a description of existing files so I
> think we shouldn't need to enhance the language beyond those locations.
>
> J
>