You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@drill.apache.org by Muhammad Gelbana <m....@gmail.com> on 2017/04/03 17:51:20 UTC

Re: Is it possible to delegate data joins and filtering to the datasource ?

I've succeeded, theoretically, in what I wanted to do because I had to send
the selected columns manually to my datasource. Would someone please tell
me how can I identify the selected columns in the join ? I searched a lot
without success.

*---------------------*
*Muhammad Gelbana*
http://www.linkedin.com/in/mgelbana

On Sat, Apr 1, 2017 at 1:43 AM, Muhammad Gelbana <m....@gmail.com>
wrote:

> So I intend to use this constructor for the new *RelNode*: *org.apache.drill.exec.planner.logical.DrillScanRel.DrillScanRel(RelOptCluster,
> RelTraitSet, RelOptTable, GroupScan, RelDataType, List<SchemaPath>)*
>
> How can I provide it's parameters ?
>
>    1. *RelOptCluster*: Can I pass *DrillJoinRel.getCluster()* ?
>
>    2. *RelTraitSet*: Can I pass *DrillJoinRel.getTraitSet()* ?
>
>    3. *RelOptTable*: I assume I can use this factory method (*org.apache.calcite.prepare.RelOptTableImpl.create(RelOptSchema,
>    RelDataType, Table, Path)*). Any hints of how I can provide these
>    parameters too ? Should I just go ahead and manually create a new instance
>    of each parameter ?
>
>    4. *GroupScan*: I understand I have to create a new implementation
>    class for this one so now questions here so far.
>
>    5. *RelDataType*: This one is confusing. Because I understand that for
>    *DrillJoinRel.transformTo(newRel)* to work, I have to provide a
>    *newRel* instance that has a *RelDataType* instance with the same
>    amount of fields and compatible types (i.e. this is mandated by *org.apache.calcite.plan.RelOptUtil.verifyTypeEquivalence(RelNode,
>    RelNode, Object)*). Why couldn't I provide a *RelDataType* with
>    a different set of fields ? How can I resolve this ?
>
>    6. *List<SchemaPath>*: I assume I can call this method and pass my
>    columns names to it, one by one. (i.e.
>    *org.apache.drill.common.expression.SchemaPath.getCompoundPath(String...)*
>    )
>
> Thanks.
>
> *---------------------*
> *Muhammad Gelbana*
> http://www.linkedin.com/in/mgelbana
>
> On Fri, Mar 31, 2017 at 1:59 PM, weijie tong <to...@gmail.com>
> wrote:
>
>> your code seems right , just to implement the 'call.transformTo()' ,but
>> the
>> left detail , maybe I think I can't express the left things so precisely,
>> just as @Paul Rogers mentioned the plugin detail is a little trivial.
>>
>> 1.  drillScanRel.getGroupScan  .
>> 2. you need to extend the AbstractGroupScan ,and let it holds some
>> information about your storage . This defined GroupScan just call it
>> AGroupScan corresponds to a joint scan RelNode. Then you can define
>> another
>> GroupScan called BGroupScan which extends AGroupScan, The BGroupScan acts
>> as a aggregate container which holds the two joint AGroupScan.
>> 3 . The new DrillScanRel has the same RowType as the JoinRel. The
>> requirement and exmple of transforming between two different RelNodes can
>> be found from other codes. This DrillScanRel's GroupScan is the
>> BGroupScan.
>> This new DrillScanRel is the one applys to the code
>>  `call.transformTo(xxxx)`.
>>
>> maybe the picture below may help you  understand my idea:
>>
>>
>>          ---Scan (AGroupScan)
>> suppose the initial RelNode tree is : Project ----Join --|
>>
>>   |       ---Scan (AGroupScan)
>>
>>   |
>>
>>  \|/
>> after applied this rule ,the final tree is: Project-----Scan ( BGroupScan
>> (
>> List(AGroupScan ,AGroupScan) ) )
>>
>>
>>
>>
>>
>>
>>
>> On Thu, Mar 30, 2017 at 10:01 PM, Muhammad Gelbana <m....@gmail.com>
>> wrote:
>>
>> > *This is my rule class*
>> >
>> > public class CartesianProductJoinRule extends RelOptRule {
>> >
>> >     public static final CartesianProductJoinRule INSTANCE = new
>> > CartesianProductJoinRule(DrillJoinRel.class);
>> >
>> >     public CartesianProductJoinRule(Class<DrillJoinRel> clazz) {
>> >         super(operand(clazz, operand(RelNode.class, any()),
>> > operand(RelNode.class, any())),
>> >                 "CartesianProductJoin");
>> >     }
>> >
>> >     @Override
>> >     public boolean matches(RelOptRuleCall call) {
>> >         DrillJoinRel drillJoin = call.rel(0);
>> >         return drillJoin.getJoinType() == JoinRelType.INNER &&
>> > drillJoin.getCondition().isAlwaysTrue();
>> >     }
>> >
>> >     @Override
>> >     public void onMatch(RelOptRuleCall call) {
>> >         DrillJoinRel join = call.rel(0);
>> >         RelNode firstRel = call.rel(1);
>> >         RelNode secondRel = call.rel(2);
>> >         HepRelVertex right = (HepRelVertex) join.getRight();
>> >         HepRelVertex left = (HepRelVertex) join.getLeft();
>> >
>> >         List<RelDataTypeField> firstFields = firstRel.getRowType().
>> > getFieldList();
>> >         List<RelDataTypeField> secondFields = secondRel.getRowType().
>> > getFieldList();
>> >
>> >         RelNode firstTable = ((HepRelVertex)firstRel.
>> > getInput(0)).getCurrentRel();
>> >         RelNode secondTable = ((HepRelVertex)secondRel.
>> > getInput(0)).getCurrentRel();
>> >
>> >         //call.transformTo(???);
>> >     }
>> > }
>> >
>> > *To register the rule*, I overrode the *getOptimizerRules* method in my
>> > storage plugin class
>> >
>> > public Set<? extends RelOptRule> getOptimizerRules(OptimizerRul
>> esContext
>> > optimizerContext, PlannerPhase phase) {
>> >     switch (phase) {
>> >     case LOGICAL_PRUNE_AND_JOIN:
>> >     case LOGICAL_PRUNE:
>> >     case LOGICAL:
>> >         return getLogicalOptimizerRules(optimizerContext);
>> >     case PHYSICAL:
>> >         return getPhysicalOptimizerRules(optimizerContext);
>> >     case PARTITION_PRUNING:
>> >     case JOIN_PLANNING:
>> > *        return ImmutableSet.of(CartesianProductJoinRule.INSTANCE);*
>> >     default:
>> >         return ImmutableSet.of();
>> >     }
>> >
>> > }
>> >
>> > The rule is firing as expected but I'm lost when it comes to the
>> > conversion. Earlier, you said "the new equivalent ScanRel is to have the
>> > joined
>> > ScanRel nodes's GroupScans", so
>> >
>> >    1. How can I obtain the left and right tables group scans ?
>> >    2. What exactly do you mean by joining them ? Is there a utility
>> method
>> >    to do so ? Or should I manually create a new single group scan and
>> add
>> > the
>> >    information I need there ? Looking into other *GroupScan*
>> >    implementations, I found that they have references to some runtime
>> > objects
>> >    such as the storage plugin and the storage plugin configuration. At
>> this
>> >    stage, I don't know how to obtain those !
>> >    3. Precisely, what kind of object should I use to represent a
>> *RelNode*
>> >    that represents the whole join ? I understand that I need to use an
>> > object
>> >    that has implements the *RelNode* interface. Then I should add the
>> >    created *GroupScan* to that *RelNode* instance and call
>> >    *call.transformTo(newRelNode)*, correct ?
>> >
>> >
>> > *---------------------*
>> > *Muhammad Gelbana*
>> > http://www.linkedin.com/in/mgelbana
>> >
>> > On Thu, Mar 30, 2017 at 2:46 AM, weijie tong <to...@gmail.com>
>> > wrote:
>> >
>> > > I mean the rule you write could be placed in the
>> > PlannerPhase.JOIN_PlANNING
>> > > which uses the HepPlanner. This phase is to solve the logical relnode
>> .
>> > > Hope to help you.
>> > > Muhammad Gelbana <m....@gmail.com>于2017年3月30日 周四上午12:07写道:
>> > >
>> > > > ​Thanks a lot Weijie, I believe I'm very close now. I hope you don't
>> > mind
>> > > > few more questions please:
>> > > >
>> > > >
>> > > >    1. The new rule you are mentioning is a physical rule ? So I
>> should
>> > > >    implement the Prel interface ?
>> > > >    2. By "traversing the join to find the ScanRel"
>> > > >       - This sounds like I have to "search" for something.
>> Shouldn't I
>> > > just
>> > > >       work on transforming the left (i.e. DrillJoinRel's getLeft()
>> > > method)
>> > > > and
>> > > >       right (i.e. DrillJoinRel's getLeft() method) join objects ?
>> > > >       - The "left" and "right" elements of the DrillJoinRel object
>> are
>> > of
>> > > >       type RelSubset, not *ScanRel* and I can't find a type called
>> > > > *ScanRel*.
>> > > >       I suppose you meant *ScanPrel*, specially because it
>> implements
>> > the
>> > > >       *Prel* interface that provides the *getPhysicalOperator*
>> method.
>> > > >    3. What if multiple physical or logical rules match for a single
>> > node,
>> > > >    what decides which rule will be applied and which will be
>> rejected ?
>> > > Is
>> > > > it
>> > > >    the *AbstractRelNode.computeSelfCost(RelOptPlanner)* method ?
>> What
>> > if
>> > > >    more than one rule produces the same cost ?
>> > > >
>> > > > I'll go ahead and see what I can do for now before hopefully you may
>> > > offer
>> > > > more guidance. THANKS A LOT.
>> > > >
>> > > > *---------------------*
>> > > > *Muhammad Gelbana*
>> > > > http://www.linkedin.com/in/mgelbana
>> > > >
>> > > > On Wed, Mar 29, 2017 at 4:23 AM, weijie tong <
>> tongweijie178@gmail.com>
>> > > > wrote:
>> > > >
>> > > > > to avoid misunderstanding , the new equivalent ScanRel is to have
>> the
>> > > > > joined ScanRel nodes's GroupScans, as the GroupScans indirectly
>> hold
>> > > the
>> > > > > underlying storage information.
>> > > > >
>> > > > > On Wed, Mar 29, 2017 at 10:15 AM, weijie tong <
>> > tongweijie178@gmail.com
>> > > >
>> > > > > wrote:
>> > > > >
>> > > > > >
>> > > > > > my suggestion is you define a rule which matches the
>> DrillJoinRel
>> > > > RelNode
>> > > > > > , then at the onMatch method ,you traverse the join children to
>> > find
>> > > > the
>> > > > > > ScanRel nodes . You define a new ScanRel which include the
>> ScanRel
>> > > > nodes
>> > > > > > you find last step. Then transform the JoinRel to this
>> equivalent
>> > new
>> > > > > > ScanRel.
>> > > > > > Finally , the plan tree will not have the JoinRel but the
>> ScanRel.
>> > > >  You
>> > > > > > can let your join plan rule  in the PlannerPhase.JOIN_PLANNING.
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>>
>
>

Re: Is it possible to delegate data joins and filtering to the datasource ?

Posted by Muhammad Gelbana <m....@gmail.com>.
I have done it. Thanks a lot Weijie and all of you for your time.

*---------------------*
*Muhammad Gelbana*
http://www.linkedin.com/in/mgelbana

On Thu, Apr 6, 2017 at 3:15 PM, weijie tong <to...@gmail.com> wrote:

> some tips:
> 1. you need to know the RexInputRef index relationship between the
>  JoinRel's  and its inputs's  .
>
> join ( 1,2 ,3,4,5)
>
> left input(1,2,3) right input (1,2)
>
> 1,2,3,  ===> left input (1 ,2,3)
>
> 4,5 ====>right input (1,2)
>
> 2. you capture the index map relationship  when you iterate over your
> JoinRelNode of your defined Rule( CartesianProductJoinRule) , and store
> these index mapping data in your defined BGroupScan( name convention of my
> last example )
> this mapping struct may be:  destination index  ------------->( source
> ScanRel  :  source Index) .
> to 1 example data ,the struct will be:
> 1 ==>(left scan1   : 1)
> 2 ==>(left scan1  : 2)
> 3 ==>(left scan1  : 3)
> 4 ==>(right scan2  : 1)
> 5 ==>(right scan2  : 2)
>
> 3. you define another Rule (match Project RelNode)which depends on the
> index mapping data of your last step . At this rule you pick the final
> output project's index and pick its mapped index by the mapping struct,
> then you find the final output column name and related tables.
>
>
>
>
> On Tue, Apr 4, 2017 at 1:51 AM, Muhammad Gelbana <m....@gmail.com>
> wrote:
>
> > I've succeeded, theoretically, in what I wanted to do because I had to
> send
> > the selected columns manually to my datasource. Would someone please tell
> > me how can I identify the selected columns in the join ? I searched a lot
> > without success.
> >
> > *---------------------*
> > *Muhammad Gelbana*
> > http://www.linkedin.com/in/mgelbana
> >
> > On Sat, Apr 1, 2017 at 1:43 AM, Muhammad Gelbana <m....@gmail.com>
> > wrote:
> >
> > > So I intend to use this constructor for the new *RelNode*:
> > *org.apache.drill.exec.planner.logical.DrillScanRel.
> > DrillScanRel(RelOptCluster,
> > > RelTraitSet, RelOptTable, GroupScan, RelDataType, List<SchemaPath>)*
> > >
> > > How can I provide it's parameters ?
> > >
> > >    1. *RelOptCluster*: Can I pass *DrillJoinRel.getCluster()* ?
> > >
> > >    2. *RelTraitSet*: Can I pass *DrillJoinRel.getTraitSet()* ?
> > >
> > >    3. *RelOptTable*: I assume I can use this factory method
> > (*org.apache.calcite.prepare.RelOptTableImpl.create(RelOptSchema,
> > >    RelDataType, Table, Path)*). Any hints of how I can provide these
> > >    parameters too ? Should I just go ahead and manually create a new
> > instance
> > >    of each parameter ?
> > >
> > >    4. *GroupScan*: I understand I have to create a new implementation
> > >    class for this one so now questions here so far.
> > >
> > >    5. *RelDataType*: This one is confusing. Because I understand that
> for
> > >    *DrillJoinRel.transformTo(newRel)* to work, I have to provide a
> > >    *newRel* instance that has a *RelDataType* instance with the same
> > >    amount of fields and compatible types (i.e. this is mandated by
> > *org.apache.calcite.plan.RelOptUtil.verifyTypeEquivalence(RelNode,
> > >    RelNode, Object)*). Why couldn't I provide a *RelDataType* with
> > >    a different set of fields ? How can I resolve this ?
> > >
> > >    6. *List<SchemaPath>*: I assume I can call this method and pass my
> > >    columns names to it, one by one. (i.e.
> > >    *org.apache.drill.common.expression.SchemaPath.
> > getCompoundPath(String...)*
> > >    )
> > >
> > > Thanks.
> > >
> > > *---------------------*
> > > *Muhammad Gelbana*
> > > http://www.linkedin.com/in/mgelbana
> > >
> > > On Fri, Mar 31, 2017 at 1:59 PM, weijie tong <to...@gmail.com>
> > > wrote:
> > >
> > >> your code seems right , just to implement the 'call.transformTo()'
> ,but
> > >> the
> > >> left detail , maybe I think I can't express the left things so
> > precisely,
> > >> just as @Paul Rogers mentioned the plugin detail is a little trivial.
> > >>
> > >> 1.  drillScanRel.getGroupScan  .
> > >> 2. you need to extend the AbstractGroupScan ,and let it holds some
> > >> information about your storage . This defined GroupScan just call it
> > >> AGroupScan corresponds to a joint scan RelNode. Then you can define
> > >> another
> > >> GroupScan called BGroupScan which extends AGroupScan, The BGroupScan
> > acts
> > >> as a aggregate container which holds the two joint AGroupScan.
> > >> 3 . The new DrillScanRel has the same RowType as the JoinRel. The
> > >> requirement and exmple of transforming between two different RelNodes
> > can
> > >> be found from other codes. This DrillScanRel's GroupScan is the
> > >> BGroupScan.
> > >> This new DrillScanRel is the one applys to the code
> > >>  `call.transformTo(xxxx)`.
> > >>
> > >> maybe the picture below may help you  understand my idea:
> > >>
> > >>
> > >>          ---Scan (AGroupScan)
> > >> suppose the initial RelNode tree is : Project ----Join --|
> > >>
> > >>   |       ---Scan (AGroupScan)
> > >>
> > >>   |
> > >>
> > >>  \|/
> > >> after applied this rule ,the final tree is: Project-----Scan (
> > BGroupScan
> > >> (
> > >> List(AGroupScan ,AGroupScan) ) )
> > >>
> > >>
> > >>
> > >>
> > >>
> > >>
> > >>
> > >> On Thu, Mar 30, 2017 at 10:01 PM, Muhammad Gelbana <
> m.gelbana@gmail.com
> > >
> > >> wrote:
> > >>
> > >> > *This is my rule class*
> > >> >
> > >> > public class CartesianProductJoinRule extends RelOptRule {
> > >> >
> > >> >     public static final CartesianProductJoinRule INSTANCE = new
> > >> > CartesianProductJoinRule(DrillJoinRel.class);
> > >> >
> > >> >     public CartesianProductJoinRule(Class<DrillJoinRel> clazz) {
> > >> >         super(operand(clazz, operand(RelNode.class, any()),
> > >> > operand(RelNode.class, any())),
> > >> >                 "CartesianProductJoin");
> > >> >     }
> > >> >
> > >> >     @Override
> > >> >     public boolean matches(RelOptRuleCall call) {
> > >> >         DrillJoinRel drillJoin = call.rel(0);
> > >> >         return drillJoin.getJoinType() == JoinRelType.INNER &&
> > >> > drillJoin.getCondition().isAlwaysTrue();
> > >> >     }
> > >> >
> > >> >     @Override
> > >> >     public void onMatch(RelOptRuleCall call) {
> > >> >         DrillJoinRel join = call.rel(0);
> > >> >         RelNode firstRel = call.rel(1);
> > >> >         RelNode secondRel = call.rel(2);
> > >> >         HepRelVertex right = (HepRelVertex) join.getRight();
> > >> >         HepRelVertex left = (HepRelVertex) join.getLeft();
> > >> >
> > >> >         List<RelDataTypeField> firstFields = firstRel.getRowType().
> > >> > getFieldList();
> > >> >         List<RelDataTypeField> secondFields =
> secondRel.getRowType().
> > >> > getFieldList();
> > >> >
> > >> >         RelNode firstTable = ((HepRelVertex)firstRel.
> > >> > getInput(0)).getCurrentRel();
> > >> >         RelNode secondTable = ((HepRelVertex)secondRel.
> > >> > getInput(0)).getCurrentRel();
> > >> >
> > >> >         //call.transformTo(???);
> > >> >     }
> > >> > }
> > >> >
> > >> > *To register the rule*, I overrode the *getOptimizerRules* method in
> > my
> > >> > storage plugin class
> > >> >
> > >> > public Set<? extends RelOptRule> getOptimizerRules(OptimizerRul
> > >> esContext
> > >> > optimizerContext, PlannerPhase phase) {
> > >> >     switch (phase) {
> > >> >     case LOGICAL_PRUNE_AND_JOIN:
> > >> >     case LOGICAL_PRUNE:
> > >> >     case LOGICAL:
> > >> >         return getLogicalOptimizerRules(optimizerContext);
> > >> >     case PHYSICAL:
> > >> >         return getPhysicalOptimizerRules(optimizerContext);
> > >> >     case PARTITION_PRUNING:
> > >> >     case JOIN_PLANNING:
> > >> > *        return ImmutableSet.of(CartesianProductJoinRule.
> INSTANCE);*
> > >> >     default:
> > >> >         return ImmutableSet.of();
> > >> >     }
> > >> >
> > >> > }
> > >> >
> > >> > The rule is firing as expected but I'm lost when it comes to the
> > >> > conversion. Earlier, you said "the new equivalent ScanRel is to have
> > the
> > >> > joined
> > >> > ScanRel nodes's GroupScans", so
> > >> >
> > >> >    1. How can I obtain the left and right tables group scans ?
> > >> >    2. What exactly do you mean by joining them ? Is there a utility
> > >> method
> > >> >    to do so ? Or should I manually create a new single group scan
> and
> > >> add
> > >> > the
> > >> >    information I need there ? Looking into other *GroupScan*
> > >> >    implementations, I found that they have references to some
> runtime
> > >> > objects
> > >> >    such as the storage plugin and the storage plugin configuration.
> At
> > >> this
> > >> >    stage, I don't know how to obtain those !
> > >> >    3. Precisely, what kind of object should I use to represent a
> > >> *RelNode*
> > >> >    that represents the whole join ? I understand that I need to use
> an
> > >> > object
> > >> >    that has implements the *RelNode* interface. Then I should add
> the
> > >> >    created *GroupScan* to that *RelNode* instance and call
> > >> >    *call.transformTo(newRelNode)*, correct ?
> > >> >
> > >> >
> > >> > *---------------------*
> > >> > *Muhammad Gelbana*
> > >> > http://www.linkedin.com/in/mgelbana
> > >> >
> > >> > On Thu, Mar 30, 2017 at 2:46 AM, weijie tong <
> tongweijie178@gmail.com
> > >
> > >> > wrote:
> > >> >
> > >> > > I mean the rule you write could be placed in the
> > >> > PlannerPhase.JOIN_PlANNING
> > >> > > which uses the HepPlanner. This phase is to solve the logical
> > relnode
> > >> .
> > >> > > Hope to help you.
> > >> > > Muhammad Gelbana <m....@gmail.com>于2017年3月30日 周四上午12:07写道:
> > >> > >
> > >> > > > ​Thanks a lot Weijie, I believe I'm very close now. I hope you
> > don't
> > >> > mind
> > >> > > > few more questions please:
> > >> > > >
> > >> > > >
> > >> > > >    1. The new rule you are mentioning is a physical rule ? So I
> > >> should
> > >> > > >    implement the Prel interface ?
> > >> > > >    2. By "traversing the join to find the ScanRel"
> > >> > > >       - This sounds like I have to "search" for something.
> > >> Shouldn't I
> > >> > > just
> > >> > > >       work on transforming the left (i.e. DrillJoinRel's
> getLeft()
> > >> > > method)
> > >> > > > and
> > >> > > >       right (i.e. DrillJoinRel's getLeft() method) join objects
> ?
> > >> > > >       - The "left" and "right" elements of the DrillJoinRel
> object
> > >> are
> > >> > of
> > >> > > >       type RelSubset, not *ScanRel* and I can't find a type
> called
> > >> > > > *ScanRel*.
> > >> > > >       I suppose you meant *ScanPrel*, specially because it
> > >> implements
> > >> > the
> > >> > > >       *Prel* interface that provides the *getPhysicalOperator*
> > >> method.
> > >> > > >    3. What if multiple physical or logical rules match for a
> > single
> > >> > node,
> > >> > > >    what decides which rule will be applied and which will be
> > >> rejected ?
> > >> > > Is
> > >> > > > it
> > >> > > >    the *AbstractRelNode.computeSelfCost(RelOptPlanner)* method
> ?
> > >> What
> > >> > if
> > >> > > >    more than one rule produces the same cost ?
> > >> > > >
> > >> > > > I'll go ahead and see what I can do for now before hopefully you
> > may
> > >> > > offer
> > >> > > > more guidance. THANKS A LOT.
> > >> > > >
> > >> > > > *---------------------*
> > >> > > > *Muhammad Gelbana*
> > >> > > > http://www.linkedin.com/in/mgelbana
> > >> > > >
> > >> > > > On Wed, Mar 29, 2017 at 4:23 AM, weijie tong <
> > >> tongweijie178@gmail.com>
> > >> > > > wrote:
> > >> > > >
> > >> > > > > to avoid misunderstanding , the new equivalent ScanRel is to
> > have
> > >> the
> > >> > > > > joined ScanRel nodes's GroupScans, as the GroupScans
> indirectly
> > >> hold
> > >> > > the
> > >> > > > > underlying storage information.
> > >> > > > >
> > >> > > > > On Wed, Mar 29, 2017 at 10:15 AM, weijie tong <
> > >> > tongweijie178@gmail.com
> > >> > > >
> > >> > > > > wrote:
> > >> > > > >
> > >> > > > > >
> > >> > > > > > my suggestion is you define a rule which matches the
> > >> DrillJoinRel
> > >> > > > RelNode
> > >> > > > > > , then at the onMatch method ,you traverse the join children
> > to
> > >> > find
> > >> > > > the
> > >> > > > > > ScanRel nodes . You define a new ScanRel which include the
> > >> ScanRel
> > >> > > > nodes
> > >> > > > > > you find last step. Then transform the JoinRel to this
> > >> equivalent
> > >> > new
> > >> > > > > > ScanRel.
> > >> > > > > > Finally , the plan tree will not have the JoinRel but the
> > >> ScanRel.
> > >> > > >  You
> > >> > > > > > can let your join plan rule  in the
> > PlannerPhase.JOIN_PLANNING.
> > >> > > > > >
> > >> > > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> > >
> > >
> >
>

Re: Is it possible to delegate data joins and filtering to the datasource ?

Posted by weijie tong <to...@gmail.com>.
some tips:
1. you need to know the RexInputRef index relationship between the
 JoinRel's  and its inputs's  .

join ( 1,2 ,3,4,5)

left input(1,2,3) right input (1,2)

1,2,3,  ===> left input (1 ,2,3)

4,5 ====>right input (1,2)

2. you capture the index map relationship  when you iterate over your
JoinRelNode of your defined Rule( CartesianProductJoinRule) , and store
these index mapping data in your defined BGroupScan( name convention of my
last example )
this mapping struct may be:  destination index  ------------->( source
ScanRel  :  source Index) .
to 1 example data ,the struct will be:
1 ==>(left scan1   : 1)
2 ==>(left scan1  : 2)
3 ==>(left scan1  : 3)
4 ==>(right scan2  : 1)
5 ==>(right scan2  : 2)

3. you define another Rule (match Project RelNode)which depends on the
index mapping data of your last step . At this rule you pick the final
output project's index and pick its mapped index by the mapping struct,
then you find the final output column name and related tables.




On Tue, Apr 4, 2017 at 1:51 AM, Muhammad Gelbana <m....@gmail.com>
wrote:

> I've succeeded, theoretically, in what I wanted to do because I had to send
> the selected columns manually to my datasource. Would someone please tell
> me how can I identify the selected columns in the join ? I searched a lot
> without success.
>
> *---------------------*
> *Muhammad Gelbana*
> http://www.linkedin.com/in/mgelbana
>
> On Sat, Apr 1, 2017 at 1:43 AM, Muhammad Gelbana <m....@gmail.com>
> wrote:
>
> > So I intend to use this constructor for the new *RelNode*:
> *org.apache.drill.exec.planner.logical.DrillScanRel.
> DrillScanRel(RelOptCluster,
> > RelTraitSet, RelOptTable, GroupScan, RelDataType, List<SchemaPath>)*
> >
> > How can I provide it's parameters ?
> >
> >    1. *RelOptCluster*: Can I pass *DrillJoinRel.getCluster()* ?
> >
> >    2. *RelTraitSet*: Can I pass *DrillJoinRel.getTraitSet()* ?
> >
> >    3. *RelOptTable*: I assume I can use this factory method
> (*org.apache.calcite.prepare.RelOptTableImpl.create(RelOptSchema,
> >    RelDataType, Table, Path)*). Any hints of how I can provide these
> >    parameters too ? Should I just go ahead and manually create a new
> instance
> >    of each parameter ?
> >
> >    4. *GroupScan*: I understand I have to create a new implementation
> >    class for this one so now questions here so far.
> >
> >    5. *RelDataType*: This one is confusing. Because I understand that for
> >    *DrillJoinRel.transformTo(newRel)* to work, I have to provide a
> >    *newRel* instance that has a *RelDataType* instance with the same
> >    amount of fields and compatible types (i.e. this is mandated by
> *org.apache.calcite.plan.RelOptUtil.verifyTypeEquivalence(RelNode,
> >    RelNode, Object)*). Why couldn't I provide a *RelDataType* with
> >    a different set of fields ? How can I resolve this ?
> >
> >    6. *List<SchemaPath>*: I assume I can call this method and pass my
> >    columns names to it, one by one. (i.e.
> >    *org.apache.drill.common.expression.SchemaPath.
> getCompoundPath(String...)*
> >    )
> >
> > Thanks.
> >
> > *---------------------*
> > *Muhammad Gelbana*
> > http://www.linkedin.com/in/mgelbana
> >
> > On Fri, Mar 31, 2017 at 1:59 PM, weijie tong <to...@gmail.com>
> > wrote:
> >
> >> your code seems right , just to implement the 'call.transformTo()' ,but
> >> the
> >> left detail , maybe I think I can't express the left things so
> precisely,
> >> just as @Paul Rogers mentioned the plugin detail is a little trivial.
> >>
> >> 1.  drillScanRel.getGroupScan  .
> >> 2. you need to extend the AbstractGroupScan ,and let it holds some
> >> information about your storage . This defined GroupScan just call it
> >> AGroupScan corresponds to a joint scan RelNode. Then you can define
> >> another
> >> GroupScan called BGroupScan which extends AGroupScan, The BGroupScan
> acts
> >> as a aggregate container which holds the two joint AGroupScan.
> >> 3 . The new DrillScanRel has the same RowType as the JoinRel. The
> >> requirement and exmple of transforming between two different RelNodes
> can
> >> be found from other codes. This DrillScanRel's GroupScan is the
> >> BGroupScan.
> >> This new DrillScanRel is the one applys to the code
> >>  `call.transformTo(xxxx)`.
> >>
> >> maybe the picture below may help you  understand my idea:
> >>
> >>
> >>          ---Scan (AGroupScan)
> >> suppose the initial RelNode tree is : Project ----Join --|
> >>
> >>   |       ---Scan (AGroupScan)
> >>
> >>   |
> >>
> >>  \|/
> >> after applied this rule ,the final tree is: Project-----Scan (
> BGroupScan
> >> (
> >> List(AGroupScan ,AGroupScan) ) )
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >> On Thu, Mar 30, 2017 at 10:01 PM, Muhammad Gelbana <m.gelbana@gmail.com
> >
> >> wrote:
> >>
> >> > *This is my rule class*
> >> >
> >> > public class CartesianProductJoinRule extends RelOptRule {
> >> >
> >> >     public static final CartesianProductJoinRule INSTANCE = new
> >> > CartesianProductJoinRule(DrillJoinRel.class);
> >> >
> >> >     public CartesianProductJoinRule(Class<DrillJoinRel> clazz) {
> >> >         super(operand(clazz, operand(RelNode.class, any()),
> >> > operand(RelNode.class, any())),
> >> >                 "CartesianProductJoin");
> >> >     }
> >> >
> >> >     @Override
> >> >     public boolean matches(RelOptRuleCall call) {
> >> >         DrillJoinRel drillJoin = call.rel(0);
> >> >         return drillJoin.getJoinType() == JoinRelType.INNER &&
> >> > drillJoin.getCondition().isAlwaysTrue();
> >> >     }
> >> >
> >> >     @Override
> >> >     public void onMatch(RelOptRuleCall call) {
> >> >         DrillJoinRel join = call.rel(0);
> >> >         RelNode firstRel = call.rel(1);
> >> >         RelNode secondRel = call.rel(2);
> >> >         HepRelVertex right = (HepRelVertex) join.getRight();
> >> >         HepRelVertex left = (HepRelVertex) join.getLeft();
> >> >
> >> >         List<RelDataTypeField> firstFields = firstRel.getRowType().
> >> > getFieldList();
> >> >         List<RelDataTypeField> secondFields = secondRel.getRowType().
> >> > getFieldList();
> >> >
> >> >         RelNode firstTable = ((HepRelVertex)firstRel.
> >> > getInput(0)).getCurrentRel();
> >> >         RelNode secondTable = ((HepRelVertex)secondRel.
> >> > getInput(0)).getCurrentRel();
> >> >
> >> >         //call.transformTo(???);
> >> >     }
> >> > }
> >> >
> >> > *To register the rule*, I overrode the *getOptimizerRules* method in
> my
> >> > storage plugin class
> >> >
> >> > public Set<? extends RelOptRule> getOptimizerRules(OptimizerRul
> >> esContext
> >> > optimizerContext, PlannerPhase phase) {
> >> >     switch (phase) {
> >> >     case LOGICAL_PRUNE_AND_JOIN:
> >> >     case LOGICAL_PRUNE:
> >> >     case LOGICAL:
> >> >         return getLogicalOptimizerRules(optimizerContext);
> >> >     case PHYSICAL:
> >> >         return getPhysicalOptimizerRules(optimizerContext);
> >> >     case PARTITION_PRUNING:
> >> >     case JOIN_PLANNING:
> >> > *        return ImmutableSet.of(CartesianProductJoinRule.INSTANCE);*
> >> >     default:
> >> >         return ImmutableSet.of();
> >> >     }
> >> >
> >> > }
> >> >
> >> > The rule is firing as expected but I'm lost when it comes to the
> >> > conversion. Earlier, you said "the new equivalent ScanRel is to have
> the
> >> > joined
> >> > ScanRel nodes's GroupScans", so
> >> >
> >> >    1. How can I obtain the left and right tables group scans ?
> >> >    2. What exactly do you mean by joining them ? Is there a utility
> >> method
> >> >    to do so ? Or should I manually create a new single group scan and
> >> add
> >> > the
> >> >    information I need there ? Looking into other *GroupScan*
> >> >    implementations, I found that they have references to some runtime
> >> > objects
> >> >    such as the storage plugin and the storage plugin configuration. At
> >> this
> >> >    stage, I don't know how to obtain those !
> >> >    3. Precisely, what kind of object should I use to represent a
> >> *RelNode*
> >> >    that represents the whole join ? I understand that I need to use an
> >> > object
> >> >    that has implements the *RelNode* interface. Then I should add the
> >> >    created *GroupScan* to that *RelNode* instance and call
> >> >    *call.transformTo(newRelNode)*, correct ?
> >> >
> >> >
> >> > *---------------------*
> >> > *Muhammad Gelbana*
> >> > http://www.linkedin.com/in/mgelbana
> >> >
> >> > On Thu, Mar 30, 2017 at 2:46 AM, weijie tong <tongweijie178@gmail.com
> >
> >> > wrote:
> >> >
> >> > > I mean the rule you write could be placed in the
> >> > PlannerPhase.JOIN_PlANNING
> >> > > which uses the HepPlanner. This phase is to solve the logical
> relnode
> >> .
> >> > > Hope to help you.
> >> > > Muhammad Gelbana <m....@gmail.com>于2017年3月30日 周四上午12:07写道:
> >> > >
> >> > > > ​Thanks a lot Weijie, I believe I'm very close now. I hope you
> don't
> >> > mind
> >> > > > few more questions please:
> >> > > >
> >> > > >
> >> > > >    1. The new rule you are mentioning is a physical rule ? So I
> >> should
> >> > > >    implement the Prel interface ?
> >> > > >    2. By "traversing the join to find the ScanRel"
> >> > > >       - This sounds like I have to "search" for something.
> >> Shouldn't I
> >> > > just
> >> > > >       work on transforming the left (i.e. DrillJoinRel's getLeft()
> >> > > method)
> >> > > > and
> >> > > >       right (i.e. DrillJoinRel's getLeft() method) join objects ?
> >> > > >       - The "left" and "right" elements of the DrillJoinRel object
> >> are
> >> > of
> >> > > >       type RelSubset, not *ScanRel* and I can't find a type called
> >> > > > *ScanRel*.
> >> > > >       I suppose you meant *ScanPrel*, specially because it
> >> implements
> >> > the
> >> > > >       *Prel* interface that provides the *getPhysicalOperator*
> >> method.
> >> > > >    3. What if multiple physical or logical rules match for a
> single
> >> > node,
> >> > > >    what decides which rule will be applied and which will be
> >> rejected ?
> >> > > Is
> >> > > > it
> >> > > >    the *AbstractRelNode.computeSelfCost(RelOptPlanner)* method ?
> >> What
> >> > if
> >> > > >    more than one rule produces the same cost ?
> >> > > >
> >> > > > I'll go ahead and see what I can do for now before hopefully you
> may
> >> > > offer
> >> > > > more guidance. THANKS A LOT.
> >> > > >
> >> > > > *---------------------*
> >> > > > *Muhammad Gelbana*
> >> > > > http://www.linkedin.com/in/mgelbana
> >> > > >
> >> > > > On Wed, Mar 29, 2017 at 4:23 AM, weijie tong <
> >> tongweijie178@gmail.com>
> >> > > > wrote:
> >> > > >
> >> > > > > to avoid misunderstanding , the new equivalent ScanRel is to
> have
> >> the
> >> > > > > joined ScanRel nodes's GroupScans, as the GroupScans indirectly
> >> hold
> >> > > the
> >> > > > > underlying storage information.
> >> > > > >
> >> > > > > On Wed, Mar 29, 2017 at 10:15 AM, weijie tong <
> >> > tongweijie178@gmail.com
> >> > > >
> >> > > > > wrote:
> >> > > > >
> >> > > > > >
> >> > > > > > my suggestion is you define a rule which matches the
> >> DrillJoinRel
> >> > > > RelNode
> >> > > > > > , then at the onMatch method ,you traverse the join children
> to
> >> > find
> >> > > > the
> >> > > > > > ScanRel nodes . You define a new ScanRel which include the
> >> ScanRel
> >> > > > nodes
> >> > > > > > you find last step. Then transform the JoinRel to this
> >> equivalent
> >> > new
> >> > > > > > ScanRel.
> >> > > > > > Finally , the plan tree will not have the JoinRel but the
> >> ScanRel.
> >> > > >  You
> >> > > > > > can let your join plan rule  in the
> PlannerPhase.JOIN_PLANNING.
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> >
> >
>