You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@drill.apache.org by Muhammad Gelbana <m....@gmail.com> on 2017/03/22 15:20:23 UTC

Is it possible to delegate data joins and filtering to the datasource ?

I'm trying to use Drill with a proprietary datasource that is very fast in
applying data joins (i.e. SQL joins) and query filters (i.e. SQL where
conditions).

To connect to that datasource, I first have to write a storage plugin, but
I'm not sure if my main goal is applicable.

May main goal is to configure Drill to let the datasource perform JOINS and
filters and only return the data. Then drill can perform further processing
based on the original SQL query sent to Drill.

Is this possible by developing a storage plugin ? Where exactly should I be
looking ?

I've been going through this wiki
<https://github.com/paul-rogers/drill/wiki> and I don't think I understood
every concept. So if there is another source of information about storage
plugins development, please point it out.

*---------------------*
*Muhammad Gelbana*
http://www.linkedin.com/in/mgelbana

Re: Is it possible to delegate data joins and filtering to the datasource ?

Posted by Muhammad Gelbana <m....@gmail.com>.
I have done it. Thanks a lot Weijie and all of you for your time.

*---------------------*
*Muhammad Gelbana*
http://www.linkedin.com/in/mgelbana

On Thu, Apr 6, 2017 at 3:15 PM, weijie tong <to...@gmail.com> wrote:

> some tips:
> 1. you need to know the RexInputRef index relationship between the
>  JoinRel's  and its inputs's  .
>
> join ( 1,2 ,3,4,5)
>
> left input(1,2,3) right input (1,2)
>
> 1,2,3,  ===> left input (1 ,2,3)
>
> 4,5 ====>right input (1,2)
>
> 2. you capture the index map relationship  when you iterate over your
> JoinRelNode of your defined Rule( CartesianProductJoinRule) , and store
> these index mapping data in your defined BGroupScan( name convention of my
> last example )
> this mapping struct may be:  destination index  ------------->( source
> ScanRel  :  source Index) .
> to 1 example data ,the struct will be:
> 1 ==>(left scan1   : 1)
> 2 ==>(left scan1  : 2)
> 3 ==>(left scan1  : 3)
> 4 ==>(right scan2  : 1)
> 5 ==>(right scan2  : 2)
>
> 3. you define another Rule (match Project RelNode)which depends on the
> index mapping data of your last step . At this rule you pick the final
> output project's index and pick its mapped index by the mapping struct,
> then you find the final output column name and related tables.
>
>
>
>
> On Tue, Apr 4, 2017 at 1:51 AM, Muhammad Gelbana <m....@gmail.com>
> wrote:
>
> > I've succeeded, theoretically, in what I wanted to do because I had to
> send
> > the selected columns manually to my datasource. Would someone please tell
> > me how can I identify the selected columns in the join ? I searched a lot
> > without success.
> >
> > *---------------------*
> > *Muhammad Gelbana*
> > http://www.linkedin.com/in/mgelbana
> >
> > On Sat, Apr 1, 2017 at 1:43 AM, Muhammad Gelbana <m....@gmail.com>
> > wrote:
> >
> > > So I intend to use this constructor for the new *RelNode*:
> > *org.apache.drill.exec.planner.logical.DrillScanRel.
> > DrillScanRel(RelOptCluster,
> > > RelTraitSet, RelOptTable, GroupScan, RelDataType, List<SchemaPath>)*
> > >
> > > How can I provide it's parameters ?
> > >
> > >    1. *RelOptCluster*: Can I pass *DrillJoinRel.getCluster()* ?
> > >
> > >    2. *RelTraitSet*: Can I pass *DrillJoinRel.getTraitSet()* ?
> > >
> > >    3. *RelOptTable*: I assume I can use this factory method
> > (*org.apache.calcite.prepare.RelOptTableImpl.create(RelOptSchema,
> > >    RelDataType, Table, Path)*). Any hints of how I can provide these
> > >    parameters too ? Should I just go ahead and manually create a new
> > instance
> > >    of each parameter ?
> > >
> > >    4. *GroupScan*: I understand I have to create a new implementation
> > >    class for this one so now questions here so far.
> > >
> > >    5. *RelDataType*: This one is confusing. Because I understand that
> for
> > >    *DrillJoinRel.transformTo(newRel)* to work, I have to provide a
> > >    *newRel* instance that has a *RelDataType* instance with the same
> > >    amount of fields and compatible types (i.e. this is mandated by
> > *org.apache.calcite.plan.RelOptUtil.verifyTypeEquivalence(RelNode,
> > >    RelNode, Object)*). Why couldn't I provide a *RelDataType* with
> > >    a different set of fields ? How can I resolve this ?
> > >
> > >    6. *List<SchemaPath>*: I assume I can call this method and pass my
> > >    columns names to it, one by one. (i.e.
> > >    *org.apache.drill.common.expression.SchemaPath.
> > getCompoundPath(String...)*
> > >    )
> > >
> > > Thanks.
> > >
> > > *---------------------*
> > > *Muhammad Gelbana*
> > > http://www.linkedin.com/in/mgelbana
> > >
> > > On Fri, Mar 31, 2017 at 1:59 PM, weijie tong <to...@gmail.com>
> > > wrote:
> > >
> > >> your code seems right , just to implement the 'call.transformTo()'
> ,but
> > >> the
> > >> left detail , maybe I think I can't express the left things so
> > precisely,
> > >> just as @Paul Rogers mentioned the plugin detail is a little trivial.
> > >>
> > >> 1.  drillScanRel.getGroupScan  .
> > >> 2. you need to extend the AbstractGroupScan ,and let it holds some
> > >> information about your storage . This defined GroupScan just call it
> > >> AGroupScan corresponds to a joint scan RelNode. Then you can define
> > >> another
> > >> GroupScan called BGroupScan which extends AGroupScan, The BGroupScan
> > acts
> > >> as a aggregate container which holds the two joint AGroupScan.
> > >> 3 . The new DrillScanRel has the same RowType as the JoinRel. The
> > >> requirement and exmple of transforming between two different RelNodes
> > can
> > >> be found from other codes. This DrillScanRel's GroupScan is the
> > >> BGroupScan.
> > >> This new DrillScanRel is the one applys to the code
> > >>  `call.transformTo(xxxx)`.
> > >>
> > >> maybe the picture below may help you  understand my idea:
> > >>
> > >>
> > >>          ---Scan (AGroupScan)
> > >> suppose the initial RelNode tree is : Project ----Join --|
> > >>
> > >>   |       ---Scan (AGroupScan)
> > >>
> > >>   |
> > >>
> > >>  \|/
> > >> after applied this rule ,the final tree is: Project-----Scan (
> > BGroupScan
> > >> (
> > >> List(AGroupScan ,AGroupScan) ) )
> > >>
> > >>
> > >>
> > >>
> > >>
> > >>
> > >>
> > >> On Thu, Mar 30, 2017 at 10:01 PM, Muhammad Gelbana <
> m.gelbana@gmail.com
> > >
> > >> wrote:
> > >>
> > >> > *This is my rule class*
> > >> >
> > >> > public class CartesianProductJoinRule extends RelOptRule {
> > >> >
> > >> >     public static final CartesianProductJoinRule INSTANCE = new
> > >> > CartesianProductJoinRule(DrillJoinRel.class);
> > >> >
> > >> >     public CartesianProductJoinRule(Class<DrillJoinRel> clazz) {
> > >> >         super(operand(clazz, operand(RelNode.class, any()),
> > >> > operand(RelNode.class, any())),
> > >> >                 "CartesianProductJoin");
> > >> >     }
> > >> >
> > >> >     @Override
> > >> >     public boolean matches(RelOptRuleCall call) {
> > >> >         DrillJoinRel drillJoin = call.rel(0);
> > >> >         return drillJoin.getJoinType() == JoinRelType.INNER &&
> > >> > drillJoin.getCondition().isAlwaysTrue();
> > >> >     }
> > >> >
> > >> >     @Override
> > >> >     public void onMatch(RelOptRuleCall call) {
> > >> >         DrillJoinRel join = call.rel(0);
> > >> >         RelNode firstRel = call.rel(1);
> > >> >         RelNode secondRel = call.rel(2);
> > >> >         HepRelVertex right = (HepRelVertex) join.getRight();
> > >> >         HepRelVertex left = (HepRelVertex) join.getLeft();
> > >> >
> > >> >         List<RelDataTypeField> firstFields = firstRel.getRowType().
> > >> > getFieldList();
> > >> >         List<RelDataTypeField> secondFields =
> secondRel.getRowType().
> > >> > getFieldList();
> > >> >
> > >> >         RelNode firstTable = ((HepRelVertex)firstRel.
> > >> > getInput(0)).getCurrentRel();
> > >> >         RelNode secondTable = ((HepRelVertex)secondRel.
> > >> > getInput(0)).getCurrentRel();
> > >> >
> > >> >         //call.transformTo(???);
> > >> >     }
> > >> > }
> > >> >
> > >> > *To register the rule*, I overrode the *getOptimizerRules* method in
> > my
> > >> > storage plugin class
> > >> >
> > >> > public Set<? extends RelOptRule> getOptimizerRules(OptimizerRul
> > >> esContext
> > >> > optimizerContext, PlannerPhase phase) {
> > >> >     switch (phase) {
> > >> >     case LOGICAL_PRUNE_AND_JOIN:
> > >> >     case LOGICAL_PRUNE:
> > >> >     case LOGICAL:
> > >> >         return getLogicalOptimizerRules(optimizerContext);
> > >> >     case PHYSICAL:
> > >> >         return getPhysicalOptimizerRules(optimizerContext);
> > >> >     case PARTITION_PRUNING:
> > >> >     case JOIN_PLANNING:
> > >> > *        return ImmutableSet.of(CartesianProductJoinRule.
> INSTANCE);*
> > >> >     default:
> > >> >         return ImmutableSet.of();
> > >> >     }
> > >> >
> > >> > }
> > >> >
> > >> > The rule is firing as expected but I'm lost when it comes to the
> > >> > conversion. Earlier, you said "the new equivalent ScanRel is to have
> > the
> > >> > joined
> > >> > ScanRel nodes's GroupScans", so
> > >> >
> > >> >    1. How can I obtain the left and right tables group scans ?
> > >> >    2. What exactly do you mean by joining them ? Is there a utility
> > >> method
> > >> >    to do so ? Or should I manually create a new single group scan
> and
> > >> add
> > >> > the
> > >> >    information I need there ? Looking into other *GroupScan*
> > >> >    implementations, I found that they have references to some
> runtime
> > >> > objects
> > >> >    such as the storage plugin and the storage plugin configuration.
> At
> > >> this
> > >> >    stage, I don't know how to obtain those !
> > >> >    3. Precisely, what kind of object should I use to represent a
> > >> *RelNode*
> > >> >    that represents the whole join ? I understand that I need to use
> an
> > >> > object
> > >> >    that has implements the *RelNode* interface. Then I should add
> the
> > >> >    created *GroupScan* to that *RelNode* instance and call
> > >> >    *call.transformTo(newRelNode)*, correct ?
> > >> >
> > >> >
> > >> > *---------------------*
> > >> > *Muhammad Gelbana*
> > >> > http://www.linkedin.com/in/mgelbana
> > >> >
> > >> > On Thu, Mar 30, 2017 at 2:46 AM, weijie tong <
> tongweijie178@gmail.com
> > >
> > >> > wrote:
> > >> >
> > >> > > I mean the rule you write could be placed in the
> > >> > PlannerPhase.JOIN_PlANNING
> > >> > > which uses the HepPlanner. This phase is to solve the logical
> > relnode
> > >> .
> > >> > > Hope to help you.
> > >> > > Muhammad Gelbana <m....@gmail.com>于2017年3月30日 周四上午12:07写道:
> > >> > >
> > >> > > > ​Thanks a lot Weijie, I believe I'm very close now. I hope you
> > don't
> > >> > mind
> > >> > > > few more questions please:
> > >> > > >
> > >> > > >
> > >> > > >    1. The new rule you are mentioning is a physical rule ? So I
> > >> should
> > >> > > >    implement the Prel interface ?
> > >> > > >    2. By "traversing the join to find the ScanRel"
> > >> > > >       - This sounds like I have to "search" for something.
> > >> Shouldn't I
> > >> > > just
> > >> > > >       work on transforming the left (i.e. DrillJoinRel's
> getLeft()
> > >> > > method)
> > >> > > > and
> > >> > > >       right (i.e. DrillJoinRel's getLeft() method) join objects
> ?
> > >> > > >       - The "left" and "right" elements of the DrillJoinRel
> object
> > >> are
> > >> > of
> > >> > > >       type RelSubset, not *ScanRel* and I can't find a type
> called
> > >> > > > *ScanRel*.
> > >> > > >       I suppose you meant *ScanPrel*, specially because it
> > >> implements
> > >> > the
> > >> > > >       *Prel* interface that provides the *getPhysicalOperator*
> > >> method.
> > >> > > >    3. What if multiple physical or logical rules match for a
> > single
> > >> > node,
> > >> > > >    what decides which rule will be applied and which will be
> > >> rejected ?
> > >> > > Is
> > >> > > > it
> > >> > > >    the *AbstractRelNode.computeSelfCost(RelOptPlanner)* method
> ?
> > >> What
> > >> > if
> > >> > > >    more than one rule produces the same cost ?
> > >> > > >
> > >> > > > I'll go ahead and see what I can do for now before hopefully you
> > may
> > >> > > offer
> > >> > > > more guidance. THANKS A LOT.
> > >> > > >
> > >> > > > *---------------------*
> > >> > > > *Muhammad Gelbana*
> > >> > > > http://www.linkedin.com/in/mgelbana
> > >> > > >
> > >> > > > On Wed, Mar 29, 2017 at 4:23 AM, weijie tong <
> > >> tongweijie178@gmail.com>
> > >> > > > wrote:
> > >> > > >
> > >> > > > > to avoid misunderstanding , the new equivalent ScanRel is to
> > have
> > >> the
> > >> > > > > joined ScanRel nodes's GroupScans, as the GroupScans
> indirectly
> > >> hold
> > >> > > the
> > >> > > > > underlying storage information.
> > >> > > > >
> > >> > > > > On Wed, Mar 29, 2017 at 10:15 AM, weijie tong <
> > >> > tongweijie178@gmail.com
> > >> > > >
> > >> > > > > wrote:
> > >> > > > >
> > >> > > > > >
> > >> > > > > > my suggestion is you define a rule which matches the
> > >> DrillJoinRel
> > >> > > > RelNode
> > >> > > > > > , then at the onMatch method ,you traverse the join children
> > to
> > >> > find
> > >> > > > the
> > >> > > > > > ScanRel nodes . You define a new ScanRel which include the
> > >> ScanRel
> > >> > > > nodes
> > >> > > > > > you find last step. Then transform the JoinRel to this
> > >> equivalent
> > >> > new
> > >> > > > > > ScanRel.
> > >> > > > > > Finally , the plan tree will not have the JoinRel but the
> > >> ScanRel.
> > >> > > >  You
> > >> > > > > > can let your join plan rule  in the
> > PlannerPhase.JOIN_PLANNING.
> > >> > > > > >
> > >> > > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> > >
> > >
> >
>

Re: Is it possible to delegate data joins and filtering to the datasource ?

Posted by weijie tong <to...@gmail.com>.
some tips:
1. you need to know the RexInputRef index relationship between the
 JoinRel's  and its inputs's  .

join ( 1,2 ,3,4,5)

left input(1,2,3) right input (1,2)

1,2,3,  ===> left input (1 ,2,3)

4,5 ====>right input (1,2)

2. you capture the index map relationship  when you iterate over your
JoinRelNode of your defined Rule( CartesianProductJoinRule) , and store
these index mapping data in your defined BGroupScan( name convention of my
last example )
this mapping struct may be:  destination index  ------------->( source
ScanRel  :  source Index) .
to 1 example data ,the struct will be:
1 ==>(left scan1   : 1)
2 ==>(left scan1  : 2)
3 ==>(left scan1  : 3)
4 ==>(right scan2  : 1)
5 ==>(right scan2  : 2)

3. you define another Rule (match Project RelNode)which depends on the
index mapping data of your last step . At this rule you pick the final
output project's index and pick its mapped index by the mapping struct,
then you find the final output column name and related tables.




On Tue, Apr 4, 2017 at 1:51 AM, Muhammad Gelbana <m....@gmail.com>
wrote:

> I've succeeded, theoretically, in what I wanted to do because I had to send
> the selected columns manually to my datasource. Would someone please tell
> me how can I identify the selected columns in the join ? I searched a lot
> without success.
>
> *---------------------*
> *Muhammad Gelbana*
> http://www.linkedin.com/in/mgelbana
>
> On Sat, Apr 1, 2017 at 1:43 AM, Muhammad Gelbana <m....@gmail.com>
> wrote:
>
> > So I intend to use this constructor for the new *RelNode*:
> *org.apache.drill.exec.planner.logical.DrillScanRel.
> DrillScanRel(RelOptCluster,
> > RelTraitSet, RelOptTable, GroupScan, RelDataType, List<SchemaPath>)*
> >
> > How can I provide it's parameters ?
> >
> >    1. *RelOptCluster*: Can I pass *DrillJoinRel.getCluster()* ?
> >
> >    2. *RelTraitSet*: Can I pass *DrillJoinRel.getTraitSet()* ?
> >
> >    3. *RelOptTable*: I assume I can use this factory method
> (*org.apache.calcite.prepare.RelOptTableImpl.create(RelOptSchema,
> >    RelDataType, Table, Path)*). Any hints of how I can provide these
> >    parameters too ? Should I just go ahead and manually create a new
> instance
> >    of each parameter ?
> >
> >    4. *GroupScan*: I understand I have to create a new implementation
> >    class for this one so now questions here so far.
> >
> >    5. *RelDataType*: This one is confusing. Because I understand that for
> >    *DrillJoinRel.transformTo(newRel)* to work, I have to provide a
> >    *newRel* instance that has a *RelDataType* instance with the same
> >    amount of fields and compatible types (i.e. this is mandated by
> *org.apache.calcite.plan.RelOptUtil.verifyTypeEquivalence(RelNode,
> >    RelNode, Object)*). Why couldn't I provide a *RelDataType* with
> >    a different set of fields ? How can I resolve this ?
> >
> >    6. *List<SchemaPath>*: I assume I can call this method and pass my
> >    columns names to it, one by one. (i.e.
> >    *org.apache.drill.common.expression.SchemaPath.
> getCompoundPath(String...)*
> >    )
> >
> > Thanks.
> >
> > *---------------------*
> > *Muhammad Gelbana*
> > http://www.linkedin.com/in/mgelbana
> >
> > On Fri, Mar 31, 2017 at 1:59 PM, weijie tong <to...@gmail.com>
> > wrote:
> >
> >> your code seems right , just to implement the 'call.transformTo()' ,but
> >> the
> >> left detail , maybe I think I can't express the left things so
> precisely,
> >> just as @Paul Rogers mentioned the plugin detail is a little trivial.
> >>
> >> 1.  drillScanRel.getGroupScan  .
> >> 2. you need to extend the AbstractGroupScan ,and let it holds some
> >> information about your storage . This defined GroupScan just call it
> >> AGroupScan corresponds to a joint scan RelNode. Then you can define
> >> another
> >> GroupScan called BGroupScan which extends AGroupScan, The BGroupScan
> acts
> >> as a aggregate container which holds the two joint AGroupScan.
> >> 3 . The new DrillScanRel has the same RowType as the JoinRel. The
> >> requirement and exmple of transforming between two different RelNodes
> can
> >> be found from other codes. This DrillScanRel's GroupScan is the
> >> BGroupScan.
> >> This new DrillScanRel is the one applys to the code
> >>  `call.transformTo(xxxx)`.
> >>
> >> maybe the picture below may help you  understand my idea:
> >>
> >>
> >>          ---Scan (AGroupScan)
> >> suppose the initial RelNode tree is : Project ----Join --|
> >>
> >>   |       ---Scan (AGroupScan)
> >>
> >>   |
> >>
> >>  \|/
> >> after applied this rule ,the final tree is: Project-----Scan (
> BGroupScan
> >> (
> >> List(AGroupScan ,AGroupScan) ) )
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >> On Thu, Mar 30, 2017 at 10:01 PM, Muhammad Gelbana <m.gelbana@gmail.com
> >
> >> wrote:
> >>
> >> > *This is my rule class*
> >> >
> >> > public class CartesianProductJoinRule extends RelOptRule {
> >> >
> >> >     public static final CartesianProductJoinRule INSTANCE = new
> >> > CartesianProductJoinRule(DrillJoinRel.class);
> >> >
> >> >     public CartesianProductJoinRule(Class<DrillJoinRel> clazz) {
> >> >         super(operand(clazz, operand(RelNode.class, any()),
> >> > operand(RelNode.class, any())),
> >> >                 "CartesianProductJoin");
> >> >     }
> >> >
> >> >     @Override
> >> >     public boolean matches(RelOptRuleCall call) {
> >> >         DrillJoinRel drillJoin = call.rel(0);
> >> >         return drillJoin.getJoinType() == JoinRelType.INNER &&
> >> > drillJoin.getCondition().isAlwaysTrue();
> >> >     }
> >> >
> >> >     @Override
> >> >     public void onMatch(RelOptRuleCall call) {
> >> >         DrillJoinRel join = call.rel(0);
> >> >         RelNode firstRel = call.rel(1);
> >> >         RelNode secondRel = call.rel(2);
> >> >         HepRelVertex right = (HepRelVertex) join.getRight();
> >> >         HepRelVertex left = (HepRelVertex) join.getLeft();
> >> >
> >> >         List<RelDataTypeField> firstFields = firstRel.getRowType().
> >> > getFieldList();
> >> >         List<RelDataTypeField> secondFields = secondRel.getRowType().
> >> > getFieldList();
> >> >
> >> >         RelNode firstTable = ((HepRelVertex)firstRel.
> >> > getInput(0)).getCurrentRel();
> >> >         RelNode secondTable = ((HepRelVertex)secondRel.
> >> > getInput(0)).getCurrentRel();
> >> >
> >> >         //call.transformTo(???);
> >> >     }
> >> > }
> >> >
> >> > *To register the rule*, I overrode the *getOptimizerRules* method in
> my
> >> > storage plugin class
> >> >
> >> > public Set<? extends RelOptRule> getOptimizerRules(OptimizerRul
> >> esContext
> >> > optimizerContext, PlannerPhase phase) {
> >> >     switch (phase) {
> >> >     case LOGICAL_PRUNE_AND_JOIN:
> >> >     case LOGICAL_PRUNE:
> >> >     case LOGICAL:
> >> >         return getLogicalOptimizerRules(optimizerContext);
> >> >     case PHYSICAL:
> >> >         return getPhysicalOptimizerRules(optimizerContext);
> >> >     case PARTITION_PRUNING:
> >> >     case JOIN_PLANNING:
> >> > *        return ImmutableSet.of(CartesianProductJoinRule.INSTANCE);*
> >> >     default:
> >> >         return ImmutableSet.of();
> >> >     }
> >> >
> >> > }
> >> >
> >> > The rule is firing as expected but I'm lost when it comes to the
> >> > conversion. Earlier, you said "the new equivalent ScanRel is to have
> the
> >> > joined
> >> > ScanRel nodes's GroupScans", so
> >> >
> >> >    1. How can I obtain the left and right tables group scans ?
> >> >    2. What exactly do you mean by joining them ? Is there a utility
> >> method
> >> >    to do so ? Or should I manually create a new single group scan and
> >> add
> >> > the
> >> >    information I need there ? Looking into other *GroupScan*
> >> >    implementations, I found that they have references to some runtime
> >> > objects
> >> >    such as the storage plugin and the storage plugin configuration. At
> >> this
> >> >    stage, I don't know how to obtain those !
> >> >    3. Precisely, what kind of object should I use to represent a
> >> *RelNode*
> >> >    that represents the whole join ? I understand that I need to use an
> >> > object
> >> >    that has implements the *RelNode* interface. Then I should add the
> >> >    created *GroupScan* to that *RelNode* instance and call
> >> >    *call.transformTo(newRelNode)*, correct ?
> >> >
> >> >
> >> > *---------------------*
> >> > *Muhammad Gelbana*
> >> > http://www.linkedin.com/in/mgelbana
> >> >
> >> > On Thu, Mar 30, 2017 at 2:46 AM, weijie tong <tongweijie178@gmail.com
> >
> >> > wrote:
> >> >
> >> > > I mean the rule you write could be placed in the
> >> > PlannerPhase.JOIN_PlANNING
> >> > > which uses the HepPlanner. This phase is to solve the logical
> relnode
> >> .
> >> > > Hope to help you.
> >> > > Muhammad Gelbana <m....@gmail.com>于2017年3月30日 周四上午12:07写道:
> >> > >
> >> > > > ​Thanks a lot Weijie, I believe I'm very close now. I hope you
> don't
> >> > mind
> >> > > > few more questions please:
> >> > > >
> >> > > >
> >> > > >    1. The new rule you are mentioning is a physical rule ? So I
> >> should
> >> > > >    implement the Prel interface ?
> >> > > >    2. By "traversing the join to find the ScanRel"
> >> > > >       - This sounds like I have to "search" for something.
> >> Shouldn't I
> >> > > just
> >> > > >       work on transforming the left (i.e. DrillJoinRel's getLeft()
> >> > > method)
> >> > > > and
> >> > > >       right (i.e. DrillJoinRel's getLeft() method) join objects ?
> >> > > >       - The "left" and "right" elements of the DrillJoinRel object
> >> are
> >> > of
> >> > > >       type RelSubset, not *ScanRel* and I can't find a type called
> >> > > > *ScanRel*.
> >> > > >       I suppose you meant *ScanPrel*, specially because it
> >> implements
> >> > the
> >> > > >       *Prel* interface that provides the *getPhysicalOperator*
> >> method.
> >> > > >    3. What if multiple physical or logical rules match for a
> single
> >> > node,
> >> > > >    what decides which rule will be applied and which will be
> >> rejected ?
> >> > > Is
> >> > > > it
> >> > > >    the *AbstractRelNode.computeSelfCost(RelOptPlanner)* method ?
> >> What
> >> > if
> >> > > >    more than one rule produces the same cost ?
> >> > > >
> >> > > > I'll go ahead and see what I can do for now before hopefully you
> may
> >> > > offer
> >> > > > more guidance. THANKS A LOT.
> >> > > >
> >> > > > *---------------------*
> >> > > > *Muhammad Gelbana*
> >> > > > http://www.linkedin.com/in/mgelbana
> >> > > >
> >> > > > On Wed, Mar 29, 2017 at 4:23 AM, weijie tong <
> >> tongweijie178@gmail.com>
> >> > > > wrote:
> >> > > >
> >> > > > > to avoid misunderstanding , the new equivalent ScanRel is to
> have
> >> the
> >> > > > > joined ScanRel nodes's GroupScans, as the GroupScans indirectly
> >> hold
> >> > > the
> >> > > > > underlying storage information.
> >> > > > >
> >> > > > > On Wed, Mar 29, 2017 at 10:15 AM, weijie tong <
> >> > tongweijie178@gmail.com
> >> > > >
> >> > > > > wrote:
> >> > > > >
> >> > > > > >
> >> > > > > > my suggestion is you define a rule which matches the
> >> DrillJoinRel
> >> > > > RelNode
> >> > > > > > , then at the onMatch method ,you traverse the join children
> to
> >> > find
> >> > > > the
> >> > > > > > ScanRel nodes . You define a new ScanRel which include the
> >> ScanRel
> >> > > > nodes
> >> > > > > > you find last step. Then transform the JoinRel to this
> >> equivalent
> >> > new
> >> > > > > > ScanRel.
> >> > > > > > Finally , the plan tree will not have the JoinRel but the
> >> ScanRel.
> >> > > >  You
> >> > > > > > can let your join plan rule  in the
> PlannerPhase.JOIN_PLANNING.
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> >
> >
>

Re: Is it possible to delegate data joins and filtering to the datasource ?

Posted by Muhammad Gelbana <m....@gmail.com>.
I've succeeded, theoretically, in what I wanted to do because I had to send
the selected columns manually to my datasource. Would someone please tell
me how can I identify the selected columns in the join ? I searched a lot
without success.

*---------------------*
*Muhammad Gelbana*
http://www.linkedin.com/in/mgelbana

On Sat, Apr 1, 2017 at 1:43 AM, Muhammad Gelbana <m....@gmail.com>
wrote:

> So I intend to use this constructor for the new *RelNode*: *org.apache.drill.exec.planner.logical.DrillScanRel.DrillScanRel(RelOptCluster,
> RelTraitSet, RelOptTable, GroupScan, RelDataType, List<SchemaPath>)*
>
> How can I provide it's parameters ?
>
>    1. *RelOptCluster*: Can I pass *DrillJoinRel.getCluster()* ?
>
>    2. *RelTraitSet*: Can I pass *DrillJoinRel.getTraitSet()* ?
>
>    3. *RelOptTable*: I assume I can use this factory method (*org.apache.calcite.prepare.RelOptTableImpl.create(RelOptSchema,
>    RelDataType, Table, Path)*). Any hints of how I can provide these
>    parameters too ? Should I just go ahead and manually create a new instance
>    of each parameter ?
>
>    4. *GroupScan*: I understand I have to create a new implementation
>    class for this one so now questions here so far.
>
>    5. *RelDataType*: This one is confusing. Because I understand that for
>    *DrillJoinRel.transformTo(newRel)* to work, I have to provide a
>    *newRel* instance that has a *RelDataType* instance with the same
>    amount of fields and compatible types (i.e. this is mandated by *org.apache.calcite.plan.RelOptUtil.verifyTypeEquivalence(RelNode,
>    RelNode, Object)*). Why couldn't I provide a *RelDataType* with
>    a different set of fields ? How can I resolve this ?
>
>    6. *List<SchemaPath>*: I assume I can call this method and pass my
>    columns names to it, one by one. (i.e.
>    *org.apache.drill.common.expression.SchemaPath.getCompoundPath(String...)*
>    )
>
> Thanks.
>
> *---------------------*
> *Muhammad Gelbana*
> http://www.linkedin.com/in/mgelbana
>
> On Fri, Mar 31, 2017 at 1:59 PM, weijie tong <to...@gmail.com>
> wrote:
>
>> your code seems right , just to implement the 'call.transformTo()' ,but
>> the
>> left detail , maybe I think I can't express the left things so precisely,
>> just as @Paul Rogers mentioned the plugin detail is a little trivial.
>>
>> 1.  drillScanRel.getGroupScan  .
>> 2. you need to extend the AbstractGroupScan ,and let it holds some
>> information about your storage . This defined GroupScan just call it
>> AGroupScan corresponds to a joint scan RelNode. Then you can define
>> another
>> GroupScan called BGroupScan which extends AGroupScan, The BGroupScan acts
>> as a aggregate container which holds the two joint AGroupScan.
>> 3 . The new DrillScanRel has the same RowType as the JoinRel. The
>> requirement and exmple of transforming between two different RelNodes can
>> be found from other codes. This DrillScanRel's GroupScan is the
>> BGroupScan.
>> This new DrillScanRel is the one applys to the code
>>  `call.transformTo(xxxx)`.
>>
>> maybe the picture below may help you  understand my idea:
>>
>>
>>          ---Scan (AGroupScan)
>> suppose the initial RelNode tree is : Project ----Join --|
>>
>>   |       ---Scan (AGroupScan)
>>
>>   |
>>
>>  \|/
>> after applied this rule ,the final tree is: Project-----Scan ( BGroupScan
>> (
>> List(AGroupScan ,AGroupScan) ) )
>>
>>
>>
>>
>>
>>
>>
>> On Thu, Mar 30, 2017 at 10:01 PM, Muhammad Gelbana <m....@gmail.com>
>> wrote:
>>
>> > *This is my rule class*
>> >
>> > public class CartesianProductJoinRule extends RelOptRule {
>> >
>> >     public static final CartesianProductJoinRule INSTANCE = new
>> > CartesianProductJoinRule(DrillJoinRel.class);
>> >
>> >     public CartesianProductJoinRule(Class<DrillJoinRel> clazz) {
>> >         super(operand(clazz, operand(RelNode.class, any()),
>> > operand(RelNode.class, any())),
>> >                 "CartesianProductJoin");
>> >     }
>> >
>> >     @Override
>> >     public boolean matches(RelOptRuleCall call) {
>> >         DrillJoinRel drillJoin = call.rel(0);
>> >         return drillJoin.getJoinType() == JoinRelType.INNER &&
>> > drillJoin.getCondition().isAlwaysTrue();
>> >     }
>> >
>> >     @Override
>> >     public void onMatch(RelOptRuleCall call) {
>> >         DrillJoinRel join = call.rel(0);
>> >         RelNode firstRel = call.rel(1);
>> >         RelNode secondRel = call.rel(2);
>> >         HepRelVertex right = (HepRelVertex) join.getRight();
>> >         HepRelVertex left = (HepRelVertex) join.getLeft();
>> >
>> >         List<RelDataTypeField> firstFields = firstRel.getRowType().
>> > getFieldList();
>> >         List<RelDataTypeField> secondFields = secondRel.getRowType().
>> > getFieldList();
>> >
>> >         RelNode firstTable = ((HepRelVertex)firstRel.
>> > getInput(0)).getCurrentRel();
>> >         RelNode secondTable = ((HepRelVertex)secondRel.
>> > getInput(0)).getCurrentRel();
>> >
>> >         //call.transformTo(???);
>> >     }
>> > }
>> >
>> > *To register the rule*, I overrode the *getOptimizerRules* method in my
>> > storage plugin class
>> >
>> > public Set<? extends RelOptRule> getOptimizerRules(OptimizerRul
>> esContext
>> > optimizerContext, PlannerPhase phase) {
>> >     switch (phase) {
>> >     case LOGICAL_PRUNE_AND_JOIN:
>> >     case LOGICAL_PRUNE:
>> >     case LOGICAL:
>> >         return getLogicalOptimizerRules(optimizerContext);
>> >     case PHYSICAL:
>> >         return getPhysicalOptimizerRules(optimizerContext);
>> >     case PARTITION_PRUNING:
>> >     case JOIN_PLANNING:
>> > *        return ImmutableSet.of(CartesianProductJoinRule.INSTANCE);*
>> >     default:
>> >         return ImmutableSet.of();
>> >     }
>> >
>> > }
>> >
>> > The rule is firing as expected but I'm lost when it comes to the
>> > conversion. Earlier, you said "the new equivalent ScanRel is to have the
>> > joined
>> > ScanRel nodes's GroupScans", so
>> >
>> >    1. How can I obtain the left and right tables group scans ?
>> >    2. What exactly do you mean by joining them ? Is there a utility
>> method
>> >    to do so ? Or should I manually create a new single group scan and
>> add
>> > the
>> >    information I need there ? Looking into other *GroupScan*
>> >    implementations, I found that they have references to some runtime
>> > objects
>> >    such as the storage plugin and the storage plugin configuration. At
>> this
>> >    stage, I don't know how to obtain those !
>> >    3. Precisely, what kind of object should I use to represent a
>> *RelNode*
>> >    that represents the whole join ? I understand that I need to use an
>> > object
>> >    that has implements the *RelNode* interface. Then I should add the
>> >    created *GroupScan* to that *RelNode* instance and call
>> >    *call.transformTo(newRelNode)*, correct ?
>> >
>> >
>> > *---------------------*
>> > *Muhammad Gelbana*
>> > http://www.linkedin.com/in/mgelbana
>> >
>> > On Thu, Mar 30, 2017 at 2:46 AM, weijie tong <to...@gmail.com>
>> > wrote:
>> >
>> > > I mean the rule you write could be placed in the
>> > PlannerPhase.JOIN_PlANNING
>> > > which uses the HepPlanner. This phase is to solve the logical relnode
>> .
>> > > Hope to help you.
>> > > Muhammad Gelbana <m....@gmail.com>于2017年3月30日 周四上午12:07写道:
>> > >
>> > > > ​Thanks a lot Weijie, I believe I'm very close now. I hope you don't
>> > mind
>> > > > few more questions please:
>> > > >
>> > > >
>> > > >    1. The new rule you are mentioning is a physical rule ? So I
>> should
>> > > >    implement the Prel interface ?
>> > > >    2. By "traversing the join to find the ScanRel"
>> > > >       - This sounds like I have to "search" for something.
>> Shouldn't I
>> > > just
>> > > >       work on transforming the left (i.e. DrillJoinRel's getLeft()
>> > > method)
>> > > > and
>> > > >       right (i.e. DrillJoinRel's getLeft() method) join objects ?
>> > > >       - The "left" and "right" elements of the DrillJoinRel object
>> are
>> > of
>> > > >       type RelSubset, not *ScanRel* and I can't find a type called
>> > > > *ScanRel*.
>> > > >       I suppose you meant *ScanPrel*, specially because it
>> implements
>> > the
>> > > >       *Prel* interface that provides the *getPhysicalOperator*
>> method.
>> > > >    3. What if multiple physical or logical rules match for a single
>> > node,
>> > > >    what decides which rule will be applied and which will be
>> rejected ?
>> > > Is
>> > > > it
>> > > >    the *AbstractRelNode.computeSelfCost(RelOptPlanner)* method ?
>> What
>> > if
>> > > >    more than one rule produces the same cost ?
>> > > >
>> > > > I'll go ahead and see what I can do for now before hopefully you may
>> > > offer
>> > > > more guidance. THANKS A LOT.
>> > > >
>> > > > *---------------------*
>> > > > *Muhammad Gelbana*
>> > > > http://www.linkedin.com/in/mgelbana
>> > > >
>> > > > On Wed, Mar 29, 2017 at 4:23 AM, weijie tong <
>> tongweijie178@gmail.com>
>> > > > wrote:
>> > > >
>> > > > > to avoid misunderstanding , the new equivalent ScanRel is to have
>> the
>> > > > > joined ScanRel nodes's GroupScans, as the GroupScans indirectly
>> hold
>> > > the
>> > > > > underlying storage information.
>> > > > >
>> > > > > On Wed, Mar 29, 2017 at 10:15 AM, weijie tong <
>> > tongweijie178@gmail.com
>> > > >
>> > > > > wrote:
>> > > > >
>> > > > > >
>> > > > > > my suggestion is you define a rule which matches the
>> DrillJoinRel
>> > > > RelNode
>> > > > > > , then at the onMatch method ,you traverse the join children to
>> > find
>> > > > the
>> > > > > > ScanRel nodes . You define a new ScanRel which include the
>> ScanRel
>> > > > nodes
>> > > > > > you find last step. Then transform the JoinRel to this
>> equivalent
>> > new
>> > > > > > ScanRel.
>> > > > > > Finally , the plan tree will not have the JoinRel but the
>> ScanRel.
>> > > >  You
>> > > > > > can let your join plan rule  in the PlannerPhase.JOIN_PLANNING.
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>>
>
>

Re: Is it possible to delegate data joins and filtering to the datasource ?

Posted by Muhammad Gelbana <m....@gmail.com>.
So I intend to use this constructor for the new *RelNode*:
*org.apache.drill.exec.planner.logical.DrillScanRel.DrillScanRel(RelOptCluster,
RelTraitSet, RelOptTable, GroupScan, RelDataType, List<SchemaPath>)*

How can I provide it's parameters ?

   1. *RelOptCluster*: Can I pass *DrillJoinRel.getCluster()* ?

   2. *RelTraitSet*: Can I pass *DrillJoinRel.getTraitSet()* ?

   3. *RelOptTable*: I assume I can use this factory method
(*org.apache.calcite.prepare.RelOptTableImpl.create(RelOptSchema,
   RelDataType, Table, Path)*). Any hints of how I can provide these
   parameters too ? Should I just go ahead and manually create a new instance
   of each parameter ?

   4. *GroupScan*: I understand I have to create a new implementation class
   for this one so now questions here so far.

   5. *RelDataType*: This one is confusing. Because I understand that for
   *DrillJoinRel.transformTo(newRel)* to work, I have to provide a *newRel*
   instance that has a *RelDataType* instance with the same amount of
   fields and compatible types (i.e. this is mandated by
*org.apache.calcite.plan.RelOptUtil.verifyTypeEquivalence(RelNode,
   RelNode, Object)*). Why couldn't I provide a *RelDataType* with
   a different set of fields ? How can I resolve this ?

   6. *List<SchemaPath>*: I assume I can call this method and pass my
   columns names to it, one by one. (i.e.
   *org.apache.drill.common.expression.SchemaPath.getCompoundPath(String...)*
   )

Thanks.

*---------------------*
*Muhammad Gelbana*
http://www.linkedin.com/in/mgelbana

On Fri, Mar 31, 2017 at 1:59 PM, weijie tong <to...@gmail.com>
wrote:

> your code seems right , just to implement the 'call.transformTo()' ,but the
> left detail , maybe I think I can't express the left things so precisely,
> just as @Paul Rogers mentioned the plugin detail is a little trivial.
>
> 1.  drillScanRel.getGroupScan  .
> 2. you need to extend the AbstractGroupScan ,and let it holds some
> information about your storage . This defined GroupScan just call it
> AGroupScan corresponds to a joint scan RelNode. Then you can define another
> GroupScan called BGroupScan which extends AGroupScan, The BGroupScan acts
> as a aggregate container which holds the two joint AGroupScan.
> 3 . The new DrillScanRel has the same RowType as the JoinRel. The
> requirement and exmple of transforming between two different RelNodes can
> be found from other codes. This DrillScanRel's GroupScan is the BGroupScan.
> This new DrillScanRel is the one applys to the code
>  `call.transformTo(xxxx)`.
>
> maybe the picture below may help you  understand my idea:
>
>
>          ---Scan (AGroupScan)
> suppose the initial RelNode tree is : Project ----Join --|
>
>   |       ---Scan (AGroupScan)
>
>   |
>
>  \|/
> after applied this rule ,the final tree is: Project-----Scan ( BGroupScan (
> List(AGroupScan ,AGroupScan) ) )
>
>
>
>
>
>
>
> On Thu, Mar 30, 2017 at 10:01 PM, Muhammad Gelbana <m....@gmail.com>
> wrote:
>
> > *This is my rule class*
> >
> > public class CartesianProductJoinRule extends RelOptRule {
> >
> >     public static final CartesianProductJoinRule INSTANCE = new
> > CartesianProductJoinRule(DrillJoinRel.class);
> >
> >     public CartesianProductJoinRule(Class<DrillJoinRel> clazz) {
> >         super(operand(clazz, operand(RelNode.class, any()),
> > operand(RelNode.class, any())),
> >                 "CartesianProductJoin");
> >     }
> >
> >     @Override
> >     public boolean matches(RelOptRuleCall call) {
> >         DrillJoinRel drillJoin = call.rel(0);
> >         return drillJoin.getJoinType() == JoinRelType.INNER &&
> > drillJoin.getCondition().isAlwaysTrue();
> >     }
> >
> >     @Override
> >     public void onMatch(RelOptRuleCall call) {
> >         DrillJoinRel join = call.rel(0);
> >         RelNode firstRel = call.rel(1);
> >         RelNode secondRel = call.rel(2);
> >         HepRelVertex right = (HepRelVertex) join.getRight();
> >         HepRelVertex left = (HepRelVertex) join.getLeft();
> >
> >         List<RelDataTypeField> firstFields = firstRel.getRowType().
> > getFieldList();
> >         List<RelDataTypeField> secondFields = secondRel.getRowType().
> > getFieldList();
> >
> >         RelNode firstTable = ((HepRelVertex)firstRel.
> > getInput(0)).getCurrentRel();
> >         RelNode secondTable = ((HepRelVertex)secondRel.
> > getInput(0)).getCurrentRel();
> >
> >         //call.transformTo(???);
> >     }
> > }
> >
> > *To register the rule*, I overrode the *getOptimizerRules* method in my
> > storage plugin class
> >
> > public Set<? extends RelOptRule> getOptimizerRules(OptimizerRulesContext
> > optimizerContext, PlannerPhase phase) {
> >     switch (phase) {
> >     case LOGICAL_PRUNE_AND_JOIN:
> >     case LOGICAL_PRUNE:
> >     case LOGICAL:
> >         return getLogicalOptimizerRules(optimizerContext);
> >     case PHYSICAL:
> >         return getPhysicalOptimizerRules(optimizerContext);
> >     case PARTITION_PRUNING:
> >     case JOIN_PLANNING:
> > *        return ImmutableSet.of(CartesianProductJoinRule.INSTANCE);*
> >     default:
> >         return ImmutableSet.of();
> >     }
> >
> > }
> >
> > The rule is firing as expected but I'm lost when it comes to the
> > conversion. Earlier, you said "the new equivalent ScanRel is to have the
> > joined
> > ScanRel nodes's GroupScans", so
> >
> >    1. How can I obtain the left and right tables group scans ?
> >    2. What exactly do you mean by joining them ? Is there a utility
> method
> >    to do so ? Or should I manually create a new single group scan and add
> > the
> >    information I need there ? Looking into other *GroupScan*
> >    implementations, I found that they have references to some runtime
> > objects
> >    such as the storage plugin and the storage plugin configuration. At
> this
> >    stage, I don't know how to obtain those !
> >    3. Precisely, what kind of object should I use to represent a
> *RelNode*
> >    that represents the whole join ? I understand that I need to use an
> > object
> >    that has implements the *RelNode* interface. Then I should add the
> >    created *GroupScan* to that *RelNode* instance and call
> >    *call.transformTo(newRelNode)*, correct ?
> >
> >
> > *---------------------*
> > *Muhammad Gelbana*
> > http://www.linkedin.com/in/mgelbana
> >
> > On Thu, Mar 30, 2017 at 2:46 AM, weijie tong <to...@gmail.com>
> > wrote:
> >
> > > I mean the rule you write could be placed in the
> > PlannerPhase.JOIN_PlANNING
> > > which uses the HepPlanner. This phase is to solve the logical relnode .
> > > Hope to help you.
> > > Muhammad Gelbana <m....@gmail.com>于2017年3月30日 周四上午12:07写道:
> > >
> > > > ​Thanks a lot Weijie, I believe I'm very close now. I hope you don't
> > mind
> > > > few more questions please:
> > > >
> > > >
> > > >    1. The new rule you are mentioning is a physical rule ? So I
> should
> > > >    implement the Prel interface ?
> > > >    2. By "traversing the join to find the ScanRel"
> > > >       - This sounds like I have to "search" for something. Shouldn't
> I
> > > just
> > > >       work on transforming the left (i.e. DrillJoinRel's getLeft()
> > > method)
> > > > and
> > > >       right (i.e. DrillJoinRel's getLeft() method) join objects ?
> > > >       - The "left" and "right" elements of the DrillJoinRel object
> are
> > of
> > > >       type RelSubset, not *ScanRel* and I can't find a type called
> > > > *ScanRel*.
> > > >       I suppose you meant *ScanPrel*, specially because it implements
> > the
> > > >       *Prel* interface that provides the *getPhysicalOperator*
> method.
> > > >    3. What if multiple physical or logical rules match for a single
> > node,
> > > >    what decides which rule will be applied and which will be
> rejected ?
> > > Is
> > > > it
> > > >    the *AbstractRelNode.computeSelfCost(RelOptPlanner)* method ?
> What
> > if
> > > >    more than one rule produces the same cost ?
> > > >
> > > > I'll go ahead and see what I can do for now before hopefully you may
> > > offer
> > > > more guidance. THANKS A LOT.
> > > >
> > > > *---------------------*
> > > > *Muhammad Gelbana*
> > > > http://www.linkedin.com/in/mgelbana
> > > >
> > > > On Wed, Mar 29, 2017 at 4:23 AM, weijie tong <
> tongweijie178@gmail.com>
> > > > wrote:
> > > >
> > > > > to avoid misunderstanding , the new equivalent ScanRel is to have
> the
> > > > > joined ScanRel nodes's GroupScans, as the GroupScans indirectly
> hold
> > > the
> > > > > underlying storage information.
> > > > >
> > > > > On Wed, Mar 29, 2017 at 10:15 AM, weijie tong <
> > tongweijie178@gmail.com
> > > >
> > > > > wrote:
> > > > >
> > > > > >
> > > > > > my suggestion is you define a rule which matches the DrillJoinRel
> > > > RelNode
> > > > > > , then at the onMatch method ,you traverse the join children to
> > find
> > > > the
> > > > > > ScanRel nodes . You define a new ScanRel which include the
> ScanRel
> > > > nodes
> > > > > > you find last step. Then transform the JoinRel to this equivalent
> > new
> > > > > > ScanRel.
> > > > > > Finally , the plan tree will not have the JoinRel but the
> ScanRel.
> > > >  You
> > > > > > can let your join plan rule  in the PlannerPhase.JOIN_PLANNING.
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: Is it possible to delegate data joins and filtering to the datasource ?

Posted by weijie tong <to...@gmail.com>.
your code seems right , just to implement the 'call.transformTo()' ,but the
left detail , maybe I think I can't express the left things so precisely,
just as @Paul Rogers mentioned the plugin detail is a little trivial.

1.  drillScanRel.getGroupScan  .
2. you need to extend the AbstractGroupScan ,and let it holds some
information about your storage . This defined GroupScan just call it
AGroupScan corresponds to a joint scan RelNode. Then you can define another
GroupScan called BGroupScan which extends AGroupScan, The BGroupScan acts
as a aggregate container which holds the two joint AGroupScan.
3 . The new DrillScanRel has the same RowType as the JoinRel. The
requirement and exmple of transforming between two different RelNodes can
be found from other codes. This DrillScanRel's GroupScan is the BGroupScan.
This new DrillScanRel is the one applys to the code
 `call.transformTo(xxxx)`.

maybe the picture below may help you  understand my idea:


         ---Scan (AGroupScan)
suppose the initial RelNode tree is : Project ----Join --|

  |       ---Scan (AGroupScan)

  |

 \|/
after applied this rule ,the final tree is: Project-----Scan ( BGroupScan (
List(AGroupScan ,AGroupScan) ) )







On Thu, Mar 30, 2017 at 10:01 PM, Muhammad Gelbana <m....@gmail.com>
wrote:

> *This is my rule class*
>
> public class CartesianProductJoinRule extends RelOptRule {
>
>     public static final CartesianProductJoinRule INSTANCE = new
> CartesianProductJoinRule(DrillJoinRel.class);
>
>     public CartesianProductJoinRule(Class<DrillJoinRel> clazz) {
>         super(operand(clazz, operand(RelNode.class, any()),
> operand(RelNode.class, any())),
>                 "CartesianProductJoin");
>     }
>
>     @Override
>     public boolean matches(RelOptRuleCall call) {
>         DrillJoinRel drillJoin = call.rel(0);
>         return drillJoin.getJoinType() == JoinRelType.INNER &&
> drillJoin.getCondition().isAlwaysTrue();
>     }
>
>     @Override
>     public void onMatch(RelOptRuleCall call) {
>         DrillJoinRel join = call.rel(0);
>         RelNode firstRel = call.rel(1);
>         RelNode secondRel = call.rel(2);
>         HepRelVertex right = (HepRelVertex) join.getRight();
>         HepRelVertex left = (HepRelVertex) join.getLeft();
>
>         List<RelDataTypeField> firstFields = firstRel.getRowType().
> getFieldList();
>         List<RelDataTypeField> secondFields = secondRel.getRowType().
> getFieldList();
>
>         RelNode firstTable = ((HepRelVertex)firstRel.
> getInput(0)).getCurrentRel();
>         RelNode secondTable = ((HepRelVertex)secondRel.
> getInput(0)).getCurrentRel();
>
>         //call.transformTo(???);
>     }
> }
>
> *To register the rule*, I overrode the *getOptimizerRules* method in my
> storage plugin class
>
> public Set<? extends RelOptRule> getOptimizerRules(OptimizerRulesContext
> optimizerContext, PlannerPhase phase) {
>     switch (phase) {
>     case LOGICAL_PRUNE_AND_JOIN:
>     case LOGICAL_PRUNE:
>     case LOGICAL:
>         return getLogicalOptimizerRules(optimizerContext);
>     case PHYSICAL:
>         return getPhysicalOptimizerRules(optimizerContext);
>     case PARTITION_PRUNING:
>     case JOIN_PLANNING:
> *        return ImmutableSet.of(CartesianProductJoinRule.INSTANCE);*
>     default:
>         return ImmutableSet.of();
>     }
>
> }
>
> The rule is firing as expected but I'm lost when it comes to the
> conversion. Earlier, you said "the new equivalent ScanRel is to have the
> joined
> ScanRel nodes's GroupScans", so
>
>    1. How can I obtain the left and right tables group scans ?
>    2. What exactly do you mean by joining them ? Is there a utility method
>    to do so ? Or should I manually create a new single group scan and add
> the
>    information I need there ? Looking into other *GroupScan*
>    implementations, I found that they have references to some runtime
> objects
>    such as the storage plugin and the storage plugin configuration. At this
>    stage, I don't know how to obtain those !
>    3. Precisely, what kind of object should I use to represent a *RelNode*
>    that represents the whole join ? I understand that I need to use an
> object
>    that has implements the *RelNode* interface. Then I should add the
>    created *GroupScan* to that *RelNode* instance and call
>    *call.transformTo(newRelNode)*, correct ?
>
>
> *---------------------*
> *Muhammad Gelbana*
> http://www.linkedin.com/in/mgelbana
>
> On Thu, Mar 30, 2017 at 2:46 AM, weijie tong <to...@gmail.com>
> wrote:
>
> > I mean the rule you write could be placed in the
> PlannerPhase.JOIN_PlANNING
> > which uses the HepPlanner. This phase is to solve the logical relnode .
> > Hope to help you.
> > Muhammad Gelbana <m....@gmail.com>于2017年3月30日 周四上午12:07写道:
> >
> > > ​Thanks a lot Weijie, I believe I'm very close now. I hope you don't
> mind
> > > few more questions please:
> > >
> > >
> > >    1. The new rule you are mentioning is a physical rule ? So I should
> > >    implement the Prel interface ?
> > >    2. By "traversing the join to find the ScanRel"
> > >       - This sounds like I have to "search" for something. Shouldn't I
> > just
> > >       work on transforming the left (i.e. DrillJoinRel's getLeft()
> > method)
> > > and
> > >       right (i.e. DrillJoinRel's getLeft() method) join objects ?
> > >       - The "left" and "right" elements of the DrillJoinRel object are
> of
> > >       type RelSubset, not *ScanRel* and I can't find a type called
> > > *ScanRel*.
> > >       I suppose you meant *ScanPrel*, specially because it implements
> the
> > >       *Prel* interface that provides the *getPhysicalOperator* method.
> > >    3. What if multiple physical or logical rules match for a single
> node,
> > >    what decides which rule will be applied and which will be rejected ?
> > Is
> > > it
> > >    the *AbstractRelNode.computeSelfCost(RelOptPlanner)* method ? What
> if
> > >    more than one rule produces the same cost ?
> > >
> > > I'll go ahead and see what I can do for now before hopefully you may
> > offer
> > > more guidance. THANKS A LOT.
> > >
> > > *---------------------*
> > > *Muhammad Gelbana*
> > > http://www.linkedin.com/in/mgelbana
> > >
> > > On Wed, Mar 29, 2017 at 4:23 AM, weijie tong <to...@gmail.com>
> > > wrote:
> > >
> > > > to avoid misunderstanding , the new equivalent ScanRel is to have the
> > > > joined ScanRel nodes's GroupScans, as the GroupScans indirectly hold
> > the
> > > > underlying storage information.
> > > >
> > > > On Wed, Mar 29, 2017 at 10:15 AM, weijie tong <
> tongweijie178@gmail.com
> > >
> > > > wrote:
> > > >
> > > > >
> > > > > my suggestion is you define a rule which matches the DrillJoinRel
> > > RelNode
> > > > > , then at the onMatch method ,you traverse the join children to
> find
> > > the
> > > > > ScanRel nodes . You define a new ScanRel which include the ScanRel
> > > nodes
> > > > > you find last step. Then transform the JoinRel to this equivalent
> new
> > > > > ScanRel.
> > > > > Finally , the plan tree will not have the JoinRel but the ScanRel.
> > >  You
> > > > > can let your join plan rule  in the PlannerPhase.JOIN_PLANNING.
> > > > >
> > > >
> > >
> >
>

Re: Is it possible to delegate data joins and filtering to the datasource ?

Posted by Muhammad Gelbana <m....@gmail.com>.
*This is my rule class*

public class CartesianProductJoinRule extends RelOptRule {

    public static final CartesianProductJoinRule INSTANCE = new
CartesianProductJoinRule(DrillJoinRel.class);

    public CartesianProductJoinRule(Class<DrillJoinRel> clazz) {
        super(operand(clazz, operand(RelNode.class, any()),
operand(RelNode.class, any())),
                "CartesianProductJoin");
    }

    @Override
    public boolean matches(RelOptRuleCall call) {
        DrillJoinRel drillJoin = call.rel(0);
        return drillJoin.getJoinType() == JoinRelType.INNER &&
drillJoin.getCondition().isAlwaysTrue();
    }

    @Override
    public void onMatch(RelOptRuleCall call) {
        DrillJoinRel join = call.rel(0);
        RelNode firstRel = call.rel(1);
        RelNode secondRel = call.rel(2);
        HepRelVertex right = (HepRelVertex) join.getRight();
        HepRelVertex left = (HepRelVertex) join.getLeft();

        List<RelDataTypeField> firstFields = firstRel.getRowType().
getFieldList();
        List<RelDataTypeField> secondFields = secondRel.getRowType().
getFieldList();

        RelNode firstTable = ((HepRelVertex)firstRel.
getInput(0)).getCurrentRel();
        RelNode secondTable = ((HepRelVertex)secondRel.
getInput(0)).getCurrentRel();

        //call.transformTo(???);
    }
}

*To register the rule*, I overrode the *getOptimizerRules* method in my
storage plugin class

public Set<? extends RelOptRule> getOptimizerRules(OptimizerRulesContext
optimizerContext, PlannerPhase phase) {
    switch (phase) {
    case LOGICAL_PRUNE_AND_JOIN:
    case LOGICAL_PRUNE:
    case LOGICAL:
        return getLogicalOptimizerRules(optimizerContext);
    case PHYSICAL:
        return getPhysicalOptimizerRules(optimizerContext);
    case PARTITION_PRUNING:
    case JOIN_PLANNING:
*        return ImmutableSet.of(CartesianProductJoinRule.INSTANCE);*
    default:
        return ImmutableSet.of();
    }

}

The rule is firing as expected but I'm lost when it comes to the
conversion. Earlier, you said "the new equivalent ScanRel is to have the joined
ScanRel nodes's GroupScans", so

   1. How can I obtain the left and right tables group scans ?
   2. What exactly do you mean by joining them ? Is there a utility method
   to do so ? Or should I manually create a new single group scan and add the
   information I need there ? Looking into other *GroupScan*
   implementations, I found that they have references to some runtime objects
   such as the storage plugin and the storage plugin configuration. At this
   stage, I don't know how to obtain those !
   3. Precisely, what kind of object should I use to represent a *RelNode*
   that represents the whole join ? I understand that I need to use an object
   that has implements the *RelNode* interface. Then I should add the
   created *GroupScan* to that *RelNode* instance and call
   *call.transformTo(newRelNode)*, correct ?


*---------------------*
*Muhammad Gelbana*
http://www.linkedin.com/in/mgelbana

On Thu, Mar 30, 2017 at 2:46 AM, weijie tong <to...@gmail.com>
wrote:

> I mean the rule you write could be placed in the PlannerPhase.JOIN_PlANNING
> which uses the HepPlanner. This phase is to solve the logical relnode .
> Hope to help you.
> Muhammad Gelbana <m....@gmail.com>于2017年3月30日 周四上午12:07写道:
>
> > ​Thanks a lot Weijie, I believe I'm very close now. I hope you don't mind
> > few more questions please:
> >
> >
> >    1. The new rule you are mentioning is a physical rule ? So I should
> >    implement the Prel interface ?
> >    2. By "traversing the join to find the ScanRel"
> >       - This sounds like I have to "search" for something. Shouldn't I
> just
> >       work on transforming the left (i.e. DrillJoinRel's getLeft()
> method)
> > and
> >       right (i.e. DrillJoinRel's getLeft() method) join objects ?
> >       - The "left" and "right" elements of the DrillJoinRel object are of
> >       type RelSubset, not *ScanRel* and I can't find a type called
> > *ScanRel*.
> >       I suppose you meant *ScanPrel*, specially because it implements the
> >       *Prel* interface that provides the *getPhysicalOperator* method.
> >    3. What if multiple physical or logical rules match for a single node,
> >    what decides which rule will be applied and which will be rejected ?
> Is
> > it
> >    the *AbstractRelNode.computeSelfCost(RelOptPlanner)* method ? What if
> >    more than one rule produces the same cost ?
> >
> > I'll go ahead and see what I can do for now before hopefully you may
> offer
> > more guidance. THANKS A LOT.
> >
> > *---------------------*
> > *Muhammad Gelbana*
> > http://www.linkedin.com/in/mgelbana
> >
> > On Wed, Mar 29, 2017 at 4:23 AM, weijie tong <to...@gmail.com>
> > wrote:
> >
> > > to avoid misunderstanding , the new equivalent ScanRel is to have the
> > > joined ScanRel nodes's GroupScans, as the GroupScans indirectly hold
> the
> > > underlying storage information.
> > >
> > > On Wed, Mar 29, 2017 at 10:15 AM, weijie tong <tongweijie178@gmail.com
> >
> > > wrote:
> > >
> > > >
> > > > my suggestion is you define a rule which matches the DrillJoinRel
> > RelNode
> > > > , then at the onMatch method ,you traverse the join children to find
> > the
> > > > ScanRel nodes . You define a new ScanRel which include the ScanRel
> > nodes
> > > > you find last step. Then transform the JoinRel to this equivalent new
> > > > ScanRel.
> > > > Finally , the plan tree will not have the JoinRel but the ScanRel.
> >  You
> > > > can let your join plan rule  in the PlannerPhase.JOIN_PLANNING.
> > > >
> > >
> >
>

Re: Is it possible to delegate data joins and filtering to the datasource ?

Posted by weijie tong <to...@gmail.com>.
I mean the rule you write could be placed in the PlannerPhase.JOIN_PlANNING
which uses the HepPlanner. This phase is to solve the logical relnode .
Hope to help you.
Muhammad Gelbana <m....@gmail.com>于2017年3月30日 周四上午12:07写道:

> ​Thanks a lot Weijie, I believe I'm very close now. I hope you don't mind
> few more questions please:
>
>
>    1. The new rule you are mentioning is a physical rule ? So I should
>    implement the Prel interface ?
>    2. By "traversing the join to find the ScanRel"
>       - This sounds like I have to "search" for something. Shouldn't I just
>       work on transforming the left (i.e. DrillJoinRel's getLeft() method)
> and
>       right (i.e. DrillJoinRel's getLeft() method) join objects ?
>       - The "left" and "right" elements of the DrillJoinRel object are of
>       type RelSubset, not *ScanRel* and I can't find a type called
> *ScanRel*.
>       I suppose you meant *ScanPrel*, specially because it implements the
>       *Prel* interface that provides the *getPhysicalOperator* method.
>    3. What if multiple physical or logical rules match for a single node,
>    what decides which rule will be applied and which will be rejected ? Is
> it
>    the *AbstractRelNode.computeSelfCost(RelOptPlanner)* method ? What if
>    more than one rule produces the same cost ?
>
> I'll go ahead and see what I can do for now before hopefully you may offer
> more guidance. THANKS A LOT.
>
> *---------------------*
> *Muhammad Gelbana*
> http://www.linkedin.com/in/mgelbana
>
> On Wed, Mar 29, 2017 at 4:23 AM, weijie tong <to...@gmail.com>
> wrote:
>
> > to avoid misunderstanding , the new equivalent ScanRel is to have the
> > joined ScanRel nodes's GroupScans, as the GroupScans indirectly hold the
> > underlying storage information.
> >
> > On Wed, Mar 29, 2017 at 10:15 AM, weijie tong <to...@gmail.com>
> > wrote:
> >
> > >
> > > my suggestion is you define a rule which matches the DrillJoinRel
> RelNode
> > > , then at the onMatch method ,you traverse the join children to find
> the
> > > ScanRel nodes . You define a new ScanRel which include the ScanRel
> nodes
> > > you find last step. Then transform the JoinRel to this equivalent new
> > > ScanRel.
> > > Finally , the plan tree will not have the JoinRel but the ScanRel.
>  You
> > > can let your join plan rule  in the PlannerPhase.JOIN_PLANNING.
> > >
> >
>

Re: Is it possible to delegate data joins and filtering to the datasource ?

Posted by Muhammad Gelbana <m....@gmail.com>.
​Thanks a lot Weijie, I believe I'm very close now. I hope you don't mind
few more questions please:


   1. The new rule you are mentioning is a physical rule ? So I should
   implement the Prel interface ?
   2. By "traversing the join to find the ScanRel"
      - This sounds like I have to "search" for something. Shouldn't I just
      work on transforming the left (i.e. DrillJoinRel's getLeft() method) and
      right (i.e. DrillJoinRel's getLeft() method) join objects ?
      - The "left" and "right" elements of the DrillJoinRel object are of
      type RelSubset, not *ScanRel* and I can't find a type called *ScanRel*.
      I suppose you meant *ScanPrel*, specially because it implements the
      *Prel* interface that provides the *getPhysicalOperator* method.
   3. What if multiple physical or logical rules match for a single node,
   what decides which rule will be applied and which will be rejected ? Is it
   the *AbstractRelNode.computeSelfCost(RelOptPlanner)* method ? What if
   more than one rule produces the same cost ?

I'll go ahead and see what I can do for now before hopefully you may offer
more guidance. THANKS A LOT.

*---------------------*
*Muhammad Gelbana*
http://www.linkedin.com/in/mgelbana

On Wed, Mar 29, 2017 at 4:23 AM, weijie tong <to...@gmail.com>
wrote:

> to avoid misunderstanding , the new equivalent ScanRel is to have the
> joined ScanRel nodes's GroupScans, as the GroupScans indirectly hold the
> underlying storage information.
>
> On Wed, Mar 29, 2017 at 10:15 AM, weijie tong <to...@gmail.com>
> wrote:
>
> >
> > my suggestion is you define a rule which matches the DrillJoinRel RelNode
> > , then at the onMatch method ,you traverse the join children to find the
> > ScanRel nodes . You define a new ScanRel which include the ScanRel nodes
> > you find last step. Then transform the JoinRel to this equivalent new
> > ScanRel.
> > Finally , the plan tree will not have the JoinRel but the ScanRel.   You
> > can let your join plan rule  in the PlannerPhase.JOIN_PLANNING.
> >
>

Re: Is it possible to delegate data joins and filtering to the datasource ?

Posted by weijie tong <to...@gmail.com>.
to avoid misunderstanding , the new equivalent ScanRel is to have the
joined ScanRel nodes's GroupScans, as the GroupScans indirectly hold the
underlying storage information.

On Wed, Mar 29, 2017 at 10:15 AM, weijie tong <to...@gmail.com>
wrote:

>
> my suggestion is you define a rule which matches the DrillJoinRel RelNode
> , then at the onMatch method ,you traverse the join children to find the
> ScanRel nodes . You define a new ScanRel which include the ScanRel nodes
> you find last step. Then transform the JoinRel to this equivalent new
> ScanRel.
> Finally , the plan tree will not have the JoinRel but the ScanRel.   You
> can let your join plan rule  in the PlannerPhase.JOIN_PLANNING.
>

Re: Is it possible to delegate data joins and filtering to the datasource ?

Posted by weijie tong <to...@gmail.com>.
my suggestion is you define a rule which matches the DrillJoinRel RelNode ,
then at the onMatch method ,you traverse the join children to find the
ScanRel nodes . You define a new ScanRel which include the ScanRel nodes
you find last step. Then transform the JoinRel to this equivalent new
ScanRel.
Finally , the plan tree will not have the JoinRel but the ScanRel.   You
can let your join plan rule  in the PlannerPhase.JOIN_PLANNING.

Re: Is it possible to delegate data joins and filtering to the datasource ?

Posted by Muhammad Gelbana <m....@gmail.com>.
I'm focusing on JOINs now, specially a query such as this: *SELECT * FROM
TABLE1, TABLE2*, drill plans to transform this into 2 separate full scan
queries and then perform the cartesian product join on it's own. I'm trying
to make drill send the query as it is in a single scan (group scan ?)

@weijie

I've found that if I opt-out the JDBC's JdbcDrelConverterRule rule (i.e.
JdbcStoragePlugin.DrillJdbcConvention.DrillJdbcConvention), an exception is
thrown because Drill refuses to plan cartesian product joins. Are you
saying that I need to keep such rule and let Drill plan it to 2 different
group scans, then I should change this plan to merge these 2 group scans
into one ?

Is there a way to make Drill accept planning cartesian product joins ?

*---------------------*
*Muhammad Gelbana*
http://www.linkedin.com/in/mgelbana

On Sun, Mar 26, 2017 at 1:33 AM, Muhammad Gelbana <m....@gmail.com>
wrote:

> Priceless information ! Thank you all.
>
> I managed to debug Drill in Eclipse hoping to get a better understanding
> but I can't get my head around some stuff:
>
>    - What is the purpose of these clases\interfaces:
>       - ConverterRule
>       - DrillRel
>       - Prel
>       - JdbcStoragePlugin.JdbcPrule
>       - JdbcIntermediatePrel
>    - What does the words *Prel* and *Prule* stand for ? *Prel*iminary and
>    *P*reliminary *Rule* ?
>    - What is a calling convention ? (i.e. mentioned in *ConverterRule*'s
>    documentation)
>
> Is there a way configure the costing model for the JDBC plugin without
> having to customize it through code ? After all, my ultimate goal is to
> push down filters and joins.
>
> I'll continue debugging\browsing the code and come back with more
> questions, or hopefully an achievement !
>
> Thanks again, your help is very much appreciated.
>
> *---------------------*
> *Muhammad Gelbana*
> http://www.linkedin.com/in/mgelbana
>
> On Fri, Mar 24, 2017 at 1:29 AM, weijie tong <to...@gmail.com>
> wrote:
>
>> I am working on pushing down joins to Druid storage plugin. To my
>> experience, you need to write a rule to know whether the joins could be
>> pushed down by your storage plugin metadata first,then if ok ,you transfer
>> the join node to the scan node with the query relevant information in the
>> scan node. The key point is to do this rule in the HepPlanner.
>> Zelaine Fong <zf...@mapr.com>于2017年3月24日 周五上午5:15写道:
>>
>> > The JDBC storage plugin does attempt to do pushdowns of joins.  However,
>> > the Drill optimizer will evaluate different query plans.  In doing so,
>> it
>> > may choose an alternative plan that does not do a full pushdown if it
>> > believes that’s a less costly plan than a full pushdown.  There are a
>> > number of open bugs with the JDBC storage plugin, including DRILL-4696.
>> > For that particular issue, I believe that when it was investigated, it
>> was
>> > determined that the costing model for the JDBC storage plugin needed
>> more
>> > work.  Hence Drill wasn’t picking the more optimal full pushdown plan.
>> >
>> > -- Zelaine
>> >
>> > On 3/23/17, 1:53 PM, "Paul Rogers" <pr...@mapr.com> wrote:
>> >
>> >     Hi Muhammad,
>> >
>> >     It seems that the goal for filters should be possible; I’m not
>> > familiar enough with the code to know if joins are currently supported,
>> or
>> > if this is where you’d have to make some contributions to Drill.
>> >
>> >     The storage plugin is called at various places in the planning
>> > process, and can insert planning rules. We have plugins that push down
>> > filters, so this seems possible. For example, check Parquet and JDBC for
>> > hints. See my answer to a previous question for hints on how to get
>> started
>> > with storage plugins.
>> >
>> >     Joins may be a bit more complex. You’d have to insert planner rules;
>> > such code *may* be available, or may require extensions to Drill. Drill
>> > should certainly do this, so if the code is not there, we’d welcome your
>> > contribution.
>> >
>> >     You’d have to create an rule that creates a new scan operator that
>> > includes the information you wish to push down. For example, if you
>> push a
>> > filter, the scan definition (AKA group scan and scan entry) would need
>> to
>> > hold the information needed to implement the push-down. Again, you can
>> > probably find examples of filters, you’d have to be creative to push
>> joins.
>> >
>> >     Assembling the pieces: your plugin would add planner rules that
>> > determine when joins can be pushed. Those rules would case your plugin
>> to
>> > create a semantic node (group scan) that holds the required information.
>> > The planner then converts group scan nodes to specific plans passed to
>> the
>> > execution engine. On the execution side, your plugin provides a “Record
>> > Reader” for your format, and that reader does the actual work to push
>> the
>> > filter or join down to your data source.
>> >
>> >     Your best bet is to mine existing plugins for ideas, and then
>> > experiment. Start simply and gradually add functionality. And, ask
>> > questions back on this list.
>> >
>> >
>> >     Thanks,
>> >
>> >     - Paul
>> >
>> >     > On Mar 22, 2017, at 8:20 AM, Muhammad Gelbana <
>> m.gelbana@gmail.com>
>> > wrote:
>> >     >
>> >     > I'm trying to use Drill with a proprietary datasource that is very
>> > fast in
>> >     > applying data joins (i.e. SQL joins) and query filters (i.e. SQL
>> > where
>> >     > conditions).
>> >     >
>> >     > To connect to that datasource, I first have to write a storage
>> > plugin, but
>> >     > I'm not sure if my main goal is applicable.
>> >     >
>> >     > May main goal is to configure Drill to let the datasource perform
>> > JOINS and
>> >     > filters and only return the data. Then drill can perform further
>> > processing
>> >     > based on the original SQL query sent to Drill.
>> >     >
>> >     > Is this possible by developing a storage plugin ? Where exactly
>> > should I be
>> >     > looking ?
>> >     >
>> >     > I've been going through this wiki
>> >     > <https://github.com/paul-rogers/drill/wiki> and I don't think I
>> > understood
>> >     > every concept. So if there is another source of information about
>> > storage
>> >     > plugins development, please point it out.
>> >     >
>> >     > *---------------------*
>> >     > *Muhammad Gelbana*
>> >     > http://www.linkedin.com/in/mgelbana
>> >
>> >
>> >
>> >
>>
>
>

Re: Is it possible to delegate data joins and filtering to the datasource ?

Posted by Muhammad Gelbana <m....@gmail.com>.
Priceless information ! Thank you all.

I managed to debug Drill in Eclipse hoping to get a better understanding
but I can't get my head around some stuff:

   - What is the purpose of these clases\interfaces:
      - ConverterRule
      - DrillRel
      - Prel
      - JdbcStoragePlugin.JdbcPrule
      - JdbcIntermediatePrel
   - What does the words *Prel* and *Prule* stand for ? *Prel*iminary and
   *P*reliminary *Rule* ?
   - What is a calling convention ? (i.e. mentioned in *ConverterRule*'s
   documentation)

Is there a way configure the costing model for the JDBC plugin without
having to customize it through code ? After all, my ultimate goal is to
push down filters and joins.

I'll continue debugging\browsing the code and come back with more
questions, or hopefully an achievement !

Thanks again, your help is very much appreciated.

*---------------------*
*Muhammad Gelbana*
http://www.linkedin.com/in/mgelbana

On Fri, Mar 24, 2017 at 1:29 AM, weijie tong <to...@gmail.com>
wrote:

> I am working on pushing down joins to Druid storage plugin. To my
> experience, you need to write a rule to know whether the joins could be
> pushed down by your storage plugin metadata first,then if ok ,you transfer
> the join node to the scan node with the query relevant information in the
> scan node. The key point is to do this rule in the HepPlanner.
> Zelaine Fong <zf...@mapr.com>于2017年3月24日 周五上午5:15写道:
>
> > The JDBC storage plugin does attempt to do pushdowns of joins.  However,
> > the Drill optimizer will evaluate different query plans.  In doing so, it
> > may choose an alternative plan that does not do a full pushdown if it
> > believes that’s a less costly plan than a full pushdown.  There are a
> > number of open bugs with the JDBC storage plugin, including DRILL-4696.
> > For that particular issue, I believe that when it was investigated, it
> was
> > determined that the costing model for the JDBC storage plugin needed more
> > work.  Hence Drill wasn’t picking the more optimal full pushdown plan.
> >
> > -- Zelaine
> >
> > On 3/23/17, 1:53 PM, "Paul Rogers" <pr...@mapr.com> wrote:
> >
> >     Hi Muhammad,
> >
> >     It seems that the goal for filters should be possible; I’m not
> > familiar enough with the code to know if joins are currently supported,
> or
> > if this is where you’d have to make some contributions to Drill.
> >
> >     The storage plugin is called at various places in the planning
> > process, and can insert planning rules. We have plugins that push down
> > filters, so this seems possible. For example, check Parquet and JDBC for
> > hints. See my answer to a previous question for hints on how to get
> started
> > with storage plugins.
> >
> >     Joins may be a bit more complex. You’d have to insert planner rules;
> > such code *may* be available, or may require extensions to Drill. Drill
> > should certainly do this, so if the code is not there, we’d welcome your
> > contribution.
> >
> >     You’d have to create an rule that creates a new scan operator that
> > includes the information you wish to push down. For example, if you push
> a
> > filter, the scan definition (AKA group scan and scan entry) would need to
> > hold the information needed to implement the push-down. Again, you can
> > probably find examples of filters, you’d have to be creative to push
> joins.
> >
> >     Assembling the pieces: your plugin would add planner rules that
> > determine when joins can be pushed. Those rules would case your plugin to
> > create a semantic node (group scan) that holds the required information.
> > The planner then converts group scan nodes to specific plans passed to
> the
> > execution engine. On the execution side, your plugin provides a “Record
> > Reader” for your format, and that reader does the actual work to push the
> > filter or join down to your data source.
> >
> >     Your best bet is to mine existing plugins for ideas, and then
> > experiment. Start simply and gradually add functionality. And, ask
> > questions back on this list.
> >
> >
> >     Thanks,
> >
> >     - Paul
> >
> >     > On Mar 22, 2017, at 8:20 AM, Muhammad Gelbana <m.gelbana@gmail.com
> >
> > wrote:
> >     >
> >     > I'm trying to use Drill with a proprietary datasource that is very
> > fast in
> >     > applying data joins (i.e. SQL joins) and query filters (i.e. SQL
> > where
> >     > conditions).
> >     >
> >     > To connect to that datasource, I first have to write a storage
> > plugin, but
> >     > I'm not sure if my main goal is applicable.
> >     >
> >     > May main goal is to configure Drill to let the datasource perform
> > JOINS and
> >     > filters and only return the data. Then drill can perform further
> > processing
> >     > based on the original SQL query sent to Drill.
> >     >
> >     > Is this possible by developing a storage plugin ? Where exactly
> > should I be
> >     > looking ?
> >     >
> >     > I've been going through this wiki
> >     > <https://github.com/paul-rogers/drill/wiki> and I don't think I
> > understood
> >     > every concept. So if there is another source of information about
> > storage
> >     > plugins development, please point it out.
> >     >
> >     > *---------------------*
> >     > *Muhammad Gelbana*
> >     > http://www.linkedin.com/in/mgelbana
> >
> >
> >
> >
>

Re: Is it possible to delegate data joins and filtering to the datasource ?

Posted by weijie tong <to...@gmail.com>.
I am working on pushing down joins to Druid storage plugin. To my
experience, you need to write a rule to know whether the joins could be
pushed down by your storage plugin metadata first,then if ok ,you transfer
the join node to the scan node with the query relevant information in the
scan node. The key point is to do this rule in the HepPlanner.
Zelaine Fong <zf...@mapr.com>于2017年3月24日 周五上午5:15写道:

> The JDBC storage plugin does attempt to do pushdowns of joins.  However,
> the Drill optimizer will evaluate different query plans.  In doing so, it
> may choose an alternative plan that does not do a full pushdown if it
> believes that’s a less costly plan than a full pushdown.  There are a
> number of open bugs with the JDBC storage plugin, including DRILL-4696.
> For that particular issue, I believe that when it was investigated, it was
> determined that the costing model for the JDBC storage plugin needed more
> work.  Hence Drill wasn’t picking the more optimal full pushdown plan.
>
> -- Zelaine
>
> On 3/23/17, 1:53 PM, "Paul Rogers" <pr...@mapr.com> wrote:
>
>     Hi Muhammad,
>
>     It seems that the goal for filters should be possible; I’m not
> familiar enough with the code to know if joins are currently supported, or
> if this is where you’d have to make some contributions to Drill.
>
>     The storage plugin is called at various places in the planning
> process, and can insert planning rules. We have plugins that push down
> filters, so this seems possible. For example, check Parquet and JDBC for
> hints. See my answer to a previous question for hints on how to get started
> with storage plugins.
>
>     Joins may be a bit more complex. You’d have to insert planner rules;
> such code *may* be available, or may require extensions to Drill. Drill
> should certainly do this, so if the code is not there, we’d welcome your
> contribution.
>
>     You’d have to create an rule that creates a new scan operator that
> includes the information you wish to push down. For example, if you push a
> filter, the scan definition (AKA group scan and scan entry) would need to
> hold the information needed to implement the push-down. Again, you can
> probably find examples of filters, you’d have to be creative to push joins.
>
>     Assembling the pieces: your plugin would add planner rules that
> determine when joins can be pushed. Those rules would case your plugin to
> create a semantic node (group scan) that holds the required information.
> The planner then converts group scan nodes to specific plans passed to the
> execution engine. On the execution side, your plugin provides a “Record
> Reader” for your format, and that reader does the actual work to push the
> filter or join down to your data source.
>
>     Your best bet is to mine existing plugins for ideas, and then
> experiment. Start simply and gradually add functionality. And, ask
> questions back on this list.
>
>
>     Thanks,
>
>     - Paul
>
>     > On Mar 22, 2017, at 8:20 AM, Muhammad Gelbana <m....@gmail.com>
> wrote:
>     >
>     > I'm trying to use Drill with a proprietary datasource that is very
> fast in
>     > applying data joins (i.e. SQL joins) and query filters (i.e. SQL
> where
>     > conditions).
>     >
>     > To connect to that datasource, I first have to write a storage
> plugin, but
>     > I'm not sure if my main goal is applicable.
>     >
>     > May main goal is to configure Drill to let the datasource perform
> JOINS and
>     > filters and only return the data. Then drill can perform further
> processing
>     > based on the original SQL query sent to Drill.
>     >
>     > Is this possible by developing a storage plugin ? Where exactly
> should I be
>     > looking ?
>     >
>     > I've been going through this wiki
>     > <https://github.com/paul-rogers/drill/wiki> and I don't think I
> understood
>     > every concept. So if there is another source of information about
> storage
>     > plugins development, please point it out.
>     >
>     > *---------------------*
>     > *Muhammad Gelbana*
>     > http://www.linkedin.com/in/mgelbana
>
>
>
>

Re: Is it possible to delegate data joins and filtering to the datasource ?

Posted by Zelaine Fong <zf...@mapr.com>.
The JDBC storage plugin does attempt to do pushdowns of joins.  However, the Drill optimizer will evaluate different query plans.  In doing so, it may choose an alternative plan that does not do a full pushdown if it believes that’s a less costly plan than a full pushdown.  There are a number of open bugs with the JDBC storage plugin, including DRILL-4696.  For that particular issue, I believe that when it was investigated, it was determined that the costing model for the JDBC storage plugin needed more work.  Hence Drill wasn’t picking the more optimal full pushdown plan.

-- Zelaine

On 3/23/17, 1:53 PM, "Paul Rogers" <pr...@mapr.com> wrote:

    Hi Muhammad,
    
    It seems that the goal for filters should be possible; I’m not familiar enough with the code to know if joins are currently supported, or if this is where you’d have to make some contributions to Drill.
    
    The storage plugin is called at various places in the planning process, and can insert planning rules. We have plugins that push down filters, so this seems possible. For example, check Parquet and JDBC for hints. See my answer to a previous question for hints on how to get started with storage plugins.
    
    Joins may be a bit more complex. You’d have to insert planner rules; such code *may* be available, or may require extensions to Drill. Drill should certainly do this, so if the code is not there, we’d welcome your contribution.
    
    You’d have to create an rule that creates a new scan operator that includes the information you wish to push down. For example, if you push a filter, the scan definition (AKA group scan and scan entry) would need to hold the information needed to implement the push-down. Again, you can probably find examples of filters, you’d have to be creative to push joins.
    
    Assembling the pieces: your plugin would add planner rules that determine when joins can be pushed. Those rules would case your plugin to create a semantic node (group scan) that holds the required information. The planner then converts group scan nodes to specific plans passed to the execution engine. On the execution side, your plugin provides a “Record Reader” for your format, and that reader does the actual work to push the filter or join down to your data source.
    
    Your best bet is to mine existing plugins for ideas, and then experiment. Start simply and gradually add functionality. And, ask questions back on this list.
    
    
    Thanks,
    
    - Paul
    
    > On Mar 22, 2017, at 8:20 AM, Muhammad Gelbana <m....@gmail.com> wrote:
    > 
    > I'm trying to use Drill with a proprietary datasource that is very fast in
    > applying data joins (i.e. SQL joins) and query filters (i.e. SQL where
    > conditions).
    > 
    > To connect to that datasource, I first have to write a storage plugin, but
    > I'm not sure if my main goal is applicable.
    > 
    > May main goal is to configure Drill to let the datasource perform JOINS and
    > filters and only return the data. Then drill can perform further processing
    > based on the original SQL query sent to Drill.
    > 
    > Is this possible by developing a storage plugin ? Where exactly should I be
    > looking ?
    > 
    > I've been going through this wiki
    > <https://github.com/paul-rogers/drill/wiki> and I don't think I understood
    > every concept. So if there is another source of information about storage
    > plugins development, please point it out.
    > 
    > *---------------------*
    > *Muhammad Gelbana*
    > http://www.linkedin.com/in/mgelbana
    
    


Re: Is it possible to delegate data joins and filtering to the datasource ?

Posted by Paul Rogers <pr...@mapr.com>.
Hi Muhammad,

It seems that the goal for filters should be possible; I’m not familiar enough with the code to know if joins are currently supported, or if this is where you’d have to make some contributions to Drill.

The storage plugin is called at various places in the planning process, and can insert planning rules. We have plugins that push down filters, so this seems possible. For example, check Parquet and JDBC for hints. See my answer to a previous question for hints on how to get started with storage plugins.

Joins may be a bit more complex. You’d have to insert planner rules; such code *may* be available, or may require extensions to Drill. Drill should certainly do this, so if the code is not there, we’d welcome your contribution.

You’d have to create an rule that creates a new scan operator that includes the information you wish to push down. For example, if you push a filter, the scan definition (AKA group scan and scan entry) would need to hold the information needed to implement the push-down. Again, you can probably find examples of filters, you’d have to be creative to push joins.

Assembling the pieces: your plugin would add planner rules that determine when joins can be pushed. Those rules would case your plugin to create a semantic node (group scan) that holds the required information. The planner then converts group scan nodes to specific plans passed to the execution engine. On the execution side, your plugin provides a “Record Reader” for your format, and that reader does the actual work to push the filter or join down to your data source.

Your best bet is to mine existing plugins for ideas, and then experiment. Start simply and gradually add functionality. And, ask questions back on this list.


Thanks,

- Paul

> On Mar 22, 2017, at 8:20 AM, Muhammad Gelbana <m....@gmail.com> wrote:
> 
> I'm trying to use Drill with a proprietary datasource that is very fast in
> applying data joins (i.e. SQL joins) and query filters (i.e. SQL where
> conditions).
> 
> To connect to that datasource, I first have to write a storage plugin, but
> I'm not sure if my main goal is applicable.
> 
> May main goal is to configure Drill to let the datasource perform JOINS and
> filters and only return the data. Then drill can perform further processing
> based on the original SQL query sent to Drill.
> 
> Is this possible by developing a storage plugin ? Where exactly should I be
> looking ?
> 
> I've been going through this wiki
> <https://github.com/paul-rogers/drill/wiki> and I don't think I understood
> every concept. So if there is another source of information about storage
> plugins development, please point it out.
> 
> *---------------------*
> *Muhammad Gelbana*
> http://www.linkedin.com/in/mgelbana