You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by Ashutosh Chauhan <as...@gmail.com> on 2009/11/06 00:28:26 UTC

How to clone a logical plan ?

Hi,

For our cost based optimizer for a given query plan we need to generate
alternative query plans and evaluate them based on their estimated cost. As
a result of that, I want to clone a logical plan. I thought
LogicalPlanCloner is meant for that, but it doesnt seem to work. I added
this simple test case in TestLogicalPlanBuilder.java

    public void testLogicalPlanCloneHelper() throws
CloneNotSupportedException{
        LogicalPlan lp = buildPlan("C = join ( load 'A') by $0, (load 'B')
by $0;");
        LogicalPlanCloner cloner = new LogicalPlanCloner(lp);
        cloner.getClonedPlan();
    }

and this fails with the following stacktrace:

java.lang.NullPointerException
        at
org.apache.pig.impl.logicalLayer.LOVisitor.visit(LOVisitor.java:171)
        at
org.apache.pig.impl.logicalLayer.PlanSetter.visit(PlanSetter.java:63)
        at org.apache.pig.impl.logicalLayer.LOJoin.visit(LOJoin.java:213)
        at org.apache.pig.impl.logicalLayer.LOJoin.visit(LOJoin.java:45)
        at
org.apache.pig.impl.plan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:67)
        at
org.apache.pig.impl.plan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:69)
        at
org.apache.pig.impl.plan.DepthFirstWalker.walk(DepthFirstWalker.java:50)
        at org.apache.pig.impl.plan.PlanVisitor.visit(PlanVisitor.java:51)
        at
org.apache.pig.impl.logicalLayer.LogicalPlanCloneHelper.getClonedPlan(LogicalPlanCloneHelper.java:73)
        at
org.apache.pig.impl.logicalLayer.LogicalPlanCloner.getClonedPlan(LogicalPlanCloner.java:46)
        at
org.apache.pig.test.TestLogicalPlanBuilder.testLogicalPlanCloneHelper(TestLogicalPlanBuilder.java:2110)

I am debugging this, but wanted to ask if I have hit a bug here or if I am
doing something wrong?

Thanks,
Ashutosh

Re: How to clone a logical plan ?

Posted by Dmitriy Ryaboy <dv...@gmail.com>.
Richard,
The Load/Store redesign proposal has an interface that defines how
stats get represented; a loader that implements ResourceLoader will
pass statistics up into Pig, which will then take care of doing
whatever it needs to do with them. The specifics of how the stats get
loaded in by the loader are up to the implementation of the loader --
they can be read in from a metadata service, sampled on the fly,
stored in a metadata file, etc.

For simplicity, we are working with serialized JSON representations of
ResourceStatistics right now.

-Dmitriy

2009/11/6 RichardGUO Fei <gl...@hotmail.com>:
>
> Hi
>
>
>                    Dmitriy,
>
> Thanks for sharing. I look forward to seeing your work. I implemented a storage and want to connect Pig to my storage.
> In order to let the optimizer fully benefit from the histogram and the side-information of my storage, I am thinking of
> implementing a cost-based optimizer.
>
> How do you plan to pass in the statistics? So let's say that your input file is a plain-text log file, do you require the users to
> do a statistics themselves? Or do you plan to limit this to only certain types of storage?
>
> Thanks,
> Richard
>
>> Date: Thu, 5 Nov 2009 22:54:47 -0500
>> Subject: Re: How to clone a logical plan ?
>> From: dvryaboy@gmail.com
>> To: pig-dev@hadoop.apache.org
>>
>> At a high level, we are implementing the framework for propagating
>> statistics between Pig operators, and using said statistics to make
>> moderately intelligent decisions about Join types that should be used
>> (unless they are specified by the user).  We do this in a fairly
>> brute-force manner, by generating all alternative plans (that part is
>> not working so hot right now, see subject) and costing them, choosing
>> the global minimum (there is some pruning happening, but not as much
>> as something like System R).  As far as relation order inside a given
>> Join, we set that deterministically after choosing the join, as Pig
>> has specific preferences for where the largest relation should go for
>> a given join type.  Once we have join type selection working, other
>> optimizations can be added -- the tricky part is making sure the
>> costing functions can't produce drastically wrong results.
>>
>> All the work is happening at the logical layer, between the rule-based
>> optimizer and LogToPhysTranslator.
>>
>> -D
>>
>>
>> 2009/11/5 RichardGUO Fei <gl...@hotmail.com>:
>> >
>> > Hi,
>> >
>> > I am also doing a cost-based optimizer. So I am interested in knowing some of the specs that you are after.
>> >
>> > Thanks,
>> > Richard
>> >
>> > _________________________________________________________________
>> > 上Windows Live 中国首页,下载Messenger2009安全版!
>> > http://www.windowslive.cn
>
> _________________________________________________________________
> 上Windows Live 中国首页,下载Messenger2009安全版!
> http://www.windowslive.cn

RE: How to clone a logical plan ?

Posted by RichardGUO Fei <gl...@hotmail.com>.
Hi 
                    
                    
                    Dmitriy,

Thanks for sharing. I look forward to seeing your work. I implemented a storage and want to connect Pig to my storage.
In order to let the optimizer fully benefit from the histogram and the side-information of my storage, I am thinking of
implementing a cost-based optimizer.

How do you plan to pass in the statistics? So let's say that your input file is a plain-text log file, do you require the users to
do a statistics themselves? Or do you plan to limit this to only certain types of storage?

Thanks,
Richard

> Date: Thu, 5 Nov 2009 22:54:47 -0500
> Subject: Re: How to clone a logical plan ?
> From: dvryaboy@gmail.com
> To: pig-dev@hadoop.apache.org
> 
> At a high level, we are implementing the framework for propagating
> statistics between Pig operators, and using said statistics to make
> moderately intelligent decisions about Join types that should be used
> (unless they are specified by the user).  We do this in a fairly
> brute-force manner, by generating all alternative plans (that part is
> not working so hot right now, see subject) and costing them, choosing
> the global minimum (there is some pruning happening, but not as much
> as something like System R).  As far as relation order inside a given
> Join, we set that deterministically after choosing the join, as Pig
> has specific preferences for where the largest relation should go for
> a given join type.  Once we have join type selection working, other
> optimizations can be added -- the tricky part is making sure the
> costing functions can't produce drastically wrong results.
> 
> All the work is happening at the logical layer, between the rule-based
> optimizer and LogToPhysTranslator.
> 
> -D
> 
> 
> 2009/11/5 RichardGUO Fei <gl...@hotmail.com>:
> >
> > Hi,
> >
> > I am also doing a cost-based optimizer. So I am interested in knowing some of the specs that you are after.
> >
> > Thanks,
> > Richard
> >
> > _________________________________________________________________
> > 上Windows Live 中国首页,下载Messenger2009安全版!
> > http://www.windowslive.cn
 		 	   		  
_________________________________________________________________
上Windows Live 中国首页,下载Messenger2009安全版!
http://www.windowslive.cn

Re: How to clone a logical plan ?

Posted by Dmitriy Ryaboy <dv...@gmail.com>.
At a high level, we are implementing the framework for propagating
statistics between Pig operators, and using said statistics to make
moderately intelligent decisions about Join types that should be used
(unless they are specified by the user).  We do this in a fairly
brute-force manner, by generating all alternative plans (that part is
not working so hot right now, see subject) and costing them, choosing
the global minimum (there is some pruning happening, but not as much
as something like System R).  As far as relation order inside a given
Join, we set that deterministically after choosing the join, as Pig
has specific preferences for where the largest relation should go for
a given join type.  Once we have join type selection working, other
optimizations can be added -- the tricky part is making sure the
costing functions can't produce drastically wrong results.

All the work is happening at the logical layer, between the rule-based
optimizer and LogToPhysTranslator.

-D


2009/11/5 RichardGUO Fei <gl...@hotmail.com>:
>
> Hi,
>
> I am also doing a cost-based optimizer. So I am interested in knowing some of the specs that you are after.
>
> Thanks,
> Richard
>
> _________________________________________________________________
> 上Windows Live 中国首页,下载Messenger2009安全版!
> http://www.windowslive.cn

RE: How to clone a logical plan ?

Posted by RichardGUO Fei <gl...@hotmail.com>.
Hi,

I am also doing a cost-based optimizer. So I am interested in knowing some of the specs that you are after.

Thanks,
Richard
 		 	   		  
_________________________________________________________________
上Windows Live 中国首页,下载Messenger2009安全版!
http://www.windowslive.cn

Re: How to clone a logical plan ?

Posted by Ashutosh Chauhan <as...@gmail.com>.
Thanks, Santhosh for quick response and explaination. Saved few hours of
debugging :)

Ashutosh

On Thu, Nov 5, 2009 at 19:21, Santhosh Srinivasan <sm...@yahoo-inc.com> wrote:

> If my memory serves me correctly, the logical plan cloning was
> implemented (by me) for cloning inner plans for foreach. As such, the
> top level plan cloning was never tested and some items are marked as
> TODO (see visit methods for LOLoad, LOStore and LOStream).
>
> If you want to use it as you mention in your test cases, then you need
> to add code for cloning the LOLoad, LOStore, LOStream and LOJoin.
>
> Santhosh
>
>
> -----Original Message-----
> From: Santhosh Srinivasan [mailto:sms@yahoo-inc.com]
> Sent: Thursday, November 05, 2009 4:04 PM
> To: pig-dev@hadoop.apache.org
> Subject: RE: How to clone a logical plan ?
>
> You have hit a bug. I think LOJoin has to be added to
> LogicalPlanCloneHelper.java. Can you file a jira?
>
> Thanks,
> Santhosh
>
> -----Original Message-----
> From: Ashutosh Chauhan [mailto:ashutosh.chauhan@gmail.com]
> Sent: Thursday, November 05, 2009 3:28 PM
> To: pig-dev@hadoop.apache.org
> Subject: How to clone a logical plan ?
>
> Hi,
>
> For our cost based optimizer for a given query plan we need to generate
> alternative query plans and evaluate them based on their estimated cost.
> As a result of that, I want to clone a logical plan. I thought
> LogicalPlanCloner is meant for that, but it doesnt seem to work. I added
> this simple test case in TestLogicalPlanBuilder.java
>
>    public void testLogicalPlanCloneHelper() throws
> CloneNotSupportedException{
>        LogicalPlan lp = buildPlan("C = join ( load 'A') by $0, (load
> 'B') by $0;");
>        LogicalPlanCloner cloner = new LogicalPlanCloner(lp);
>        cloner.getClonedPlan();
>    }
>
> and this fails with the following stacktrace:
>
> java.lang.NullPointerException
>        at
> org.apache.pig.impl.logicalLayer.LOVisitor.visit(LOVisitor.java:171)
>        at
> org.apache.pig.impl.logicalLayer.PlanSetter.visit(PlanSetter.java:63)
>        at
> org.apache.pig.impl.logicalLayer.LOJoin.visit(LOJoin.java:213)
>        at org.apache.pig.impl.logicalLayer.LOJoin.visit(LOJoin.java:45)
>        at
> org.apache.pig.impl.plan.DepthFirstWalker.depthFirst(DepthFirstWalker.ja
> va:67)
>        at
> org.apache.pig.impl.plan.DepthFirstWalker.depthFirst(DepthFirstWalker.ja
> va:69)
>        at
> org.apache.pig.impl.plan.DepthFirstWalker.walk(DepthFirstWalker.java:50)
>        at
> org.apache.pig.impl.plan.PlanVisitor.visit(PlanVisitor.java:51)
>        at
> org.apache.pig.impl.logicalLayer.LogicalPlanCloneHelper.getClonedPlan(Lo
> gicalPlanCloneHelper.java:73)
>        at
> org.apache.pig.impl.logicalLayer.LogicalPlanCloner.getClonedPlan(Logical
> PlanCloner.java:46)
>        at
> org.apache.pig.test.TestLogicalPlanBuilder.testLogicalPlanCloneHelper(Te
> stLogicalPlanBuilder.java:2110)
>
> I am debugging this, but wanted to ask if I have hit a bug here or if I
> am doing something wrong?
>
> Thanks,
> Ashutosh
>

RE: How to clone a logical plan ?

Posted by Santhosh Srinivasan <sm...@yahoo-inc.com>.
If my memory serves me correctly, the logical plan cloning was
implemented (by me) for cloning inner plans for foreach. As such, the
top level plan cloning was never tested and some items are marked as
TODO (see visit methods for LOLoad, LOStore and LOStream).

If you want to use it as you mention in your test cases, then you need
to add code for cloning the LOLoad, LOStore, LOStream and LOJoin.

Santhosh


-----Original Message-----
From: Santhosh Srinivasan [mailto:sms@yahoo-inc.com] 
Sent: Thursday, November 05, 2009 4:04 PM
To: pig-dev@hadoop.apache.org
Subject: RE: How to clone a logical plan ?

You have hit a bug. I think LOJoin has to be added to
LogicalPlanCloneHelper.java. Can you file a jira?

Thanks,
Santhosh

-----Original Message-----
From: Ashutosh Chauhan [mailto:ashutosh.chauhan@gmail.com]
Sent: Thursday, November 05, 2009 3:28 PM
To: pig-dev@hadoop.apache.org
Subject: How to clone a logical plan ?

Hi,

For our cost based optimizer for a given query plan we need to generate
alternative query plans and evaluate them based on their estimated cost.
As a result of that, I want to clone a logical plan. I thought
LogicalPlanCloner is meant for that, but it doesnt seem to work. I added
this simple test case in TestLogicalPlanBuilder.java

    public void testLogicalPlanCloneHelper() throws
CloneNotSupportedException{
        LogicalPlan lp = buildPlan("C = join ( load 'A') by $0, (load
'B') by $0;");
        LogicalPlanCloner cloner = new LogicalPlanCloner(lp);
        cloner.getClonedPlan();
    }

and this fails with the following stacktrace:

java.lang.NullPointerException
        at
org.apache.pig.impl.logicalLayer.LOVisitor.visit(LOVisitor.java:171)
        at
org.apache.pig.impl.logicalLayer.PlanSetter.visit(PlanSetter.java:63)
        at
org.apache.pig.impl.logicalLayer.LOJoin.visit(LOJoin.java:213)
        at org.apache.pig.impl.logicalLayer.LOJoin.visit(LOJoin.java:45)
        at
org.apache.pig.impl.plan.DepthFirstWalker.depthFirst(DepthFirstWalker.ja
va:67)
        at
org.apache.pig.impl.plan.DepthFirstWalker.depthFirst(DepthFirstWalker.ja
va:69)
        at
org.apache.pig.impl.plan.DepthFirstWalker.walk(DepthFirstWalker.java:50)
        at
org.apache.pig.impl.plan.PlanVisitor.visit(PlanVisitor.java:51)
        at
org.apache.pig.impl.logicalLayer.LogicalPlanCloneHelper.getClonedPlan(Lo
gicalPlanCloneHelper.java:73)
        at
org.apache.pig.impl.logicalLayer.LogicalPlanCloner.getClonedPlan(Logical
PlanCloner.java:46)
        at
org.apache.pig.test.TestLogicalPlanBuilder.testLogicalPlanCloneHelper(Te
stLogicalPlanBuilder.java:2110)

I am debugging this, but wanted to ask if I have hit a bug here or if I
am doing something wrong?

Thanks,
Ashutosh

RE: How to clone a logical plan ?

Posted by Santhosh Srinivasan <sm...@yahoo-inc.com>.
You have hit a bug. I think LOJoin has to be added to
LogicalPlanCloneHelper.java. Can you file a jira?

Thanks,
Santhosh

-----Original Message-----
From: Ashutosh Chauhan [mailto:ashutosh.chauhan@gmail.com] 
Sent: Thursday, November 05, 2009 3:28 PM
To: pig-dev@hadoop.apache.org
Subject: How to clone a logical plan ?

Hi,

For our cost based optimizer for a given query plan we need to generate
alternative query plans and evaluate them based on their estimated cost.
As a result of that, I want to clone a logical plan. I thought
LogicalPlanCloner is meant for that, but it doesnt seem to work. I added
this simple test case in TestLogicalPlanBuilder.java

    public void testLogicalPlanCloneHelper() throws
CloneNotSupportedException{
        LogicalPlan lp = buildPlan("C = join ( load 'A') by $0, (load
'B') by $0;");
        LogicalPlanCloner cloner = new LogicalPlanCloner(lp);
        cloner.getClonedPlan();
    }

and this fails with the following stacktrace:

java.lang.NullPointerException
        at
org.apache.pig.impl.logicalLayer.LOVisitor.visit(LOVisitor.java:171)
        at
org.apache.pig.impl.logicalLayer.PlanSetter.visit(PlanSetter.java:63)
        at
org.apache.pig.impl.logicalLayer.LOJoin.visit(LOJoin.java:213)
        at org.apache.pig.impl.logicalLayer.LOJoin.visit(LOJoin.java:45)
        at
org.apache.pig.impl.plan.DepthFirstWalker.depthFirst(DepthFirstWalker.ja
va:67)
        at
org.apache.pig.impl.plan.DepthFirstWalker.depthFirst(DepthFirstWalker.ja
va:69)
        at
org.apache.pig.impl.plan.DepthFirstWalker.walk(DepthFirstWalker.java:50)
        at
org.apache.pig.impl.plan.PlanVisitor.visit(PlanVisitor.java:51)
        at
org.apache.pig.impl.logicalLayer.LogicalPlanCloneHelper.getClonedPlan(Lo
gicalPlanCloneHelper.java:73)
        at
org.apache.pig.impl.logicalLayer.LogicalPlanCloner.getClonedPlan(Logical
PlanCloner.java:46)
        at
org.apache.pig.test.TestLogicalPlanBuilder.testLogicalPlanCloneHelper(Te
stLogicalPlanBuilder.java:2110)

I am debugging this, but wanted to ask if I have hit a bug here or if I
am doing something wrong?

Thanks,
Ashutosh