You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by Swati Jain <sw...@aggiemail.usu.edu> on 2010/06/28 04:07:47 UTC

Bug in new logical optimizer framework?

Folks,

Posting on the dev since this is regarding the new logical plan optimization
framework which is not enabled yet. I was interested in playing around with
the new optimization framework and try adding some simple rules to it.

I have attached two simple programs which do not work when the new logical
optimization framework is enabled (they work when it is disabled). My
changes to enable the new optimizer are pretty straightforward and the diff
on branch-0.7 are attached (I just set the appropriate property to true). I
have attached two very simple scripts both of which raise an exception (in
local mode of execution) "java.io.IOException: Type mismatch in key from
map: expected org.apache.pig.impl.io.NullableIntWritable, recieved
org.apache.pig.impl.io.NullableBytesWritable" if there is atleast 1 row to
be output. The error goes away if I replace "DUMP" with "EXPLAIN"
(presumably because the bug manifests during plan execution). It would be
great if someone could throw some light on this issue or give pointers on
workarounds or ways to fix this. I have not filed a JIRA for the above,
please let me know if I should.

Also, it would be great to get some guidance on the state of the new
optimizer wrt testing (I do understand it is not GA ready since it is
disabled by default) and whether it is too early to start playing around
with adding new rules.

Thanks
Swati

Re: Bug in new logical optimizer framework?

Posted by Daniel Dai <ji...@yahoo-inc.com>.
The optimization code is in the package org.apache.pig.experimental. The 
code is not used by default unless you set system property 
pig.usenewlogicalplan to true. A good starting point might be 
"org.apache.pig.experimental.logical.optimizer.LogicalPlanOptimizer".

Daniel

Renato Marroquín Mogrovejo wrote:
> Thanks Daniel and Dmitriy for your answers, now I have a much clear idea of
> what type of optimization work is being done on PIG. And just another quick
> question, do you guys know where in the code those optimizations are?
> I just want to give it a closer look (:
>
> Renato M.
>
> 2010/7/1 Dmitriy Ryaboy <dv...@gmail.com>
>
>   
>> Renato,
>> I just want to make sure folks know -- Pig already has a number of such
>> optimizations. Daniel's work is aimed at making it (much) easier to write
>> such rules and to add a couple new ones. But some of the classic
>> optimizations like projection and filter push-down already exist in the
>> released versions of Pig.
>>
>> D
>>
>> On Thu, Jul 1, 2010 at 5:32 PM, Daniel Dai <ji...@yahoo-inc.com> wrote:
>>
>>     
>>> Yes, they are classic logical optimization plus some Pig only
>>>       
>> optimization.
>>     
>>> All these are rule based.
>>>       
>> https://issues.apache.org/jira/browse/PIG-1319is an umbrella Jira to track
>> all new optimization rules.
>>     
>>> Daniel
>>>
>>>
>>> Renato Marroquín Mogrovejo wrote:
>>>
>>>       
>>>> Hi,
>>>>
>>>> I am also interested in this logical plan optimization framework
>>>> functionality. You mentioned that rules are being developed, could you
>>>> explain a little bit more about them? Are they like the classic logical
>>>> optimizations (early projection, early filtering, among others)?
>>>> Thanks in advance.
>>>>
>>>> Renato M.
>>>>
>>>> logical plan optimization framework
>>>>
>>>> 2010/6/28 Alan Gates <ga...@yahoo-inc.com>
>>>>
>>>>
>>>>
>>>>         
>>>>> On Jun 28, 2010, at 12:36 AM, Swati Jain wrote:
>>>>>
>>>>>  Thanks for the prompt reply. As you mentioned optimization is in its
>>>>>
>>>>>
>>>>>           
>>>>>> developing stage, does it mean optimization framework is not complete
>>>>>>             
>> or
>>     
>>>>>> only rules are in developing stage? In addition to that, I would
>>>>>>             
>> really
>>     
>>>>>> appreciate if you could give a rough idea when the patch will be
>>>>>> available
>>>>>> and what functionality will it contain?
>>>>>>
>>>>>>  At this point we believe the framework is complete and rules are
>>>>>>             
>> being
>>     
>>>>>>             
>>>>> developed.  But the framework has never been used in user testing
>>>>> situations
>>>>> (alpha or beta testing) so there will be a whole round of bugs to fix
>>>>> once
>>>>> that testing is done.
>>>>>
>>>>> The current plan is to switch to this code as the actual optimizer with
>>>>> 0.8, which we hope to release late this year (no promises).
>>>>>
>>>>> Alan.
>>>>>
>>>>>
>>>>>
>>>>>           


Re: Bug in new logical optimizer framework?

Posted by Renato Marroquín Mogrovejo <re...@gmail.com>.
Thanks Daniel and Dmitriy for your answers, now I have a much clear idea of
what type of optimization work is being done on PIG. And just another quick
question, do you guys know where in the code those optimizations are?
I just want to give it a closer look (:

Renato M.

2010/7/1 Dmitriy Ryaboy <dv...@gmail.com>

> Renato,
> I just want to make sure folks know -- Pig already has a number of such
> optimizations. Daniel's work is aimed at making it (much) easier to write
> such rules and to add a couple new ones. But some of the classic
> optimizations like projection and filter push-down already exist in the
> released versions of Pig.
>
> D
>
> On Thu, Jul 1, 2010 at 5:32 PM, Daniel Dai <ji...@yahoo-inc.com> wrote:
>
> > Yes, they are classic logical optimization plus some Pig only
> optimization.
> > All these are rule based.
> https://issues.apache.org/jira/browse/PIG-1319is an umbrella Jira to track
> all new optimization rules.
> >
> > Daniel
> >
> >
> > Renato Marroquín Mogrovejo wrote:
> >
> >> Hi,
> >>
> >> I am also interested in this logical plan optimization framework
> >> functionality. You mentioned that rules are being developed, could you
> >> explain a little bit more about them? Are they like the classic logical
> >> optimizations (early projection, early filtering, among others)?
> >> Thanks in advance.
> >>
> >> Renato M.
> >>
> >> logical plan optimization framework
> >>
> >> 2010/6/28 Alan Gates <ga...@yahoo-inc.com>
> >>
> >>
> >>
> >>> On Jun 28, 2010, at 12:36 AM, Swati Jain wrote:
> >>>
> >>>  Thanks for the prompt reply. As you mentioned optimization is in its
> >>>
> >>>
> >>>> developing stage, does it mean optimization framework is not complete
> or
> >>>> only rules are in developing stage? In addition to that, I would
> really
> >>>> appreciate if you could give a rough idea when the patch will be
> >>>> available
> >>>> and what functionality will it contain?
> >>>>
> >>>>  At this point we believe the framework is complete and rules are
> being
> >>>>
> >>>>
> >>> developed.  But the framework has never been used in user testing
> >>> situations
> >>> (alpha or beta testing) so there will be a whole round of bugs to fix
> >>> once
> >>> that testing is done.
> >>>
> >>> The current plan is to switch to this code as the actual optimizer with
> >>> 0.8, which we hope to release late this year (no promises).
> >>>
> >>> Alan.
> >>>
> >>>
> >>>
> >>
> >
>

Re: Bug in new logical optimizer framework?

Posted by Dmitriy Ryaboy <dv...@gmail.com>.
Renato,
I just want to make sure folks know -- Pig already has a number of such
optimizations. Daniel's work is aimed at making it (much) easier to write
such rules and to add a couple new ones. But some of the classic
optimizations like projection and filter push-down already exist in the
released versions of Pig.

D

On Thu, Jul 1, 2010 at 5:32 PM, Daniel Dai <ji...@yahoo-inc.com> wrote:

> Yes, they are classic logical optimization plus some Pig only optimization.
> All these are rule based. https://issues.apache.org/jira/browse/PIG-1319is an umbrella Jira to track all new optimization rules.
>
> Daniel
>
>
> Renato Marroquín Mogrovejo wrote:
>
>> Hi,
>>
>> I am also interested in this logical plan optimization framework
>> functionality. You mentioned that rules are being developed, could you
>> explain a little bit more about them? Are they like the classic logical
>> optimizations (early projection, early filtering, among others)?
>> Thanks in advance.
>>
>> Renato M.
>>
>> logical plan optimization framework
>>
>> 2010/6/28 Alan Gates <ga...@yahoo-inc.com>
>>
>>
>>
>>> On Jun 28, 2010, at 12:36 AM, Swati Jain wrote:
>>>
>>>  Thanks for the prompt reply. As you mentioned optimization is in its
>>>
>>>
>>>> developing stage, does it mean optimization framework is not complete or
>>>> only rules are in developing stage? In addition to that, I would really
>>>> appreciate if you could give a rough idea when the patch will be
>>>> available
>>>> and what functionality will it contain?
>>>>
>>>>  At this point we believe the framework is complete and rules are being
>>>>
>>>>
>>> developed.  But the framework has never been used in user testing
>>> situations
>>> (alpha or beta testing) so there will be a whole round of bugs to fix
>>> once
>>> that testing is done.
>>>
>>> The current plan is to switch to this code as the actual optimizer with
>>> 0.8, which we hope to release late this year (no promises).
>>>
>>> Alan.
>>>
>>>
>>>
>>
>

Re: Bug in new logical optimizer framework?

Posted by Daniel Dai <ji...@yahoo-inc.com>.
Yes, they are classic logical optimization plus some Pig only 
optimization. All these are rule based. 
https://issues.apache.org/jira/browse/PIG-1319 is an umbrella Jira to 
track all new optimization rules.

Daniel

Renato Marroquín Mogrovejo wrote:
> Hi,
>
> I am also interested in this logical plan optimization framework
> functionality. You mentioned that rules are being developed, could you
> explain a little bit more about them? Are they like the classic logical
> optimizations (early projection, early filtering, among others)?
> Thanks in advance.
>
> Renato M.
>
> logical plan optimization framework
>
> 2010/6/28 Alan Gates <ga...@yahoo-inc.com>
>
>   
>> On Jun 28, 2010, at 12:36 AM, Swati Jain wrote:
>>
>>  Thanks for the prompt reply. As you mentioned optimization is in its
>>     
>>> developing stage, does it mean optimization framework is not complete or
>>> only rules are in developing stage? In addition to that, I would really
>>> appreciate if you could give a rough idea when the patch will be available
>>> and what functionality will it contain?
>>>
>>>  At this point we believe the framework is complete and rules are being
>>>       
>> developed.  But the framework has never been used in user testing situations
>> (alpha or beta testing) so there will be a whole round of bugs to fix once
>> that testing is done.
>>
>> The current plan is to switch to this code as the actual optimizer with
>> 0.8, which we hope to release late this year (no promises).
>>
>> Alan.
>>
>>     


Re: Bug in new logical optimizer framework?

Posted by Renato Marroquín Mogrovejo <re...@gmail.com>.
Hi,

I am also interested in this logical plan optimization framework
functionality. You mentioned that rules are being developed, could you
explain a little bit more about them? Are they like the classic logical
optimizations (early projection, early filtering, among others)?
Thanks in advance.

Renato M.

logical plan optimization framework

2010/6/28 Alan Gates <ga...@yahoo-inc.com>

>
> On Jun 28, 2010, at 12:36 AM, Swati Jain wrote:
>
>  Thanks for the prompt reply. As you mentioned optimization is in its
>> developing stage, does it mean optimization framework is not complete or
>> only rules are in developing stage? In addition to that, I would really
>> appreciate if you could give a rough idea when the patch will be available
>> and what functionality will it contain?
>>
>>  At this point we believe the framework is complete and rules are being
> developed.  But the framework has never been used in user testing situations
> (alpha or beta testing) so there will be a whole round of bugs to fix once
> that testing is done.
>
> The current plan is to switch to this code as the actual optimizer with
> 0.8, which we hope to release late this year (no promises).
>
> Alan.
>

Re: Bug in new logical optimizer framework?

Posted by Alan Gates <ga...@yahoo-inc.com>.
On Jun 28, 2010, at 12:36 AM, Swati Jain wrote:

> Thanks for the prompt reply. As you mentioned optimization is in its
> developing stage, does it mean optimization framework is not  
> complete or
> only rules are in developing stage? In addition to that, I would  
> really
> appreciate if you could give a rough idea when the patch will be  
> available
> and what functionality will it contain?
>
At this point we believe the framework is complete and rules are being  
developed.  But the framework has never been used in user testing  
situations (alpha or beta testing) so there will be a whole round of  
bugs to fix once that testing is done.

The current plan is to switch to this code as the actual optimizer  
with 0.8, which we hope to release late this year (no promises).

Alan.

Re: Bug in new logical optimizer framework?

Posted by Swati Jain <sw...@aggiemail.usu.edu>.
Thanks for the prompt reply. As you mentioned optimization is in its
developing stage, does it mean optimization framework is not complete or
only rules are in developing stage? In addition to that, I would really
appreciate if you could give a rough idea when the patch will be available
and what functionality will it contain?

Actually, I had attached seven files in my previous mail to reproduce the
bug including the error log but as you couldn't find them I am inlining all
the attachments :
*
My patch:* (To enable the optimization)

Index: src/org/apache/pig/PigServer.java
===================================================================
--- src/org/apache/pig/PigServer.java    (revision 951297)
+++ src/org/apache/pig/PigServer.java    (working copy)
@@ -179,6 +179,11 @@

         aggregateWarning =
"true".equalsIgnoreCase(pigContext.getProperties().getProperty("aggregate.warning"));
         isMultiQuery =
"true".equalsIgnoreCase(pigContext.getProperties().getProperty("opt.multiquery","true"));
+
getPigContext().getProperties().setProperty("pig.usenewlogicalplan",
"true");
+        log.info(
+                "---------> pig.usenewlogicalplan set to " +
+
getPigContext().getProperties().getProperty("pig.usenewlogicalplan",
"false") +
+                " in PigServer" );

         if (connect) {
             pigContext.connect();

*Script 1: *
A = load '/home/pig/exfile1' USING PigStorage(' ') as (x:int,y:int);
B = Group A by x;
dump B;

* Script 2:*
A = load '/home/pig/exfile1' USING PigStorage(',') as (a1:int,a2:int);
B = load '/home/pig/exfile1' USING PigStorage(',') as (b1:int,b2:int);
C = JOIN A by a1, B by     b1;
dump C;

*exfile1:*
1,5

Please let me know if you have any further questions.

Thanks,
Swati


On Sun, Jun 27, 2010 at 9:32 PM, Daniel Dai <da...@gmail.com> wrote:

> Swati,
> New logical plan is half way done so it is not surprising to see exceptions
> at current stage. We are actively developing it and will deliver patch
> shortly. Meanwhile, please attach the problematic scripts (I didn't see it
> in your mail) so we can make sure those exceptions are addressed.
>
> Thanks,
> Daniel
>
>
> From: Swati Jain
> Sent: Sunday, June 27, 2010 7:07 PM
> To: pig-dev@hadoop.apache.org
> Subject: Bug in new logical optimizer framework?
>
>
> Folks,
>
> Posting on the dev since this is regarding the new logical plan
> optimization framework which is not enabled yet. I was interested in playing
> around with the new optimization framework and try adding some simple rules
> to it.
>
> I have attached two simple programs which do not work when the new logical
> optimization framework is enabled (they work when it is disabled). My
> changes to enable the new optimizer are pretty straightforward and the diff
> on branch-0.7 are attached (I just set the appropriate property to true). I
> have attached two very simple scripts both of which raise an exception (in
> local mode of execution) "java.io.IOException: Type mismatch in key from
> map: expected org.apache.pig.impl.io.NullableIntWritable, recieved
> org.apache.pig.impl.io.NullableBytesWritable" if there is atleast 1 row to
> be output. The error goes away if I replace "DUMP" with "EXPLAIN"
> (presumably because the bug manifests during plan execution). It would be
> great if someone could throw some light on this issue or give pointers on
> workarounds or ways to fix this. I have not filed a JIRA for the above,
> please let me know if I should.
>
> Also, it would be great to get some guidance on the state of the new
> optimizer wrt testing (I do understand it is not GA ready since it is
> disabled by default) and whether it is too early to start playing around
> with adding new rules.
>
> Thanks
> Swati
>

Re: Bug in new logical optimizer framework?

Posted by Daniel Dai <da...@gmail.com>.
Swati,
New logical plan is half way done so it is not surprising to see exceptions at current stage. We are actively developing it and will deliver patch shortly. Meanwhile, please attach the problematic scripts (I didn't see it in your mail) so we can make sure those exceptions are addressed. 

Thanks,
Daniel


From: Swati Jain 
Sent: Sunday, June 27, 2010 7:07 PM
To: pig-dev@hadoop.apache.org 
Subject: Bug in new logical optimizer framework?


Folks,

Posting on the dev since this is regarding the new logical plan optimization framework which is not enabled yet. I was interested in playing around with the new optimization framework and try adding some simple rules to it.

I have attached two simple programs which do not work when the new logical optimization framework is enabled (they work when it is disabled). My changes to enable the new optimizer are pretty straightforward and the diff on branch-0.7 are attached (I just set the appropriate property to true). I have attached two very simple scripts both of which raise an exception (in local mode of execution) "java.io.IOException: Type mismatch in key from map: expected org.apache.pig.impl.io.NullableIntWritable, recieved org.apache.pig.impl.io.NullableBytesWritable" if there is atleast 1 row to be output. The error goes away if I replace "DUMP" with "EXPLAIN" (presumably because the bug manifests during plan execution). It would be great if someone could throw some light on this issue or give pointers on workarounds or ways to fix this. I have not filed a JIRA for the above, please let me know if I should.

Also, it would be great to get some guidance on the state of the new optimizer wrt testing (I do understand it is not GA ready since it is disabled by default) and whether it is too early to start playing around with adding new rules.

Thanks
Swati