You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by Dmitriy Ryaboy <dv...@gmail.com> on 2012/03/02 22:48:15 UTC

Re: Is this desirable: relation.projection as sugar for foreach relation generate projection

But that's already not the case. The syntax "a = distinct (foreach b
generate  $1, $2);" is completely legal.

D

On Fri, Feb 24, 2012 at 2:52 PM, Daniel Dai <da...@hortonworks.com> wrote:
> One of my concern is that it could complicate GUI mapping for the Pig
> script in the future. I feel it might be more clear one statement only
> do one thing.
>
> Daniel
>
> On Thu, Feb 23, 2012 at 2:23 PM, Jonathan Coveney <jc...@gmail.com> wrote:
>> Adam, thanks for the comments. Below is the cat of the patch (it's short
>> enough to just paste in line):
>>
>> Your comments are welcome, and I'd be curious what others think as well.
>> The blurring of the line between bags and relations is what I'm worried
>> about, but at the same time, one of the things people confuse the most is
>> that distinction.
>>
>>
>> Index: test/org/apache/pig/test/TestEvalPipeline.java
>> ===================================================================
>> --- test/org/apache/pig/test/TestEvalPipeline.java    (revision 1244760)
>> +++ test/org/apache/pig/test/TestEvalPipeline.java    (working copy)
>> @@ -383,7 +383,7 @@
>>         pigServer.registerQuery("A = LOAD '"
>>                 + Util.generateURI(tmpFile.toString(), pigContext) + "';");
>>         if (eliminateDuplicates){
>> -            pigServer.registerQuery("B = DISTINCT (FOREACH A GENERATE $0)
>> PARALLEL 10;");
>> +            pigServer.registerQuery("B = DISTINCT A.$0 PARALLEL 10;");
>>         }else{
>>             if(!useUDF) {
>>                 pigServer.registerQuery("B = ORDER A BY $0 PARALLEL 10;");
>> Index: test/org/apache/pig/test/TestEvalPipelineLocal.java
>> ===================================================================
>> --- test/org/apache/pig/test/TestEvalPipelineLocal.java    (revision
>> 1244760)
>> +++ test/org/apache/pig/test/TestEvalPipelineLocal.java    (working copy)
>> @@ -400,7 +400,7 @@
>>                 + Util.generateURI(tmpFile.toString(), pigServer
>>                         .getPigContext()) + "';");
>>         if (eliminateDuplicates){
>> -            pigServer.registerQuery("B = DISTINCT (FOREACH A GENERATE $0)
>> PARALLEL 10;");
>> +            pigServer.registerQuery("B = DISTINCT A.$0 PARALLEL 10;");
>>         }else{
>>             if(!useUDF) {
>>                 pigServer.registerQuery("B = ORDER A BY $0 PARALLEL 10;");
>> Index: src/org/apache/pig/parser/AstPrinter.g
>> ===================================================================
>> Index: src/org/apache/pig/parser/QueryParser.g
>> ===================================================================
>> --- src/org/apache/pig/parser/QueryParser.g    (revision 1244760)
>> +++ src/org/apache/pig/parser/QueryParser.g    (working copy)
>> @@ -506,7 +506,10 @@
>>           | LEFT_PAREN! col_ref ( ASC | DESC )? RIGHT_PAREN!
>>  ;
>>
>> -distinct_clause : DISTINCT^ rel partition_clause?
>> +distinct_clause : DISTINCT rel PERIOD ( col_alias_or_index | ( LEFT_PAREN
>> col_alias_or_index ( COMMA col_alias_or_index )* RIGHT_PAREN ) )
>> partition_clause?
>> +               -> ^( DISTINCT ^( FOREACH rel ^( FOREACH_PLAN_SIMPLE ^(
>> GENERATE col_alias_or_index+ ) ) ) partition_clause? )
>> +                | DISTINCT rel partition_clause?
>> +               -> ^( DISTINCT rel partition_clause? )
>>  ;
>>
>>  partition_clause : PARTITION^ BY! func_name

Re: Is this desirable: relation.projection as sugar for foreach relation generate projection

Posted by Daniel Dai <da...@hortonworks.com>.
I should say one operator only do one thing instead.

Daniel

On Fri, Mar 2, 2012 at 1:48 PM, Dmitriy Ryaboy <dv...@gmail.com> wrote:
> But that's already not the case. The syntax "a = distinct (foreach b
> generate  $1, $2);" is completely legal.
>
> D
>
> On Fri, Feb 24, 2012 at 2:52 PM, Daniel Dai <da...@hortonworks.com> wrote:
>> One of my concern is that it could complicate GUI mapping for the Pig
>> script in the future. I feel it might be more clear one statement only
>> do one thing.
>>
>> Daniel
>>
>> On Thu, Feb 23, 2012 at 2:23 PM, Jonathan Coveney <jc...@gmail.com> wrote:
>>> Adam, thanks for the comments. Below is the cat of the patch (it's short
>>> enough to just paste in line):
>>>
>>> Your comments are welcome, and I'd be curious what others think as well.
>>> The blurring of the line between bags and relations is what I'm worried
>>> about, but at the same time, one of the things people confuse the most is
>>> that distinction.
>>>
>>>
>>> Index: test/org/apache/pig/test/TestEvalPipeline.java
>>> ===================================================================
>>> --- test/org/apache/pig/test/TestEvalPipeline.java    (revision 1244760)
>>> +++ test/org/apache/pig/test/TestEvalPipeline.java    (working copy)
>>> @@ -383,7 +383,7 @@
>>>         pigServer.registerQuery("A = LOAD '"
>>>                 + Util.generateURI(tmpFile.toString(), pigContext) + "';");
>>>         if (eliminateDuplicates){
>>> -            pigServer.registerQuery("B = DISTINCT (FOREACH A GENERATE $0)
>>> PARALLEL 10;");
>>> +            pigServer.registerQuery("B = DISTINCT A.$0 PARALLEL 10;");
>>>         }else{
>>>             if(!useUDF) {
>>>                 pigServer.registerQuery("B = ORDER A BY $0 PARALLEL 10;");
>>> Index: test/org/apache/pig/test/TestEvalPipelineLocal.java
>>> ===================================================================
>>> --- test/org/apache/pig/test/TestEvalPipelineLocal.java    (revision
>>> 1244760)
>>> +++ test/org/apache/pig/test/TestEvalPipelineLocal.java    (working copy)
>>> @@ -400,7 +400,7 @@
>>>                 + Util.generateURI(tmpFile.toString(), pigServer
>>>                         .getPigContext()) + "';");
>>>         if (eliminateDuplicates){
>>> -            pigServer.registerQuery("B = DISTINCT (FOREACH A GENERATE $0)
>>> PARALLEL 10;");
>>> +            pigServer.registerQuery("B = DISTINCT A.$0 PARALLEL 10;");
>>>         }else{
>>>             if(!useUDF) {
>>>                 pigServer.registerQuery("B = ORDER A BY $0 PARALLEL 10;");
>>> Index: src/org/apache/pig/parser/AstPrinter.g
>>> ===================================================================
>>> Index: src/org/apache/pig/parser/QueryParser.g
>>> ===================================================================
>>> --- src/org/apache/pig/parser/QueryParser.g    (revision 1244760)
>>> +++ src/org/apache/pig/parser/QueryParser.g    (working copy)
>>> @@ -506,7 +506,10 @@
>>>           | LEFT_PAREN! col_ref ( ASC | DESC )? RIGHT_PAREN!
>>>  ;
>>>
>>> -distinct_clause : DISTINCT^ rel partition_clause?
>>> +distinct_clause : DISTINCT rel PERIOD ( col_alias_or_index | ( LEFT_PAREN
>>> col_alias_or_index ( COMMA col_alias_or_index )* RIGHT_PAREN ) )
>>> partition_clause?
>>> +               -> ^( DISTINCT ^( FOREACH rel ^( FOREACH_PLAN_SIMPLE ^(
>>> GENERATE col_alias_or_index+ ) ) ) partition_clause? )
>>> +                | DISTINCT rel partition_clause?
>>> +               -> ^( DISTINCT rel partition_clause? )
>>>  ;
>>>
>>>  partition_clause : PARTITION^ BY! func_name