You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by Dmitriy Ryaboy <dv...@gmail.com> on 2012/03/02 22:48:15 UTC
Re: Is this desirable: relation.projection as sugar for foreach
relation generate projection
But that's already not the case. The syntax "a = distinct (foreach b
generate $1, $2);" is completely legal.
D
On Fri, Feb 24, 2012 at 2:52 PM, Daniel Dai <da...@hortonworks.com> wrote:
> One of my concern is that it could complicate GUI mapping for the Pig
> script in the future. I feel it might be more clear one statement only
> do one thing.
>
> Daniel
>
> On Thu, Feb 23, 2012 at 2:23 PM, Jonathan Coveney <jc...@gmail.com> wrote:
>> Adam, thanks for the comments. Below is the cat of the patch (it's short
>> enough to just paste in line):
>>
>> Your comments are welcome, and I'd be curious what others think as well.
>> The blurring of the line between bags and relations is what I'm worried
>> about, but at the same time, one of the things people confuse the most is
>> that distinction.
>>
>>
>> Index: test/org/apache/pig/test/TestEvalPipeline.java
>> ===================================================================
>> --- test/org/apache/pig/test/TestEvalPipeline.java (revision 1244760)
>> +++ test/org/apache/pig/test/TestEvalPipeline.java (working copy)
>> @@ -383,7 +383,7 @@
>> pigServer.registerQuery("A = LOAD '"
>> + Util.generateURI(tmpFile.toString(), pigContext) + "';");
>> if (eliminateDuplicates){
>> - pigServer.registerQuery("B = DISTINCT (FOREACH A GENERATE $0)
>> PARALLEL 10;");
>> + pigServer.registerQuery("B = DISTINCT A.$0 PARALLEL 10;");
>> }else{
>> if(!useUDF) {
>> pigServer.registerQuery("B = ORDER A BY $0 PARALLEL 10;");
>> Index: test/org/apache/pig/test/TestEvalPipelineLocal.java
>> ===================================================================
>> --- test/org/apache/pig/test/TestEvalPipelineLocal.java (revision
>> 1244760)
>> +++ test/org/apache/pig/test/TestEvalPipelineLocal.java (working copy)
>> @@ -400,7 +400,7 @@
>> + Util.generateURI(tmpFile.toString(), pigServer
>> .getPigContext()) + "';");
>> if (eliminateDuplicates){
>> - pigServer.registerQuery("B = DISTINCT (FOREACH A GENERATE $0)
>> PARALLEL 10;");
>> + pigServer.registerQuery("B = DISTINCT A.$0 PARALLEL 10;");
>> }else{
>> if(!useUDF) {
>> pigServer.registerQuery("B = ORDER A BY $0 PARALLEL 10;");
>> Index: src/org/apache/pig/parser/AstPrinter.g
>> ===================================================================
>> Index: src/org/apache/pig/parser/QueryParser.g
>> ===================================================================
>> --- src/org/apache/pig/parser/QueryParser.g (revision 1244760)
>> +++ src/org/apache/pig/parser/QueryParser.g (working copy)
>> @@ -506,7 +506,10 @@
>> | LEFT_PAREN! col_ref ( ASC | DESC )? RIGHT_PAREN!
>> ;
>>
>> -distinct_clause : DISTINCT^ rel partition_clause?
>> +distinct_clause : DISTINCT rel PERIOD ( col_alias_or_index | ( LEFT_PAREN
>> col_alias_or_index ( COMMA col_alias_or_index )* RIGHT_PAREN ) )
>> partition_clause?
>> + -> ^( DISTINCT ^( FOREACH rel ^( FOREACH_PLAN_SIMPLE ^(
>> GENERATE col_alias_or_index+ ) ) ) partition_clause? )
>> + | DISTINCT rel partition_clause?
>> + -> ^( DISTINCT rel partition_clause? )
>> ;
>>
>> partition_clause : PARTITION^ BY! func_name
Re: Is this desirable: relation.projection as sugar for foreach
relation generate projection
Posted by Daniel Dai <da...@hortonworks.com>.
I should say one operator only do one thing instead.
Daniel
On Fri, Mar 2, 2012 at 1:48 PM, Dmitriy Ryaboy <dv...@gmail.com> wrote:
> But that's already not the case. The syntax "a = distinct (foreach b
> generate $1, $2);" is completely legal.
>
> D
>
> On Fri, Feb 24, 2012 at 2:52 PM, Daniel Dai <da...@hortonworks.com> wrote:
>> One of my concern is that it could complicate GUI mapping for the Pig
>> script in the future. I feel it might be more clear one statement only
>> do one thing.
>>
>> Daniel
>>
>> On Thu, Feb 23, 2012 at 2:23 PM, Jonathan Coveney <jc...@gmail.com> wrote:
>>> Adam, thanks for the comments. Below is the cat of the patch (it's short
>>> enough to just paste in line):
>>>
>>> Your comments are welcome, and I'd be curious what others think as well.
>>> The blurring of the line between bags and relations is what I'm worried
>>> about, but at the same time, one of the things people confuse the most is
>>> that distinction.
>>>
>>>
>>> Index: test/org/apache/pig/test/TestEvalPipeline.java
>>> ===================================================================
>>> --- test/org/apache/pig/test/TestEvalPipeline.java (revision 1244760)
>>> +++ test/org/apache/pig/test/TestEvalPipeline.java (working copy)
>>> @@ -383,7 +383,7 @@
>>> pigServer.registerQuery("A = LOAD '"
>>> + Util.generateURI(tmpFile.toString(), pigContext) + "';");
>>> if (eliminateDuplicates){
>>> - pigServer.registerQuery("B = DISTINCT (FOREACH A GENERATE $0)
>>> PARALLEL 10;");
>>> + pigServer.registerQuery("B = DISTINCT A.$0 PARALLEL 10;");
>>> }else{
>>> if(!useUDF) {
>>> pigServer.registerQuery("B = ORDER A BY $0 PARALLEL 10;");
>>> Index: test/org/apache/pig/test/TestEvalPipelineLocal.java
>>> ===================================================================
>>> --- test/org/apache/pig/test/TestEvalPipelineLocal.java (revision
>>> 1244760)
>>> +++ test/org/apache/pig/test/TestEvalPipelineLocal.java (working copy)
>>> @@ -400,7 +400,7 @@
>>> + Util.generateURI(tmpFile.toString(), pigServer
>>> .getPigContext()) + "';");
>>> if (eliminateDuplicates){
>>> - pigServer.registerQuery("B = DISTINCT (FOREACH A GENERATE $0)
>>> PARALLEL 10;");
>>> + pigServer.registerQuery("B = DISTINCT A.$0 PARALLEL 10;");
>>> }else{
>>> if(!useUDF) {
>>> pigServer.registerQuery("B = ORDER A BY $0 PARALLEL 10;");
>>> Index: src/org/apache/pig/parser/AstPrinter.g
>>> ===================================================================
>>> Index: src/org/apache/pig/parser/QueryParser.g
>>> ===================================================================
>>> --- src/org/apache/pig/parser/QueryParser.g (revision 1244760)
>>> +++ src/org/apache/pig/parser/QueryParser.g (working copy)
>>> @@ -506,7 +506,10 @@
>>> | LEFT_PAREN! col_ref ( ASC | DESC )? RIGHT_PAREN!
>>> ;
>>>
>>> -distinct_clause : DISTINCT^ rel partition_clause?
>>> +distinct_clause : DISTINCT rel PERIOD ( col_alias_or_index | ( LEFT_PAREN
>>> col_alias_or_index ( COMMA col_alias_or_index )* RIGHT_PAREN ) )
>>> partition_clause?
>>> + -> ^( DISTINCT ^( FOREACH rel ^( FOREACH_PLAN_SIMPLE ^(
>>> GENERATE col_alias_or_index+ ) ) ) partition_clause? )
>>> + | DISTINCT rel partition_clause?
>>> + -> ^( DISTINCT rel partition_clause? )
>>> ;
>>>
>>> partition_clause : PARTITION^ BY! func_name