You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by Russell Jurney <ru...@gmail.com> on 2012/07/20 04:34:58 UTC

Can't JOIN self?

I have a problem where I can't join a relation to itself on a different
field.

describe pairs
pairs: {from: chararray,to: chararray,message_id: chararray,in_reply_to:
chararray}

pairs2 = pairs;

with_reply = join pairs by in_reply_to, pairs2 by message_id;


I get this error:

2012-07-19 19:31:16,927 [main] ERROR org.apache.pig.tools.grunt.Grunt -
ERROR 1200: Pig script failed to parse:
<line 20, column 6> pig script failed to validate:
org.apache.pig.impl.logicalLayer.FrontendException: ERROR 2225: Projection
with nothing to reference!
2012-07-19 19:31:16,928 [main] ERROR org.apache.pig.tools.grunt.Grunt -
Failed to parse: Pig script failed to parse:
<line 20, column 6> pig script failed to validate:
org.apache.pig.impl.logicalLayer.FrontendException: ERROR 2225: Projection
with nothing to reference!
at org.apache.pig.parser.QueryParserDriver.parse(QueryParserDriver.java:182)
at org.apache.pig.PigServer$Graph.validateQuery(PigServer.java:1565)
at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1538)
at org.apache.pig.PigServer.registerQuery(PigServer.java:540)
at org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:970)
at
org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:386)
at
org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:189)
at
org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:165)
at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:69)
at org.apache.pig.Main.run(Main.java:490)
at org.apache.pig.Main.main(Main.java:111)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
Caused by:
<line 20, column 6> pig script failed to validate:
org.apache.pig.impl.logicalLayer.FrontendException: ERROR 2225: Projection
with nothing to reference!
at
org.apache.pig.parser.LogicalPlanBuilder.buildJoinOp(LogicalPlanBuilder.java:363)
at
org.apache.pig.parser.LogicalPlanGenerator.join_clause(LogicalPlanGenerator.java:11354)
at
org.apache.pig.parser.LogicalPlanGenerator.op_clause(LogicalPlanGenerator.java:1489)
at
org.apache.pig.parser.LogicalPlanGenerator.general_statement(LogicalPlanGenerator.java:789)
at
org.apache.pig.parser.LogicalPlanGenerator.statement(LogicalPlanGenerator.java:507)
at
org.apache.pig.parser.LogicalPlanGenerator.query(LogicalPlanGenerator.java:382)
at org.apache.pig.parser.QueryParserDriver.parse(QueryParserDriver.java:175)
... 15 more


What am I to do?
-- 
Russell Jurney twitter.com/rjurney russell.jurney@gmail.com datasyndrome.com

Re: Can't JOIN self?

Posted by Russell Jurney <ru...@gmail.com>.
I'm realizing that I need to do this constantly, otherwise I can't make
much of anything. I used to do this, I think, maybe Pig let it slide.

On Mon, Jul 23, 2012 at 2:48 PM, Russell Jurney <ru...@gmail.com>wrote:

> Thanks, that was my thinking. If I make an alias and self-JOIN to it, it
> should work. Self-joins this way are really powerful.
>
>
> On Mon, Jul 23, 2012 at 2:36 PM, Sean Timm <Ti...@aol.com> wrote:
>
>> It seem the self join should work in Pig 0.10 if using an alias, but alas
>> it doesn't.  See Jira PIG-2630. https://issues.apache.org/**
>> jira/browse/PIG-2630 <https://issues.apache.org/jira/browse/PIG-2630>
>>
>> -Sean
>>
>>
>> On 7/20/2012 12:01 PM, Alan Gates wrote:
>>
>>> It isn't a bug that you need to declare the join twice in your script.
>>>  That is necessary for clarity and semantic correctness.  That is, if we
>>> allowed:
>>>
>>> A = load 'bla';
>>> B = join A by user, A by user;
>>>
>>> then you'd have two user fields in the B with no way to disambiguate.
>>>  What's a bug (or missed optimization opportunity) is that we actually
>>> double read and shuffle the data.  We could optimize here and only read
>>> shuffle one copy and then do the join in the reduce.
>>>
>>> Alan.
>>>
>>> On Jul 20, 2012, at 12:53 AM, Dmitriy Ryaboy wrote:
>>>
>>>  It's kind if a waste of io and mappers. If not a bug, it's an
>>>> optimization opportunity.
>>>>
>>>> On Jul 19, 2012, at 10:34 PM, Bill Graham <bi...@gmail.com> wrote:
>>>>
>>>>  No, it isn't a bug as I see it. You need to load the two relations
>>>>> separately because a join is across two separate data sources.
>>>>>
>>>>>
>>>>> On Thu, Jul 19, 2012 at 10:10 PM, Russell Jurney
>>>>> <ru...@gmail.com>**wrote:
>>>>>
>>>>>  So it is a bug? Because Pig will not let me self JOIN. I have to LOAD
>>>>>> the
>>>>>> data twice.
>>>>>>
>>>>>> On Thu, Jul 19, 2012 at 9:49 PM, Bill Graham <bi...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>  No, to Pig a self join is just like a regular join across two
>>>>>>> different
>>>>>>> relations. It just happens to be to the same input data.
>>>>>>>
>>>>>>> On Thu, Jul 19, 2012 at 8:39 PM, Russell Jurney <
>>>>>>> russell.jurney@gmail.com
>>>>>>>
>>>>>>>> wrote:
>>>>>>>> Is this a bug?
>>>>>>>>
>>>>>>>> On Thu, Jul 19, 2012 at 8:00 PM, Robert Yerex <
>>>>>>>> robert.yerex@civitaslearning.**com<ro...@civitaslearning.com>>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>  The only way to get it to work is to load a second copy.
>>>>>>>>>
>>>>>>>>> On Thu, Jul 19, 2012 at 7:46 PM, Russell Jurney <
>>>>>>>>>
>>>>>>>> russell.jurney@gmail.com
>>>>>>>>
>>>>>>>>> wrote:
>>>>>>>>>> Note: this works if I LOAD a new, 2nd relation and do the join.
>>>>>>>>>>
>>>>>>>>>> On Thu, Jul 19, 2012 at 7:34 PM, Russell Jurney <
>>>>>>>>>>
>>>>>>>>> russell.jurney@gmail.com
>>>>>>>>>
>>>>>>>>>> wrote:
>>>>>>>>>>> I have a problem where I can't join a relation to itself on a
>>>>>>>>>>>
>>>>>>>>>> different
>>>>>>>>
>>>>>>>>> field.
>>>>>>>>>>>
>>>>>>>>>>> describe pairs
>>>>>>>>>>> pairs: {from: chararray,to: chararray,message_id:
>>>>>>>>>>>
>>>>>>>>>> chararray,in_reply_to:
>>>>>>>>>
>>>>>>>>>> chararray}
>>>>>>>>>>>
>>>>>>>>>>> pairs2 = pairs;
>>>>>>>>>>>
>>>>>>>>>>> with_reply = join pairs by in_reply_to, pairs2 by message_id;
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> I get this error:
>>>>>>>>>>>
>>>>>>>>>>> 2012-07-19 19:31:16,927 [main] ERROR
>>>>>>>>>>>
>>>>>>>>>> org.apache.pig.tools.grunt.**Grunt -
>>>>>>>>
>>>>>>>>> ERROR 1200: Pig script failed to parse:
>>>>>>>>>>> <line 20, column 6> pig script failed to validate:
>>>>>>>>>>> org.apache.pig.impl.**logicalLayer.**FrontendException: ERROR
>>>>>>>>>>> 2225:
>>>>>>>>>>>
>>>>>>>>>> Projection
>>>>>>>>>>
>>>>>>>>>>> with nothing to reference!
>>>>>>>>>>> 2012-07-19 19:31:16,928 [main] ERROR
>>>>>>>>>>>
>>>>>>>>>> org.apache.pig.tools.grunt.**Grunt -
>>>>>>>>
>>>>>>>>> Failed to parse: Pig script failed to parse:
>>>>>>>>>>> <line 20, column 6> pig script failed to validate:
>>>>>>>>>>> org.apache.pig.impl.**logicalLayer.**FrontendException: ERROR
>>>>>>>>>>> 2225:
>>>>>>>>>>>
>>>>>>>>>> Projection
>>>>>>>>>>
>>>>>>>>>>> with nothing to reference!
>>>>>>>>>>> at
>>>>>>>>>>>
>>>>>>>>>>>  org.apache.pig.parser.**QueryParserDriver.parse(**
>>>>>>> QueryParserDriver.java:182)
>>>>>>>
>>>>>>>>  at
>>>>>>>>>>>
>>>>>>>>>> org.apache.pig.PigServer$**Graph.validateQuery(PigServer.**
>>>>>>> java:1565)
>>>>>>>
>>>>>>>>  at
>>>>>>>>>>>
>>>>>>>>>> org.apache.pig.PigServer$**Graph.registerQuery(PigServer.**
>>>>>>> java:1538)
>>>>>>>
>>>>>>>>  at org.apache.pig.PigServer.**registerQuery(PigServer.java:**540)
>>>>>>>>>>> at
>>>>>>>>>>>
>>>>>>>>>> org.apache.pig.tools.grunt.**GruntParser.processPig(**
>>>>>>> GruntParser.java:970)
>>>>>>>
>>>>>>>>  at
>>>>>>>>>>>
>>>>>>>>>>>  org.apache.pig.tools.**pigscript.parser.**
>>>>>>> PigScriptParser.parse(**PigScriptParser.java:386)
>>>>>>>
>>>>>>>>  at
>>>>>>>>>>>
>>>>>>>>>>>  org.apache.pig.tools.grunt.**GruntParser.parseStopOnError(**
>>>>>>> GruntParser.java:189)
>>>>>>>
>>>>>>>>  at
>>>>>>>>>>>
>>>>>>>>>>>  org.apache.pig.tools.grunt.**GruntParser.parseStopOnError(**
>>>>>>> GruntParser.java:165)
>>>>>>>
>>>>>>>>  at org.apache.pig.tools.grunt.**Grunt.run(Grunt.java:69)
>>>>>>>>>>> at org.apache.pig.Main.run(Main.**java:490)
>>>>>>>>>>> at org.apache.pig.Main.main(Main.**java:111)
>>>>>>>>>>> at sun.reflect.**NativeMethodAccessorImpl.**invoke0(Native
>>>>>>>>>>> Method)
>>>>>>>>>>> at
>>>>>>>>>>>
>>>>>>>>>>>  sun.reflect.**NativeMethodAccessorImpl.**invoke(**
>>>>>>> NativeMethodAccessorImpl.java:**39)
>>>>>>>
>>>>>>>>  at
>>>>>>>>>>>
>>>>>>>>>>>  sun.reflect.**DelegatingMethodAccessorImpl.**invoke(**
>>>>>>> DelegatingMethodAccessorImpl.**java:25)
>>>>>>>
>>>>>>>>  at java.lang.reflect.Method.**invoke(Method.java:597)
>>>>>>>>>>> at org.apache.hadoop.util.RunJar.**main(RunJar.java:156)
>>>>>>>>>>> Caused by:
>>>>>>>>>>> <line 20, column 6> pig script failed to validate:
>>>>>>>>>>> org.apache.pig.impl.**logicalLayer.**FrontendException: ERROR
>>>>>>>>>>> 2225:
>>>>>>>>>>>
>>>>>>>>>> Projection
>>>>>>>>>>
>>>>>>>>>>> with nothing to reference!
>>>>>>>>>>> at
>>>>>>>>>>>
>>>>>>>>>>>  org.apache.pig.parser.**LogicalPlanBuilder.**buildJoinOp(**
>>>>>>> LogicalPlanBuilder.java:363)
>>>>>>>
>>>>>>>>  at
>>>>>>>>>>>
>>>>>>>>>>>  org.apache.pig.parser.**LogicalPlanGenerator.join_**
>>>>>>> clause(LogicalPlanGenerator.**java:11354)
>>>>>>>
>>>>>>>>  at
>>>>>>>>>>>
>>>>>>>>>>>  org.apache.pig.parser.**LogicalPlanGenerator.op_**
>>>>>>> clause(LogicalPlanGenerator.**java:1489)
>>>>>>>
>>>>>>>>  at
>>>>>>>>>>>
>>>>>>>>>>>  org.apache.pig.parser.**LogicalPlanGenerator.general_**
>>>>>>> statement(**LogicalPlanGenerator.java:789)
>>>>>>>
>>>>>>>>  at
>>>>>>>>>>>
>>>>>>>>>>>  org.apache.pig.parser.**LogicalPlanGenerator.**statement(**
>>>>>>> LogicalPlanGenerator.java:507)
>>>>>>>
>>>>>>>>  at
>>>>>>>>>>>
>>>>>>>>>>>  org.apache.pig.parser.**LogicalPlanGenerator.query(**
>>>>>>> LogicalPlanGenerator.java:382)
>>>>>>>
>>>>>>>>  at
>>>>>>>>>>>
>>>>>>>>>>>  org.apache.pig.parser.**QueryParserDriver.parse(**
>>>>>>> QueryParserDriver.java:175)
>>>>>>>
>>>>>>>>  ... 15 more
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> What am I to do?
>>>>>>>>>>> --
>>>>>>>>>>> Russell Jurney
>>>>>>>>>>>
>>>>>>>>>> twitter.com/rjurneyrussell.**jurney@gmail.comdatasyndrome<ht...@gmail.comdatasyndrome>
>>>>>>>> .
>>>>>>>>
>>>>>>>>> com
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> Russell Jurney twitter.com/rjurney russell.jurney@gmail.com
>>>>>>>>>> datasyndrome.com
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Robert Yerex
>>>>>>>>> Data Scientist
>>>>>>>>> Civitas Learning
>>>>>>>>> www.civitaslearning.com
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Russell Jurney twitter.com/rjurney russell.jurney@gmail.com
>>>>>>>> datasyndrome.com
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> *Note that I'm no longer using my Yahoo! email address. Please email
>>>>>>> me at
>>>>>>> billgraham@gmail.com going forward.*
>>>>>>>
>>>>>>>
>>>>>>
>>>>>> --
>>>>>> Russell Jurney twitter.com/rjurney russell.jurney@gmail.comdatasyndrome.
>>>>>> com
>>>>>>
>>>>>>
>>>>>
>>>>> --
>>>>> *Note that I'm no longer using my Yahoo! email address. Please email
>>>>> me at
>>>>> billgraham@gmail.com going forward.*
>>>>>
>>>>
>>
>
>
> --
> Russell Jurney twitter.com/rjurney russell.jurney@gmail.com datasyndrome.
> com
>



-- 
Russell Jurney twitter.com/rjurney russell.jurney@gmail.com datasyndrome.com

Re: Can't JOIN self?

Posted by Russell Jurney <ru...@gmail.com>.
Thanks, that was my thinking. If I make an alias and self-JOIN to it, it
should work. Self-joins this way are really powerful.

On Mon, Jul 23, 2012 at 2:36 PM, Sean Timm <Ti...@aol.com> wrote:

> It seem the self join should work in Pig 0.10 if using an alias, but alas
> it doesn't.  See Jira PIG-2630. https://issues.apache.org/**
> jira/browse/PIG-2630 <https://issues.apache.org/jira/browse/PIG-2630>
>
> -Sean
>
>
> On 7/20/2012 12:01 PM, Alan Gates wrote:
>
>> It isn't a bug that you need to declare the join twice in your script.
>>  That is necessary for clarity and semantic correctness.  That is, if we
>> allowed:
>>
>> A = load 'bla';
>> B = join A by user, A by user;
>>
>> then you'd have two user fields in the B with no way to disambiguate.
>>  What's a bug (or missed optimization opportunity) is that we actually
>> double read and shuffle the data.  We could optimize here and only read
>> shuffle one copy and then do the join in the reduce.
>>
>> Alan.
>>
>> On Jul 20, 2012, at 12:53 AM, Dmitriy Ryaboy wrote:
>>
>>  It's kind if a waste of io and mappers. If not a bug, it's an
>>> optimization opportunity.
>>>
>>> On Jul 19, 2012, at 10:34 PM, Bill Graham <bi...@gmail.com> wrote:
>>>
>>>  No, it isn't a bug as I see it. You need to load the two relations
>>>> separately because a join is across two separate data sources.
>>>>
>>>>
>>>> On Thu, Jul 19, 2012 at 10:10 PM, Russell Jurney
>>>> <ru...@gmail.com>**wrote:
>>>>
>>>>  So it is a bug? Because Pig will not let me self JOIN. I have to LOAD
>>>>> the
>>>>> data twice.
>>>>>
>>>>> On Thu, Jul 19, 2012 at 9:49 PM, Bill Graham <bi...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>  No, to Pig a self join is just like a regular join across two
>>>>>> different
>>>>>> relations. It just happens to be to the same input data.
>>>>>>
>>>>>> On Thu, Jul 19, 2012 at 8:39 PM, Russell Jurney <
>>>>>> russell.jurney@gmail.com
>>>>>>
>>>>>>> wrote:
>>>>>>> Is this a bug?
>>>>>>>
>>>>>>> On Thu, Jul 19, 2012 at 8:00 PM, Robert Yerex <
>>>>>>> robert.yerex@civitaslearning.**com<ro...@civitaslearning.com>>
>>>>>>> wrote:
>>>>>>>
>>>>>>>  The only way to get it to work is to load a second copy.
>>>>>>>>
>>>>>>>> On Thu, Jul 19, 2012 at 7:46 PM, Russell Jurney <
>>>>>>>>
>>>>>>> russell.jurney@gmail.com
>>>>>>>
>>>>>>>> wrote:
>>>>>>>>> Note: this works if I LOAD a new, 2nd relation and do the join.
>>>>>>>>>
>>>>>>>>> On Thu, Jul 19, 2012 at 7:34 PM, Russell Jurney <
>>>>>>>>>
>>>>>>>> russell.jurney@gmail.com
>>>>>>>>
>>>>>>>>> wrote:
>>>>>>>>>> I have a problem where I can't join a relation to itself on a
>>>>>>>>>>
>>>>>>>>> different
>>>>>>>
>>>>>>>> field.
>>>>>>>>>>
>>>>>>>>>> describe pairs
>>>>>>>>>> pairs: {from: chararray,to: chararray,message_id:
>>>>>>>>>>
>>>>>>>>> chararray,in_reply_to:
>>>>>>>>
>>>>>>>>> chararray}
>>>>>>>>>>
>>>>>>>>>> pairs2 = pairs;
>>>>>>>>>>
>>>>>>>>>> with_reply = join pairs by in_reply_to, pairs2 by message_id;
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> I get this error:
>>>>>>>>>>
>>>>>>>>>> 2012-07-19 19:31:16,927 [main] ERROR
>>>>>>>>>>
>>>>>>>>> org.apache.pig.tools.grunt.**Grunt -
>>>>>>>
>>>>>>>> ERROR 1200: Pig script failed to parse:
>>>>>>>>>> <line 20, column 6> pig script failed to validate:
>>>>>>>>>> org.apache.pig.impl.**logicalLayer.**FrontendException: ERROR
>>>>>>>>>> 2225:
>>>>>>>>>>
>>>>>>>>> Projection
>>>>>>>>>
>>>>>>>>>> with nothing to reference!
>>>>>>>>>> 2012-07-19 19:31:16,928 [main] ERROR
>>>>>>>>>>
>>>>>>>>> org.apache.pig.tools.grunt.**Grunt -
>>>>>>>
>>>>>>>> Failed to parse: Pig script failed to parse:
>>>>>>>>>> <line 20, column 6> pig script failed to validate:
>>>>>>>>>> org.apache.pig.impl.**logicalLayer.**FrontendException: ERROR
>>>>>>>>>> 2225:
>>>>>>>>>>
>>>>>>>>> Projection
>>>>>>>>>
>>>>>>>>>> with nothing to reference!
>>>>>>>>>> at
>>>>>>>>>>
>>>>>>>>>>  org.apache.pig.parser.**QueryParserDriver.parse(**
>>>>>> QueryParserDriver.java:182)
>>>>>>
>>>>>>> at
>>>>>>>>>>
>>>>>>>>> org.apache.pig.PigServer$**Graph.validateQuery(PigServer.**
>>>>>> java:1565)
>>>>>>
>>>>>>> at
>>>>>>>>>>
>>>>>>>>> org.apache.pig.PigServer$**Graph.registerQuery(PigServer.**
>>>>>> java:1538)
>>>>>>
>>>>>>> at org.apache.pig.PigServer.**registerQuery(PigServer.java:**540)
>>>>>>>>>> at
>>>>>>>>>>
>>>>>>>>> org.apache.pig.tools.grunt.**GruntParser.processPig(**
>>>>>> GruntParser.java:970)
>>>>>>
>>>>>>> at
>>>>>>>>>>
>>>>>>>>>>  org.apache.pig.tools.**pigscript.parser.**PigScriptParser.parse(
>>>>>> **PigScriptParser.java:386)
>>>>>>
>>>>>>> at
>>>>>>>>>>
>>>>>>>>>>  org.apache.pig.tools.grunt.**GruntParser.parseStopOnError(**
>>>>>> GruntParser.java:189)
>>>>>>
>>>>>>> at
>>>>>>>>>>
>>>>>>>>>>  org.apache.pig.tools.grunt.**GruntParser.parseStopOnError(**
>>>>>> GruntParser.java:165)
>>>>>>
>>>>>>> at org.apache.pig.tools.grunt.**Grunt.run(Grunt.java:69)
>>>>>>>>>> at org.apache.pig.Main.run(Main.**java:490)
>>>>>>>>>> at org.apache.pig.Main.main(Main.**java:111)
>>>>>>>>>> at sun.reflect.**NativeMethodAccessorImpl.**invoke0(Native
>>>>>>>>>> Method)
>>>>>>>>>> at
>>>>>>>>>>
>>>>>>>>>>  sun.reflect.**NativeMethodAccessorImpl.**invoke(**
>>>>>> NativeMethodAccessorImpl.java:**39)
>>>>>>
>>>>>>> at
>>>>>>>>>>
>>>>>>>>>>  sun.reflect.**DelegatingMethodAccessorImpl.**invoke(**
>>>>>> DelegatingMethodAccessorImpl.**java:25)
>>>>>>
>>>>>>> at java.lang.reflect.Method.**invoke(Method.java:597)
>>>>>>>>>> at org.apache.hadoop.util.RunJar.**main(RunJar.java:156)
>>>>>>>>>> Caused by:
>>>>>>>>>> <line 20, column 6> pig script failed to validate:
>>>>>>>>>> org.apache.pig.impl.**logicalLayer.**FrontendException: ERROR
>>>>>>>>>> 2225:
>>>>>>>>>>
>>>>>>>>> Projection
>>>>>>>>>
>>>>>>>>>> with nothing to reference!
>>>>>>>>>> at
>>>>>>>>>>
>>>>>>>>>>  org.apache.pig.parser.**LogicalPlanBuilder.**buildJoinOp(**
>>>>>> LogicalPlanBuilder.java:363)
>>>>>>
>>>>>>> at
>>>>>>>>>>
>>>>>>>>>>  org.apache.pig.parser.**LogicalPlanGenerator.join_**
>>>>>> clause(LogicalPlanGenerator.**java:11354)
>>>>>>
>>>>>>> at
>>>>>>>>>>
>>>>>>>>>>  org.apache.pig.parser.**LogicalPlanGenerator.op_**
>>>>>> clause(LogicalPlanGenerator.**java:1489)
>>>>>>
>>>>>>> at
>>>>>>>>>>
>>>>>>>>>>  org.apache.pig.parser.**LogicalPlanGenerator.general_**
>>>>>> statement(**LogicalPlanGenerator.java:789)
>>>>>>
>>>>>>> at
>>>>>>>>>>
>>>>>>>>>>  org.apache.pig.parser.**LogicalPlanGenerator.**statement(**
>>>>>> LogicalPlanGenerator.java:507)
>>>>>>
>>>>>>> at
>>>>>>>>>>
>>>>>>>>>>  org.apache.pig.parser.**LogicalPlanGenerator.query(**
>>>>>> LogicalPlanGenerator.java:382)
>>>>>>
>>>>>>> at
>>>>>>>>>>
>>>>>>>>>>  org.apache.pig.parser.**QueryParserDriver.parse(**
>>>>>> QueryParserDriver.java:175)
>>>>>>
>>>>>>> ... 15 more
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> What am I to do?
>>>>>>>>>> --
>>>>>>>>>> Russell Jurney
>>>>>>>>>>
>>>>>>>>> twitter.com/rjurneyrussell.**jurney@gmail.comdatasyndrome<ht...@gmail.comdatasyndrome>
>>>>>>> .
>>>>>>>
>>>>>>>> com
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Russell Jurney twitter.com/rjurney russell.jurney@gmail.com
>>>>>>>>> datasyndrome.com
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Robert Yerex
>>>>>>>> Data Scientist
>>>>>>>> Civitas Learning
>>>>>>>> www.civitaslearning.com
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Russell Jurney twitter.com/rjurney russell.jurney@gmail.com
>>>>>>> datasyndrome.com
>>>>>>>
>>>>>>>
>>>>>>
>>>>>> --
>>>>>> *Note that I'm no longer using my Yahoo! email address. Please email
>>>>>> me at
>>>>>> billgraham@gmail.com going forward.*
>>>>>>
>>>>>>
>>>>>
>>>>> --
>>>>> Russell Jurney twitter.com/rjurney russell.jurney@gmail.comdatasyndrome.
>>>>> com
>>>>>
>>>>>
>>>>
>>>> --
>>>> *Note that I'm no longer using my Yahoo! email address. Please email me
>>>> at
>>>> billgraham@gmail.com going forward.*
>>>>
>>>
>


-- 
Russell Jurney twitter.com/rjurney russell.jurney@gmail.com datasyndrome.com

Re: Can't JOIN self?

Posted by Sean Timm <Ti...@aol.com>.
It seem the self join should work in Pig 0.10 if using an alias, but 
alas it doesn't.  See Jira PIG-2630. 
https://issues.apache.org/jira/browse/PIG-2630

-Sean

On 7/20/2012 12:01 PM, Alan Gates wrote:
> It isn't a bug that you need to declare the join twice in your script.  That is necessary for clarity and semantic correctness.  That is, if we allowed:
>
> A = load 'bla';
> B = join A by user, A by user;
>
> then you'd have two user fields in the B with no way to disambiguate.  What's a bug (or missed optimization opportunity) is that we actually double read and shuffle the data.  We could optimize here and only read shuffle one copy and then do the join in the reduce.
>
> Alan.
>
> On Jul 20, 2012, at 12:53 AM, Dmitriy Ryaboy wrote:
>
>> It's kind if a waste of io and mappers. If not a bug, it's an optimization opportunity.
>>
>> On Jul 19, 2012, at 10:34 PM, Bill Graham <bi...@gmail.com> wrote:
>>
>>> No, it isn't a bug as I see it. You need to load the two relations
>>> separately because a join is across two separate data sources.
>>>
>>>
>>> On Thu, Jul 19, 2012 at 10:10 PM, Russell Jurney
>>> <ru...@gmail.com>wrote:
>>>
>>>> So it is a bug? Because Pig will not let me self JOIN. I have to LOAD the
>>>> data twice.
>>>>
>>>> On Thu, Jul 19, 2012 at 9:49 PM, Bill Graham <bi...@gmail.com> wrote:
>>>>
>>>>> No, to Pig a self join is just like a regular join across two different
>>>>> relations. It just happens to be to the same input data.
>>>>>
>>>>> On Thu, Jul 19, 2012 at 8:39 PM, Russell Jurney <russell.jurney@gmail.com
>>>>>> wrote:
>>>>>> Is this a bug?
>>>>>>
>>>>>> On Thu, Jul 19, 2012 at 8:00 PM, Robert Yerex <
>>>>>> robert.yerex@civitaslearning.com> wrote:
>>>>>>
>>>>>>> The only way to get it to work is to load a second copy.
>>>>>>>
>>>>>>> On Thu, Jul 19, 2012 at 7:46 PM, Russell Jurney <
>>>>>> russell.jurney@gmail.com
>>>>>>>> wrote:
>>>>>>>> Note: this works if I LOAD a new, 2nd relation and do the join.
>>>>>>>>
>>>>>>>> On Thu, Jul 19, 2012 at 7:34 PM, Russell Jurney <
>>>>>>> russell.jurney@gmail.com
>>>>>>>>> wrote:
>>>>>>>>> I have a problem where I can't join a relation to itself on a
>>>>>> different
>>>>>>>>> field.
>>>>>>>>>
>>>>>>>>> describe pairs
>>>>>>>>> pairs: {from: chararray,to: chararray,message_id:
>>>>>>> chararray,in_reply_to:
>>>>>>>>> chararray}
>>>>>>>>>
>>>>>>>>> pairs2 = pairs;
>>>>>>>>>
>>>>>>>>> with_reply = join pairs by in_reply_to, pairs2 by message_id;
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> I get this error:
>>>>>>>>>
>>>>>>>>> 2012-07-19 19:31:16,927 [main] ERROR
>>>>>> org.apache.pig.tools.grunt.Grunt -
>>>>>>>>> ERROR 1200: Pig script failed to parse:
>>>>>>>>> <line 20, column 6> pig script failed to validate:
>>>>>>>>> org.apache.pig.impl.logicalLayer.FrontendException: ERROR 2225:
>>>>>>>> Projection
>>>>>>>>> with nothing to reference!
>>>>>>>>> 2012-07-19 19:31:16,928 [main] ERROR
>>>>>> org.apache.pig.tools.grunt.Grunt -
>>>>>>>>> Failed to parse: Pig script failed to parse:
>>>>>>>>> <line 20, column 6> pig script failed to validate:
>>>>>>>>> org.apache.pig.impl.logicalLayer.FrontendException: ERROR 2225:
>>>>>>>> Projection
>>>>>>>>> with nothing to reference!
>>>>>>>>> at
>>>>>>>>>
>>>>> org.apache.pig.parser.QueryParserDriver.parse(QueryParserDriver.java:182)
>>>>>>>>> at
>>>>> org.apache.pig.PigServer$Graph.validateQuery(PigServer.java:1565)
>>>>>>>>> at
>>>>> org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1538)
>>>>>>>>> at org.apache.pig.PigServer.registerQuery(PigServer.java:540)
>>>>>>>>> at
>>>>> org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:970)
>>>>>>>>> at
>>>>>>>>>
>>>>> org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:386)
>>>>>>>>> at
>>>>>>>>>
>>>>> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:189)
>>>>>>>>> at
>>>>>>>>>
>>>>> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:165)
>>>>>>>>> at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:69)
>>>>>>>>> at org.apache.pig.Main.run(Main.java:490)
>>>>>>>>> at org.apache.pig.Main.main(Main.java:111)
>>>>>>>>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>>>>>>>> at
>>>>>>>>>
>>>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>>>>>>>>> at
>>>>>>>>>
>>>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>>>>>>>> at java.lang.reflect.Method.invoke(Method.java:597)
>>>>>>>>> at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
>>>>>>>>> Caused by:
>>>>>>>>> <line 20, column 6> pig script failed to validate:
>>>>>>>>> org.apache.pig.impl.logicalLayer.FrontendException: ERROR 2225:
>>>>>>>> Projection
>>>>>>>>> with nothing to reference!
>>>>>>>>> at
>>>>>>>>>
>>>>> org.apache.pig.parser.LogicalPlanBuilder.buildJoinOp(LogicalPlanBuilder.java:363)
>>>>>>>>> at
>>>>>>>>>
>>>>> org.apache.pig.parser.LogicalPlanGenerator.join_clause(LogicalPlanGenerator.java:11354)
>>>>>>>>> at
>>>>>>>>>
>>>>> org.apache.pig.parser.LogicalPlanGenerator.op_clause(LogicalPlanGenerator.java:1489)
>>>>>>>>> at
>>>>>>>>>
>>>>> org.apache.pig.parser.LogicalPlanGenerator.general_statement(LogicalPlanGenerator.java:789)
>>>>>>>>> at
>>>>>>>>>
>>>>> org.apache.pig.parser.LogicalPlanGenerator.statement(LogicalPlanGenerator.java:507)
>>>>>>>>> at
>>>>>>>>>
>>>>> org.apache.pig.parser.LogicalPlanGenerator.query(LogicalPlanGenerator.java:382)
>>>>>>>>> at
>>>>>>>>>
>>>>> org.apache.pig.parser.QueryParserDriver.parse(QueryParserDriver.java:175)
>>>>>>>>> ... 15 more
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> What am I to do?
>>>>>>>>> --
>>>>>>>>> Russell Jurney
>>>>>> twitter.com/rjurneyrussell.jurney@gmail.comdatasyndrome.
>>>>>>>>> com
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Russell Jurney twitter.com/rjurney russell.jurney@gmail.com
>>>>>>>> datasyndrome.com
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Robert Yerex
>>>>>>> Data Scientist
>>>>>>> Civitas Learning
>>>>>>> www.civitaslearning.com
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Russell Jurney twitter.com/rjurney russell.jurney@gmail.com
>>>>>> datasyndrome.com
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> *Note that I'm no longer using my Yahoo! email address. Please email me at
>>>>> billgraham@gmail.com going forward.*
>>>>>
>>>>
>>>>
>>>> --
>>>> Russell Jurney twitter.com/rjurney russell.jurney@gmail.com datasyndrome.
>>>> com
>>>>
>>>
>>>
>>> -- 
>>> *Note that I'm no longer using my Yahoo! email address. Please email me at
>>> billgraham@gmail.com going forward.*


Re: Can't JOIN self?

Posted by Alan Gates <ga...@hortonworks.com>.
It isn't a bug that you need to declare the join twice in your script.  That is necessary for clarity and semantic correctness.  That is, if we allowed:

A = load 'bla';
B = join A by user, A by user;

then you'd have two user fields in the B with no way to disambiguate.  What's a bug (or missed optimization opportunity) is that we actually double read and shuffle the data.  We could optimize here and only read shuffle one copy and then do the join in the reduce.

Alan.

On Jul 20, 2012, at 12:53 AM, Dmitriy Ryaboy wrote:

> It's kind if a waste of io and mappers. If not a bug, it's an optimization opportunity. 
> 
> On Jul 19, 2012, at 10:34 PM, Bill Graham <bi...@gmail.com> wrote:
> 
>> No, it isn't a bug as I see it. You need to load the two relations
>> separately because a join is across two separate data sources.
>> 
>> 
>> On Thu, Jul 19, 2012 at 10:10 PM, Russell Jurney
>> <ru...@gmail.com>wrote:
>> 
>>> So it is a bug? Because Pig will not let me self JOIN. I have to LOAD the
>>> data twice.
>>> 
>>> On Thu, Jul 19, 2012 at 9:49 PM, Bill Graham <bi...@gmail.com> wrote:
>>> 
>>>> No, to Pig a self join is just like a regular join across two different
>>>> relations. It just happens to be to the same input data.
>>>> 
>>>> On Thu, Jul 19, 2012 at 8:39 PM, Russell Jurney <russell.jurney@gmail.com
>>>>> wrote:
>>>> 
>>>>> Is this a bug?
>>>>> 
>>>>> On Thu, Jul 19, 2012 at 8:00 PM, Robert Yerex <
>>>>> robert.yerex@civitaslearning.com> wrote:
>>>>> 
>>>>>> The only way to get it to work is to load a second copy.
>>>>>> 
>>>>>> On Thu, Jul 19, 2012 at 7:46 PM, Russell Jurney <
>>>>> russell.jurney@gmail.com
>>>>>>> wrote:
>>>>>> 
>>>>>>> Note: this works if I LOAD a new, 2nd relation and do the join.
>>>>>>> 
>>>>>>> On Thu, Jul 19, 2012 at 7:34 PM, Russell Jurney <
>>>>>> russell.jurney@gmail.com
>>>>>>>> wrote:
>>>>>>> 
>>>>>>>> I have a problem where I can't join a relation to itself on a
>>>>> different
>>>>>>>> field.
>>>>>>>> 
>>>>>>>> describe pairs
>>>>>>>> pairs: {from: chararray,to: chararray,message_id:
>>>>>> chararray,in_reply_to:
>>>>>>>> chararray}
>>>>>>>> 
>>>>>>>> pairs2 = pairs;
>>>>>>>> 
>>>>>>>> with_reply = join pairs by in_reply_to, pairs2 by message_id;
>>>>>>>> 
>>>>>>>> 
>>>>>>>> I get this error:
>>>>>>>> 
>>>>>>>> 2012-07-19 19:31:16,927 [main] ERROR
>>>>> org.apache.pig.tools.grunt.Grunt -
>>>>>>>> ERROR 1200: Pig script failed to parse:
>>>>>>>> <line 20, column 6> pig script failed to validate:
>>>>>>>> org.apache.pig.impl.logicalLayer.FrontendException: ERROR 2225:
>>>>>>> Projection
>>>>>>>> with nothing to reference!
>>>>>>>> 2012-07-19 19:31:16,928 [main] ERROR
>>>>> org.apache.pig.tools.grunt.Grunt -
>>>>>>>> Failed to parse: Pig script failed to parse:
>>>>>>>> <line 20, column 6> pig script failed to validate:
>>>>>>>> org.apache.pig.impl.logicalLayer.FrontendException: ERROR 2225:
>>>>>>> Projection
>>>>>>>> with nothing to reference!
>>>>>>>> at
>>>>>>>> 
>>>>>> 
>>>> org.apache.pig.parser.QueryParserDriver.parse(QueryParserDriver.java:182)
>>>>>>>> at
>>>> org.apache.pig.PigServer$Graph.validateQuery(PigServer.java:1565)
>>>>>>>> at
>>>> org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1538)
>>>>>>>> at org.apache.pig.PigServer.registerQuery(PigServer.java:540)
>>>>>>>> at
>>>>>>> 
>>>> org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:970)
>>>>>>>> at
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>>> org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:386)
>>>>>>>> at
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>>> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:189)
>>>>>>>> at
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>>> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:165)
>>>>>>>> at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:69)
>>>>>>>> at org.apache.pig.Main.run(Main.java:490)
>>>>>>>> at org.apache.pig.Main.main(Main.java:111)
>>>>>>>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>>>>>>> at
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>>>>>>>> at
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>>>>>>> at java.lang.reflect.Method.invoke(Method.java:597)
>>>>>>>> at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
>>>>>>>> Caused by:
>>>>>>>> <line 20, column 6> pig script failed to validate:
>>>>>>>> org.apache.pig.impl.logicalLayer.FrontendException: ERROR 2225:
>>>>>>> Projection
>>>>>>>> with nothing to reference!
>>>>>>>> at
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>>> org.apache.pig.parser.LogicalPlanBuilder.buildJoinOp(LogicalPlanBuilder.java:363)
>>>>>>>> at
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>>> org.apache.pig.parser.LogicalPlanGenerator.join_clause(LogicalPlanGenerator.java:11354)
>>>>>>>> at
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>>> org.apache.pig.parser.LogicalPlanGenerator.op_clause(LogicalPlanGenerator.java:1489)
>>>>>>>> at
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>>> org.apache.pig.parser.LogicalPlanGenerator.general_statement(LogicalPlanGenerator.java:789)
>>>>>>>> at
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>>> org.apache.pig.parser.LogicalPlanGenerator.statement(LogicalPlanGenerator.java:507)
>>>>>>>> at
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>>> org.apache.pig.parser.LogicalPlanGenerator.query(LogicalPlanGenerator.java:382)
>>>>>>>> at
>>>>>>>> 
>>>>>> 
>>>> org.apache.pig.parser.QueryParserDriver.parse(QueryParserDriver.java:175)
>>>>>>>> ... 15 more
>>>>>>>> 
>>>>>>>> 
>>>>>>>> What am I to do?
>>>>>>>> --
>>>>>>>> Russell Jurney
>>>>> twitter.com/rjurneyrussell.jurney@gmail.comdatasyndrome.
>>>>>>>> com
>>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> --
>>>>>>> Russell Jurney twitter.com/rjurney russell.jurney@gmail.com
>>>>>>> datasyndrome.com
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> --
>>>>>> Robert Yerex
>>>>>> Data Scientist
>>>>>> Civitas Learning
>>>>>> www.civitaslearning.com
>>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> --
>>>>> Russell Jurney twitter.com/rjurney russell.jurney@gmail.com
>>>>> datasyndrome.com
>>>>> 
>>>> 
>>>> 
>>>> 
>>>> --
>>>> *Note that I'm no longer using my Yahoo! email address. Please email me at
>>>> billgraham@gmail.com going forward.*
>>>> 
>>> 
>>> 
>>> 
>>> --
>>> Russell Jurney twitter.com/rjurney russell.jurney@gmail.com datasyndrome.
>>> com
>>> 
>> 
>> 
>> 
>> -- 
>> *Note that I'm no longer using my Yahoo! email address. Please email me at
>> billgraham@gmail.com going forward.*


Re: Can't JOIN self?

Posted by Dmitriy Ryaboy <dv...@gmail.com>.
It's kind if a waste of io and mappers. If not a bug, it's an optimization opportunity. 

On Jul 19, 2012, at 10:34 PM, Bill Graham <bi...@gmail.com> wrote:

> No, it isn't a bug as I see it. You need to load the two relations
> separately because a join is across two separate data sources.
> 
> 
> On Thu, Jul 19, 2012 at 10:10 PM, Russell Jurney
> <ru...@gmail.com>wrote:
> 
>> So it is a bug? Because Pig will not let me self JOIN. I have to LOAD the
>> data twice.
>> 
>> On Thu, Jul 19, 2012 at 9:49 PM, Bill Graham <bi...@gmail.com> wrote:
>> 
>>> No, to Pig a self join is just like a regular join across two different
>>> relations. It just happens to be to the same input data.
>>> 
>>> On Thu, Jul 19, 2012 at 8:39 PM, Russell Jurney <russell.jurney@gmail.com
>>>> wrote:
>>> 
>>>> Is this a bug?
>>>> 
>>>> On Thu, Jul 19, 2012 at 8:00 PM, Robert Yerex <
>>>> robert.yerex@civitaslearning.com> wrote:
>>>> 
>>>>> The only way to get it to work is to load a second copy.
>>>>> 
>>>>> On Thu, Jul 19, 2012 at 7:46 PM, Russell Jurney <
>>>> russell.jurney@gmail.com
>>>>>> wrote:
>>>>> 
>>>>>> Note: this works if I LOAD a new, 2nd relation and do the join.
>>>>>> 
>>>>>> On Thu, Jul 19, 2012 at 7:34 PM, Russell Jurney <
>>>>> russell.jurney@gmail.com
>>>>>>> wrote:
>>>>>> 
>>>>>>> I have a problem where I can't join a relation to itself on a
>>>> different
>>>>>>> field.
>>>>>>> 
>>>>>>> describe pairs
>>>>>>> pairs: {from: chararray,to: chararray,message_id:
>>>>> chararray,in_reply_to:
>>>>>>> chararray}
>>>>>>> 
>>>>>>> pairs2 = pairs;
>>>>>>> 
>>>>>>> with_reply = join pairs by in_reply_to, pairs2 by message_id;
>>>>>>> 
>>>>>>> 
>>>>>>> I get this error:
>>>>>>> 
>>>>>>> 2012-07-19 19:31:16,927 [main] ERROR
>>>> org.apache.pig.tools.grunt.Grunt -
>>>>>>> ERROR 1200: Pig script failed to parse:
>>>>>>> <line 20, column 6> pig script failed to validate:
>>>>>>> org.apache.pig.impl.logicalLayer.FrontendException: ERROR 2225:
>>>>>> Projection
>>>>>>> with nothing to reference!
>>>>>>> 2012-07-19 19:31:16,928 [main] ERROR
>>>> org.apache.pig.tools.grunt.Grunt -
>>>>>>> Failed to parse: Pig script failed to parse:
>>>>>>> <line 20, column 6> pig script failed to validate:
>>>>>>> org.apache.pig.impl.logicalLayer.FrontendException: ERROR 2225:
>>>>>> Projection
>>>>>>> with nothing to reference!
>>>>>>> at
>>>>>>> 
>>>>> 
>>> org.apache.pig.parser.QueryParserDriver.parse(QueryParserDriver.java:182)
>>>>>>> at
>>> org.apache.pig.PigServer$Graph.validateQuery(PigServer.java:1565)
>>>>>>> at
>>> org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1538)
>>>>>>> at org.apache.pig.PigServer.registerQuery(PigServer.java:540)
>>>>>>> at
>>>>>> 
>>> org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:970)
>>>>>>> at
>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>> org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:386)
>>>>>>> at
>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:189)
>>>>>>> at
>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:165)
>>>>>>> at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:69)
>>>>>>> at org.apache.pig.Main.run(Main.java:490)
>>>>>>> at org.apache.pig.Main.main(Main.java:111)
>>>>>>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>>>>>> at
>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>>>>>>> at
>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>>>>>> at java.lang.reflect.Method.invoke(Method.java:597)
>>>>>>> at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
>>>>>>> Caused by:
>>>>>>> <line 20, column 6> pig script failed to validate:
>>>>>>> org.apache.pig.impl.logicalLayer.FrontendException: ERROR 2225:
>>>>>> Projection
>>>>>>> with nothing to reference!
>>>>>>> at
>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>> org.apache.pig.parser.LogicalPlanBuilder.buildJoinOp(LogicalPlanBuilder.java:363)
>>>>>>> at
>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>> org.apache.pig.parser.LogicalPlanGenerator.join_clause(LogicalPlanGenerator.java:11354)
>>>>>>> at
>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>> org.apache.pig.parser.LogicalPlanGenerator.op_clause(LogicalPlanGenerator.java:1489)
>>>>>>> at
>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>> org.apache.pig.parser.LogicalPlanGenerator.general_statement(LogicalPlanGenerator.java:789)
>>>>>>> at
>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>> org.apache.pig.parser.LogicalPlanGenerator.statement(LogicalPlanGenerator.java:507)
>>>>>>> at
>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>> org.apache.pig.parser.LogicalPlanGenerator.query(LogicalPlanGenerator.java:382)
>>>>>>> at
>>>>>>> 
>>>>> 
>>> org.apache.pig.parser.QueryParserDriver.parse(QueryParserDriver.java:175)
>>>>>>> ... 15 more
>>>>>>> 
>>>>>>> 
>>>>>>> What am I to do?
>>>>>>> --
>>>>>>> Russell Jurney
>>>> twitter.com/rjurneyrussell.jurney@gmail.comdatasyndrome.
>>>>>>> com
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> --
>>>>>> Russell Jurney twitter.com/rjurney russell.jurney@gmail.com
>>>>>> datasyndrome.com
>>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> --
>>>>> Robert Yerex
>>>>> Data Scientist
>>>>> Civitas Learning
>>>>> www.civitaslearning.com
>>>>> 
>>>> 
>>>> 
>>>> 
>>>> --
>>>> Russell Jurney twitter.com/rjurney russell.jurney@gmail.com
>>>> datasyndrome.com
>>>> 
>>> 
>>> 
>>> 
>>> --
>>> *Note that I'm no longer using my Yahoo! email address. Please email me at
>>> billgraham@gmail.com going forward.*
>>> 
>> 
>> 
>> 
>> --
>> Russell Jurney twitter.com/rjurney russell.jurney@gmail.com datasyndrome.
>> com
>> 
> 
> 
> 
> -- 
> *Note that I'm no longer using my Yahoo! email address. Please email me at
> billgraham@gmail.com going forward.*

Re: Can't JOIN self?

Posted by Bill Graham <bi...@gmail.com>.
No, it isn't a bug as I see it. You need to load the two relations
separately because a join is across two separate data sources.


On Thu, Jul 19, 2012 at 10:10 PM, Russell Jurney
<ru...@gmail.com>wrote:

> So it is a bug? Because Pig will not let me self JOIN. I have to LOAD the
> data twice.
>
> On Thu, Jul 19, 2012 at 9:49 PM, Bill Graham <bi...@gmail.com> wrote:
>
>> No, to Pig a self join is just like a regular join across two different
>> relations. It just happens to be to the same input data.
>>
>> On Thu, Jul 19, 2012 at 8:39 PM, Russell Jurney <russell.jurney@gmail.com
>> >wrote:
>>
>> > Is this a bug?
>> >
>> > On Thu, Jul 19, 2012 at 8:00 PM, Robert Yerex <
>> > robert.yerex@civitaslearning.com> wrote:
>> >
>> > > The only way to get it to work is to load a second copy.
>> > >
>> > > On Thu, Jul 19, 2012 at 7:46 PM, Russell Jurney <
>> > russell.jurney@gmail.com
>> > > >wrote:
>> > >
>> > > > Note: this works if I LOAD a new, 2nd relation and do the join.
>> > > >
>> > > > On Thu, Jul 19, 2012 at 7:34 PM, Russell Jurney <
>> > > russell.jurney@gmail.com
>> > > > >wrote:
>> > > >
>> > > > > I have a problem where I can't join a relation to itself on a
>> > different
>> > > > > field.
>> > > > >
>> > > > > describe pairs
>> > > > > pairs: {from: chararray,to: chararray,message_id:
>> > > chararray,in_reply_to:
>> > > > > chararray}
>> > > > >
>> > > > > pairs2 = pairs;
>> > > > >
>> > > > > with_reply = join pairs by in_reply_to, pairs2 by message_id;
>> > > > >
>> > > > >
>> > > > > I get this error:
>> > > > >
>> > > > > 2012-07-19 19:31:16,927 [main] ERROR
>> > org.apache.pig.tools.grunt.Grunt -
>> > > > > ERROR 1200: Pig script failed to parse:
>> > > > > <line 20, column 6> pig script failed to validate:
>> > > > > org.apache.pig.impl.logicalLayer.FrontendException: ERROR 2225:
>> > > > Projection
>> > > > > with nothing to reference!
>> > > > > 2012-07-19 19:31:16,928 [main] ERROR
>> > org.apache.pig.tools.grunt.Grunt -
>> > > > > Failed to parse: Pig script failed to parse:
>> > > > > <line 20, column 6> pig script failed to validate:
>> > > > > org.apache.pig.impl.logicalLayer.FrontendException: ERROR 2225:
>> > > > Projection
>> > > > > with nothing to reference!
>> > > > > at
>> > > > >
>> > >
>> org.apache.pig.parser.QueryParserDriver.parse(QueryParserDriver.java:182)
>> > > > >  at
>> org.apache.pig.PigServer$Graph.validateQuery(PigServer.java:1565)
>> > > > > at
>> org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1538)
>> > > > >  at org.apache.pig.PigServer.registerQuery(PigServer.java:540)
>> > > > > at
>> > > >
>> org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:970)
>> > > > >  at
>> > > > >
>> > > >
>> > >
>> >
>> org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:386)
>> > > > > at
>> > > > >
>> > > >
>> > >
>> >
>> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:189)
>> > > > >  at
>> > > > >
>> > > >
>> > >
>> >
>> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:165)
>> > > > > at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:69)
>> > > > >  at org.apache.pig.Main.run(Main.java:490)
>> > > > > at org.apache.pig.Main.main(Main.java:111)
>> > > > >  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>> > > > > at
>> > > > >
>> > > >
>> > >
>> >
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>> > > > >  at
>> > > > >
>> > > >
>> > >
>> >
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>> > > > > at java.lang.reflect.Method.invoke(Method.java:597)
>> > > > >  at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
>> > > > > Caused by:
>> > > > > <line 20, column 6> pig script failed to validate:
>> > > > > org.apache.pig.impl.logicalLayer.FrontendException: ERROR 2225:
>> > > > Projection
>> > > > > with nothing to reference!
>> > > > >  at
>> > > > >
>> > > >
>> > >
>> >
>> org.apache.pig.parser.LogicalPlanBuilder.buildJoinOp(LogicalPlanBuilder.java:363)
>> > > > > at
>> > > > >
>> > > >
>> > >
>> >
>> org.apache.pig.parser.LogicalPlanGenerator.join_clause(LogicalPlanGenerator.java:11354)
>> > > > >  at
>> > > > >
>> > > >
>> > >
>> >
>> org.apache.pig.parser.LogicalPlanGenerator.op_clause(LogicalPlanGenerator.java:1489)
>> > > > > at
>> > > > >
>> > > >
>> > >
>> >
>> org.apache.pig.parser.LogicalPlanGenerator.general_statement(LogicalPlanGenerator.java:789)
>> > > > >  at
>> > > > >
>> > > >
>> > >
>> >
>> org.apache.pig.parser.LogicalPlanGenerator.statement(LogicalPlanGenerator.java:507)
>> > > > > at
>> > > > >
>> > > >
>> > >
>> >
>> org.apache.pig.parser.LogicalPlanGenerator.query(LogicalPlanGenerator.java:382)
>> > > > >  at
>> > > > >
>> > >
>> org.apache.pig.parser.QueryParserDriver.parse(QueryParserDriver.java:175)
>> > > > > ... 15 more
>> > > > >
>> > > > >
>> > > > > What am I to do?
>> > > > > --
>> > > > > Russell Jurney
>> > twitter.com/rjurneyrussell.jurney@gmail.comdatasyndrome.
>> > > > > com
>> > > > >
>> > > >
>> > > >
>> > > >
>> > > > --
>> > > > Russell Jurney twitter.com/rjurney russell.jurney@gmail.com
>> > > > datasyndrome.com
>> > > >
>> > >
>> > >
>> > >
>> > > --
>> > > Robert Yerex
>> > > Data Scientist
>> > > Civitas Learning
>> > > www.civitaslearning.com
>> > >
>> >
>> >
>> >
>> > --
>> > Russell Jurney twitter.com/rjurney russell.jurney@gmail.com
>> > datasyndrome.com
>> >
>>
>>
>>
>> --
>> *Note that I'm no longer using my Yahoo! email address. Please email me at
>> billgraham@gmail.com going forward.*
>>
>
>
>
> --
> Russell Jurney twitter.com/rjurney russell.jurney@gmail.com datasyndrome.
> com
>



-- 
*Note that I'm no longer using my Yahoo! email address. Please email me at
billgraham@gmail.com going forward.*

Re: Can't JOIN self?

Posted by Russell Jurney <ru...@gmail.com>.
So it is a bug? Because Pig will not let me self JOIN. I have to LOAD the
data twice.

On Thu, Jul 19, 2012 at 9:49 PM, Bill Graham <bi...@gmail.com> wrote:

> No, to Pig a self join is just like a regular join across two different
> relations. It just happens to be to the same input data.
>
> On Thu, Jul 19, 2012 at 8:39 PM, Russell Jurney <russell.jurney@gmail.com
> >wrote:
>
> > Is this a bug?
> >
> > On Thu, Jul 19, 2012 at 8:00 PM, Robert Yerex <
> > robert.yerex@civitaslearning.com> wrote:
> >
> > > The only way to get it to work is to load a second copy.
> > >
> > > On Thu, Jul 19, 2012 at 7:46 PM, Russell Jurney <
> > russell.jurney@gmail.com
> > > >wrote:
> > >
> > > > Note: this works if I LOAD a new, 2nd relation and do the join.
> > > >
> > > > On Thu, Jul 19, 2012 at 7:34 PM, Russell Jurney <
> > > russell.jurney@gmail.com
> > > > >wrote:
> > > >
> > > > > I have a problem where I can't join a relation to itself on a
> > different
> > > > > field.
> > > > >
> > > > > describe pairs
> > > > > pairs: {from: chararray,to: chararray,message_id:
> > > chararray,in_reply_to:
> > > > > chararray}
> > > > >
> > > > > pairs2 = pairs;
> > > > >
> > > > > with_reply = join pairs by in_reply_to, pairs2 by message_id;
> > > > >
> > > > >
> > > > > I get this error:
> > > > >
> > > > > 2012-07-19 19:31:16,927 [main] ERROR
> > org.apache.pig.tools.grunt.Grunt -
> > > > > ERROR 1200: Pig script failed to parse:
> > > > > <line 20, column 6> pig script failed to validate:
> > > > > org.apache.pig.impl.logicalLayer.FrontendException: ERROR 2225:
> > > > Projection
> > > > > with nothing to reference!
> > > > > 2012-07-19 19:31:16,928 [main] ERROR
> > org.apache.pig.tools.grunt.Grunt -
> > > > > Failed to parse: Pig script failed to parse:
> > > > > <line 20, column 6> pig script failed to validate:
> > > > > org.apache.pig.impl.logicalLayer.FrontendException: ERROR 2225:
> > > > Projection
> > > > > with nothing to reference!
> > > > > at
> > > > >
> > >
> org.apache.pig.parser.QueryParserDriver.parse(QueryParserDriver.java:182)
> > > > >  at
> org.apache.pig.PigServer$Graph.validateQuery(PigServer.java:1565)
> > > > > at
> org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1538)
> > > > >  at org.apache.pig.PigServer.registerQuery(PigServer.java:540)
> > > > > at
> > > >
> org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:970)
> > > > >  at
> > > > >
> > > >
> > >
> >
> org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:386)
> > > > > at
> > > > >
> > > >
> > >
> >
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:189)
> > > > >  at
> > > > >
> > > >
> > >
> >
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:165)
> > > > > at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:69)
> > > > >  at org.apache.pig.Main.run(Main.java:490)
> > > > > at org.apache.pig.Main.main(Main.java:111)
> > > > >  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> > > > > at
> > > > >
> > > >
> > >
> >
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> > > > >  at
> > > > >
> > > >
> > >
> >
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> > > > > at java.lang.reflect.Method.invoke(Method.java:597)
> > > > >  at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
> > > > > Caused by:
> > > > > <line 20, column 6> pig script failed to validate:
> > > > > org.apache.pig.impl.logicalLayer.FrontendException: ERROR 2225:
> > > > Projection
> > > > > with nothing to reference!
> > > > >  at
> > > > >
> > > >
> > >
> >
> org.apache.pig.parser.LogicalPlanBuilder.buildJoinOp(LogicalPlanBuilder.java:363)
> > > > > at
> > > > >
> > > >
> > >
> >
> org.apache.pig.parser.LogicalPlanGenerator.join_clause(LogicalPlanGenerator.java:11354)
> > > > >  at
> > > > >
> > > >
> > >
> >
> org.apache.pig.parser.LogicalPlanGenerator.op_clause(LogicalPlanGenerator.java:1489)
> > > > > at
> > > > >
> > > >
> > >
> >
> org.apache.pig.parser.LogicalPlanGenerator.general_statement(LogicalPlanGenerator.java:789)
> > > > >  at
> > > > >
> > > >
> > >
> >
> org.apache.pig.parser.LogicalPlanGenerator.statement(LogicalPlanGenerator.java:507)
> > > > > at
> > > > >
> > > >
> > >
> >
> org.apache.pig.parser.LogicalPlanGenerator.query(LogicalPlanGenerator.java:382)
> > > > >  at
> > > > >
> > >
> org.apache.pig.parser.QueryParserDriver.parse(QueryParserDriver.java:175)
> > > > > ... 15 more
> > > > >
> > > > >
> > > > > What am I to do?
> > > > > --
> > > > > Russell Jurney
> > twitter.com/rjurneyrussell.jurney@gmail.comdatasyndrome.
> > > > > com
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > Russell Jurney twitter.com/rjurney russell.jurney@gmail.com
> > > > datasyndrome.com
> > > >
> > >
> > >
> > >
> > > --
> > > Robert Yerex
> > > Data Scientist
> > > Civitas Learning
> > > www.civitaslearning.com
> > >
> >
> >
> >
> > --
> > Russell Jurney twitter.com/rjurney russell.jurney@gmail.com
> > datasyndrome.com
> >
>
>
>
> --
> *Note that I'm no longer using my Yahoo! email address. Please email me at
> billgraham@gmail.com going forward.*
>



-- 
Russell Jurney twitter.com/rjurney russell.jurney@gmail.com datasyndrome.com

Re: Can't JOIN self?

Posted by Bill Graham <bi...@gmail.com>.
No, to Pig a self join is just like a regular join across two different
relations. It just happens to be to the same input data.

On Thu, Jul 19, 2012 at 8:39 PM, Russell Jurney <ru...@gmail.com>wrote:

> Is this a bug?
>
> On Thu, Jul 19, 2012 at 8:00 PM, Robert Yerex <
> robert.yerex@civitaslearning.com> wrote:
>
> > The only way to get it to work is to load a second copy.
> >
> > On Thu, Jul 19, 2012 at 7:46 PM, Russell Jurney <
> russell.jurney@gmail.com
> > >wrote:
> >
> > > Note: this works if I LOAD a new, 2nd relation and do the join.
> > >
> > > On Thu, Jul 19, 2012 at 7:34 PM, Russell Jurney <
> > russell.jurney@gmail.com
> > > >wrote:
> > >
> > > > I have a problem where I can't join a relation to itself on a
> different
> > > > field.
> > > >
> > > > describe pairs
> > > > pairs: {from: chararray,to: chararray,message_id:
> > chararray,in_reply_to:
> > > > chararray}
> > > >
> > > > pairs2 = pairs;
> > > >
> > > > with_reply = join pairs by in_reply_to, pairs2 by message_id;
> > > >
> > > >
> > > > I get this error:
> > > >
> > > > 2012-07-19 19:31:16,927 [main] ERROR
> org.apache.pig.tools.grunt.Grunt -
> > > > ERROR 1200: Pig script failed to parse:
> > > > <line 20, column 6> pig script failed to validate:
> > > > org.apache.pig.impl.logicalLayer.FrontendException: ERROR 2225:
> > > Projection
> > > > with nothing to reference!
> > > > 2012-07-19 19:31:16,928 [main] ERROR
> org.apache.pig.tools.grunt.Grunt -
> > > > Failed to parse: Pig script failed to parse:
> > > > <line 20, column 6> pig script failed to validate:
> > > > org.apache.pig.impl.logicalLayer.FrontendException: ERROR 2225:
> > > Projection
> > > > with nothing to reference!
> > > > at
> > > >
> > org.apache.pig.parser.QueryParserDriver.parse(QueryParserDriver.java:182)
> > > >  at org.apache.pig.PigServer$Graph.validateQuery(PigServer.java:1565)
> > > > at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1538)
> > > >  at org.apache.pig.PigServer.registerQuery(PigServer.java:540)
> > > > at
> > > org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:970)
> > > >  at
> > > >
> > >
> >
> org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:386)
> > > > at
> > > >
> > >
> >
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:189)
> > > >  at
> > > >
> > >
> >
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:165)
> > > > at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:69)
> > > >  at org.apache.pig.Main.run(Main.java:490)
> > > > at org.apache.pig.Main.main(Main.java:111)
> > > >  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> > > > at
> > > >
> > >
> >
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> > > >  at
> > > >
> > >
> >
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> > > > at java.lang.reflect.Method.invoke(Method.java:597)
> > > >  at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
> > > > Caused by:
> > > > <line 20, column 6> pig script failed to validate:
> > > > org.apache.pig.impl.logicalLayer.FrontendException: ERROR 2225:
> > > Projection
> > > > with nothing to reference!
> > > >  at
> > > >
> > >
> >
> org.apache.pig.parser.LogicalPlanBuilder.buildJoinOp(LogicalPlanBuilder.java:363)
> > > > at
> > > >
> > >
> >
> org.apache.pig.parser.LogicalPlanGenerator.join_clause(LogicalPlanGenerator.java:11354)
> > > >  at
> > > >
> > >
> >
> org.apache.pig.parser.LogicalPlanGenerator.op_clause(LogicalPlanGenerator.java:1489)
> > > > at
> > > >
> > >
> >
> org.apache.pig.parser.LogicalPlanGenerator.general_statement(LogicalPlanGenerator.java:789)
> > > >  at
> > > >
> > >
> >
> org.apache.pig.parser.LogicalPlanGenerator.statement(LogicalPlanGenerator.java:507)
> > > > at
> > > >
> > >
> >
> org.apache.pig.parser.LogicalPlanGenerator.query(LogicalPlanGenerator.java:382)
> > > >  at
> > > >
> > org.apache.pig.parser.QueryParserDriver.parse(QueryParserDriver.java:175)
> > > > ... 15 more
> > > >
> > > >
> > > > What am I to do?
> > > > --
> > > > Russell Jurney
> twitter.com/rjurneyrussell.jurney@gmail.comdatasyndrome.
> > > > com
> > > >
> > >
> > >
> > >
> > > --
> > > Russell Jurney twitter.com/rjurney russell.jurney@gmail.com
> > > datasyndrome.com
> > >
> >
> >
> >
> > --
> > Robert Yerex
> > Data Scientist
> > Civitas Learning
> > www.civitaslearning.com
> >
>
>
>
> --
> Russell Jurney twitter.com/rjurney russell.jurney@gmail.com
> datasyndrome.com
>



-- 
*Note that I'm no longer using my Yahoo! email address. Please email me at
billgraham@gmail.com going forward.*

Re: Can't JOIN self?

Posted by Russell Jurney <ru...@gmail.com>.
Is this a bug?

On Thu, Jul 19, 2012 at 8:00 PM, Robert Yerex <
robert.yerex@civitaslearning.com> wrote:

> The only way to get it to work is to load a second copy.
>
> On Thu, Jul 19, 2012 at 7:46 PM, Russell Jurney <russell.jurney@gmail.com
> >wrote:
>
> > Note: this works if I LOAD a new, 2nd relation and do the join.
> >
> > On Thu, Jul 19, 2012 at 7:34 PM, Russell Jurney <
> russell.jurney@gmail.com
> > >wrote:
> >
> > > I have a problem where I can't join a relation to itself on a different
> > > field.
> > >
> > > describe pairs
> > > pairs: {from: chararray,to: chararray,message_id:
> chararray,in_reply_to:
> > > chararray}
> > >
> > > pairs2 = pairs;
> > >
> > > with_reply = join pairs by in_reply_to, pairs2 by message_id;
> > >
> > >
> > > I get this error:
> > >
> > > 2012-07-19 19:31:16,927 [main] ERROR org.apache.pig.tools.grunt.Grunt -
> > > ERROR 1200: Pig script failed to parse:
> > > <line 20, column 6> pig script failed to validate:
> > > org.apache.pig.impl.logicalLayer.FrontendException: ERROR 2225:
> > Projection
> > > with nothing to reference!
> > > 2012-07-19 19:31:16,928 [main] ERROR org.apache.pig.tools.grunt.Grunt -
> > > Failed to parse: Pig script failed to parse:
> > > <line 20, column 6> pig script failed to validate:
> > > org.apache.pig.impl.logicalLayer.FrontendException: ERROR 2225:
> > Projection
> > > with nothing to reference!
> > > at
> > >
> org.apache.pig.parser.QueryParserDriver.parse(QueryParserDriver.java:182)
> > >  at org.apache.pig.PigServer$Graph.validateQuery(PigServer.java:1565)
> > > at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1538)
> > >  at org.apache.pig.PigServer.registerQuery(PigServer.java:540)
> > > at
> > org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:970)
> > >  at
> > >
> >
> org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:386)
> > > at
> > >
> >
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:189)
> > >  at
> > >
> >
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:165)
> > > at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:69)
> > >  at org.apache.pig.Main.run(Main.java:490)
> > > at org.apache.pig.Main.main(Main.java:111)
> > >  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> > > at
> > >
> >
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> > >  at
> > >
> >
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> > > at java.lang.reflect.Method.invoke(Method.java:597)
> > >  at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
> > > Caused by:
> > > <line 20, column 6> pig script failed to validate:
> > > org.apache.pig.impl.logicalLayer.FrontendException: ERROR 2225:
> > Projection
> > > with nothing to reference!
> > >  at
> > >
> >
> org.apache.pig.parser.LogicalPlanBuilder.buildJoinOp(LogicalPlanBuilder.java:363)
> > > at
> > >
> >
> org.apache.pig.parser.LogicalPlanGenerator.join_clause(LogicalPlanGenerator.java:11354)
> > >  at
> > >
> >
> org.apache.pig.parser.LogicalPlanGenerator.op_clause(LogicalPlanGenerator.java:1489)
> > > at
> > >
> >
> org.apache.pig.parser.LogicalPlanGenerator.general_statement(LogicalPlanGenerator.java:789)
> > >  at
> > >
> >
> org.apache.pig.parser.LogicalPlanGenerator.statement(LogicalPlanGenerator.java:507)
> > > at
> > >
> >
> org.apache.pig.parser.LogicalPlanGenerator.query(LogicalPlanGenerator.java:382)
> > >  at
> > >
> org.apache.pig.parser.QueryParserDriver.parse(QueryParserDriver.java:175)
> > > ... 15 more
> > >
> > >
> > > What am I to do?
> > > --
> > > Russell Jurney twitter.com/rjurneyrussell.jurney@gmail.comdatasyndrome.
> > > com
> > >
> >
> >
> >
> > --
> > Russell Jurney twitter.com/rjurney russell.jurney@gmail.com
> > datasyndrome.com
> >
>
>
>
> --
> Robert Yerex
> Data Scientist
> Civitas Learning
> www.civitaslearning.com
>



-- 
Russell Jurney twitter.com/rjurney russell.jurney@gmail.com datasyndrome.com

Re: Can't JOIN self?

Posted by Robert Yerex <ro...@civitaslearning.com>.
The only way to get it to work is to load a second copy.

On Thu, Jul 19, 2012 at 7:46 PM, Russell Jurney <ru...@gmail.com>wrote:

> Note: this works if I LOAD a new, 2nd relation and do the join.
>
> On Thu, Jul 19, 2012 at 7:34 PM, Russell Jurney <russell.jurney@gmail.com
> >wrote:
>
> > I have a problem where I can't join a relation to itself on a different
> > field.
> >
> > describe pairs
> > pairs: {from: chararray,to: chararray,message_id: chararray,in_reply_to:
> > chararray}
> >
> > pairs2 = pairs;
> >
> > with_reply = join pairs by in_reply_to, pairs2 by message_id;
> >
> >
> > I get this error:
> >
> > 2012-07-19 19:31:16,927 [main] ERROR org.apache.pig.tools.grunt.Grunt -
> > ERROR 1200: Pig script failed to parse:
> > <line 20, column 6> pig script failed to validate:
> > org.apache.pig.impl.logicalLayer.FrontendException: ERROR 2225:
> Projection
> > with nothing to reference!
> > 2012-07-19 19:31:16,928 [main] ERROR org.apache.pig.tools.grunt.Grunt -
> > Failed to parse: Pig script failed to parse:
> > <line 20, column 6> pig script failed to validate:
> > org.apache.pig.impl.logicalLayer.FrontendException: ERROR 2225:
> Projection
> > with nothing to reference!
> > at
> > org.apache.pig.parser.QueryParserDriver.parse(QueryParserDriver.java:182)
> >  at org.apache.pig.PigServer$Graph.validateQuery(PigServer.java:1565)
> > at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1538)
> >  at org.apache.pig.PigServer.registerQuery(PigServer.java:540)
> > at
> org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:970)
> >  at
> >
> org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:386)
> > at
> >
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:189)
> >  at
> >
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:165)
> > at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:69)
> >  at org.apache.pig.Main.run(Main.java:490)
> > at org.apache.pig.Main.main(Main.java:111)
> >  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> > at
> >
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> >  at
> >
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> > at java.lang.reflect.Method.invoke(Method.java:597)
> >  at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
> > Caused by:
> > <line 20, column 6> pig script failed to validate:
> > org.apache.pig.impl.logicalLayer.FrontendException: ERROR 2225:
> Projection
> > with nothing to reference!
> >  at
> >
> org.apache.pig.parser.LogicalPlanBuilder.buildJoinOp(LogicalPlanBuilder.java:363)
> > at
> >
> org.apache.pig.parser.LogicalPlanGenerator.join_clause(LogicalPlanGenerator.java:11354)
> >  at
> >
> org.apache.pig.parser.LogicalPlanGenerator.op_clause(LogicalPlanGenerator.java:1489)
> > at
> >
> org.apache.pig.parser.LogicalPlanGenerator.general_statement(LogicalPlanGenerator.java:789)
> >  at
> >
> org.apache.pig.parser.LogicalPlanGenerator.statement(LogicalPlanGenerator.java:507)
> > at
> >
> org.apache.pig.parser.LogicalPlanGenerator.query(LogicalPlanGenerator.java:382)
> >  at
> > org.apache.pig.parser.QueryParserDriver.parse(QueryParserDriver.java:175)
> > ... 15 more
> >
> >
> > What am I to do?
> > --
> > Russell Jurney twitter.com/rjurney russell.jurney@gmail.comdatasyndrome.
> > com
> >
>
>
>
> --
> Russell Jurney twitter.com/rjurney russell.jurney@gmail.com
> datasyndrome.com
>



-- 
Robert Yerex
Data Scientist
Civitas Learning
www.civitaslearning.com

Re: Can't JOIN self?

Posted by Russell Jurney <ru...@gmail.com>.
Note: this works if I LOAD a new, 2nd relation and do the join.

On Thu, Jul 19, 2012 at 7:34 PM, Russell Jurney <ru...@gmail.com>wrote:

> I have a problem where I can't join a relation to itself on a different
> field.
>
> describe pairs
> pairs: {from: chararray,to: chararray,message_id: chararray,in_reply_to:
> chararray}
>
> pairs2 = pairs;
>
> with_reply = join pairs by in_reply_to, pairs2 by message_id;
>
>
> I get this error:
>
> 2012-07-19 19:31:16,927 [main] ERROR org.apache.pig.tools.grunt.Grunt -
> ERROR 1200: Pig script failed to parse:
> <line 20, column 6> pig script failed to validate:
> org.apache.pig.impl.logicalLayer.FrontendException: ERROR 2225: Projection
> with nothing to reference!
> 2012-07-19 19:31:16,928 [main] ERROR org.apache.pig.tools.grunt.Grunt -
> Failed to parse: Pig script failed to parse:
> <line 20, column 6> pig script failed to validate:
> org.apache.pig.impl.logicalLayer.FrontendException: ERROR 2225: Projection
> with nothing to reference!
> at
> org.apache.pig.parser.QueryParserDriver.parse(QueryParserDriver.java:182)
>  at org.apache.pig.PigServer$Graph.validateQuery(PigServer.java:1565)
> at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1538)
>  at org.apache.pig.PigServer.registerQuery(PigServer.java:540)
> at org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:970)
>  at
> org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:386)
> at
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:189)
>  at
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:165)
> at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:69)
>  at org.apache.pig.Main.run(Main.java:490)
> at org.apache.pig.Main.main(Main.java:111)
>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>  at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
>  at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
> Caused by:
> <line 20, column 6> pig script failed to validate:
> org.apache.pig.impl.logicalLayer.FrontendException: ERROR 2225: Projection
> with nothing to reference!
>  at
> org.apache.pig.parser.LogicalPlanBuilder.buildJoinOp(LogicalPlanBuilder.java:363)
> at
> org.apache.pig.parser.LogicalPlanGenerator.join_clause(LogicalPlanGenerator.java:11354)
>  at
> org.apache.pig.parser.LogicalPlanGenerator.op_clause(LogicalPlanGenerator.java:1489)
> at
> org.apache.pig.parser.LogicalPlanGenerator.general_statement(LogicalPlanGenerator.java:789)
>  at
> org.apache.pig.parser.LogicalPlanGenerator.statement(LogicalPlanGenerator.java:507)
> at
> org.apache.pig.parser.LogicalPlanGenerator.query(LogicalPlanGenerator.java:382)
>  at
> org.apache.pig.parser.QueryParserDriver.parse(QueryParserDriver.java:175)
> ... 15 more
>
>
> What am I to do?
> --
> Russell Jurney twitter.com/rjurney russell.jurney@gmail.com datasyndrome.
> com
>



-- 
Russell Jurney twitter.com/rjurney russell.jurney@gmail.com datasyndrome.com