You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by wi...@thomsonreuters.com on 2011/04/05 22:10:08 UTC
Internal error 2999 - misuse of CONCAT? misuse of GROUP?
I am a new pig user and have run into “Internal error 2999” .
2011-04-05 15:59:57,445 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2999: Unexpected internal error. null
Details at logfile: /proj/CitationSystem/backend/hadoop/testbed-hold/pig_1302033581143.log
That shows:
Pig Stack Trace
---------------
ERROR 2999: Unexpected internal error. null
java.lang.NullPointerException
at org.apache.pig.impl.logicalLayer.validators.TypeCheckingVisitor.getLoadFuncSpec(TypeCheckingVisitor.java:3116)
at org.apache.pig.impl.logicalLayer.validators.TypeCheckingVisitor.visit(TypeCheckingVisitor.java:1793)
at org.apache.pig.impl.logicalLayer.LOCast.visit(LOCast.java:67)
at org.apache.pig.impl.logicalLayer.LOCast.visit(LOCast.java:32)
at org.apache.pig.impl.plan.DependencyOrderWalker.walk(DependencyOrderWalker.java:70)
at org.apache.pig.impl.plan.PlanVisitor.visit(PlanVisitor.java:51)
at org.apache.pig.impl.logicalLayer.validators.TypeCheckingVisitor.checkInnerPlan(TypeCheckingVisitor.java:2869)
at org.apache.pig.impl.logicalLayer.validators.TypeCheckingVisitor.visit(TypeCheckingVisitor.java:2430)
at org.apache.pig.impl.logicalLayer.LOCogroup.visit(LOCogroup.java:378)
at org.apache.pig.impl.logicalLayer.LOCogroup.visit(LOCogroup.java:45)
at org.apache.pig.impl.plan.DependencyOrderWalker.walk(DependencyOrderWalker.java:70)
at org.apache.pig.impl.plan.PlanVisitor.visit(PlanVisitor.java:51)
at org.apache.pig.impl.plan.PlanValidator.validateSkipCollectException(PlanValidator.java:102)
at org.apache.pig.impl.logicalLayer.validators.TypeCheckingValidator.validate(TypeCheckingValidator.java:40)
at org.apache.pig.impl.logicalLayer.validators.TypeCheckingValidator.validate(TypeCheckingValidator.java:30)
at org.apache.pig.impl.logicalLayer.validators.LogicalPlanValidationExecutor.validate(LogicalPlanValidationExecutor.java:89)
at org.apache.pig.impl.logicalLayer.UnionOnSchemaSetter.visit(UnionOnSchemaSetter.java:70)
at org.apache.pig.impl.logicalLayer.LOUnion.visit(LOUnion.java:177)
at org.apache.pig.impl.logicalLayer.LOUnion.visit(LOUnion.java:38)
at org.apache.pig.impl.plan.DependencyOrderWalker.walk(DependencyOrderWalker.java:70)
at org.apache.pig.impl.plan.PlanVisitor.visit(PlanVisitor.java:51)
at org.apache.pig.PigServer.compileLp(PigServer.java:1317)
at org.apache.pig.PigServer.compileLp(PigServer.java:1306)
at org.apache.pig.PigServer.compileLp(PigServer.java:1241)
at org.apache.pig.PigServer.compileLp(PigServer.java:1221)
at org.apache.pig.PigServer.execute(PigServer.java:1178)
at org.apache.pig.PigServer.access$100(PigServer.java:128)
at org.apache.pig.PigServer$Graph.execute(PigServer.java:1517)
at org.apache.pig.PigServer.executeBatchEx(PigServer.java:362)
at org.apache.pig.PigServer.executeBatch(PigServer.java:329)
at org.apache.pig.tools.grunt.GruntParser.executeBatch(GruntParser.java:112)
at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:169)
at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:141)
at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:90)
at org.apache.pig.Main.run(Main.java:510)
at org.apache.pig.Main.main(Main.java:107)
Most likely I am doing something wrong, so any advice would be appreciated. Here is my setup - I have a pig script like this:
[… statements define SrcFuid and NewCitationRel …]
TCRaw = join SrcFuid by citingdocid, NewCitationRel by citeddocid;
describe TCRaw;
dump TCRaw;
TCGroupedByFuid = group TCRaw by CONCAT((chararray)SrcFuid.citingdocid,
SrcFuid.col,
(chararray)SrcFuid.seq);
store TCGroupedByFuid into 'foo';
The log shows the output of the describe and dump commands (I’ve formatted for readability):
TCRaw: {SrcFuid::citingdocid: int,
SrcFuid::col: bytearray,
SrcFuid::seq: int,
NewCitationRel::citeddocid: int,
NewCitationRel::citingdocid: int,
NewCitationRel::col: bytearray,
NewCitationRel::seq: int,
NewCitationRel::year: int,
NewCitationRel::eds: bytearray}
(14159274,BCI,6,14159274,14159163,BCI,5,1999,BCI.BCI)
(14159274,BCI,6,14159274,14159163,WOS,11,1999,WOS.SCI)
(14159274,WOS,16,14159274,14159163,BCI,5,1999,BCI.BCI)
(14159274,WOS,16,14159274,14159163,WOS,11,1999,WOS.SCI)
What I was hoping for was something like
(‘14159274BCI6’,
{(14159274,BCI,6,14159274,14159163,BCI,5,1999,BCI.BCI),
(14159274,BCI,6,14159274,14159163,WOS,11,1999,WOS.SCI)})
(‘14159274WOS16’,
{(14159274,WOS,16,14159274,14159163,BCI,5,1999,BCI.BCI)
(14159274,WOS,16,14159274,14159163,WOS,11,1999,WOS.SCI)})
If anyone could give me a hint what to do get that I’d appreciate it much. Thanks!
Will
William F Dowling
Sr Technical Specialist, Software Engineering
Thomson Reuters
0 +1 215 823 3853
Re: Internal error 2999 - misuse of CONCAT? misuse of GROUP?
Posted by Xiaomeng Wan <sh...@gmail.com>.
TCGroupedByFuid = group TCRaw by CONCAT((chararray)SrcFuid.citingdocid,
SrcFuid.col,
(chararray)SrcFuid.seq);
should be
TCGroupedByFuid = group TCRaw by CONCAT((chararray)SrcFuid::citingdocid,CONCAT(
SrcFuid::col,
(chararray)SrcFuid::seq));
Shawn
On Wed, Apr 6, 2011 at 12:17 PM, <wi...@thomsonreuters.com> wrote:
> -----Original Message-----
> From: Xiaomeng Wan [mailto:shawnwan@gmail.com]
> Sent: Tuesday, April 05, 2011 6:54 PM
> To: user@pig.apache.org
> Subject: Re: Internal error 2999 - misuse of CONCAT? misuse of GROUP?
>
> concat only takes two fields at a time. use concat(field1,
> concat(field2, field3))
>
> Shawn
> --------------------------
>
>
> Hi Shawn,
>
> Thanks for your response. I have tried supplying just two arguments to CONCAT(), but I get the same
>
> ERROR 2999: Unexpected internal error. null
>
> java.lang.NullPointerException
>
> that I originally reported. It's a good tip to use only two arguments, but I think something else is (also) going on. Thanks!
>
> Will
>
RE: Internal error 2999 - misuse of CONCAT? misuse of GROUP?
Posted by wi...@thomsonreuters.com.
-----Original Message-----
From: Xiaomeng Wan [mailto:shawnwan@gmail.com]
Sent: Tuesday, April 05, 2011 6:54 PM
To: user@pig.apache.org
Subject: Re: Internal error 2999 - misuse of CONCAT? misuse of GROUP?
concat only takes two fields at a time. use concat(field1,
concat(field2, field3))
Shawn
--------------------------
Hi Shawn,
Thanks for your response. I have tried supplying just two arguments to CONCAT(), but I get the same
ERROR 2999: Unexpected internal error. null
java.lang.NullPointerException
that I originally reported. It's a good tip to use only two arguments, but I think something else is (also) going on. Thanks!
Will
Re: Internal error 2999 - misuse of CONCAT? misuse of GROUP?
Posted by Xiaomeng Wan <sh...@gmail.com>.
concat only takes two fields at a time. use concat(field1,
concat(field2, field3))
Shawn
On Tue, Apr 5, 2011 at 3:59 PM, Thejas M Nair <te...@yahoo-inc.com> wrote:
>
>
> Do you need the group-key to be concatenated ? If not, you can just group on all the three columns -
>
> TCGroupedByFuid = group TCRaw by (SrcFuid.citingdocid,
> SrcFuid.col,
> SrcFuid.seq);
>
> If you want the group key to be concatenated for some reason, you can see if generating a concatenated string helps.
> Which version of pig are you using ? From the stack trace, it looks like the version is an old(er) one. This issue might have been fixed in a newer version of pig.
>
> -Thejas
>
>
> On 4/5/11 1:10 PM, "william.dowling@thomsonreuters.com" <wi...@thomsonreuters.com> wrote:
>
>
> dump TCRaw;
> TCGroupedByFuid = group TCRaw by CONCAT((chararray)SrcFuid.citingdocid,
> SrcFuid.col,
> (chararray)SrcFuid.seq);
> store TCGroupedByFuid into 'foo';
>
>
RE: Internal error 2999 - misuse of CONCAT? misuse of GROUP?
Posted by wi...@thomsonreuters.com.
Hi Thejas,
Thanks again for your help. When I omit the SrcFuid "qualifier" and use the form you suggest, I get this error (that was actually the reason I tried SrcFuid.<field> to start with.)
Pig Stack Trace
---------------
ERROR 1025: Found more than one match: SrcFuid::citingdocid, NewCitationRel::citingdocid
org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1000: Error during parsing. Found more than one match: SrcFuid::citingdocid, NewCitationRel::citingdocid
at org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1607)
at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1551)
at org.apache.pig.PigServer.registerQuery(PigServer.java:523)
at org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:868)
at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:388)
at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:165)
at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:141)
at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:90)
at org.apache.pig.Main.run(Main.java:510)
at org.apache.pig.Main.main(Main.java:107)
Caused by: org.apache.pig.impl.logicalLayer.parser.ParseException: Found more than one match: SrcFuid::citingdocid, NewCitationRel::citingdocid
at org.apache.pig.impl.logicalLayer.parser.QueryParser.AliasFieldOrSpec(QueryParser.java:7418)
at org.apache.pig.impl.logicalLayer.parser.QueryParser.ColOrSpec(QueryParser.java:7226)
at org.apache.pig.impl.logicalLayer.parser.QueryParser.BaseEvalSpec(QueryParser.java:5297)
But the good news is that I combined this suggestion with Shawn's and found that this works:
TCGroupedByFuid = group TCRaw by (SrcFuid::citingdocid, SrcFuid::col, SrcFuid::seq);
Thanks Thejas and Shawn!
Will
William F Dowling
Sr Technical Specialist, Software Engineering
Thomson Reuters
0 +1 215 823 3853
-----Original Message-----
From: Thejas M Nair [mailto:tejas@yahoo-inc.com]
Sent: Wednesday, April 06, 2011 3:31 PM
To: user@pig.apache.org; Dowling, William (Hlthcr&Science)
Subject: Re: Internal error 2999 - misuse of CONCAT? misuse of GROUP?
In the relation TCRaw, there is no column called SrcFuid.
As a result, you end up using this feature -
http://pig.apache.org/docs/r0.8.0/piglatin_ref2.html#Casting+Relations+to+Sc
alars .
Change your statement to -
TCGroupedByFuid = group TCRaw by (citingdocid,
col,
seq);
Thanks,
Thejas
On 4/6/11 11:09 AM, "william.dowling@thomsonreuters.com"
<wi...@thomsonreuters.com> wrote:
>
>
>> Do you need the group-key to be concatenated ? If not, you can just group on
>> all the three columns -
>
>> TCGroupedByFuid = group TCRaw by (SrcFuid.citingdocid,
> SrcFuid.col,
> SrcFuid.seq);
>
> Hi Thejas,
>
> I had tried that originally before introducing CONCAT(), but I got this error
> message:
>
> ERROR 0: Scalar has more than one row in the output.
> 1st : (14159274,BCI,6), 2nd :(45937168,BCI,17)
>
> I don't understand that, since TCRaw is
>
> (14159274,BCI,6,14159274,14159163,BCI,5,1999,BCI.BCI)
> (14159274,BCI,6,14159274,14159163,WOS,11,1999,WOS.SCI)
> (14159274,WOS,16,14159274,14159163,BCI,5,1999,BCI.BCI)
> (14159274,WOS,16,14159274,14159163,WOS,11,1999,WOS.SCI)
>
> and the 2nd tuple is not a (projection of any) member of TCRaw (though it is a
> member of SrcFuid). So I think my understanding of GROUP is incorrect.
>
> Thanks for your help!
>
> Will
>
>
Re: Internal error 2999 - misuse of CONCAT? misuse of GROUP?
Posted by Thejas M Nair <te...@yahoo-inc.com>.
This feature/syntax seems be causing confusion in many cases , so I have proposed deprecating this syntax in the next release .
See - https://issues.apache.org/jira/browse/PIG-1967 .
-Thejas
On 4/6/11 12:30 PM, "Thejas M Nair" <te...@yahoo-inc.com> wrote:
In the relation TCRaw, there is no column called SrcFuid.
As a result, you end up using this feature -
http://pig.apache.org/docs/r0.8.0/piglatin_ref2.html#Casting+Relations+to+Sc
alars .
Change your statement to -
TCGroupedByFuid = group TCRaw by (citingdocid,
col,
seq);
Thanks,
Thejas
On 4/6/11 11:09 AM, "william.dowling@thomsonreuters.com"
<wi...@thomsonreuters.com> wrote:
>
>
>> Do you need the group-key to be concatenated ? If not, you can just group on
>> all the three columns -
>
>> TCGroupedByFuid = group TCRaw by (SrcFuid.citingdocid,
> SrcFuid.col,
> SrcFuid.seq);
>
> Hi Thejas,
>
> I had tried that originally before introducing CONCAT(), but I got this error
> message:
>
> ERROR 0: Scalar has more than one row in the output.
> 1st : (14159274,BCI,6), 2nd :(45937168,BCI,17)
>
> I don't understand that, since TCRaw is
>
> (14159274,BCI,6,14159274,14159163,BCI,5,1999,BCI.BCI)
> (14159274,BCI,6,14159274,14159163,WOS,11,1999,WOS.SCI)
> (14159274,WOS,16,14159274,14159163,BCI,5,1999,BCI.BCI)
> (14159274,WOS,16,14159274,14159163,WOS,11,1999,WOS.SCI)
>
> and the 2nd tuple is not a (projection of any) member of TCRaw (though it is a
> member of SrcFuid). So I think my understanding of GROUP is incorrect.
>
> Thanks for your help!
>
> Will
>
>
Re: Internal error 2999 - misuse of CONCAT? misuse of GROUP?
Posted by Thejas M Nair <te...@yahoo-inc.com>.
In the relation TCRaw, there is no column called SrcFuid.
As a result, you end up using this feature -
http://pig.apache.org/docs/r0.8.0/piglatin_ref2.html#Casting+Relations+to+Sc
alars .
Change your statement to -
TCGroupedByFuid = group TCRaw by (citingdocid,
col,
seq);
Thanks,
Thejas
On 4/6/11 11:09 AM, "william.dowling@thomsonreuters.com"
<wi...@thomsonreuters.com> wrote:
>
>
>> Do you need the group-key to be concatenated ? If not, you can just group on
>> all the three columns -
>
>> TCGroupedByFuid = group TCRaw by (SrcFuid.citingdocid,
> SrcFuid.col,
> SrcFuid.seq);
>
> Hi Thejas,
>
> I had tried that originally before introducing CONCAT(), but I got this error
> message:
>
> ERROR 0: Scalar has more than one row in the output.
> 1st : (14159274,BCI,6), 2nd :(45937168,BCI,17)
>
> I don't understand that, since TCRaw is
>
> (14159274,BCI,6,14159274,14159163,BCI,5,1999,BCI.BCI)
> (14159274,BCI,6,14159274,14159163,WOS,11,1999,WOS.SCI)
> (14159274,WOS,16,14159274,14159163,BCI,5,1999,BCI.BCI)
> (14159274,WOS,16,14159274,14159163,WOS,11,1999,WOS.SCI)
>
> and the 2nd tuple is not a (projection of any) member of TCRaw (though it is a
> member of SrcFuid). So I think my understanding of GROUP is incorrect.
>
> Thanks for your help!
>
> Will
>
>
RE: Internal error 2999 - misuse of CONCAT? misuse of GROUP?
Posted by wi...@thomsonreuters.com.
>Do you need the group-key to be concatenated ? If not, you can just group on all the three columns -
>TCGroupedByFuid = group TCRaw by (SrcFuid.citingdocid,
SrcFuid.col,
SrcFuid.seq);
Hi Thejas,
I had tried that originally before introducing CONCAT(), but I got this error message:
ERROR 0: Scalar has more than one row in the output.
1st : (14159274,BCI,6), 2nd :(45937168,BCI,17)
I don't understand that, since TCRaw is
(14159274,BCI,6,14159274,14159163,BCI,5,1999,BCI.BCI)
(14159274,BCI,6,14159274,14159163,WOS,11,1999,WOS.SCI)
(14159274,WOS,16,14159274,14159163,BCI,5,1999,BCI.BCI)
(14159274,WOS,16,14159274,14159163,WOS,11,1999,WOS.SCI)
and the 2nd tuple is not a (projection of any) member of TCRaw (though it is a member of SrcFuid). So I think my understanding of GROUP is incorrect.
Thanks for your help!
Will
Re: Internal error 2999 - misuse of CONCAT? misuse of GROUP?
Posted by Thejas M Nair <te...@yahoo-inc.com>.
Do you need the group-key to be concatenated ? If not, you can just group on all the three columns -
TCGroupedByFuid = group TCRaw by (SrcFuid.citingdocid,
SrcFuid.col,
SrcFuid.seq);
If you want the group key to be concatenated for some reason, you can see if generating a concatenated string helps.
Which version of pig are you using ? From the stack trace, it looks like the version is an old(er) one. This issue might have been fixed in a newer version of pig.
-Thejas
On 4/5/11 1:10 PM, "william.dowling@thomsonreuters.com" <wi...@thomsonreuters.com> wrote:
dump TCRaw;
TCGroupedByFuid = group TCRaw by CONCAT((chararray)SrcFuid.citingdocid,
SrcFuid.col,
(chararray)SrcFuid.seq);
store TCGroupedByFuid into 'foo';