You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by wi...@thomsonreuters.com on 2011/04/05 22:10:08 UTC

Internal error 2999 - misuse of CONCAT? misuse of GROUP?

I am a new pig user and have run into “Internal error 2999” . 

2011-04-05 15:59:57,445 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2999: Unexpected internal error. null
Details at logfile: /proj/CitationSystem/backend/hadoop/testbed-hold/pig_1302033581143.log

That shows:

Pig Stack Trace
---------------
ERROR 2999: Unexpected internal error. null

java.lang.NullPointerException
        at org.apache.pig.impl.logicalLayer.validators.TypeCheckingVisitor.getLoadFuncSpec(TypeCheckingVisitor.java:3116)
        at org.apache.pig.impl.logicalLayer.validators.TypeCheckingVisitor.visit(TypeCheckingVisitor.java:1793)
        at org.apache.pig.impl.logicalLayer.LOCast.visit(LOCast.java:67)
        at org.apache.pig.impl.logicalLayer.LOCast.visit(LOCast.java:32)
        at org.apache.pig.impl.plan.DependencyOrderWalker.walk(DependencyOrderWalker.java:70)
        at org.apache.pig.impl.plan.PlanVisitor.visit(PlanVisitor.java:51)
        at org.apache.pig.impl.logicalLayer.validators.TypeCheckingVisitor.checkInnerPlan(TypeCheckingVisitor.java:2869)
        at org.apache.pig.impl.logicalLayer.validators.TypeCheckingVisitor.visit(TypeCheckingVisitor.java:2430)
        at org.apache.pig.impl.logicalLayer.LOCogroup.visit(LOCogroup.java:378)
        at org.apache.pig.impl.logicalLayer.LOCogroup.visit(LOCogroup.java:45)
        at org.apache.pig.impl.plan.DependencyOrderWalker.walk(DependencyOrderWalker.java:70)
        at org.apache.pig.impl.plan.PlanVisitor.visit(PlanVisitor.java:51)
        at org.apache.pig.impl.plan.PlanValidator.validateSkipCollectException(PlanValidator.java:102)
        at org.apache.pig.impl.logicalLayer.validators.TypeCheckingValidator.validate(TypeCheckingValidator.java:40)
        at org.apache.pig.impl.logicalLayer.validators.TypeCheckingValidator.validate(TypeCheckingValidator.java:30)
        at org.apache.pig.impl.logicalLayer.validators.LogicalPlanValidationExecutor.validate(LogicalPlanValidationExecutor.java:89)
        at org.apache.pig.impl.logicalLayer.UnionOnSchemaSetter.visit(UnionOnSchemaSetter.java:70)
        at org.apache.pig.impl.logicalLayer.LOUnion.visit(LOUnion.java:177)
        at org.apache.pig.impl.logicalLayer.LOUnion.visit(LOUnion.java:38)
        at org.apache.pig.impl.plan.DependencyOrderWalker.walk(DependencyOrderWalker.java:70)
        at org.apache.pig.impl.plan.PlanVisitor.visit(PlanVisitor.java:51)
        at org.apache.pig.PigServer.compileLp(PigServer.java:1317)
        at org.apache.pig.PigServer.compileLp(PigServer.java:1306)
        at org.apache.pig.PigServer.compileLp(PigServer.java:1241)
        at org.apache.pig.PigServer.compileLp(PigServer.java:1221)
        at org.apache.pig.PigServer.execute(PigServer.java:1178)
        at org.apache.pig.PigServer.access$100(PigServer.java:128)
        at org.apache.pig.PigServer$Graph.execute(PigServer.java:1517)
        at org.apache.pig.PigServer.executeBatchEx(PigServer.java:362)
        at org.apache.pig.PigServer.executeBatch(PigServer.java:329)
        at org.apache.pig.tools.grunt.GruntParser.executeBatch(GruntParser.java:112)
        at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:169)
        at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:141)
        at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:90)
        at org.apache.pig.Main.run(Main.java:510)
        at org.apache.pig.Main.main(Main.java:107)

Most likely I am doing something wrong, so any advice would be appreciated.  Here is my setup -  I have a pig script like this:

[… statements define SrcFuid and NewCitationRel …]
 TCRaw = join SrcFuid by citingdocid,  NewCitationRel by citeddocid;
 describe TCRaw;
 dump TCRaw;
 TCGroupedByFuid = group TCRaw by CONCAT((chararray)SrcFuid.citingdocid,
                                          SrcFuid.col,
                                         (chararray)SrcFuid.seq); 
 store TCGroupedByFuid into 'foo';


The log shows the output of the describe and dump commands (I’ve formatted for readability):

TCRaw: {SrcFuid::citingdocid: int,
SrcFuid::col: bytearray,
		SrcFuid::seq: int,
		NewCitationRel::citeddocid: int,
		NewCitationRel::citingdocid: int,
		NewCitationRel::col: bytearray,
		NewCitationRel::seq: int,
		NewCitationRel::year: int,
		NewCitationRel::eds: bytearray}

(14159274,BCI,6,14159274,14159163,BCI,5,1999,BCI.BCI)
(14159274,BCI,6,14159274,14159163,WOS,11,1999,WOS.SCI)
(14159274,WOS,16,14159274,14159163,BCI,5,1999,BCI.BCI)
(14159274,WOS,16,14159274,14159163,WOS,11,1999,WOS.SCI)

What I was hoping for was something like
(‘14159274BCI6’,
 {(14159274,BCI,6,14159274,14159163,BCI,5,1999,BCI.BCI),
(14159274,BCI,6,14159274,14159163,WOS,11,1999,WOS.SCI)})
(‘14159274WOS16’, 
{(14159274,WOS,16,14159274,14159163,BCI,5,1999,BCI.BCI)
(14159274,WOS,16,14159274,14159163,WOS,11,1999,WOS.SCI)})

If anyone could give me a hint what to do get that I’d appreciate it much.  Thanks!

Will

William F Dowling
Sr Technical Specialist, Software Engineering
Thomson Reuters
0 +1 215 823 3853



Re: Internal error 2999 - misuse of CONCAT? misuse of GROUP?

Posted by Xiaomeng Wan <sh...@gmail.com>.
TCGroupedByFuid = group TCRaw by CONCAT((chararray)SrcFuid.citingdocid,
                                         SrcFuid.col,
                                        (chararray)SrcFuid.seq);

should be

TCGroupedByFuid = group TCRaw by CONCAT((chararray)SrcFuid::citingdocid,CONCAT(
                                         SrcFuid::col,
                                        (chararray)SrcFuid::seq));

Shawn

On Wed, Apr 6, 2011 at 12:17 PM,  <wi...@thomsonreuters.com> wrote:
> -----Original Message-----
> From: Xiaomeng Wan [mailto:shawnwan@gmail.com]
> Sent: Tuesday, April 05, 2011 6:54 PM
> To: user@pig.apache.org
> Subject: Re: Internal error 2999 - misuse of CONCAT? misuse of GROUP?
>
> concat only takes two fields at a time. use concat(field1,
> concat(field2, field3))
>
> Shawn
> --------------------------
>
>
> Hi Shawn,
>
> Thanks for your response.  I have tried supplying just two arguments to CONCAT(), but I get the same
>
> ERROR 2999: Unexpected internal error. null
>
> java.lang.NullPointerException
>
> that I originally reported.  It's a good tip to use only two arguments, but I think something else is (also) going on.  Thanks!
>
> Will
>

RE: Internal error 2999 - misuse of CONCAT? misuse of GROUP?

Posted by wi...@thomsonreuters.com.
-----Original Message-----
From: Xiaomeng Wan [mailto:shawnwan@gmail.com] 
Sent: Tuesday, April 05, 2011 6:54 PM
To: user@pig.apache.org
Subject: Re: Internal error 2999 - misuse of CONCAT? misuse of GROUP?

concat only takes two fields at a time. use concat(field1,
concat(field2, field3))

Shawn
--------------------------


Hi Shawn,

Thanks for your response.  I have tried supplying just two arguments to CONCAT(), but I get the same

ERROR 2999: Unexpected internal error. null

java.lang.NullPointerException

that I originally reported.  It's a good tip to use only two arguments, but I think something else is (also) going on.  Thanks!

Will

Re: Internal error 2999 - misuse of CONCAT? misuse of GROUP?

Posted by Xiaomeng Wan <sh...@gmail.com>.
concat only takes two fields at a time. use concat(field1,
concat(field2, field3))

Shawn

On Tue, Apr 5, 2011 at 3:59 PM, Thejas M Nair <te...@yahoo-inc.com> wrote:
>
>
> Do you need the group-key to be concatenated ? If not, you can just group on all the three columns -
>
>  TCGroupedByFuid = group TCRaw by (SrcFuid.citingdocid,
>                                         SrcFuid.col,
>                                        SrcFuid.seq);
>
> If you want the group key to be concatenated for some reason, you can see if generating a concatenated string helps.
> Which version of pig are you using ? From the stack trace, it looks like the version is an old(er) one. This issue might have been fixed in a newer version of pig.
>
> -Thejas
>
>
> On 4/5/11 1:10 PM, "william.dowling@thomsonreuters.com" <wi...@thomsonreuters.com> wrote:
>
>
>  dump TCRaw;
>  TCGroupedByFuid = group TCRaw by CONCAT((chararray)SrcFuid.citingdocid,
>                                          SrcFuid.col,
>                                         (chararray)SrcFuid.seq);
>  store TCGroupedByFuid into 'foo';
>
>

RE: Internal error 2999 - misuse of CONCAT? misuse of GROUP?

Posted by wi...@thomsonreuters.com.
Hi Thejas,

Thanks again for your help. When I omit the SrcFuid "qualifier" and use the form you suggest, I get this error (that was actually the reason I tried SrcFuid.<field> to start with.)

Pig Stack Trace
---------------
ERROR 1025: Found more than one match: SrcFuid::citingdocid, NewCitationRel::citingdocid

org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1000: Error during parsing. Found more than one match: SrcFuid::citingdocid, NewCitationRel::citingdocid
	at org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1607)
	at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1551)
	at org.apache.pig.PigServer.registerQuery(PigServer.java:523)
	at org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:868)
	at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:388)
	at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:165)
	at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:141)
	at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:90)
	at org.apache.pig.Main.run(Main.java:510)
	at org.apache.pig.Main.main(Main.java:107)
Caused by: org.apache.pig.impl.logicalLayer.parser.ParseException: Found more than one match: SrcFuid::citingdocid, NewCitationRel::citingdocid
	at org.apache.pig.impl.logicalLayer.parser.QueryParser.AliasFieldOrSpec(QueryParser.java:7418)
	at org.apache.pig.impl.logicalLayer.parser.QueryParser.ColOrSpec(QueryParser.java:7226)
	at org.apache.pig.impl.logicalLayer.parser.QueryParser.BaseEvalSpec(QueryParser.java:5297)


But the good news is that I combined this suggestion with Shawn's and found that this works:

TCGroupedByFuid = group TCRaw by (SrcFuid::citingdocid, SrcFuid::col, SrcFuid::seq);

Thanks Thejas and Shawn!

Will


William F Dowling
Sr Technical Specialist, Software Engineering
Thomson Reuters
0 +1 215 823 3853


-----Original Message-----
From: Thejas M Nair [mailto:tejas@yahoo-inc.com] 
Sent: Wednesday, April 06, 2011 3:31 PM
To: user@pig.apache.org; Dowling, William (Hlthcr&Science)
Subject: Re: Internal error 2999 - misuse of CONCAT? misuse of GROUP?

In the relation TCRaw, there is no column called SrcFuid.
As a result, you end up using this feature -
http://pig.apache.org/docs/r0.8.0/piglatin_ref2.html#Casting+Relations+to+Sc
alars .


Change your statement to -
 TCGroupedByFuid = group TCRaw by (citingdocid,
                                          col,
                                         seq);

Thanks,
Thejas

On 4/6/11 11:09 AM, "william.dowling@thomsonreuters.com"
<wi...@thomsonreuters.com> wrote:

> 
> 
>> Do you need the group-key to be concatenated ? If not, you can just group on
>> all the three columns -
> 
>> TCGroupedByFuid = group TCRaw by (SrcFuid.citingdocid,
>                                          SrcFuid.col,
>                                         SrcFuid.seq);
> 
> Hi Thejas,
> 
> I had tried that originally before introducing CONCAT(), but I got this error
> message:
> 
> ERROR 0: Scalar has more than one row in the output.
>  1st : (14159274,BCI,6), 2nd :(45937168,BCI,17)
> 
> I don't understand that, since TCRaw is
> 
> (14159274,BCI,6,14159274,14159163,BCI,5,1999,BCI.BCI)
> (14159274,BCI,6,14159274,14159163,WOS,11,1999,WOS.SCI)
> (14159274,WOS,16,14159274,14159163,BCI,5,1999,BCI.BCI)
> (14159274,WOS,16,14159274,14159163,WOS,11,1999,WOS.SCI)
> 
> and the 2nd tuple is not a (projection of any) member of TCRaw (though it is a
> member of SrcFuid). So I think my understanding of GROUP is incorrect.
> 
> Thanks for your help!
> 
> Will
> 
> 



Re: Internal error 2999 - misuse of CONCAT? misuse of GROUP?

Posted by Thejas M Nair <te...@yahoo-inc.com>.
This feature/syntax seems be causing confusion in many cases , so I have proposed deprecating this syntax in the next release .
See - https://issues.apache.org/jira/browse/PIG-1967 .

-Thejas



On 4/6/11 12:30 PM, "Thejas M Nair" <te...@yahoo-inc.com> wrote:

In the relation TCRaw, there is no column called SrcFuid.
As a result, you end up using this feature -
http://pig.apache.org/docs/r0.8.0/piglatin_ref2.html#Casting+Relations+to+Sc
alars .


Change your statement to -
 TCGroupedByFuid = group TCRaw by (citingdocid,
                                          col,
                                         seq);

Thanks,
Thejas

On 4/6/11 11:09 AM, "william.dowling@thomsonreuters.com"
<wi...@thomsonreuters.com> wrote:

>
>
>> Do you need the group-key to be concatenated ? If not, you can just group on
>> all the three columns -
>
>> TCGroupedByFuid = group TCRaw by (SrcFuid.citingdocid,
>                                          SrcFuid.col,
>                                         SrcFuid.seq);
>
> Hi Thejas,
>
> I had tried that originally before introducing CONCAT(), but I got this error
> message:
>
> ERROR 0: Scalar has more than one row in the output.
>  1st : (14159274,BCI,6), 2nd :(45937168,BCI,17)
>
> I don't understand that, since TCRaw is
>
> (14159274,BCI,6,14159274,14159163,BCI,5,1999,BCI.BCI)
> (14159274,BCI,6,14159274,14159163,WOS,11,1999,WOS.SCI)
> (14159274,WOS,16,14159274,14159163,BCI,5,1999,BCI.BCI)
> (14159274,WOS,16,14159274,14159163,WOS,11,1999,WOS.SCI)
>
> and the 2nd tuple is not a (projection of any) member of TCRaw (though it is a
> member of SrcFuid). So I think my understanding of GROUP is incorrect.
>
> Thanks for your help!
>
> Will
>
>





Re: Internal error 2999 - misuse of CONCAT? misuse of GROUP?

Posted by Thejas M Nair <te...@yahoo-inc.com>.
In the relation TCRaw, there is no column called SrcFuid.
As a result, you end up using this feature -
http://pig.apache.org/docs/r0.8.0/piglatin_ref2.html#Casting+Relations+to+Sc
alars .


Change your statement to -
 TCGroupedByFuid = group TCRaw by (citingdocid,
                                          col,
                                         seq);

Thanks,
Thejas

On 4/6/11 11:09 AM, "william.dowling@thomsonreuters.com"
<wi...@thomsonreuters.com> wrote:

> 
> 
>> Do you need the group-key to be concatenated ? If not, you can just group on
>> all the three columns -
> 
>> TCGroupedByFuid = group TCRaw by (SrcFuid.citingdocid,
>                                          SrcFuid.col,
>                                         SrcFuid.seq);
> 
> Hi Thejas,
> 
> I had tried that originally before introducing CONCAT(), but I got this error
> message:
> 
> ERROR 0: Scalar has more than one row in the output.
>  1st : (14159274,BCI,6), 2nd :(45937168,BCI,17)
> 
> I don't understand that, since TCRaw is
> 
> (14159274,BCI,6,14159274,14159163,BCI,5,1999,BCI.BCI)
> (14159274,BCI,6,14159274,14159163,WOS,11,1999,WOS.SCI)
> (14159274,WOS,16,14159274,14159163,BCI,5,1999,BCI.BCI)
> (14159274,WOS,16,14159274,14159163,WOS,11,1999,WOS.SCI)
> 
> and the 2nd tuple is not a (projection of any) member of TCRaw (though it is a
> member of SrcFuid). So I think my understanding of GROUP is incorrect.
> 
> Thanks for your help!
> 
> Will
> 
> 



RE: Internal error 2999 - misuse of CONCAT? misuse of GROUP?

Posted by wi...@thomsonreuters.com.
>Do you need the group-key to be concatenated ? If not, you can just group on all the three columns -

>TCGroupedByFuid = group TCRaw by (SrcFuid.citingdocid,
                                         SrcFuid.col,
                                        SrcFuid.seq);

Hi Thejas,

I had tried that originally before introducing CONCAT(), but I got this error message:

ERROR 0: Scalar has more than one row in the output.
 1st : (14159274,BCI,6), 2nd :(45937168,BCI,17)

I don't understand that, since TCRaw is

(14159274,BCI,6,14159274,14159163,BCI,5,1999,BCI.BCI)
(14159274,BCI,6,14159274,14159163,WOS,11,1999,WOS.SCI)
(14159274,WOS,16,14159274,14159163,BCI,5,1999,BCI.BCI)
(14159274,WOS,16,14159274,14159163,WOS,11,1999,WOS.SCI)

and the 2nd tuple is not a (projection of any) member of TCRaw (though it is a member of SrcFuid). So I think my understanding of GROUP is incorrect.

Thanks for your help!

Will


Re: Internal error 2999 - misuse of CONCAT? misuse of GROUP?

Posted by Thejas M Nair <te...@yahoo-inc.com>.

Do you need the group-key to be concatenated ? If not, you can just group on all the three columns -

 TCGroupedByFuid = group TCRaw by (SrcFuid.citingdocid,
                                         SrcFuid.col,
                                        SrcFuid.seq);

If you want the group key to be concatenated for some reason, you can see if generating a concatenated string helps.
Which version of pig are you using ? From the stack trace, it looks like the version is an old(er) one. This issue might have been fixed in a newer version of pig.

-Thejas


On 4/5/11 1:10 PM, "william.dowling@thomsonreuters.com" <wi...@thomsonreuters.com> wrote:


 dump TCRaw;
 TCGroupedByFuid = group TCRaw by CONCAT((chararray)SrcFuid.citingdocid,
                                          SrcFuid.col,
                                         (chararray)SrcFuid.seq);
 store TCGroupedByFuid into 'foo';