You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by XIAMING CHEN <ch...@gmail.com> on 2014/03/24 06:18:38 UTC

Pig0.12 gets confused about schema after a nested FOREACH

I found that PIG gets confused about the schema after a complicated but correct nested FOREACH operation.

My script is attached with no modification and it gives error messages below:

Picked up _JAVA_OPTIONS: -Xmx1G
2014-03-24 13:05:18,662 [main] INFO  org.apache.pig.Main - Apache Pig version 0.12.0 (r1529718) compiled Oct 07 2013, 12:20:14
2014-03-24 13:05:18,663 [main] INFO  org.apache.pig.Main - Logging error messages to: /mnt/tera/workspace/OmnilabMisc/sjtuwifi/activities/pig_1395637518659.log
2014-03-24 13:05:18,897 [main] INFO  org.apache.pig.impl.util.Utils - Default bootup file /home/chenxm/.pigbootup not found
2014-03-24 13:05:18,990 [main] INFO  org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to hadoop file system at: file:///
activities: {group: chararray,brief: {(activityID: chararray,reqHost: chararray,rspPylByt: long,pylByt: long,reqTime: double,reqDur: double,rspTime: double,rspDur: double)}}
2014-03-24 13:05:19,766 [main] WARN  org.apache.pig.PigServer - Encountered Warning IMPLICIT_CAST_TO_DOUBLE 5 time(s).
features: {activityID: chararray,service: chararray,volume: long,size: long,ADur: double,MWTime: double,MEdur: double,VMR: double,CI: double,PABw: double}
2014-03-24 13:05:19,904 [main] WARN  org.apache.pig.PigServer - Encountered Warning IMPLICIT_CAST_TO_DOUBLE 11 time(s).
2014-03-24 13:05:19,904 [main] WARN  org.apache.pig.PigServer - Encountered Warning IMPLICIT_CAST_TO_LONG 2 time(s).
filtered: {activityID: chararray,service: chararray,volume: long,size: long,ADur: double,MWTime: double,MEdur: double,VMR: double,CI: double,PABw: double}
2014-03-24 13:05:20,049 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1000: 
<file /home/chenxm/tera/workspace/OmnilabMisc/sjtuwifi/activities/features_perf.pig, line 47, column 142> Out of bound access. Trying to access non-existent column: 8. Schema activityID:chararray,reqHost:chararray,rspPylByt:long,pylByt:long,reqTime:double,reqDur:double,rspTime:double,rspDur:double has 8 column(s).
Details at logfile: ************/pig_1395637518659.log
[Finished in 1.7s with exit code 6]

In the output, schema of 'filtered' projection is correct but in the following FOREACH [line 47], PIG treats 'filtered' with another schema the same to 'brief' [line 16].
I do not know why PIG is confused about this. Is this a bug or my usage in an incorrect way?

Best,

Jamin
chenxm35@gmail.com

Re: Pig0.12 gets confused about schema after a nested FOREACH

Posted by Cheolsoo Park <pi...@gmail.com>.
Hi Jamin,

>> Out of bound access. Trying to access non-existent column: 8. Schema
activityID:chararray,reqHost:chararray,rspPylByt:long
pylByt:long,reqTime:double,reqDur:double,rspTime:double,rspDur:double has 8
column(s).

Did you try to disable ColumnMapKeyPrune optimization? You can do it by
adding "-t ColumnMapKeyPrune" to the command line.

Also, there have been a few bug fixes regarding ColumnMapKeyPrune since
0.12 release, so please try to
branch-0.12<https://github.com/apache/pig/tree/branch-0.12> in
Pig repo.

Thanks,
Cheolsoo




On Sun, Mar 23, 2014 at 10:18 PM, XIAMING CHEN <ch...@gmail.com> wrote:

> I found that PIG gets confused about the schema after a complicated but
> correct nested FOREACH operation.
>
> My script is attached with no modification and it gives error messages
> below:
>
> Picked up _JAVA_OPTIONS: -Xmx1G
> 2014-03-24 13:05:18,662 [main] INFO  org.apache.pig.Main - Apache Pig
> version 0.12.0 (r1529718) compiled Oct 07 2013, 12:20:14
> 2014-03-24 13:05:18,663 [main] INFO  org.apache.pig.Main - Logging error
> messages to:
> /mnt/tera/workspace/OmnilabMisc/sjtuwifi/activities/pig_1395637518659.log
> 2014-03-24 13:05:18,897 [main] INFO  org.apache.pig.impl.util.Utils -
> Default bootup file /home/chenxm/.pigbootup not found
> 2014-03-24 13:05:18,990 [main] INFO
>  org.apache.pig.backend.hadoop.executionengine.HExecutionEngine -
> Connecting to hadoop file system at: file:///
> activities: {group: chararray,brief: {(activityID: chararray,reqHost:
> chararray,rspPylByt: long,pylByt: long,reqTime: double,reqDur:
> double,rspTime: double,rspDur: double)}}
> 2014-03-24 13:05:19,766 [main] WARN  org.apache.pig.PigServer -
> Encountered Warning IMPLICIT_CAST_TO_DOUBLE 5 time(s).
> features: {activityID: chararray,service: chararray,volume: long,size:
> long,ADur: double,MWTime: double,MEdur: double,VMR: double,CI: double,PABw:
> double}
> 2014-03-24 13:05:19,904 [main] WARN  org.apache.pig.PigServer -
> Encountered Warning IMPLICIT_CAST_TO_DOUBLE 11 time(s).
> 2014-03-24 13:05:19,904 [main] WARN  org.apache.pig.PigServer -
> Encountered Warning IMPLICIT_CAST_TO_LONG 2 time(s).
> filtered: {activityID: chararray,service: chararray,volume: long,size:
> long,ADur: double,MWTime: double,MEdur: double,VMR: double,CI: double,PABw:
> double}
> 2014-03-24 13:05:20,049 [main] ERROR org.apache.pig.tools.grunt.Grunt -
> ERROR 1000:
> <file
> /home/chenxm/tera/workspace/OmnilabMisc/sjtuwifi/activities/features_perf.pig,
> line 47, column 142> Out of bound access. Trying to access non-existent
> column: 8. Schema
> activityID:chararray,reqHost:chararray,rspPylByt:long,pylByt:long,reqTime:double,reqDur:double,rspTime:double,rspDur:double
> has 8 column(s).
> Details at logfile: ************/pig_1395637518659.log
> [Finished in 1.7s with exit code 6]
>
> In the output, schema of 'filtered' projection is correct but in the
> following FOREACH [line 47], PIG treats 'filtered' with another schema the
> same to 'brief' [line 16].
> I do not know why PIG is confused about this. Is this a bug or my usage in
> an incorrect way?
>
> Best,
>
> Jamin
> chenxm35@gmail.com