You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by Yang <te...@gmail.com> on 2013/03/11 08:11:05 UTC

null pointer error with a simple pig program

the following code gave null pointer exception

---------------------------------------------------------------------------------------

rbl_raw = load 's3://mybucket/rbl-logs/{2013/03/06,2013/03/05}' AS
(line:chararray);

rbl = FOREACH rbl_raw GENERATE FLATTEN(loadrbl(line)) AS (x:chararray,
y:chararray);

seo_rbl = FILTER rbl BY x IS NOT NULL AND y == 'seo_google';

rbl1 = GROUP seo_rbl BY x;

STORE rbl1 INTO '/user/hadoop/blah'

-------------------------------------------------------------------------------




Pig Stack Trace
---------------
ERROR 2017: Internal error creating job configuration.

org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobCreationException:
ERROR 2017: Internal error creating job configuration.
        at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.getJob(JobControlCompiler.java:750)
        at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.compile(JobControlCompiler.java:267)
        at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:151)
        at org.apache.pig.PigServer.launchPlan(PigServer.java:1313)
        at
org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1298)
        at org.apache.pig.PigServer.execute(PigServer.java:1288)
        at org.apache.pig.PigServer.executeBatch(PigServer.java:360)
        at
org.apache.pig.tools.grunt.GruntParser.executeBatch(GruntParser.java:132)
        at
org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:193)
        at
org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:165)
        at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:84)
        at org.apache.pig.Main.run(Main.java:568)
        at org.apache.pig.Main.main(Main.java:114)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:187)
Caused by: java.lang.NullPointerException
        at org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:994)
        at org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:967)
        at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.getTotalInputFileSize(JobControlCompiler.java:798)
        at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.estimateNumberOfReducers(JobControlCompiler.java:773)
        at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.getJob(JobControlCompiler.java:611)
        ... 17 more
================================================================================



version of pig is 0.9.2:
hadoop@ip-10-147-131-60:/mnt/run$ pig -version


Apache Pig version 0.9.2-amzn (rexported)






the weird thing is that if I take out the GROUP BY, it works fine;  if I
take out the glob in the initial LOAD statement, and just load one dir, it
works fine; also if I load both dirs with the glob, then store the  loaded
result after the loadrbl() UDF, then store the result in a intermediate
dir; then load the intermediate result  and continue all the original
computation all the way to GROUP BY, it works fine too.


so why does the GROUP BY  have a problem with the glob above? while they
are far apart and the intermediate steps all worked fine?


thanks
Yang

Re: null pointer error with a simple pig program

Posted by Dmitriy Ryaboy <dv...@gmail.com>.
(to explain why it happens with the group by -- it's trying to estimate the
number of reducers you need by looking at the size of the input data. If
you remove the group by, there is no reduce step, so no estimation needed.
If you store, then load and group, presumably whatever null-returning bug
in S3 this thing tickles isn't being triggered).


On Tue, Mar 12, 2013 at 2:55 PM, Dmitriy Ryaboy <dv...@gmail.com> wrote:

> Sounds like a bug in the S3 implementation of FileSystem? Does this happen
> with pig 0.10 or 0.11?
>
>
>
> On Mon, Mar 11, 2013 at 12:11 AM, Yang <te...@gmail.com> wrote:
>
>> the following code gave null pointer exception
>>
>>
>> ---------------------------------------------------------------------------------------
>>
>> rbl_raw = load 's3://mybucket/rbl-logs/{2013/03/06,2013/03/05}' AS
>> (line:chararray);
>>
>> rbl = FOREACH rbl_raw GENERATE FLATTEN(loadrbl(line)) AS (x:chararray,
>> y:chararray);
>>
>> seo_rbl = FILTER rbl BY x IS NOT NULL AND y == 'seo_google';
>>
>> rbl1 = GROUP seo_rbl BY x;
>>
>> STORE rbl1 INTO '/user/hadoop/blah'
>>
>>
>> -------------------------------------------------------------------------------
>>
>>
>>
>>
>> Pig Stack Trace
>> ---------------
>> ERROR 2017: Internal error creating job configuration.
>>
>>
>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobCreationException:
>> ERROR 2017: Internal error creating job configuration.
>>         at
>>
>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.getJob(JobControlCompiler.java:750)
>>         at
>>
>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.compile(JobControlCompiler.java:267)
>>         at
>>
>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:151)
>>         at org.apache.pig.PigServer.launchPlan(PigServer.java:1313)
>>         at
>> org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1298)
>>         at org.apache.pig.PigServer.execute(PigServer.java:1288)
>>         at org.apache.pig.PigServer.executeBatch(PigServer.java:360)
>>         at
>> org.apache.pig.tools.grunt.GruntParser.executeBatch(GruntParser.java:132)
>>         at
>>
>> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:193)
>>         at
>>
>> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:165)
>>         at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:84)
>>         at org.apache.pig.Main.run(Main.java:568)
>>         at org.apache.pig.Main.main(Main.java:114)
>>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>         at
>>
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>>         at
>>
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>         at java.lang.reflect.Method.invoke(Method.java:597)
>>         at org.apache.hadoop.util.RunJar.main(RunJar.java:187)
>> Caused by: java.lang.NullPointerException
>>         at org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:994)
>>         at org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:967)
>>         at
>>
>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.getTotalInputFileSize(JobControlCompiler.java:798)
>>         at
>>
>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.estimateNumberOfReducers(JobControlCompiler.java:773)
>>         at
>>
>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.getJob(JobControlCompiler.java:611)
>>         ... 17 more
>>
>> ================================================================================
>>
>>
>>
>> version of pig is 0.9.2:
>> hadoop@ip-10-147-131-60:/mnt/run$ pig -version
>>
>>
>> Apache Pig version 0.9.2-amzn (rexported)
>>
>>
>>
>>
>>
>>
>> the weird thing is that if I take out the GROUP BY, it works fine;  if I
>> take out the glob in the initial LOAD statement, and just load one dir, it
>> works fine; also if I load both dirs with the glob, then store the  loaded
>> result after the loadrbl() UDF, then store the result in a intermediate
>> dir; then load the intermediate result  and continue all the original
>> computation all the way to GROUP BY, it works fine too.
>>
>>
>> so why does the GROUP BY  have a problem with the glob above? while they
>> are far apart and the intermediate steps all worked fine?
>>
>>
>> thanks
>> Yang
>>
>
>

Re: null pointer error with a simple pig program

Posted by Dmitriy Ryaboy <dv...@gmail.com>.
Sounds like a bug in the S3 implementation of FileSystem? Does this happen
with pig 0.10 or 0.11?



On Mon, Mar 11, 2013 at 12:11 AM, Yang <te...@gmail.com> wrote:

> the following code gave null pointer exception
>
>
> ---------------------------------------------------------------------------------------
>
> rbl_raw = load 's3://mybucket/rbl-logs/{2013/03/06,2013/03/05}' AS
> (line:chararray);
>
> rbl = FOREACH rbl_raw GENERATE FLATTEN(loadrbl(line)) AS (x:chararray,
> y:chararray);
>
> seo_rbl = FILTER rbl BY x IS NOT NULL AND y == 'seo_google';
>
> rbl1 = GROUP seo_rbl BY x;
>
> STORE rbl1 INTO '/user/hadoop/blah'
>
>
> -------------------------------------------------------------------------------
>
>
>
>
> Pig Stack Trace
> ---------------
> ERROR 2017: Internal error creating job configuration.
>
>
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobCreationException:
> ERROR 2017: Internal error creating job configuration.
>         at
>
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.getJob(JobControlCompiler.java:750)
>         at
>
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.compile(JobControlCompiler.java:267)
>         at
>
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:151)
>         at org.apache.pig.PigServer.launchPlan(PigServer.java:1313)
>         at
> org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1298)
>         at org.apache.pig.PigServer.execute(PigServer.java:1288)
>         at org.apache.pig.PigServer.executeBatch(PigServer.java:360)
>         at
> org.apache.pig.tools.grunt.GruntParser.executeBatch(GruntParser.java:132)
>         at
>
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:193)
>         at
>
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:165)
>         at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:84)
>         at org.apache.pig.Main.run(Main.java:568)
>         at org.apache.pig.Main.main(Main.java:114)
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at
>
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>         at
>
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>         at java.lang.reflect.Method.invoke(Method.java:597)
>         at org.apache.hadoop.util.RunJar.main(RunJar.java:187)
> Caused by: java.lang.NullPointerException
>         at org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:994)
>         at org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:967)
>         at
>
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.getTotalInputFileSize(JobControlCompiler.java:798)
>         at
>
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.estimateNumberOfReducers(JobControlCompiler.java:773)
>         at
>
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.getJob(JobControlCompiler.java:611)
>         ... 17 more
>
> ================================================================================
>
>
>
> version of pig is 0.9.2:
> hadoop@ip-10-147-131-60:/mnt/run$ pig -version
>
>
> Apache Pig version 0.9.2-amzn (rexported)
>
>
>
>
>
>
> the weird thing is that if I take out the GROUP BY, it works fine;  if I
> take out the glob in the initial LOAD statement, and just load one dir, it
> works fine; also if I load both dirs with the glob, then store the  loaded
> result after the loadrbl() UDF, then store the result in a intermediate
> dir; then load the intermediate result  and continue all the original
> computation all the way to GROUP BY, it works fine too.
>
>
> so why does the GROUP BY  have a problem with the glob above? while they
> are far apart and the intermediate steps all worked fine?
>
>
> thanks
> Yang
>