You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@pig.apache.org by Geoffrey Gallaway <ge...@geoffeg.org> on 2011/01/10 05:47:21 UTC

Pig error: Unable to create input splits

Hello, I'm looking for some clues to help me fix an annoying error I'm
getting using Pig.

I need to parse a large JSON file so I grabbed kimsterv's (
https://gist.github.com/601331) JSON loader, compiled it and successfully
tested it on my laptop via -x local. However, when I try to run it on the
edgenode of our dev hadoop instance I am unable to get it to work, even if I
run it in -x local. I get
"org.apache.pig.backend.executionengine.ExecException: ERROR 2118: Unable to
create input splits for test.json". I looked through the mailing list for
this message, only to find a mention of it being related to LZO compression
issues. I'm not using any file compression and this error still occurs when
running in -x local on the edgenode of the dev cluster. Is there some
environment variables I'm missing? Maybe some permissions issues I'm unaware
of? Suggestions and theories welcome!

Hadoop version: Hadoop 0.20.2+737
Pig version: 0.7.0+16 (compiled against the pig 0.7.0 jar)

Command line:
  java -cp '/usr/lib/pig/*:/usr/lib/hadoop/*:/usr/lib/hadoop/lib/*:libs/*:.'
org.apache.pig.Main -v -x local json.pig

Pig script:
  REGISTER /home/geoffeg/pig-functions/jsontester.jar;
  -- file:// should specify the local FS, remove file:// to specify HDFS
  A = LOAD 'file://home/geoffeg/test.json' using
org.geoffeg.hadoop.pig.loader.PigJsonLoader() as ( json: map[] );
  B = foreach A generate json#'_keyword';
  DUMP B;

Full error/log:
2011-01-09 22:33:29,692 [main] INFO
 org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting
to hadoop file system at: file:///
2011-01-09 22:33:30,345 [main] INFO
 org.apache.pig.impl.logicalLayer.optimizer.PruneColumns - No column pruned
for A
2011-01-09 22:33:30,345 [main] INFO
 org.apache.pig.impl.logicalLayer.optimizer.PruneColumns - Map key required
for A: $0->[_keyword]
2011-01-09 22:33:30,455 [main] INFO
 org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - (Name:
Store(file:/tmp/temp1814319995/tmp1141533149:org.apache.pig.builtin.BinStorage)
- 1-36 Operator Key: 1-36)
2011-01-09 22:33:30,482 [main] INFO
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
- MR plan size before optimization: 1
2011-01-09 22:33:30,482 [main] INFO
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
- MR plan size after optimization: 1
2011-01-09 22:33:30,517 [main] INFO
 org.apache.hadoop.metrics.jvm.JvmMetrics - Initializing JVM Metrics with
processName=JobTracker, sessionId=
2011-01-09 22:33:30,522 [main] INFO
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
- mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
2011-01-09 22:33:32,520 [main] INFO
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
- Setting up single store job
2011-01-09 22:33:32,552 [main] INFO
 org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics
with processName=JobTracker, sessionId= - already initialized
2011-01-09 22:33:32,552 [main] INFO
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- 1 map-reduce job(s) waiting for submission.
2011-01-09 22:33:32,562 [Thread-2] WARN  org.apache.hadoop.mapred.JobClient
- Use GenericOptionsParser for parsing the arguments. Applications should
implement Tool for the same.
2011-01-09 22:33:32,692 [Thread-2] INFO  org.apache.hadoop.mapred.JobClient
- Cleaning up the staging area
file:/tmp/hadoop-geoffeg/mapred/staging/geoffeg395595954/.staging/job_local_0001
2011-01-09 22:33:33,054 [main] INFO
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- 0% complete
2011-01-09 22:33:33,054 [main] INFO
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- 100% complete
2011-01-09 22:33:33,054 [main] ERROR
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- 1 map reduce job(s) failed!
2011-01-09 22:33:33,064 [main] ERROR
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- Failed to produce result in: "file:/tmp/temp1814319995/tmp1141533149"
2011-01-09 22:33:33,064 [main] INFO
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- Records written : Unable to determine number of records written
2011-01-09 22:33:33,065 [main] INFO
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- Bytes written : Unable to determine number of bytes written
2011-01-09 22:33:33,065 [main] INFO
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- Spillable Memory Manager spill count : 0
2011-01-09 22:33:33,065 [main] INFO
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- Proactive spill count : 0
2011-01-09 22:33:33,065 [main] INFO
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- Failed!
2011-01-09 22:33:33,133 [main] ERROR org.apache.pig.tools.grunt.Grunt -
ERROR 2997: Unable to recreate exception from backend error:
org.apache.pig.backend.executionengine.ExecException: ERROR 2118: Unable to
create input splits for: file://home/geoffeg/test.json
2011-01-09 22:33:33,134 [main] ERROR org.apache.pig.tools.grunt.Grunt -
org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to
open iterator for alias B
at org.apache.pig.PigServer.openIterator(PigServer.java:607)
 at org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:545)
at
org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:241)
 at
org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:163)
at
org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:139)
 at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:89)
at org.apache.pig.Main.main(Main.java:414)
Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 2997:
Unable to recreate exception from backend error:
org.apache.pig.backend.executionengine.ExecException: ERROR 2118: Unable to
create input splits for: file://home/geoffeg/test.json
 at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher.getStats(Launcher.java:169)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:270)
 at
org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.execute(HExecutionEngine.java:308)
at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1007)
 at org.apache.pig.PigServer.store(PigServer.java:697)
at org.apache.pig.PigServer.openIterator(PigServer.java:590)
 ... 6 more

-- 
Sent from my email client.

Re: Pig error: Unable to create input splits

Posted by Geoffrey Gallaway <ge...@geoffeg.org>.

Thanks to Joe and Daniel, I was able to fix this issue.

It was a combination of ambiguity about file paths (which Joe's message
helped me confirm) and an error in my Java that wasn't causing an exception
and failing silently.

Thanks,
Geoff

On Wed, Jan 12, 2011 at 7:43 AM, Joe Crobak <jo...@gmail.com> wrote:

> A = LOAD 'file://home/geoffeg/test.json' will try to load using a relative
> path.  Pig will understand file:/home/geoffeg/test.json or
> file:///home/geoffeg/test.json to load the absolute path.  Same goes for a
> file in hdfs://
>
> HTH,
> Joe
>
> On Sun, Jan 9, 2011 at 11:47 PM, Geoffrey Gallaway <geoffeg@geoffeg.org
> >wrote:
>
> > Hello, I'm looking for some clues to help me fix an annoying error I'm
> > getting using Pig.
> >
> > I need to parse a large JSON file so I grabbed kimsterv's (
> > https://gist.github.com/601331) JSON loader, compiled it and
> successfully
> > tested it on my laptop via -x local. However, when I try to run it on the
> > edgenode of our dev hadoop instance I am unable to get it to work, even
> if
> > I
> > run it in -x local. I get
> > "org.apache.pig.backend.executionengine.ExecException: ERROR 2118: Unable
> > to
> > create input splits for test.json". I looked through the mailing list for
> > this message, only to find a mention of it being related to LZO
> compression
> > issues. I'm not using any file compression and this error still occurs
> when
> > running in -x local on the edgenode of the dev cluster. Is there some
> > environment variables I'm missing? Maybe some permissions issues I'm
> > unaware
> > of? Suggestions and theories welcome!
> >
> > Hadoop version: Hadoop 0.20.2+737
> > Pig version: 0.7.0+16 (compiled against the pig 0.7.0 jar)
> >
> > Command line:
> >  java -cp
> '/usr/lib/pig/*:/usr/lib/hadoop/*:/usr/lib/hadoop/lib/*:libs/*:.'
> > org.apache.pig.Main -v -x local json.pig
> >
> > Pig script:
> >  REGISTER /home/geoffeg/pig-functions/jsontester.jar;
> >  -- file:// should specify the local FS, remove file:// to specify HDFS
> >  A = LOAD 'file://home/geoffeg/test.json' using
> > org.geoffeg.hadoop.pig.loader.PigJsonLoader() as ( json: map[] );
> >  B = foreach A generate json#'_keyword';
> >  DUMP B;
> >
> > Full error/log:
> > 2011-01-09 22:33:29,692 [main] INFO
> >  org.apache.pig.backend.hadoop.executionengine.HExecutionEngine -
> > Connecting
> > to hadoop file system at: file:///
> > 2011-01-09 22:33:30,345 [main] INFO
> >  org.apache.pig.impl.logicalLayer.optimizer.PruneColumns - No column
> pruned
> > for A
> > 2011-01-09 22:33:30,345 [main] INFO
> >  org.apache.pig.impl.logicalLayer.optimizer.PruneColumns - Map key
> required
> > for A: $0->[_keyword]
> > 2011-01-09 22:33:30,455 [main] INFO
> >  org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - (Name:
> >
> >
> Store(file:/tmp/temp1814319995/tmp1141533149:org.apache.pig.builtin.BinStorage)
> > - 1-36 Operator Key: 1-36)
> > 2011-01-09 22:33:30,482 [main] INFO
> >
> >
>  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
> > - MR plan size before optimization: 1
> > 2011-01-09 22:33:30,482 [main] INFO
> >
> >
>  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
> > - MR plan size after optimization: 1
> > 2011-01-09 22:33:30,517 [main] INFO
> >  org.apache.hadoop.metrics.jvm.JvmMetrics - Initializing JVM Metrics with
> > processName=JobTracker, sessionId=
> > 2011-01-09 22:33:30,522 [main] INFO
> >
> >
>  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
> > - mapred.job.reduce.markreset.buffer.percent is not set, set to default
> 0.3
> > 2011-01-09 22:33:32,520 [main] INFO
> >
> >
>  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
> > - Setting up single store job
> > 2011-01-09 22:33:32,552 [main] INFO
> >  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics
> > with processName=JobTracker, sessionId= - already initialized
> > 2011-01-09 22:33:32,552 [main] INFO
> >
> >
>  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> > - 1 map-reduce job(s) waiting for submission.
> > 2011-01-09 22:33:32,562 [Thread-2] WARN
>  org.apache.hadoop.mapred.JobClient
> > - Use GenericOptionsParser for parsing the arguments. Applications should
> > implement Tool for the same.
> > 2011-01-09 22:33:32,692 [Thread-2] INFO
>  org.apache.hadoop.mapred.JobClient
> > - Cleaning up the staging area
> >
> >
> file:/tmp/hadoop-geoffeg/mapred/staging/geoffeg395595954/.staging/job_local_0001
> > 2011-01-09 22:33:33,054 [main] INFO
> >
> >
>  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> > - 0% complete
> > 2011-01-09 22:33:33,054 [main] INFO
> >
> >
>  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> > - 100% complete
> > 2011-01-09 22:33:33,054 [main] ERROR
> >
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> > - 1 map reduce job(s) failed!
> > 2011-01-09 22:33:33,064 [main] ERROR
> >
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> > - Failed to produce result in: "file:/tmp/temp1814319995/tmp1141533149"
> > 2011-01-09 22:33:33,064 [main] INFO
> >
> >
>  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> > - Records written : Unable to determine number of records written
> > 2011-01-09 22:33:33,065 [main] INFO
> >
> >
>  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> > - Bytes written : Unable to determine number of bytes written
> > 2011-01-09 22:33:33,065 [main] INFO
> >
> >
>  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> > - Spillable Memory Manager spill count : 0
> > 2011-01-09 22:33:33,065 [main] INFO
> >
> >
>  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> > - Proactive spill count : 0
> > 2011-01-09 22:33:33,065 [main] INFO
> >
> >
>  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> > - Failed!
> > 2011-01-09 22:33:33,133 [main] ERROR org.apache.pig.tools.grunt.Grunt -
> > ERROR 2997: Unable to recreate exception from backend error:
> > org.apache.pig.backend.executionengine.ExecException: ERROR 2118: Unable
> to
> > create input splits for: file://home/geoffeg/test.json
> > 2011-01-09 22:33:33,134 [main] ERROR org.apache.pig.tools.grunt.Grunt -
> > org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to
> > open iterator for alias B
> > at org.apache.pig.PigServer.openIterator(PigServer.java:607)
> >  at
> > org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:545)
> > at
> >
> >
> org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:241)
> >  at
> >
> >
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:163)
> > at
> >
> >
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:139)
> >  at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:89)
> > at org.apache.pig.Main.main(Main.java:414)
> > Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR
> > 2997:
> > Unable to recreate exception from backend error:
> > org.apache.pig.backend.executionengine.ExecException: ERROR 2118: Unable
> to
> > create input splits for: file://home/geoffeg/test.json
> >  at
> >
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher.getStats(Launcher.java:169)
> > at
> >
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:270)
> >  at
> >
> >
> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.execute(HExecutionEngine.java:308)
> > at
> org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1007)
> >  at org.apache.pig.PigServer.store(PigServer.java:697)
> > at org.apache.pig.PigServer.openIterator(PigServer.java:590)
> >  ... 6 more
> >
> > --
> > Sent from my email client.
> >
>



-- 
Sent from my email client.

Re: Pig error: Unable to create input splits

Posted by Joe Crobak <jo...@gmail.com>.

A = LOAD 'file://home/geoffeg/test.json' will try to load using a relative
path.  Pig will understand file:/home/geoffeg/test.json or
file:///home/geoffeg/test.json to load the absolute path.  Same goes for a
file in hdfs://

HTH,
Joe

On Sun, Jan 9, 2011 at 11:47 PM, Geoffrey Gallaway <ge...@geoffeg.org>wrote:

> Hello, I'm looking for some clues to help me fix an annoying error I'm
> getting using Pig.
>
> I need to parse a large JSON file so I grabbed kimsterv's (
> https://gist.github.com/601331) JSON loader, compiled it and successfully
> tested it on my laptop via -x local. However, when I try to run it on the
> edgenode of our dev hadoop instance I am unable to get it to work, even if
> I
> run it in -x local. I get
> "org.apache.pig.backend.executionengine.ExecException: ERROR 2118: Unable
> to
> create input splits for test.json". I looked through the mailing list for
> this message, only to find a mention of it being related to LZO compression
> issues. I'm not using any file compression and this error still occurs when
> running in -x local on the edgenode of the dev cluster. Is there some
> environment variables I'm missing? Maybe some permissions issues I'm
> unaware
> of? Suggestions and theories welcome!
>
> Hadoop version: Hadoop 0.20.2+737
> Pig version: 0.7.0+16 (compiled against the pig 0.7.0 jar)
>
> Command line:
>  java -cp '/usr/lib/pig/*:/usr/lib/hadoop/*:/usr/lib/hadoop/lib/*:libs/*:.'
> org.apache.pig.Main -v -x local json.pig
>
> Pig script:
>  REGISTER /home/geoffeg/pig-functions/jsontester.jar;
>  -- file:// should specify the local FS, remove file:// to specify HDFS
>  A = LOAD 'file://home/geoffeg/test.json' using
> org.geoffeg.hadoop.pig.loader.PigJsonLoader() as ( json: map[] );
>  B = foreach A generate json#'_keyword';
>  DUMP B;
>
> Full error/log:
> 2011-01-09 22:33:29,692 [main] INFO
>  org.apache.pig.backend.hadoop.executionengine.HExecutionEngine -
> Connecting
> to hadoop file system at: file:///
> 2011-01-09 22:33:30,345 [main] INFO
>  org.apache.pig.impl.logicalLayer.optimizer.PruneColumns - No column pruned
> for A
> 2011-01-09 22:33:30,345 [main] INFO
>  org.apache.pig.impl.logicalLayer.optimizer.PruneColumns - Map key required
> for A: $0->[_keyword]
> 2011-01-09 22:33:30,455 [main] INFO
>  org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - (Name:
>
> Store(file:/tmp/temp1814319995/tmp1141533149:org.apache.pig.builtin.BinStorage)
> - 1-36 Operator Key: 1-36)
> 2011-01-09 22:33:30,482 [main] INFO
>
>  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
> - MR plan size before optimization: 1
> 2011-01-09 22:33:30,482 [main] INFO
>
>  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
> - MR plan size after optimization: 1
> 2011-01-09 22:33:30,517 [main] INFO
>  org.apache.hadoop.metrics.jvm.JvmMetrics - Initializing JVM Metrics with
> processName=JobTracker, sessionId=
> 2011-01-09 22:33:30,522 [main] INFO
>
>  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
> - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
> 2011-01-09 22:33:32,520 [main] INFO
>
>  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
> - Setting up single store job
> 2011-01-09 22:33:32,552 [main] INFO
>  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics
> with processName=JobTracker, sessionId= - already initialized
> 2011-01-09 22:33:32,552 [main] INFO
>
>  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> - 1 map-reduce job(s) waiting for submission.
> 2011-01-09 22:33:32,562 [Thread-2] WARN  org.apache.hadoop.mapred.JobClient
> - Use GenericOptionsParser for parsing the arguments. Applications should
> implement Tool for the same.
> 2011-01-09 22:33:32,692 [Thread-2] INFO  org.apache.hadoop.mapred.JobClient
> - Cleaning up the staging area
>
> file:/tmp/hadoop-geoffeg/mapred/staging/geoffeg395595954/.staging/job_local_0001
> 2011-01-09 22:33:33,054 [main] INFO
>
>  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> - 0% complete
> 2011-01-09 22:33:33,054 [main] INFO
>
>  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> - 100% complete
> 2011-01-09 22:33:33,054 [main] ERROR
>
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> - 1 map reduce job(s) failed!
> 2011-01-09 22:33:33,064 [main] ERROR
>
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> - Failed to produce result in: "file:/tmp/temp1814319995/tmp1141533149"
> 2011-01-09 22:33:33,064 [main] INFO
>
>  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> - Records written : Unable to determine number of records written
> 2011-01-09 22:33:33,065 [main] INFO
>
>  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> - Bytes written : Unable to determine number of bytes written
> 2011-01-09 22:33:33,065 [main] INFO
>
>  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> - Spillable Memory Manager spill count : 0
> 2011-01-09 22:33:33,065 [main] INFO
>
>  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> - Proactive spill count : 0
> 2011-01-09 22:33:33,065 [main] INFO
>
>  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> - Failed!
> 2011-01-09 22:33:33,133 [main] ERROR org.apache.pig.tools.grunt.Grunt -
> ERROR 2997: Unable to recreate exception from backend error:
> org.apache.pig.backend.executionengine.ExecException: ERROR 2118: Unable to
> create input splits for: file://home/geoffeg/test.json
> 2011-01-09 22:33:33,134 [main] ERROR org.apache.pig.tools.grunt.Grunt -
> org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to
> open iterator for alias B
> at org.apache.pig.PigServer.openIterator(PigServer.java:607)
>  at
> org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:545)
> at
>
> org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:241)
>  at
>
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:163)
> at
>
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:139)
>  at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:89)
> at org.apache.pig.Main.main(Main.java:414)
> Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR
> 2997:
> Unable to recreate exception from backend error:
> org.apache.pig.backend.executionengine.ExecException: ERROR 2118: Unable to
> create input splits for: file://home/geoffeg/test.json
>  at
>
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher.getStats(Launcher.java:169)
> at
>
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:270)
>  at
>
> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.execute(HExecutionEngine.java:308)
> at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1007)
>  at org.apache.pig.PigServer.store(PigServer.java:697)
> at org.apache.pig.PigServer.openIterator(PigServer.java:590)
>  ... 6 more
>
> --
> Sent from my email client.
>

Re: Pig error: Unable to create input splits

Posted by Daniel Dai <ji...@yahoo-inc.com>.

I tried JSON loader you mentioned on 0.7, seems works fine for me. I 
didn't get the error message you mention. Are you still seeing those errors?

Daniel

Geoffrey Gallaway wrote:
> Hello, I'm looking for some clues to help me fix an annoying error I'm
> getting using Pig.
>
> I need to parse a large JSON file so I grabbed kimsterv's (
> https://gist.github.com/601331) JSON loader, compiled it and successfully
> tested it on my laptop via -x local. However, when I try to run it on the
> edgenode of our dev hadoop instance I am unable to get it to work, even if I
> run it in -x local. I get
> "org.apache.pig.backend.executionengine.ExecException: ERROR 2118: Unable to
> create input splits for test.json". I looked through the mailing list for
> this message, only to find a mention of it being related to LZO compression
> issues. I'm not using any file compression and this error still occurs when
> running in -x local on the edgenode of the dev cluster. Is there some
> environment variables I'm missing? Maybe some permissions issues I'm unaware
> of? Suggestions and theories welcome!
>
> Hadoop version: Hadoop 0.20.2+737
> Pig version: 0.7.0+16 (compiled against the pig 0.7.0 jar)
>
> Command line:
>   java -cp '/usr/lib/pig/*:/usr/lib/hadoop/*:/usr/lib/hadoop/lib/*:libs/*:.'
> org.apache.pig.Main -v -x local json.pig
>
> Pig script:
>   REGISTER /home/geoffeg/pig-functions/jsontester.jar;
>   -- file:// should specify the local FS, remove file:// to specify HDFS
>   A = LOAD 'file://home/geoffeg/test.json' using
> org.geoffeg.hadoop.pig.loader.PigJsonLoader() as ( json: map[] );
>   B = foreach A generate json#'_keyword';
>   DUMP B;
>
> Full error/log:
> 2011-01-09 22:33:29,692 [main] INFO
>  org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting
> to hadoop file system at: file:///
> 2011-01-09 22:33:30,345 [main] INFO
>  org.apache.pig.impl.logicalLayer.optimizer.PruneColumns - No column pruned
> for A
> 2011-01-09 22:33:30,345 [main] INFO
>  org.apache.pig.impl.logicalLayer.optimizer.PruneColumns - Map key required
> for A: $0->[_keyword]
> 2011-01-09 22:33:30,455 [main] INFO
>  org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - (Name:
> Store(file:/tmp/temp1814319995/tmp1141533149:org.apache.pig.builtin.BinStorage)
> - 1-36 Operator Key: 1-36)
> 2011-01-09 22:33:30,482 [main] INFO
>  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
> - MR plan size before optimization: 1
> 2011-01-09 22:33:30,482 [main] INFO
>  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
> - MR plan size after optimization: 1
> 2011-01-09 22:33:30,517 [main] INFO
>  org.apache.hadoop.metrics.jvm.JvmMetrics - Initializing JVM Metrics with
> processName=JobTracker, sessionId=
> 2011-01-09 22:33:30,522 [main] INFO
>  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
> - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
> 2011-01-09 22:33:32,520 [main] INFO
>  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
> - Setting up single store job
> 2011-01-09 22:33:32,552 [main] INFO
>  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics
> with processName=JobTracker, sessionId= - already initialized
> 2011-01-09 22:33:32,552 [main] INFO
>  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> - 1 map-reduce job(s) waiting for submission.
> 2011-01-09 22:33:32,562 [Thread-2] WARN  org.apache.hadoop.mapred.JobClient
> - Use GenericOptionsParser for parsing the arguments. Applications should
> implement Tool for the same.
> 2011-01-09 22:33:32,692 [Thread-2] INFO  org.apache.hadoop.mapred.JobClient
> - Cleaning up the staging area
> file:/tmp/hadoop-geoffeg/mapred/staging/geoffeg395595954/.staging/job_local_0001
> 2011-01-09 22:33:33,054 [main] INFO
>  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> - 0% complete
> 2011-01-09 22:33:33,054 [main] INFO
>  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> - 100% complete
> 2011-01-09 22:33:33,054 [main] ERROR
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> - 1 map reduce job(s) failed!
> 2011-01-09 22:33:33,064 [main] ERROR
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> - Failed to produce result in: "file:/tmp/temp1814319995/tmp1141533149"
> 2011-01-09 22:33:33,064 [main] INFO
>  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> - Records written : Unable to determine number of records written
> 2011-01-09 22:33:33,065 [main] INFO
>  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> - Bytes written : Unable to determine number of bytes written
> 2011-01-09 22:33:33,065 [main] INFO
>  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> - Spillable Memory Manager spill count : 0
> 2011-01-09 22:33:33,065 [main] INFO
>  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> - Proactive spill count : 0
> 2011-01-09 22:33:33,065 [main] INFO
>  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> - Failed!
> 2011-01-09 22:33:33,133 [main] ERROR org.apache.pig.tools.grunt.Grunt -
> ERROR 2997: Unable to recreate exception from backend error:
> org.apache.pig.backend.executionengine.ExecException: ERROR 2118: Unable to
> create input splits for: file://home/geoffeg/test.json
> 2011-01-09 22:33:33,134 [main] ERROR org.apache.pig.tools.grunt.Grunt -
> org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to
> open iterator for alias B
> at org.apache.pig.PigServer.openIterator(PigServer.java:607)
>  at org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:545)
> at
> org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:241)
>  at
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:163)
> at
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:139)
>  at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:89)
> at org.apache.pig.Main.main(Main.java:414)
> Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 2997:
> Unable to recreate exception from backend error:
> org.apache.pig.backend.executionengine.ExecException: ERROR 2118: Unable to
> create input splits for: file://home/geoffeg/test.json
>  at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher.getStats(Launcher.java:169)
> at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:270)
>  at
> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.execute(HExecutionEngine.java:308)
> at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1007)
>  at org.apache.pig.PigServer.store(PigServer.java:697)
> at org.apache.pig.PigServer.openIterator(PigServer.java:590)
>  ... 6 more
>
>