You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by Ruth Garcia <ru...@yahoo-inc.com> on 2010/10/25 17:22:37 UTC

BUG: PIG HDF, HADOOP MAPREDUCE java.io.IOException: ..... does not exist

Hello,
I am having an error that is driving me crazy. Any help will be appreciated.

First, I have configured hadoop and hdfs according to this tutorial (I 
did not created an account hadoop, used mine instead) 
http://www.michael-noll.com/wiki/Running_Hadoop_On_Ubuntu_Linux_%28Single-Node_Cluster%29
and I do not have any problem, I could run the wordcount. In other 
words, I could run the following command without problems: $ bin/hadoop 
jar hadoop-0.20.2-examples.jar wordcount gutenberg gutenberg-output

I have also followed the Pig tutorial here 
http://pig.apache.org/docs/r0.7.0/setup.html#Sample+Code   and 
http://pig.apache.org/docs/r0.7.0/tutorial.html .

For both cases, I can run local pig scripts  without problems.
Nevertheless, with using HADOOP and PIG to run Mapreduce jobs, I have 
the same error...it can not detect the file that is being loaded... I 
have put that file into the hdfs directory (the same used in the 
wordcount directory), I have plases the file to be load everywhere and I 
still have the error that the file to be loaded "does not exist."  For 
some reason, when I am using PIG it seems to me that it tries to detect 
files from a unknown directory (for me). Could someone please help me 
with this issue??
The error that I receive for the example 
http://pig.apache.org/docs/r0.7.0/setup.html#Sample+Code   when using : 
$ java -cp pig.jar:.:$HADOOPDIR idmapreduce is:
/
$ java -cp pig.jar:.:$HADOOPDIR idmapreduce
10/10/25 17:10:01 INFO executionengine.HExecutionEngine: Connecting to 
hadoop file system at: file:///
10/10/25 17:10:01 INFO jvm.JvmMetrics: Initializing JVM Metrics with 
processName=JobTracker, sessionId=
10/10/25 17:10:03 INFO jvm.JvmMetrics: Cannot initialize JVM Metrics 
with processName=JobTracker, sessionId= - already initialized
10/10/25 17:10:03 WARN mapred.JobClient: Use GenericOptionsParser for 
parsing the arguments. Applications should implement Tool for the same.
**10/10/25 17:10:08 INFO mapReduceLayer.MapReduceLauncher: 0% complete
10/10/25 17:10:08 ERROR mapReduceLayer.MapReduceLauncher: Map reduce job 
failed*
10/10/25 17:10:08 *ERROR mapReduceLayer.MapReduceLauncher: 
java.io.IOException: passwd does not exist**
    at 
org.apache.pig.backend.executionengine.PigSlicer.validate(PigSlicer.java:115)
    at 
org.apache.pig.impl.io.ValidatingInputFileSpec.validate(ValidatingInputFileSpec.java:59)
    at 
org.apache.pig.impl.io.ValidatingInputFileSpec.<init>(ValidatingInputFileSpec.java:44)
    at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:200)
    at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:742)
    at org.apache.hadoop.mapred.jobcontrol.Job.submit(Job.java:370)
    at 
org.apache.hadoop.mapred.jobcontrol.JobControl.startReadyJobs(JobControl.java:247)
    at 
org.apache.hadoop.mapred.jobcontrol.JobControl.run(JobControl.java:279)
    at java.lang.Thread.run(Thread.java:619)


/When doing the  same with  
http://pig.apache.org/docs/r0.7.0/tutorial.html   using :
$ java -cp $HOME/pigtmp/pig.jar:$HADOOP_CONF_DIR  org.apache.pig.Main 
$HOME/pigtmp/script1-hadooptest.pig

2010-10-25 17:14:32,651 [main] INFO  
org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - 
Connecting to hadoop file system at: file:///
2010-10-25 17:14:32,815 [main] INFO  
org.apache.hadoop.metrics.jvm.JvmMetrics - Initializing JVM Metrics with 
processName=JobTracker, sessionId=
2010-10-25 17:14:34,312 [main] INFO  
org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics 
with processName=JobTracker, sessionId= - already initialized
2010-10-25 17:14:34,314 [Thread-4] WARN  
org.apache.hadoop.mapred.JobClient - Use GenericOptionsParser for 
parsing the arguments. Applications should implement Tool for the same.
**2010-10-25 17:14:39,312 [main] INFO  
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher 
- 0% complete
2010-10-25 17:14:39,313 [main] ERROR 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher 
- Map reduce job failed
2010-10-25 17:14:39,313 [main] ERROR 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher 
- java.io.IOException: excite.log.bz2 does not exist**
    at 
org.apache.pig.backend.executionengine.PigSlicer.validate(PigSlicer.java:115)
    at 
org.apache.pig.impl.io.ValidatingInputFileSpec.validate(ValidatingInputFileSpec.java:59)
    at 
org.apache.pig.impl.io.ValidatingInputFileSpec.<init>(ValidatingInputFileSpec.java:44)
    at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:200)
    at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:742)
    at org.apache.hadoop.mapred.jobcontrol.Job.submit(Job.java:370)
    at 
org.apache.hadoop.mapred.jobcontrol.JobControl.startReadyJobs(JobControl.java:247)
    at 
org.apache.hadoop.mapred.jobcontrol.JobControl.run(JobControl.java:279)
    at java.lang.Thread.run(Thread.java:619)



Please, can someone help me??

Ruth


Re: BUG: PIG HDF, HADOOP MAPREDUCE java.io.IOException: ..... does not exist

Posted by Renato Marroquín Mogrovejo <re...@gmail.com>.
Have you modified the following parameters ?

PIG_PATH=$HOME/bin/pig-0.XX.X
PIG_CLASSPATH=$PIG_PATH/pig-0.X.0-core.jar:$HOME/bin/hadoop-0.XX.X/conf \
PIG_HADOOP_VERSION=0.XX.X \

You can change those parameters in the bin/pig script.


Renato M.

2010/10/25 Jeff Zhang <zj...@gmail.com>

> You are still connecting to local file system rather than hdfs
> according the following log:
>
>     2010-10-25 17:14:32,651 [main] INFO
> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine -
> Connecting to hadoop file system at: file:///
>
> Have you put the core-site.xml on the classpath ?
>
>
> On Mon, Oct 25, 2010 at 8:22 AM, Ruth Garcia <ru...@yahoo-inc.com>
> wrote:
> > Hello,
> > I am having an error that is driving me crazy. Any help will be
> appreciated.
> >
> > First, I have configured hadoop and hdfs according to this tutorial (I
> did
> > not created an account hadoop, used mine instead)
> >
> http://www.michael-noll.com/wiki/Running_Hadoop_On_Ubuntu_Linux_%28Single-Node_Cluster%29
> > and I do not have any problem, I could run the wordcount. In other words,
> I
> > could run the following command without problems: $ bin/hadoop jar
> > hadoop-0.20.2-examples.jar wordcount gutenberg gutenberg-output
> >
> > I have also followed the Pig tutorial here
> > http://pig.apache.org/docs/r0.7.0/setup.html#Sample+Code   and
> > http://pig.apache.org/docs/r0.7.0/tutorial.html .
> >
> > For both cases, I can run local pig scripts  without problems.
> > Nevertheless, with using HADOOP and PIG to run Mapreduce jobs, I have the
> > same error...it can not detect the file that is being loaded... I have
> put
> > that file into the hdfs directory (the same used in the wordcount
> > directory), I have plases the file to be load everywhere and I still have
> > the error that the file to be loaded "does not exist."  For some reason,
> > when I am using PIG it seems to me that it tries to detect files from a
> > unknown directory (for me). Could someone please help me with this
> issue??
> > The error that I receive for the example
> > http://pig.apache.org/docs/r0.7.0/setup.html#Sample+Code   when using :
> $
> > java -cp pig.jar:.:$HADOOPDIR idmapreduce is:
> > /
> > $ java -cp pig.jar:.:$HADOOPDIR idmapreduce
> > 10/10/25 17:10:01 INFO executionengine.HExecutionEngine: Connecting to
> > hadoop file system at: file:///
> > 10/10/25 17:10:01 INFO jvm.JvmMetrics: Initializing JVM Metrics with
> > processName=JobTracker, sessionId=
> > 10/10/25 17:10:03 INFO jvm.JvmMetrics: Cannot initialize JVM Metrics with
> > processName=JobTracker, sessionId= - already initialized
> > 10/10/25 17:10:03 WARN mapred.JobClient: Use GenericOptionsParser for
> > parsing the arguments. Applications should implement Tool for the same.
> > **10/10/25 17:10:08 INFO mapReduceLayer.MapReduceLauncher: 0% complete
> > 10/10/25 17:10:08 ERROR mapReduceLayer.MapReduceLauncher: Map reduce job
> > failed*
> > 10/10/25 17:10:08 *ERROR mapReduceLayer.MapReduceLauncher:
> > java.io.IOException: passwd does not exist**
> >   at
> >
> org.apache.pig.backend.executionengine.PigSlicer.validate(PigSlicer.java:115)
> >   at
> >
> org.apache.pig.impl.io.ValidatingInputFileSpec.validate(ValidatingInputFileSpec.java:59)
> >   at
> >
> org.apache.pig.impl.io.ValidatingInputFileSpec.<init>(ValidatingInputFileSpec.java:44)
> >   at
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:200)
> >   at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:742)
> >   at org.apache.hadoop.mapred.jobcontrol.Job.submit(Job.java:370)
> >   at
> >
> org.apache.hadoop.mapred.jobcontrol.JobControl.startReadyJobs(JobControl.java:247)
> >   at
> org.apache.hadoop.mapred.jobcontrol.JobControl.run(JobControl.java:279)
> >   at java.lang.Thread.run(Thread.java:619)
> >
> >
> > /When doing the  same with
> http://pig.apache.org/docs/r0.7.0/tutorial.html
> >   using :
> > $ java -cp $HOME/pigtmp/pig.jar:$HADOOP_CONF_DIR  org.apache.pig.Main
> > $HOME/pigtmp/script1-hadooptest.pig
> >
> > 2010-10-25 17:14:32,651 [main] INFO
> >  org.apache.pig.backend.hadoop.executionengine.HExecutionEngine -
> Connecting
> > to hadoop file system at: file:///
> > 2010-10-25 17:14:32,815 [main] INFO
> >  org.apache.hadoop.metrics.jvm.JvmMetrics - Initializing JVM Metrics with
> > processName=JobTracker, sessionId=
> > 2010-10-25 17:14:34,312 [main] INFO
> >  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics
> > with processName=JobTracker, sessionId= - already initialized
> > 2010-10-25 17:14:34,314 [Thread-4] WARN
>  org.apache.hadoop.mapred.JobClient
> > - Use GenericOptionsParser for parsing the arguments. Applications should
> > implement Tool for the same.
> > **2010-10-25 17:14:39,312 [main] INFO
> >
>  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> > - 0% complete
> > 2010-10-25 17:14:39,313 [main] ERROR
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> > - Map reduce job failed
> > 2010-10-25 17:14:39,313 [main] ERROR
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> > - java.io.IOException: excite.log.bz2 does not exist**
> >   at
> >
> org.apache.pig.backend.executionengine.PigSlicer.validate(PigSlicer.java:115)
> >   at
> >
> org.apache.pig.impl.io.ValidatingInputFileSpec.validate(ValidatingInputFileSpec.java:59)
> >   at
> >
> org.apache.pig.impl.io.ValidatingInputFileSpec.<init>(ValidatingInputFileSpec.java:44)
> >   at
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:200)
> >   at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:742)
> >   at org.apache.hadoop.mapred.jobcontrol.Job.submit(Job.java:370)
> >   at
> >
> org.apache.hadoop.mapred.jobcontrol.JobControl.startReadyJobs(JobControl.java:247)
> >   at
> org.apache.hadoop.mapred.jobcontrol.JobControl.run(JobControl.java:279)
> >   at java.lang.Thread.run(Thread.java:619)
> >
> >
> >
> > Please, can someone help me??
> >
> > Ruth
> >
> >
>
>
>
> --
> Best Regards
>
> Jeff Zhang
>

Re: BUG: PIG HDF, HADOOP MAPREDUCE java.io.IOException: ..... does not exist

Posted by Jeff Zhang <zj...@gmail.com>.
You are still connecting to local file system rather than hdfs
according the following log:

     2010-10-25 17:14:32,651 [main] INFO
org.apache.pig.backend.hadoop.executionengine.HExecutionEngine -
Connecting to hadoop file system at: file:///

Have you put the core-site.xml on the classpath ?


On Mon, Oct 25, 2010 at 8:22 AM, Ruth Garcia <ru...@yahoo-inc.com> wrote:
> Hello,
> I am having an error that is driving me crazy. Any help will be appreciated.
>
> First, I have configured hadoop and hdfs according to this tutorial (I did
> not created an account hadoop, used mine instead)
> http://www.michael-noll.com/wiki/Running_Hadoop_On_Ubuntu_Linux_%28Single-Node_Cluster%29
> and I do not have any problem, I could run the wordcount. In other words, I
> could run the following command without problems: $ bin/hadoop jar
> hadoop-0.20.2-examples.jar wordcount gutenberg gutenberg-output
>
> I have also followed the Pig tutorial here
> http://pig.apache.org/docs/r0.7.0/setup.html#Sample+Code   and
> http://pig.apache.org/docs/r0.7.0/tutorial.html .
>
> For both cases, I can run local pig scripts  without problems.
> Nevertheless, with using HADOOP and PIG to run Mapreduce jobs, I have the
> same error...it can not detect the file that is being loaded... I have put
> that file into the hdfs directory (the same used in the wordcount
> directory), I have plases the file to be load everywhere and I still have
> the error that the file to be loaded "does not exist."  For some reason,
> when I am using PIG it seems to me that it tries to detect files from a
> unknown directory (for me). Could someone please help me with this issue??
> The error that I receive for the example
> http://pig.apache.org/docs/r0.7.0/setup.html#Sample+Code   when using : $
> java -cp pig.jar:.:$HADOOPDIR idmapreduce is:
> /
> $ java -cp pig.jar:.:$HADOOPDIR idmapreduce
> 10/10/25 17:10:01 INFO executionengine.HExecutionEngine: Connecting to
> hadoop file system at: file:///
> 10/10/25 17:10:01 INFO jvm.JvmMetrics: Initializing JVM Metrics with
> processName=JobTracker, sessionId=
> 10/10/25 17:10:03 INFO jvm.JvmMetrics: Cannot initialize JVM Metrics with
> processName=JobTracker, sessionId= - already initialized
> 10/10/25 17:10:03 WARN mapred.JobClient: Use GenericOptionsParser for
> parsing the arguments. Applications should implement Tool for the same.
> **10/10/25 17:10:08 INFO mapReduceLayer.MapReduceLauncher: 0% complete
> 10/10/25 17:10:08 ERROR mapReduceLayer.MapReduceLauncher: Map reduce job
> failed*
> 10/10/25 17:10:08 *ERROR mapReduceLayer.MapReduceLauncher:
> java.io.IOException: passwd does not exist**
>   at
> org.apache.pig.backend.executionengine.PigSlicer.validate(PigSlicer.java:115)
>   at
> org.apache.pig.impl.io.ValidatingInputFileSpec.validate(ValidatingInputFileSpec.java:59)
>   at
> org.apache.pig.impl.io.ValidatingInputFileSpec.<init>(ValidatingInputFileSpec.java:44)
>   at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:200)
>   at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:742)
>   at org.apache.hadoop.mapred.jobcontrol.Job.submit(Job.java:370)
>   at
> org.apache.hadoop.mapred.jobcontrol.JobControl.startReadyJobs(JobControl.java:247)
>   at org.apache.hadoop.mapred.jobcontrol.JobControl.run(JobControl.java:279)
>   at java.lang.Thread.run(Thread.java:619)
>
>
> /When doing the  same with  http://pig.apache.org/docs/r0.7.0/tutorial.html
>   using :
> $ java -cp $HOME/pigtmp/pig.jar:$HADOOP_CONF_DIR  org.apache.pig.Main
> $HOME/pigtmp/script1-hadooptest.pig
>
> 2010-10-25 17:14:32,651 [main] INFO
>  org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting
> to hadoop file system at: file:///
> 2010-10-25 17:14:32,815 [main] INFO
>  org.apache.hadoop.metrics.jvm.JvmMetrics - Initializing JVM Metrics with
> processName=JobTracker, sessionId=
> 2010-10-25 17:14:34,312 [main] INFO
>  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics
> with processName=JobTracker, sessionId= - already initialized
> 2010-10-25 17:14:34,314 [Thread-4] WARN  org.apache.hadoop.mapred.JobClient
> - Use GenericOptionsParser for parsing the arguments. Applications should
> implement Tool for the same.
> **2010-10-25 17:14:39,312 [main] INFO
>  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> - 0% complete
> 2010-10-25 17:14:39,313 [main] ERROR
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> - Map reduce job failed
> 2010-10-25 17:14:39,313 [main] ERROR
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> - java.io.IOException: excite.log.bz2 does not exist**
>   at
> org.apache.pig.backend.executionengine.PigSlicer.validate(PigSlicer.java:115)
>   at
> org.apache.pig.impl.io.ValidatingInputFileSpec.validate(ValidatingInputFileSpec.java:59)
>   at
> org.apache.pig.impl.io.ValidatingInputFileSpec.<init>(ValidatingInputFileSpec.java:44)
>   at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:200)
>   at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:742)
>   at org.apache.hadoop.mapred.jobcontrol.Job.submit(Job.java:370)
>   at
> org.apache.hadoop.mapred.jobcontrol.JobControl.startReadyJobs(JobControl.java:247)
>   at org.apache.hadoop.mapred.jobcontrol.JobControl.run(JobControl.java:279)
>   at java.lang.Thread.run(Thread.java:619)
>
>
>
> Please, can someone help me??
>
> Ruth
>
>



-- 
Best Regards

Jeff Zhang