You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-user@hadoop.apache.org by Bejoy KS <be...@gmail.com> on 2011/09/12 11:18:31 UTC

Hadoop Streaming job Fails - Permission Denied error

Hi
      I wanted to try out hadoop steaming and got the sample python code for
mapper and reducer. I copied both into my lfs and tried running the steaming
job as mention in the documentation.
Here the command i used to run the job

hadoop  jar
/usr/lib/hadoop-0.20/contrib/streaming/hadoop-streaming-0.20.2-cdh3u0.jar
-input /userdata/bejoy/apps/wc/input -output /userdata/bejoy/apps/wc/output
-mapper /home/cloudera/bejoy/apps/inputs/wc/WcStreamMap.py  -reducer
/home/cloudera/bejoy/apps/inputs/wc/WcStreamReduce.py

Here other than input and output the rest all are on lfs locations. How ever
the job is failing. The error log from the jobtracker url is as

java.lang.RuntimeException: Error in configuring object
    at
org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93)
    at
org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64)
    at
org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:386)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:324)
    at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:396)
    at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1115)
    at org.apache.hadoop.mapred.Child.main(Child.java:262)
Caused by: java.lang.reflect.InvocationTargetException
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at
org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88)
    ... 9 more
Caused by: java.lang.RuntimeException: Error in configuring object
    at
org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93)
    at
org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64)
    at
org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
    at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:34)
    ... 14 more
Caused by: java.lang.reflect.InvocationTargetException
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at
org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88)
    ... 17 more
Caused by: java.lang.RuntimeException: configuration exception
    at org.apache.hadoop.streaming.PipeMapRed.configure(PipeMapRed.java:230)
    at org.apache.hadoop.streaming.PipeMapper.configure(PipeMapper.java:66)
    ... 22 more
Caused by: java.io.IOException: Cannot run program
"/home/cloudera/bejoy/apps/inputs/wc/WcStreamMap.py": java.io.IOException:
error=13, Permission denied
    at java.lang.ProcessBuilder.start(ProcessBuilder.java:460)
    at org.apache.hadoop.streaming.PipeMapRed.configure(PipeMapRed.java:214)
    ... 23 more
Caused by: java.io.IOException: java.io.IOException: error=13, Permission
denied
    at java.lang.UNIXProcess.<init>(UNIXProcess.java:148)
    at java.lang.ProcessImpl.start(ProcessImpl.java:65)
    at java.lang.ProcessBuilder.start(ProcessBuilder.java:453)
    ... 24 more

On the error I checked the permissions of mapper and reducer. Issued a chmod
777 command as well. Still no luck.

The permission of the files are as follows
cloudera@cloudera-vm:~$ ls -l /home/cloudera/bejoy/apps/inputs/wc/
-rwxrwxrwx 1 cloudera cloudera  707 2011-09-11 23:42 WcStreamMap.py
-rwxrwxrwx 1 cloudera cloudera 1077 2011-09-11 23:42 WcStreamReduce.py

I'm testing the same on Cloudera Demo VM. So the hadoop setup would be on
pseudo distributed mode. Any help would be highly appreciated.

Thank You

Regards
Bejoy.K.S

Re: Hadoop Streaming job Fails - Permission Denied error

Posted by Jeremy Lewi <je...@lewi.us>.
Benjoy to redirect stdout add the lines

import sys
sys.stdout=sys.stderr

to the top of your py files (i.e right after the shebang line).

J

On Tue, Sep 13, 2011 at 1:42 AM, Bejoy KS <be...@gmail.com> wrote:

> Hi Harsh
>          Thank You for the response. I'm on Cloudera demo VM. It is on
> hadoop 0.20 and has python installed. Do I have to do any further
> installation/configuration to get python running?
>
>
> On Tue, Sep 13, 2011 at 1:36 PM, Harsh J <ha...@cloudera.com> wrote:
>
>> The env binary would be present, but do all your TT nodes have python
>> properly installed on them? The env program can't find them and that's
>> probably why your scripts with shbang don't run.
>>
>> On Tue, Sep 13, 2011 at 1:12 PM, Bejoy KS <be...@gmail.com> wrote:
>> > Thanks Jeremy. But I didn't follow 'redirect "stdout" to "stderr" at the
>> > entry point to your mapper and reducer'.
>> > Basically I'm a java hadoop developer and has no idea on python
>> programming.
>> > Could you please help me with mode details like the line of code i need
>> to
>> > include to achieve this.
>> >
>> > Also I tried a still more deep drill down on my error logs and found the
>> > following line as well
>> >
>> > stderr logs
>> >
>> > /usr/bin/env: python
>> > : No such file or directory
>> > java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess
>> > failed with code 127
>> >     at
>> >
>> org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:362)
>> >     at
>> >
>> org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:572)
>> >     at
>> org.apache.hadoop.streaming.PipeReducer.close(PipeReducer.java:137)
>> >     at
>> > org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:478)
>> >     at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:416)
>> >     at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
>> >     at java.security.AccessController.doPrivileged(Native Method)
>> >     at javax.security.auth.Subject.doAs(Subject.java:396)
>> >     at
>> >
>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1115)
>> >     at org.apache.hadoop.mapred.Child.main(Child.java:262)
>> > log4j:WARN No appenders could be found for logger
>> > (org.apache.hadoop.hdfs.DFSClient).
>> > log4j:WARN Please initialize the log4j system properly.
>> >
>> > I verified on the existence of such a directory and it was present
>> > '/usr/bin/env' .
>> >
>> > Could you please provide little more guidance on the same.
>> >
>> >
>> >
>> > On Tue, Sep 13, 2011 at 9:06 AM, Jeremy Lewi <je...@lewi.us> wrote:
>> >>
>> >> Bejoy,
>> >> The other problem I typically ran into using python streaming jobs was
>> if
>> >> my mapper or reducer wrote to stdout. Since hadoop uses stdout to pass
>> data
>> >> back to Hadoop, any erroneous "print" statements will cause the pipe to
>> >> break. The easiest way around this is to redirect "stdout" to "stderr"
>> at
>> >> the entry point to your mapper and reducer; do this even before you
>> import
>> >> any modules so that even if those modules call "print" it gets
>> redirected.
>> >> Note: if your using dumbo (but I don't think you are) the above
>> solution
>> >> may not work but I can send you a pointer.
>> >> J
>> >>
>> >> On Mon, Sep 12, 2011 at 8:27 AM, Bejoy KS <be...@gmail.com>
>> wrote:
>> >>>
>> >>> Thanks Jeremy. I tried with your first suggestion and the mappers ran
>> >>> into completion. But then the reducers failed with another exception
>> related
>> >>> to pipes. I believe it may be due to permission issues again. I tried
>> >>> setting a few additional config parameters but it didn't do the job.
>> Please
>> >>> find the command used and the error logs from jobtracker web UI
>> >>>
>> >>> hadoop  jar
>> >>>
>> /usr/lib/hadoop-0.20/contrib/streaming/hadoop-streaming-0.20.2-cdh3u0.jar
>> >>> -D hadoop.tmp.dir=/home/streaming/tmp/hadoop/ -D
>> >>> dfs.data.dir=/home/streaming/tmp -D
>> >>> mapred.local.dir=/home/streaming/tmp/local -D
>> >>> mapred.system.dir=/home/streaming/tmp/system -D
>> >>> mapred.temp.dir=/home/streaming/tmp/temp -input
>> >>> /userdata/bejoy/apps/wc/input -output /userdata/bejoy/apps/wc/output
>> >>> -mapper /home/streaming/WcStreamMap.py  -reducer
>> >>> /home/streaming/WcStreamReduce.py
>> >>>
>> >>>
>> >>> java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess
>> >>> failed with code 127
>> >>>     at
>> >>>
>> org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:362)
>> >>>     at
>> >>>
>> org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:572)
>> >>>     at
>> >>> org.apache.hadoop.streaming.PipeReducer.close(PipeReducer.java:137)
>> >>>     at
>> >>> org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:478)
>> >>>     at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:416)
>> >>>     at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
>> >>>     at java.security.AccessController.doPrivileged(Native Method)
>> >>>     at javax.security.auth.Subject.doAs(Subject.java:396)
>> >>>     at
>> >>>
>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1115)
>> >>>     at org.apache.hadoop.mapred.Child.main(Child.java:262)
>> >>>
>> >>>
>> >>> The folder permissions at the time of job execution are as follows
>> >>>
>> >>> cloudera@cloudera-vm:~$ ls -l  /home/streaming/
>> >>> drwxrwxrwx 5 root root 4096 2011-09-12 05:59 tmp
>> >>> -rwxrwxrwx 1 root root  707 2011-09-11 23:42 WcStreamMap.py
>> >>> -rwxrwxrwx 1 root root 1077 2011-09-11 23:42 WcStreamReduce.py
>> >>>
>> >>> cloudera@cloudera-vm:~$ ls -l /home/streaming/tmp/
>> >>> drwxrwxrwx 2 root root 4096 2011-09-12 06:12 hadoop
>> >>> drwxrwxrwx 2 root root 4096 2011-09-12 05:58 local
>> >>> drwxrwxrwx 2 root root 4096 2011-09-12 05:59 system
>> >>> drwxrwxrwx 2 root root 4096 2011-09-12 05:59 temp
>> >>>
>> >>> Am I missing some thing here?
>> >>>
>> >>> It is not for long I'm into Linux so couldn't try your second
>> suggestion
>> >>> on setting up the Linux task controller.
>> >>>
>> >>> Thanks a lot
>> >>>
>> >>> Regards
>> >>> Bejoy.K.S
>> >>>
>> >>>
>> >>>
>> >>> On Mon, Sep 12, 2011 at 6:20 AM, Jeremy Lewi <je...@lewi.us> wrote:
>> >>>>
>> >>>> I would suggest you try putting your mapper/reducer py files in a
>> >>>> directory that is world readable at every level . i.e /tmp/test. I
>> had
>> >>>> similar problems when I was using streaming and I believe my
>> workaround was
>> >>>> to put the mapper/reducers outside my home directory. The other more
>> >>>> involved alternative is to setup the linux task controller so you can
>> run
>> >>>> your MR jobs as the user who submits the jobs.
>> >>>> J
>> >>>>
>> >>>> On Mon, Sep 12, 2011 at 2:18 AM, Bejoy KS <be...@gmail.com>
>> >>>> wrote:
>> >>>>>
>> >>>>> Hi
>> >>>>>       I wanted to try out hadoop steaming and got the sample python
>> >>>>> code for mapper and reducer. I copied both into my lfs and tried
>> running the
>> >>>>> steaming job as mention in the documentation.
>> >>>>> Here the command i used to run the job
>> >>>>>
>> >>>>> hadoop  jar
>> >>>>>
>> /usr/lib/hadoop-0.20/contrib/streaming/hadoop-streaming-0.20.2-cdh3u0.jar
>> >>>>> -input /userdata/bejoy/apps/wc/input -output
>> /userdata/bejoy/apps/wc/output
>> >>>>> -mapper /home/cloudera/bejoy/apps/inputs/wc/WcStreamMap.py  -reducer
>> >>>>> /home/cloudera/bejoy/apps/inputs/wc/WcStreamReduce.py
>> >>>>>
>> >>>>> Here other than input and output the rest all are on lfs locations.
>> How
>> >>>>> ever the job is failing. The error log from the jobtracker url is as
>> >>>>>
>> >>>>> java.lang.RuntimeException: Error in configuring object
>> >>>>>     at
>> >>>>>
>> org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93)
>> >>>>>     at
>> >>>>>
>> org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64)
>> >>>>>     at
>> >>>>>
>> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
>> >>>>>     at
>> org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:386)
>> >>>>>     at org.apache.hadoop.mapred.MapTask.run(MapTask.java:324)
>> >>>>>     at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
>> >>>>>     at java.security.AccessController.doPrivileged(Native Method)
>> >>>>>     at javax.security.auth.Subject.doAs(Subject.java:396)
>> >>>>>     at
>> >>>>>
>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1115)
>> >>>>>     at org.apache.hadoop.mapred.Child.main(Child.java:262)
>> >>>>> Caused by: java.lang.reflect.InvocationTargetException
>> >>>>>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>> >>>>>     at
>> >>>>>
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>> >>>>>     at
>> >>>>>
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>> >>>>>     at java.lang.reflect.Method.invoke(Method.java:597)
>> >>>>>     at
>> >>>>>
>> org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88)
>> >>>>>     ... 9 more
>> >>>>> Caused by: java.lang.RuntimeException: Error in configuring object
>> >>>>>     at
>> >>>>>
>> org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93)
>> >>>>>     at
>> >>>>>
>> org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64)
>> >>>>>     at
>> >>>>>
>> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
>> >>>>>     at
>> org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:34)
>> >>>>>     ... 14 more
>> >>>>> Caused by: java.lang.reflect.InvocationTargetException
>> >>>>>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>> >>>>>     at
>> >>>>>
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>> >>>>>     at
>> >>>>>
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>> >>>>>     at java.lang.reflect.Method.invoke(Method.java:597)
>> >>>>>     at
>> >>>>>
>> org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88)
>> >>>>>     ... 17 more
>> >>>>> Caused by: java.lang.RuntimeException: configuration exception
>> >>>>>     at
>> >>>>>
>> org.apache.hadoop.streaming.PipeMapRed.configure(PipeMapRed.java:230)
>> >>>>>     at
>> >>>>> org.apache.hadoop.streaming.PipeMapper.configure(PipeMapper.java:66)
>> >>>>>     ... 22 more
>> >>>>> Caused by: java.io.IOException: Cannot run program
>> >>>>> "/home/cloudera/bejoy/apps/inputs/wc/WcStreamMap.py":
>> java.io.IOException:
>> >>>>> error=13, Permission denied
>> >>>>>     at java.lang.ProcessBuilder.start(ProcessBuilder.java:460)
>> >>>>>     at
>> >>>>>
>> org.apache.hadoop.streaming.PipeMapRed.configure(PipeMapRed.java:214)
>> >>>>>     ... 23 more
>> >>>>> Caused by: java.io.IOException: java.io.IOException: error=13,
>> >>>>> Permission denied
>> >>>>>     at java.lang.UNIXProcess.<init>(UNIXProcess.java:148)
>> >>>>>     at java.lang.ProcessImpl.start(ProcessImpl.java:65)
>> >>>>>     at java.lang.ProcessBuilder.start(ProcessBuilder.java:453)
>> >>>>>     ... 24 more
>> >>>>>
>> >>>>> On the error I checked the permissions of mapper and reducer. Issued
>> a
>> >>>>> chmod 777 command as well. Still no luck.
>> >>>>>
>> >>>>> The permission of the files are as follows
>> >>>>> cloudera@cloudera-vm:~$ ls -l /home/cloudera/bejoy/apps/inputs/wc/
>> >>>>> -rwxrwxrwx 1 cloudera cloudera  707 2011-09-11 23:42 WcStreamMap.py
>> >>>>> -rwxrwxrwx 1 cloudera cloudera 1077 2011-09-11 23:42
>> WcStreamReduce.py
>> >>>>>
>> >>>>> I'm testing the same on Cloudera Demo VM. So the hadoop setup would
>> be
>> >>>>> on pseudo distributed mode. Any help would be highly appreciated.
>> >>>>>
>> >>>>> Thank You
>> >>>>>
>> >>>>> Regards
>> >>>>> Bejoy.K.S
>> >>>>>
>> >>>>
>> >>>
>> >>
>> >
>> >
>>
>>
>>
>> --
>> Harsh J
>>
>
>

Re: Hadoop Streaming job Fails - Permission Denied error

Posted by Bejoy KS <be...@gmail.com>.
Hi Harsh
         Thank You for the response. I'm on Cloudera demo VM. It is on
hadoop 0.20 and has python installed. Do I have to do any further
installation/configuration to get python running?

On Tue, Sep 13, 2011 at 1:36 PM, Harsh J <ha...@cloudera.com> wrote:

> The env binary would be present, but do all your TT nodes have python
> properly installed on them? The env program can't find them and that's
> probably why your scripts with shbang don't run.
>
> On Tue, Sep 13, 2011 at 1:12 PM, Bejoy KS <be...@gmail.com> wrote:
> > Thanks Jeremy. But I didn't follow 'redirect "stdout" to "stderr" at the
> > entry point to your mapper and reducer'.
> > Basically I'm a java hadoop developer and has no idea on python
> programming.
> > Could you please help me with mode details like the line of code i need
> to
> > include to achieve this.
> >
> > Also I tried a still more deep drill down on my error logs and found the
> > following line as well
> >
> > stderr logs
> >
> > /usr/bin/env: python
> > : No such file or directory
> > java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess
> > failed with code 127
> >     at
> >
> org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:362)
> >     at
> >
> org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:572)
> >     at
> org.apache.hadoop.streaming.PipeReducer.close(PipeReducer.java:137)
> >     at
> > org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:478)
> >     at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:416)
> >     at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
> >     at java.security.AccessController.doPrivileged(Native Method)
> >     at javax.security.auth.Subject.doAs(Subject.java:396)
> >     at
> >
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1115)
> >     at org.apache.hadoop.mapred.Child.main(Child.java:262)
> > log4j:WARN No appenders could be found for logger
> > (org.apache.hadoop.hdfs.DFSClient).
> > log4j:WARN Please initialize the log4j system properly.
> >
> > I verified on the existence of such a directory and it was present
> > '/usr/bin/env' .
> >
> > Could you please provide little more guidance on the same.
> >
> >
> >
> > On Tue, Sep 13, 2011 at 9:06 AM, Jeremy Lewi <je...@lewi.us> wrote:
> >>
> >> Bejoy,
> >> The other problem I typically ran into using python streaming jobs was
> if
> >> my mapper or reducer wrote to stdout. Since hadoop uses stdout to pass
> data
> >> back to Hadoop, any erroneous "print" statements will cause the pipe to
> >> break. The easiest way around this is to redirect "stdout" to "stderr"
> at
> >> the entry point to your mapper and reducer; do this even before you
> import
> >> any modules so that even if those modules call "print" it gets
> redirected.
> >> Note: if your using dumbo (but I don't think you are) the above solution
> >> may not work but I can send you a pointer.
> >> J
> >>
> >> On Mon, Sep 12, 2011 at 8:27 AM, Bejoy KS <be...@gmail.com>
> wrote:
> >>>
> >>> Thanks Jeremy. I tried with your first suggestion and the mappers ran
> >>> into completion. But then the reducers failed with another exception
> related
> >>> to pipes. I believe it may be due to permission issues again. I tried
> >>> setting a few additional config parameters but it didn't do the job.
> Please
> >>> find the command used and the error logs from jobtracker web UI
> >>>
> >>> hadoop  jar
> >>>
> /usr/lib/hadoop-0.20/contrib/streaming/hadoop-streaming-0.20.2-cdh3u0.jar
> >>> -D hadoop.tmp.dir=/home/streaming/tmp/hadoop/ -D
> >>> dfs.data.dir=/home/streaming/tmp -D
> >>> mapred.local.dir=/home/streaming/tmp/local -D
> >>> mapred.system.dir=/home/streaming/tmp/system -D
> >>> mapred.temp.dir=/home/streaming/tmp/temp -input
> >>> /userdata/bejoy/apps/wc/input -output /userdata/bejoy/apps/wc/output
> >>> -mapper /home/streaming/WcStreamMap.py  -reducer
> >>> /home/streaming/WcStreamReduce.py
> >>>
> >>>
> >>> java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess
> >>> failed with code 127
> >>>     at
> >>>
> org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:362)
> >>>     at
> >>>
> org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:572)
> >>>     at
> >>> org.apache.hadoop.streaming.PipeReducer.close(PipeReducer.java:137)
> >>>     at
> >>> org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:478)
> >>>     at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:416)
> >>>     at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
> >>>     at java.security.AccessController.doPrivileged(Native Method)
> >>>     at javax.security.auth.Subject.doAs(Subject.java:396)
> >>>     at
> >>>
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1115)
> >>>     at org.apache.hadoop.mapred.Child.main(Child.java:262)
> >>>
> >>>
> >>> The folder permissions at the time of job execution are as follows
> >>>
> >>> cloudera@cloudera-vm:~$ ls -l  /home/streaming/
> >>> drwxrwxrwx 5 root root 4096 2011-09-12 05:59 tmp
> >>> -rwxrwxrwx 1 root root  707 2011-09-11 23:42 WcStreamMap.py
> >>> -rwxrwxrwx 1 root root 1077 2011-09-11 23:42 WcStreamReduce.py
> >>>
> >>> cloudera@cloudera-vm:~$ ls -l /home/streaming/tmp/
> >>> drwxrwxrwx 2 root root 4096 2011-09-12 06:12 hadoop
> >>> drwxrwxrwx 2 root root 4096 2011-09-12 05:58 local
> >>> drwxrwxrwx 2 root root 4096 2011-09-12 05:59 system
> >>> drwxrwxrwx 2 root root 4096 2011-09-12 05:59 temp
> >>>
> >>> Am I missing some thing here?
> >>>
> >>> It is not for long I'm into Linux so couldn't try your second
> suggestion
> >>> on setting up the Linux task controller.
> >>>
> >>> Thanks a lot
> >>>
> >>> Regards
> >>> Bejoy.K.S
> >>>
> >>>
> >>>
> >>> On Mon, Sep 12, 2011 at 6:20 AM, Jeremy Lewi <je...@lewi.us> wrote:
> >>>>
> >>>> I would suggest you try putting your mapper/reducer py files in a
> >>>> directory that is world readable at every level . i.e /tmp/test. I had
> >>>> similar problems when I was using streaming and I believe my
> workaround was
> >>>> to put the mapper/reducers outside my home directory. The other more
> >>>> involved alternative is to setup the linux task controller so you can
> run
> >>>> your MR jobs as the user who submits the jobs.
> >>>> J
> >>>>
> >>>> On Mon, Sep 12, 2011 at 2:18 AM, Bejoy KS <be...@gmail.com>
> >>>> wrote:
> >>>>>
> >>>>> Hi
> >>>>>       I wanted to try out hadoop steaming and got the sample python
> >>>>> code for mapper and reducer. I copied both into my lfs and tried
> running the
> >>>>> steaming job as mention in the documentation.
> >>>>> Here the command i used to run the job
> >>>>>
> >>>>> hadoop  jar
> >>>>>
> /usr/lib/hadoop-0.20/contrib/streaming/hadoop-streaming-0.20.2-cdh3u0.jar
> >>>>> -input /userdata/bejoy/apps/wc/input -output
> /userdata/bejoy/apps/wc/output
> >>>>> -mapper /home/cloudera/bejoy/apps/inputs/wc/WcStreamMap.py  -reducer
> >>>>> /home/cloudera/bejoy/apps/inputs/wc/WcStreamReduce.py
> >>>>>
> >>>>> Here other than input and output the rest all are on lfs locations.
> How
> >>>>> ever the job is failing. The error log from the jobtracker url is as
> >>>>>
> >>>>> java.lang.RuntimeException: Error in configuring object
> >>>>>     at
> >>>>>
> org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93)
> >>>>>     at
> >>>>>
> org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64)
> >>>>>     at
> >>>>>
> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
> >>>>>     at
> org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:386)
> >>>>>     at org.apache.hadoop.mapred.MapTask.run(MapTask.java:324)
> >>>>>     at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
> >>>>>     at java.security.AccessController.doPrivileged(Native Method)
> >>>>>     at javax.security.auth.Subject.doAs(Subject.java:396)
> >>>>>     at
> >>>>>
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1115)
> >>>>>     at org.apache.hadoop.mapred.Child.main(Child.java:262)
> >>>>> Caused by: java.lang.reflect.InvocationTargetException
> >>>>>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> >>>>>     at
> >>>>>
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> >>>>>     at
> >>>>>
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> >>>>>     at java.lang.reflect.Method.invoke(Method.java:597)
> >>>>>     at
> >>>>>
> org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88)
> >>>>>     ... 9 more
> >>>>> Caused by: java.lang.RuntimeException: Error in configuring object
> >>>>>     at
> >>>>>
> org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93)
> >>>>>     at
> >>>>>
> org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64)
> >>>>>     at
> >>>>>
> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
> >>>>>     at
> org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:34)
> >>>>>     ... 14 more
> >>>>> Caused by: java.lang.reflect.InvocationTargetException
> >>>>>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> >>>>>     at
> >>>>>
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> >>>>>     at
> >>>>>
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> >>>>>     at java.lang.reflect.Method.invoke(Method.java:597)
> >>>>>     at
> >>>>>
> org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88)
> >>>>>     ... 17 more
> >>>>> Caused by: java.lang.RuntimeException: configuration exception
> >>>>>     at
> >>>>> org.apache.hadoop.streaming.PipeMapRed.configure(PipeMapRed.java:230)
> >>>>>     at
> >>>>> org.apache.hadoop.streaming.PipeMapper.configure(PipeMapper.java:66)
> >>>>>     ... 22 more
> >>>>> Caused by: java.io.IOException: Cannot run program
> >>>>> "/home/cloudera/bejoy/apps/inputs/wc/WcStreamMap.py":
> java.io.IOException:
> >>>>> error=13, Permission denied
> >>>>>     at java.lang.ProcessBuilder.start(ProcessBuilder.java:460)
> >>>>>     at
> >>>>> org.apache.hadoop.streaming.PipeMapRed.configure(PipeMapRed.java:214)
> >>>>>     ... 23 more
> >>>>> Caused by: java.io.IOException: java.io.IOException: error=13,
> >>>>> Permission denied
> >>>>>     at java.lang.UNIXProcess.<init>(UNIXProcess.java:148)
> >>>>>     at java.lang.ProcessImpl.start(ProcessImpl.java:65)
> >>>>>     at java.lang.ProcessBuilder.start(ProcessBuilder.java:453)
> >>>>>     ... 24 more
> >>>>>
> >>>>> On the error I checked the permissions of mapper and reducer. Issued
> a
> >>>>> chmod 777 command as well. Still no luck.
> >>>>>
> >>>>> The permission of the files are as follows
> >>>>> cloudera@cloudera-vm:~$ ls -l /home/cloudera/bejoy/apps/inputs/wc/
> >>>>> -rwxrwxrwx 1 cloudera cloudera  707 2011-09-11 23:42 WcStreamMap.py
> >>>>> -rwxrwxrwx 1 cloudera cloudera 1077 2011-09-11 23:42
> WcStreamReduce.py
> >>>>>
> >>>>> I'm testing the same on Cloudera Demo VM. So the hadoop setup would
> be
> >>>>> on pseudo distributed mode. Any help would be highly appreciated.
> >>>>>
> >>>>> Thank You
> >>>>>
> >>>>> Regards
> >>>>> Bejoy.K.S
> >>>>>
> >>>>
> >>>
> >>
> >
> >
>
>
>
> --
> Harsh J
>

Re: Hadoop Streaming job Fails - Permission Denied error

Posted by Harsh J <ha...@cloudera.com>.
The env binary would be present, but do all your TT nodes have python
properly installed on them? The env program can't find them and that's
probably why your scripts with shbang don't run.

On Tue, Sep 13, 2011 at 1:12 PM, Bejoy KS <be...@gmail.com> wrote:
> Thanks Jeremy. But I didn't follow 'redirect "stdout" to "stderr" at the
> entry point to your mapper and reducer'.
> Basically I'm a java hadoop developer and has no idea on python programming.
> Could you please help me with mode details like the line of code i need to
> include to achieve this.
>
> Also I tried a still more deep drill down on my error logs and found the
> following line as well
>
> stderr logs
>
> /usr/bin/env: python
> : No such file or directory
> java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess
> failed with code 127
>     at
> org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:362)
>     at
> org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:572)
>     at org.apache.hadoop.streaming.PipeReducer.close(PipeReducer.java:137)
>     at
> org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:478)
>     at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:416)
>     at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
>     at java.security.AccessController.doPrivileged(Native Method)
>     at javax.security.auth.Subject.doAs(Subject.java:396)
>     at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1115)
>     at org.apache.hadoop.mapred.Child.main(Child.java:262)
> log4j:WARN No appenders could be found for logger
> (org.apache.hadoop.hdfs.DFSClient).
> log4j:WARN Please initialize the log4j system properly.
>
> I verified on the existence of such a directory and it was present
> '/usr/bin/env' .
>
> Could you please provide little more guidance on the same.
>
>
>
> On Tue, Sep 13, 2011 at 9:06 AM, Jeremy Lewi <je...@lewi.us> wrote:
>>
>> Bejoy,
>> The other problem I typically ran into using python streaming jobs was if
>> my mapper or reducer wrote to stdout. Since hadoop uses stdout to pass data
>> back to Hadoop, any erroneous "print" statements will cause the pipe to
>> break. The easiest way around this is to redirect "stdout" to "stderr" at
>> the entry point to your mapper and reducer; do this even before you import
>> any modules so that even if those modules call "print" it gets redirected.
>> Note: if your using dumbo (but I don't think you are) the above solution
>> may not work but I can send you a pointer.
>> J
>>
>> On Mon, Sep 12, 2011 at 8:27 AM, Bejoy KS <be...@gmail.com> wrote:
>>>
>>> Thanks Jeremy. I tried with your first suggestion and the mappers ran
>>> into completion. But then the reducers failed with another exception related
>>> to pipes. I believe it may be due to permission issues again. I tried
>>> setting a few additional config parameters but it didn't do the job. Please
>>> find the command used and the error logs from jobtracker web UI
>>>
>>> hadoop  jar
>>> /usr/lib/hadoop-0.20/contrib/streaming/hadoop-streaming-0.20.2-cdh3u0.jar
>>> -D hadoop.tmp.dir=/home/streaming/tmp/hadoop/ -D
>>> dfs.data.dir=/home/streaming/tmp -D
>>> mapred.local.dir=/home/streaming/tmp/local -D
>>> mapred.system.dir=/home/streaming/tmp/system -D
>>> mapred.temp.dir=/home/streaming/tmp/temp -input
>>> /userdata/bejoy/apps/wc/input -output /userdata/bejoy/apps/wc/output
>>> -mapper /home/streaming/WcStreamMap.py  -reducer
>>> /home/streaming/WcStreamReduce.py
>>>
>>>
>>> java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess
>>> failed with code 127
>>>     at
>>> org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:362)
>>>     at
>>> org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:572)
>>>     at
>>> org.apache.hadoop.streaming.PipeReducer.close(PipeReducer.java:137)
>>>     at
>>> org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:478)
>>>     at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:416)
>>>     at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
>>>     at java.security.AccessController.doPrivileged(Native Method)
>>>     at javax.security.auth.Subject.doAs(Subject.java:396)
>>>     at
>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1115)
>>>     at org.apache.hadoop.mapred.Child.main(Child.java:262)
>>>
>>>
>>> The folder permissions at the time of job execution are as follows
>>>
>>> cloudera@cloudera-vm:~$ ls -l  /home/streaming/
>>> drwxrwxrwx 5 root root 4096 2011-09-12 05:59 tmp
>>> -rwxrwxrwx 1 root root  707 2011-09-11 23:42 WcStreamMap.py
>>> -rwxrwxrwx 1 root root 1077 2011-09-11 23:42 WcStreamReduce.py
>>>
>>> cloudera@cloudera-vm:~$ ls -l /home/streaming/tmp/
>>> drwxrwxrwx 2 root root 4096 2011-09-12 06:12 hadoop
>>> drwxrwxrwx 2 root root 4096 2011-09-12 05:58 local
>>> drwxrwxrwx 2 root root 4096 2011-09-12 05:59 system
>>> drwxrwxrwx 2 root root 4096 2011-09-12 05:59 temp
>>>
>>> Am I missing some thing here?
>>>
>>> It is not for long I'm into Linux so couldn't try your second suggestion
>>> on setting up the Linux task controller.
>>>
>>> Thanks a lot
>>>
>>> Regards
>>> Bejoy.K.S
>>>
>>>
>>>
>>> On Mon, Sep 12, 2011 at 6:20 AM, Jeremy Lewi <je...@lewi.us> wrote:
>>>>
>>>> I would suggest you try putting your mapper/reducer py files in a
>>>> directory that is world readable at every level . i.e /tmp/test. I had
>>>> similar problems when I was using streaming and I believe my workaround was
>>>> to put the mapper/reducers outside my home directory. The other more
>>>> involved alternative is to setup the linux task controller so you can run
>>>> your MR jobs as the user who submits the jobs.
>>>> J
>>>>
>>>> On Mon, Sep 12, 2011 at 2:18 AM, Bejoy KS <be...@gmail.com>
>>>> wrote:
>>>>>
>>>>> Hi
>>>>>       I wanted to try out hadoop steaming and got the sample python
>>>>> code for mapper and reducer. I copied both into my lfs and tried running the
>>>>> steaming job as mention in the documentation.
>>>>> Here the command i used to run the job
>>>>>
>>>>> hadoop  jar
>>>>> /usr/lib/hadoop-0.20/contrib/streaming/hadoop-streaming-0.20.2-cdh3u0.jar
>>>>> -input /userdata/bejoy/apps/wc/input -output /userdata/bejoy/apps/wc/output
>>>>> -mapper /home/cloudera/bejoy/apps/inputs/wc/WcStreamMap.py  -reducer
>>>>> /home/cloudera/bejoy/apps/inputs/wc/WcStreamReduce.py
>>>>>
>>>>> Here other than input and output the rest all are on lfs locations. How
>>>>> ever the job is failing. The error log from the jobtracker url is as
>>>>>
>>>>> java.lang.RuntimeException: Error in configuring object
>>>>>     at
>>>>> org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93)
>>>>>     at
>>>>> org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64)
>>>>>     at
>>>>> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
>>>>>     at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:386)
>>>>>     at org.apache.hadoop.mapred.MapTask.run(MapTask.java:324)
>>>>>     at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
>>>>>     at java.security.AccessController.doPrivileged(Native Method)
>>>>>     at javax.security.auth.Subject.doAs(Subject.java:396)
>>>>>     at
>>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1115)
>>>>>     at org.apache.hadoop.mapred.Child.main(Child.java:262)
>>>>> Caused by: java.lang.reflect.InvocationTargetException
>>>>>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>>>>     at
>>>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>>>>>     at
>>>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>>>>     at java.lang.reflect.Method.invoke(Method.java:597)
>>>>>     at
>>>>> org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88)
>>>>>     ... 9 more
>>>>> Caused by: java.lang.RuntimeException: Error in configuring object
>>>>>     at
>>>>> org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93)
>>>>>     at
>>>>> org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64)
>>>>>     at
>>>>> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
>>>>>     at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:34)
>>>>>     ... 14 more
>>>>> Caused by: java.lang.reflect.InvocationTargetException
>>>>>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>>>>     at
>>>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>>>>>     at
>>>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>>>>     at java.lang.reflect.Method.invoke(Method.java:597)
>>>>>     at
>>>>> org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88)
>>>>>     ... 17 more
>>>>> Caused by: java.lang.RuntimeException: configuration exception
>>>>>     at
>>>>> org.apache.hadoop.streaming.PipeMapRed.configure(PipeMapRed.java:230)
>>>>>     at
>>>>> org.apache.hadoop.streaming.PipeMapper.configure(PipeMapper.java:66)
>>>>>     ... 22 more
>>>>> Caused by: java.io.IOException: Cannot run program
>>>>> "/home/cloudera/bejoy/apps/inputs/wc/WcStreamMap.py": java.io.IOException:
>>>>> error=13, Permission denied
>>>>>     at java.lang.ProcessBuilder.start(ProcessBuilder.java:460)
>>>>>     at
>>>>> org.apache.hadoop.streaming.PipeMapRed.configure(PipeMapRed.java:214)
>>>>>     ... 23 more
>>>>> Caused by: java.io.IOException: java.io.IOException: error=13,
>>>>> Permission denied
>>>>>     at java.lang.UNIXProcess.<init>(UNIXProcess.java:148)
>>>>>     at java.lang.ProcessImpl.start(ProcessImpl.java:65)
>>>>>     at java.lang.ProcessBuilder.start(ProcessBuilder.java:453)
>>>>>     ... 24 more
>>>>>
>>>>> On the error I checked the permissions of mapper and reducer. Issued a
>>>>> chmod 777 command as well. Still no luck.
>>>>>
>>>>> The permission of the files are as follows
>>>>> cloudera@cloudera-vm:~$ ls -l /home/cloudera/bejoy/apps/inputs/wc/
>>>>> -rwxrwxrwx 1 cloudera cloudera  707 2011-09-11 23:42 WcStreamMap.py
>>>>> -rwxrwxrwx 1 cloudera cloudera 1077 2011-09-11 23:42 WcStreamReduce.py
>>>>>
>>>>> I'm testing the same on Cloudera Demo VM. So the hadoop setup would be
>>>>> on pseudo distributed mode. Any help would be highly appreciated.
>>>>>
>>>>> Thank You
>>>>>
>>>>> Regards
>>>>> Bejoy.K.S
>>>>>
>>>>
>>>
>>
>
>



-- 
Harsh J

Re: Hadoop Streaming job Fails - Permission Denied error

Posted by Bejoy KS <be...@gmail.com>.
Thanks Jeremy. But I didn't follow 'redirect "stdout" to "stderr" at the
entry point to your mapper and reducer'.
Basically I'm a java hadoop developer and has no idea on python programming.
Could you please help me with mode details like the line of code i need to
include to achieve this.

Also I tried a still more deep drill down on my error logs and found the
following line as well

*stderr logs*

/usr/bin/env: python
: No such file or directory
java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess
failed with code 127
    at
org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:362)
    at
org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:572)
    at org.apache.hadoop.streaming.PipeReducer.close(PipeReducer.java:137)
    at
org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:478)
    at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:416)
    at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:396)
    at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1115)
    at org.apache.hadoop.mapred.Child.main(Child.java:262)
log4j:WARN No appenders could be found for logger
(org.apache.hadoop.hdfs.DFSClient).
log4j:WARN Please initialize the log4j system properly.

I verified on the existence of such a directory and it was present
'/usr/bin/env' .

Could you please provide little more guidance on the same.



On Tue, Sep 13, 2011 at 9:06 AM, Jeremy Lewi <je...@lewi.us> wrote:

> Bejoy,
>
> The other problem I typically ran into using python streaming jobs was if
> my mapper or reducer wrote to stdout. Since hadoop uses stdout to pass data
> back to Hadoop, any erroneous "print" statements will cause the pipe to
> break. The easiest way around this is to redirect "stdout" to "stderr" at
> the entry point to your mapper and reducer; do this even before you import
> any modules so that even if those modules call "print" it gets redirected.
>
> Note: if your using dumbo (but I don't think you are) the above solution
> may not work but I can send you a pointer.
>
> J
>
>
> On Mon, Sep 12, 2011 at 8:27 AM, Bejoy KS <be...@gmail.com> wrote:
>
>> Thanks Jeremy. I tried with your first suggestion and the mappers ran into
>> completion. But then the reducers failed with another exception related to
>> pipes. I believe it may be due to permission issues again. I tried setting a
>> few additional config parameters but it didn't do the job. Please find the
>> command used and the error logs from jobtracker web UI
>>
>> hadoop  jar
>> /usr/lib/hadoop-0.20/contrib/streaming/hadoop-streaming-0.20.2-cdh3u0.jar
>> -D hadoop.tmp.dir=/home/streaming/tmp/hadoop/ -D
>> dfs.data.dir=/home/streaming/tmp -D
>> mapred.local.dir=/home/streaming/tmp/local -D
>> mapred.system.dir=/home/streaming/tmp/system -D
>> mapred.temp.dir=/home/streaming/tmp/temp -input
>> /userdata/bejoy/apps/wc/input -output /userdata/bejoy/apps/wc/output
>> -mapper /home/streaming/WcStreamMap.py  -reducer
>> /home/streaming/WcStreamReduce.py
>>
>>
>> java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess
>> failed with code 127
>>     at
>> org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:362)
>>     at
>> org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:572)
>>     at org.apache.hadoop.streaming.PipeReducer.close(PipeReducer.java:137)
>>     at
>> org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:478)
>>     at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:416)
>>
>>     at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
>>     at java.security.AccessController.doPrivileged(Native Method)
>>     at javax.security.auth.Subject.doAs(Subject.java:396)
>>     at
>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1115)
>>     at org.apache.hadoop.mapred.Child.main(Child.java:262)
>>
>>
>> The folder permissions at the time of job execution are as follows
>>
>> cloudera@cloudera-vm:~$ ls -l  /home/streaming/
>> drwxrwxrwx 5 root root 4096 2011-09-12 05:59 tmp
>> -rwxrwxrwx 1 root root  707 2011-09-11 23:42 WcStreamMap.py
>> -rwxrwxrwx 1 root root 1077 2011-09-11 23:42 WcStreamReduce.py
>>
>> cloudera@cloudera-vm:~$ ls -l /home/streaming/tmp/
>> drwxrwxrwx 2 root root 4096 2011-09-12 06:12 hadoop
>> drwxrwxrwx 2 root root 4096 2011-09-12 05:58 local
>> drwxrwxrwx 2 root root 4096 2011-09-12 05:59 system
>> drwxrwxrwx 2 root root 4096 2011-09-12 05:59 temp
>>
>> Am I missing some thing here?
>>
>> It is not for long I'm into Linux so couldn't try your second suggestion
>> on setting up the Linux task controller.
>>
>> Thanks a lot
>>
>> Regards
>> Bejoy.K.S
>>
>>
>>
>>
>> On Mon, Sep 12, 2011 at 6:20 AM, Jeremy Lewi <je...@lewi.us> wrote:
>>
>>> I would suggest you try putting your mapper/reducer py files in a
>>> directory that is world readable at every level . i.e /tmp/test. I had
>>> similar problems when I was using streaming and I believe my workaround was
>>> to put the mapper/reducers outside my home directory. The other more
>>> involved alternative is to setup the linux task controller so you can run
>>> your MR jobs as the user who submits the jobs.
>>>
>>> J
>>>
>>>
>>> On Mon, Sep 12, 2011 at 2:18 AM, Bejoy KS <be...@gmail.com>wrote:
>>>
>>>> Hi
>>>>       I wanted to try out hadoop steaming and got the sample python code
>>>> for mapper and reducer. I copied both into my lfs and tried running the
>>>> steaming job as mention in the documentation.
>>>> Here the command i used to run the job
>>>>
>>>> hadoop  jar
>>>> /usr/lib/hadoop-0.20/contrib/streaming/hadoop-streaming-0.20.2-cdh3u0.jar
>>>> -input /userdata/bejoy/apps/wc/input -output /userdata/bejoy/apps/wc/output
>>>> -mapper /home/cloudera/bejoy/apps/inputs/wc/WcStreamMap.py  -reducer
>>>> /home/cloudera/bejoy/apps/inputs/wc/WcStreamReduce.py
>>>>
>>>> Here other than input and output the rest all are on lfs locations. How
>>>> ever the job is failing. The error log from the jobtracker url is as
>>>>
>>>> java.lang.RuntimeException: Error in configuring object
>>>>     at
>>>> org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93)
>>>>     at
>>>> org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64)
>>>>     at
>>>> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
>>>>     at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:386)
>>>>     at org.apache.hadoop.mapred.MapTask.run(MapTask.java:324)
>>>>     at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
>>>>     at java.security.AccessController.doPrivileged(Native Method)
>>>>     at javax.security.auth.Subject.doAs(Subject.java:396)
>>>>     at
>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1115)
>>>>     at org.apache.hadoop.mapred.Child.main(Child.java:262)
>>>> Caused by: java.lang.reflect.InvocationTargetException
>>>>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>>>     at
>>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>>>>     at
>>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>>>     at java.lang.reflect.Method.invoke(Method.java:597)
>>>>     at
>>>> org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88)
>>>>     ... 9 more
>>>> Caused by: java.lang.RuntimeException: Error in configuring object
>>>>     at
>>>> org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93)
>>>>     at
>>>> org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64)
>>>>     at
>>>> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
>>>>     at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:34)
>>>>     ... 14 more
>>>> Caused by: java.lang.reflect.InvocationTargetException
>>>>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>>>     at
>>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>>>>     at
>>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>>>     at java.lang.reflect.Method.invoke(Method.java:597)
>>>>     at
>>>> org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88)
>>>>     ... 17 more
>>>> Caused by: java.lang.RuntimeException: configuration exception
>>>>     at
>>>> org.apache.hadoop.streaming.PipeMapRed.configure(PipeMapRed.java:230)
>>>>     at
>>>> org.apache.hadoop.streaming.PipeMapper.configure(PipeMapper.java:66)
>>>>     ... 22 more
>>>> Caused by: java.io.IOException: Cannot run program
>>>> "/home/cloudera/bejoy/apps/inputs/wc/WcStreamMap.py": java.io.IOException:
>>>> error=13, Permission denied
>>>>     at java.lang.ProcessBuilder.start(ProcessBuilder.java:460)
>>>>     at
>>>> org.apache.hadoop.streaming.PipeMapRed.configure(PipeMapRed.java:214)
>>>>     ... 23 more
>>>> Caused by: java.io.IOException: java.io.IOException: error=13,
>>>> Permission denied
>>>>     at java.lang.UNIXProcess.<init>(UNIXProcess.java:148)
>>>>     at java.lang.ProcessImpl.start(ProcessImpl.java:65)
>>>>     at java.lang.ProcessBuilder.start(ProcessBuilder.java:453)
>>>>     ... 24 more
>>>>
>>>> On the error I checked the permissions of mapper and reducer. Issued a
>>>> chmod 777 command as well. Still no luck.
>>>>
>>>> The permission of the files are as follows
>>>> cloudera@cloudera-vm:~$ ls -l /home/cloudera/bejoy/apps/inputs/wc/
>>>> -rwxrwxrwx 1 cloudera cloudera  707 2011-09-11 23:42 WcStreamMap.py
>>>> -rwxrwxrwx 1 cloudera cloudera 1077 2011-09-11 23:42 WcStreamReduce.py
>>>>
>>>> I'm testing the same on Cloudera Demo VM. So the hadoop setup would be
>>>> on pseudo distributed mode. Any help would be highly appreciated.
>>>>
>>>> Thank You
>>>>
>>>> Regards
>>>> Bejoy.K.S
>>>>
>>>>
>>>
>>
>

Re: Hadoop Streaming job Fails - Permission Denied error

Posted by Jeremy Lewi <je...@lewi.us>.
Bejoy,

The other problem I typically ran into using python streaming jobs was if my
mapper or reducer wrote to stdout. Since hadoop uses stdout to pass data
back to Hadoop, any erroneous "print" statements will cause the pipe to
break. The easiest way around this is to redirect "stdout" to "stderr" at
the entry point to your mapper and reducer; do this even before you import
any modules so that even if those modules call "print" it gets redirected.

Note: if your using dumbo (but I don't think you are) the above solution may
not work but I can send you a pointer.

J

On Mon, Sep 12, 2011 at 8:27 AM, Bejoy KS <be...@gmail.com> wrote:

> Thanks Jeremy. I tried with your first suggestion and the mappers ran into
> completion. But then the reducers failed with another exception related to
> pipes. I believe it may be due to permission issues again. I tried setting a
> few additional config parameters but it didn't do the job. Please find the
> command used and the error logs from jobtracker web UI
>
> hadoop  jar
> /usr/lib/hadoop-0.20/contrib/streaming/hadoop-streaming-0.20.2-cdh3u0.jar
> -D hadoop.tmp.dir=/home/streaming/tmp/hadoop/ -D
> dfs.data.dir=/home/streaming/tmp -D
> mapred.local.dir=/home/streaming/tmp/local -D
> mapred.system.dir=/home/streaming/tmp/system -D
> mapred.temp.dir=/home/streaming/tmp/temp -input
> /userdata/bejoy/apps/wc/input -output /userdata/bejoy/apps/wc/output
> -mapper /home/streaming/WcStreamMap.py  -reducer
> /home/streaming/WcStreamReduce.py
>
>
> java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess
> failed with code 127
>     at
> org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:362)
>     at
> org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:572)
>     at org.apache.hadoop.streaming.PipeReducer.close(PipeReducer.java:137)
>     at
> org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:478)
>     at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:416)
>
>     at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
>     at java.security.AccessController.doPrivileged(Native Method)
>     at javax.security.auth.Subject.doAs(Subject.java:396)
>     at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1115)
>     at org.apache.hadoop.mapred.Child.main(Child.java:262)
>
>
> The folder permissions at the time of job execution are as follows
>
> cloudera@cloudera-vm:~$ ls -l  /home/streaming/
> drwxrwxrwx 5 root root 4096 2011-09-12 05:59 tmp
> -rwxrwxrwx 1 root root  707 2011-09-11 23:42 WcStreamMap.py
> -rwxrwxrwx 1 root root 1077 2011-09-11 23:42 WcStreamReduce.py
>
> cloudera@cloudera-vm:~$ ls -l /home/streaming/tmp/
> drwxrwxrwx 2 root root 4096 2011-09-12 06:12 hadoop
> drwxrwxrwx 2 root root 4096 2011-09-12 05:58 local
> drwxrwxrwx 2 root root 4096 2011-09-12 05:59 system
> drwxrwxrwx 2 root root 4096 2011-09-12 05:59 temp
>
> Am I missing some thing here?
>
> It is not for long I'm into Linux so couldn't try your second suggestion on
> setting up the Linux task controller.
>
> Thanks a lot
>
> Regards
> Bejoy.K.S
>
>
>
>
> On Mon, Sep 12, 2011 at 6:20 AM, Jeremy Lewi <je...@lewi.us> wrote:
>
>> I would suggest you try putting your mapper/reducer py files in a
>> directory that is world readable at every level . i.e /tmp/test. I had
>> similar problems when I was using streaming and I believe my workaround was
>> to put the mapper/reducers outside my home directory. The other more
>> involved alternative is to setup the linux task controller so you can run
>> your MR jobs as the user who submits the jobs.
>>
>> J
>>
>>
>> On Mon, Sep 12, 2011 at 2:18 AM, Bejoy KS <be...@gmail.com> wrote:
>>
>>> Hi
>>>       I wanted to try out hadoop steaming and got the sample python code
>>> for mapper and reducer. I copied both into my lfs and tried running the
>>> steaming job as mention in the documentation.
>>> Here the command i used to run the job
>>>
>>> hadoop  jar
>>> /usr/lib/hadoop-0.20/contrib/streaming/hadoop-streaming-0.20.2-cdh3u0.jar
>>> -input /userdata/bejoy/apps/wc/input -output /userdata/bejoy/apps/wc/output
>>> -mapper /home/cloudera/bejoy/apps/inputs/wc/WcStreamMap.py  -reducer
>>> /home/cloudera/bejoy/apps/inputs/wc/WcStreamReduce.py
>>>
>>> Here other than input and output the rest all are on lfs locations. How
>>> ever the job is failing. The error log from the jobtracker url is as
>>>
>>> java.lang.RuntimeException: Error in configuring object
>>>     at
>>> org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93)
>>>     at
>>> org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64)
>>>     at
>>> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
>>>     at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:386)
>>>     at org.apache.hadoop.mapred.MapTask.run(MapTask.java:324)
>>>     at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
>>>     at java.security.AccessController.doPrivileged(Native Method)
>>>     at javax.security.auth.Subject.doAs(Subject.java:396)
>>>     at
>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1115)
>>>     at org.apache.hadoop.mapred.Child.main(Child.java:262)
>>> Caused by: java.lang.reflect.InvocationTargetException
>>>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>>     at
>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>>>     at
>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>>     at java.lang.reflect.Method.invoke(Method.java:597)
>>>     at
>>> org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88)
>>>     ... 9 more
>>> Caused by: java.lang.RuntimeException: Error in configuring object
>>>     at
>>> org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93)
>>>     at
>>> org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64)
>>>     at
>>> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
>>>     at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:34)
>>>     ... 14 more
>>> Caused by: java.lang.reflect.InvocationTargetException
>>>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>>     at
>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>>>     at
>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>>     at java.lang.reflect.Method.invoke(Method.java:597)
>>>     at
>>> org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88)
>>>     ... 17 more
>>> Caused by: java.lang.RuntimeException: configuration exception
>>>     at
>>> org.apache.hadoop.streaming.PipeMapRed.configure(PipeMapRed.java:230)
>>>     at
>>> org.apache.hadoop.streaming.PipeMapper.configure(PipeMapper.java:66)
>>>     ... 22 more
>>> Caused by: java.io.IOException: Cannot run program
>>> "/home/cloudera/bejoy/apps/inputs/wc/WcStreamMap.py": java.io.IOException:
>>> error=13, Permission denied
>>>     at java.lang.ProcessBuilder.start(ProcessBuilder.java:460)
>>>     at
>>> org.apache.hadoop.streaming.PipeMapRed.configure(PipeMapRed.java:214)
>>>     ... 23 more
>>> Caused by: java.io.IOException: java.io.IOException: error=13, Permission
>>> denied
>>>     at java.lang.UNIXProcess.<init>(UNIXProcess.java:148)
>>>     at java.lang.ProcessImpl.start(ProcessImpl.java:65)
>>>     at java.lang.ProcessBuilder.start(ProcessBuilder.java:453)
>>>     ... 24 more
>>>
>>> On the error I checked the permissions of mapper and reducer. Issued a
>>> chmod 777 command as well. Still no luck.
>>>
>>> The permission of the files are as follows
>>> cloudera@cloudera-vm:~$ ls -l /home/cloudera/bejoy/apps/inputs/wc/
>>> -rwxrwxrwx 1 cloudera cloudera  707 2011-09-11 23:42 WcStreamMap.py
>>> -rwxrwxrwx 1 cloudera cloudera 1077 2011-09-11 23:42 WcStreamReduce.py
>>>
>>> I'm testing the same on Cloudera Demo VM. So the hadoop setup would be on
>>> pseudo distributed mode. Any help would be highly appreciated.
>>>
>>> Thank You
>>>
>>> Regards
>>> Bejoy.K.S
>>>
>>>
>>
>

Re: Hadoop Streaming job Fails - Permission Denied error

Posted by Bejoy KS <be...@gmail.com>.
Thanks Jeremy. I tried with your first suggestion and the mappers ran into
completion. But then the reducers failed with another exception related to
pipes. I believe it may be due to permission issues again. I tried setting a
few additional config parameters but it didn't do the job. Please find the
command used and the error logs from jobtracker web UI

hadoop  jar
/usr/lib/hadoop-0.20/contrib/streaming/hadoop-streaming-0.20.2-cdh3u0.jar
-D hadoop.tmp.dir=/home/streaming/tmp/hadoop/ -D
dfs.data.dir=/home/streaming/tmp -D
mapred.local.dir=/home/streaming/tmp/local -D
mapred.system.dir=/home/streaming/tmp/system -D
mapred.temp.dir=/home/streaming/tmp/temp -input
/userdata/bejoy/apps/wc/input -output /userdata/bejoy/apps/wc/output
-mapper /home/streaming/WcStreamMap.py  -reducer
/home/streaming/WcStreamReduce.py


java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess
failed with code 127
    at
org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:362)
    at
org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:572)
    at org.apache.hadoop.streaming.PipeReducer.close(PipeReducer.java:137)
    at
org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:478)
    at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:416)
    at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:396)
    at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1115)
    at org.apache.hadoop.mapred.Child.main(Child.java:262)


The folder permissions at the time of job execution are as follows

cloudera@cloudera-vm:~$ ls -l  /home/streaming/
drwxrwxrwx 5 root root 4096 2011-09-12 05:59 tmp
-rwxrwxrwx 1 root root  707 2011-09-11 23:42 WcStreamMap.py
-rwxrwxrwx 1 root root 1077 2011-09-11 23:42 WcStreamReduce.py

cloudera@cloudera-vm:~$ ls -l /home/streaming/tmp/
drwxrwxrwx 2 root root 4096 2011-09-12 06:12 hadoop
drwxrwxrwx 2 root root 4096 2011-09-12 05:58 local
drwxrwxrwx 2 root root 4096 2011-09-12 05:59 system
drwxrwxrwx 2 root root 4096 2011-09-12 05:59 temp

Am I missing some thing here?

It is not for long I'm into Linux so couldn't try your second suggestion on
setting up the Linux task controller.

Thanks a lot

Regards
Bejoy.K.S



On Mon, Sep 12, 2011 at 6:20 AM, Jeremy Lewi <je...@lewi.us> wrote:

> I would suggest you try putting your mapper/reducer py files in a directory
> that is world readable at every level . i.e /tmp/test. I had similar
> problems when I was using streaming and I believe my workaround was to put
> the mapper/reducers outside my home directory. The other more involved
> alternative is to setup the linux task controller so you can run your MR
> jobs as the user who submits the jobs.
>
> J
>
>
> On Mon, Sep 12, 2011 at 2:18 AM, Bejoy KS <be...@gmail.com> wrote:
>
>> Hi
>>       I wanted to try out hadoop steaming and got the sample python code
>> for mapper and reducer. I copied both into my lfs and tried running the
>> steaming job as mention in the documentation.
>> Here the command i used to run the job
>>
>> hadoop  jar
>> /usr/lib/hadoop-0.20/contrib/streaming/hadoop-streaming-0.20.2-cdh3u0.jar
>> -input /userdata/bejoy/apps/wc/input -output /userdata/bejoy/apps/wc/output
>> -mapper /home/cloudera/bejoy/apps/inputs/wc/WcStreamMap.py  -reducer
>> /home/cloudera/bejoy/apps/inputs/wc/WcStreamReduce.py
>>
>> Here other than input and output the rest all are on lfs locations. How
>> ever the job is failing. The error log from the jobtracker url is as
>>
>> java.lang.RuntimeException: Error in configuring object
>>     at
>> org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93)
>>     at
>> org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64)
>>     at
>> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
>>     at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:386)
>>     at org.apache.hadoop.mapred.MapTask.run(MapTask.java:324)
>>     at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
>>     at java.security.AccessController.doPrivileged(Native Method)
>>     at javax.security.auth.Subject.doAs(Subject.java:396)
>>     at
>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1115)
>>     at org.apache.hadoop.mapred.Child.main(Child.java:262)
>> Caused by: java.lang.reflect.InvocationTargetException
>>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>     at
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>>     at
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>     at java.lang.reflect.Method.invoke(Method.java:597)
>>     at
>> org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88)
>>     ... 9 more
>> Caused by: java.lang.RuntimeException: Error in configuring object
>>     at
>> org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93)
>>     at
>> org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64)
>>     at
>> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
>>     at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:34)
>>     ... 14 more
>> Caused by: java.lang.reflect.InvocationTargetException
>>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>     at
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>>     at
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>     at java.lang.reflect.Method.invoke(Method.java:597)
>>     at
>> org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88)
>>     ... 17 more
>> Caused by: java.lang.RuntimeException: configuration exception
>>     at
>> org.apache.hadoop.streaming.PipeMapRed.configure(PipeMapRed.java:230)
>>     at
>> org.apache.hadoop.streaming.PipeMapper.configure(PipeMapper.java:66)
>>     ... 22 more
>> Caused by: java.io.IOException: Cannot run program
>> "/home/cloudera/bejoy/apps/inputs/wc/WcStreamMap.py": java.io.IOException:
>> error=13, Permission denied
>>     at java.lang.ProcessBuilder.start(ProcessBuilder.java:460)
>>     at
>> org.apache.hadoop.streaming.PipeMapRed.configure(PipeMapRed.java:214)
>>     ... 23 more
>> Caused by: java.io.IOException: java.io.IOException: error=13, Permission
>> denied
>>     at java.lang.UNIXProcess.<init>(UNIXProcess.java:148)
>>     at java.lang.ProcessImpl.start(ProcessImpl.java:65)
>>     at java.lang.ProcessBuilder.start(ProcessBuilder.java:453)
>>     ... 24 more
>>
>> On the error I checked the permissions of mapper and reducer. Issued a
>> chmod 777 command as well. Still no luck.
>>
>> The permission of the files are as follows
>> cloudera@cloudera-vm:~$ ls -l /home/cloudera/bejoy/apps/inputs/wc/
>> -rwxrwxrwx 1 cloudera cloudera  707 2011-09-11 23:42 WcStreamMap.py
>> -rwxrwxrwx 1 cloudera cloudera 1077 2011-09-11 23:42 WcStreamReduce.py
>>
>> I'm testing the same on Cloudera Demo VM. So the hadoop setup would be on
>> pseudo distributed mode. Any help would be highly appreciated.
>>
>> Thank You
>>
>> Regards
>> Bejoy.K.S
>>
>>
>

Re: Hadoop Streaming job Fails - Permission Denied error

Posted by Jeremy Lewi <je...@lewi.us>.
I would suggest you try putting your mapper/reducer py files in a directory
that is world readable at every level . i.e /tmp/test. I had similar
problems when I was using streaming and I believe my workaround was to put
the mapper/reducers outside my home directory. The other more involved
alternative is to setup the linux task controller so you can run your MR
jobs as the user who submits the jobs.

J

On Mon, Sep 12, 2011 at 2:18 AM, Bejoy KS <be...@gmail.com> wrote:

> Hi
>       I wanted to try out hadoop steaming and got the sample python code
> for mapper and reducer. I copied both into my lfs and tried running the
> steaming job as mention in the documentation.
> Here the command i used to run the job
>
> hadoop  jar
> /usr/lib/hadoop-0.20/contrib/streaming/hadoop-streaming-0.20.2-cdh3u0.jar
> -input /userdata/bejoy/apps/wc/input -output /userdata/bejoy/apps/wc/output
> -mapper /home/cloudera/bejoy/apps/inputs/wc/WcStreamMap.py  -reducer
> /home/cloudera/bejoy/apps/inputs/wc/WcStreamReduce.py
>
> Here other than input and output the rest all are on lfs locations. How
> ever the job is failing. The error log from the jobtracker url is as
>
> java.lang.RuntimeException: Error in configuring object
>     at
> org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93)
>     at
> org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64)
>     at
> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
>     at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:386)
>     at org.apache.hadoop.mapred.MapTask.run(MapTask.java:324)
>     at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
>     at java.security.AccessController.doPrivileged(Native Method)
>     at javax.security.auth.Subject.doAs(Subject.java:396)
>     at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1115)
>     at org.apache.hadoop.mapred.Child.main(Child.java:262)
> Caused by: java.lang.reflect.InvocationTargetException
>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>     at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>     at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>     at java.lang.reflect.Method.invoke(Method.java:597)
>     at
> org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88)
>     ... 9 more
> Caused by: java.lang.RuntimeException: Error in configuring object
>     at
> org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93)
>     at
> org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64)
>     at
> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
>     at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:34)
>     ... 14 more
> Caused by: java.lang.reflect.InvocationTargetException
>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>     at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>     at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>     at java.lang.reflect.Method.invoke(Method.java:597)
>     at
> org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88)
>     ... 17 more
> Caused by: java.lang.RuntimeException: configuration exception
>     at
> org.apache.hadoop.streaming.PipeMapRed.configure(PipeMapRed.java:230)
>     at org.apache.hadoop.streaming.PipeMapper.configure(PipeMapper.java:66)
>     ... 22 more
> Caused by: java.io.IOException: Cannot run program
> "/home/cloudera/bejoy/apps/inputs/wc/WcStreamMap.py": java.io.IOException:
> error=13, Permission denied
>     at java.lang.ProcessBuilder.start(ProcessBuilder.java:460)
>     at
> org.apache.hadoop.streaming.PipeMapRed.configure(PipeMapRed.java:214)
>     ... 23 more
> Caused by: java.io.IOException: java.io.IOException: error=13, Permission
> denied
>     at java.lang.UNIXProcess.<init>(UNIXProcess.java:148)
>     at java.lang.ProcessImpl.start(ProcessImpl.java:65)
>     at java.lang.ProcessBuilder.start(ProcessBuilder.java:453)
>     ... 24 more
>
> On the error I checked the permissions of mapper and reducer. Issued a
> chmod 777 command as well. Still no luck.
>
> The permission of the files are as follows
> cloudera@cloudera-vm:~$ ls -l /home/cloudera/bejoy/apps/inputs/wc/
> -rwxrwxrwx 1 cloudera cloudera  707 2011-09-11 23:42 WcStreamMap.py
> -rwxrwxrwx 1 cloudera cloudera 1077 2011-09-11 23:42 WcStreamReduce.py
>
> I'm testing the same on Cloudera Demo VM. So the hadoop setup would be on
> pseudo distributed mode. Any help would be highly appreciated.
>
> Thank You
>
> Regards
> Bejoy.K.S
>
>

Re: Hadoop Streaming job Fails - Permission Denied error

Posted by Shi Yu <sh...@uchicago.edu>.
Just a quick ask, have you tried to switch off the dfs permission check 
in the hdfs-site.xml file?

<property>
<name>dfs.permissions</name>
<value>false</value>
</property>


Shi

On 9/14/2011 8:29 AM, Brock Noland wrote:
> Hi,
>
> This probably belongs on mapreduce-user as opposed to common-user. I
> have BCC'ed the common-user group.
>
> Generally it's a best practice to ship the scripts with the job. Like so:
>
> hadoop  jar
> /usr/lib/hadoop-0.20/contrib/streaming/hadoop-streaming-0.20.2-cdh3u0.jar
> -input /userdata/bejoy/apps/wc/input -output /userdata/bejoy/apps/wc/output
> -mapper WcStreamMap.py  -reducer WcStreamReduce.py
> -file /home/cloudera/bejoy/apps/inputs/wc/WcStreamMap.py
> -file /home/cloudera/bejoy/apps/inputs/wc/WcStreamReduce.py
>
> Brock
>
> On Mon, Sep 12, 2011 at 4:18 AM, Bejoy KS<be...@gmail.com>  wrote:
>> Hi
>>       I wanted to try out hadoop steaming and got the sample python code for
>> mapper and reducer. I copied both into my lfs and tried running the steaming
>> job as mention in the documentation.
>> Here the command i used to run the job
>>
>> hadoop  jar
>> /usr/lib/hadoop-0.20/contrib/streaming/hadoop-streaming-0.20.2-cdh3u0.jar
>> -input /userdata/bejoy/apps/wc/input -output /userdata/bejoy/apps/wc/output
>> -mapper /home/cloudera/bejoy/apps/inputs/wc/WcStreamMap.py  -reducer
>> /home/cloudera/bejoy/apps/inputs/wc/WcStreamReduce.py
>>
>> Here other than input and output the rest all are on lfs locations. How ever
>> the job is failing. The error log from the jobtracker url is as
>>
>> java.lang.RuntimeException: Error in configuring object
>>     at
>> org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93)
>>     at
>> org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64)
>>     at
>> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
>>     at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:386)
>>     at org.apache.hadoop.mapred.MapTask.run(MapTask.java:324)
>>     at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
>>     at java.security.AccessController.doPrivileged(Native Method)
>>     at javax.security.auth.Subject.doAs(Subject.java:396)
>>     at
>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1115)
>>     at org.apache.hadoop.mapred.Child.main(Child.java:262)
>> Caused by: java.lang.reflect.InvocationTargetException
>>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>     at
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>>     at
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>     at java.lang.reflect.Method.invoke(Method.java:597)
>>     at
>> org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88)
>>     ... 9 more
>> Caused by: java.lang.RuntimeException: Error in configuring object
>>     at
>> org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93)
>>     at
>> org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64)
>>     at
>> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
>>     at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:34)
>>     ... 14 more
>> Caused by: java.lang.reflect.InvocationTargetException
>>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>     at
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>>     at
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>     at java.lang.reflect.Method.invoke(Method.java:597)
>>     at
>> org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88)
>>     ... 17 more
>> Caused by: java.lang.RuntimeException: configuration exception
>>     at org.apache.hadoop.streaming.PipeMapRed.configure(PipeMapRed.java:230)
>>     at org.apache.hadoop.streaming.PipeMapper.configure(PipeMapper.java:66)
>>     ... 22 more
>> Caused by: java.io.IOException: Cannot run program
>> "/home/cloudera/bejoy/apps/inputs/wc/WcStreamMap.py": java.io.IOException:
>> error=13, Permission denied
>>     at java.lang.ProcessBuilder.start(ProcessBuilder.java:460)
>>     at org.apache.hadoop.streaming.PipeMapRed.configure(PipeMapRed.java:214)
>>     ... 23 more
>> Caused by: java.io.IOException: java.io.IOException: error=13, Permission
>> denied
>>     at java.lang.UNIXProcess.<init>(UNIXProcess.java:148)
>>     at java.lang.ProcessImpl.start(ProcessImpl.java:65)
>>     at java.lang.ProcessBuilder.start(ProcessBuilder.java:453)
>>     ... 24 more
>>
>> On the error I checked the permissions of mapper and reducer. Issued a chmod
>> 777 command as well. Still no luck.
>>
>> The permission of the files are as follows
>> cloudera@cloudera-vm:~$ ls -l /home/cloudera/bejoy/apps/inputs/wc/
>> -rwxrwxrwx 1 cloudera cloudera  707 2011-09-11 23:42 WcStreamMap.py
>> -rwxrwxrwx 1 cloudera cloudera 1077 2011-09-11 23:42 WcStreamReduce.py
>>
>> I'm testing the same on Cloudera Demo VM. So the hadoop setup would be on
>> pseudo distributed mode. Any help would be highly appreciated.
>>
>> Thank You
>>
>> Regards
>> Bejoy.K.S
>>



Re: Hadoop Streaming job Fails - Permission Denied error

Posted by Brock Noland <br...@cloudera.com>.
Hi,

This probably belongs on mapreduce-user as opposed to common-user. I
have BCC'ed the common-user group.

Generally it's a best practice to ship the scripts with the job. Like so:

hadoop  jar
/usr/lib/hadoop-0.20/contrib/streaming/hadoop-streaming-0.20.2-cdh3u0.jar
-input /userdata/bejoy/apps/wc/input -output /userdata/bejoy/apps/wc/output
-mapper WcStreamMap.py  -reducer WcStreamReduce.py
-file /home/cloudera/bejoy/apps/inputs/wc/WcStreamMap.py
-file /home/cloudera/bejoy/apps/inputs/wc/WcStreamReduce.py

Brock

On Mon, Sep 12, 2011 at 4:18 AM, Bejoy KS <be...@gmail.com> wrote:
> Hi
>      I wanted to try out hadoop steaming and got the sample python code for
> mapper and reducer. I copied both into my lfs and tried running the steaming
> job as mention in the documentation.
> Here the command i used to run the job
>
> hadoop  jar
> /usr/lib/hadoop-0.20/contrib/streaming/hadoop-streaming-0.20.2-cdh3u0.jar
> -input /userdata/bejoy/apps/wc/input -output /userdata/bejoy/apps/wc/output
> -mapper /home/cloudera/bejoy/apps/inputs/wc/WcStreamMap.py  -reducer
> /home/cloudera/bejoy/apps/inputs/wc/WcStreamReduce.py
>
> Here other than input and output the rest all are on lfs locations. How ever
> the job is failing. The error log from the jobtracker url is as
>
> java.lang.RuntimeException: Error in configuring object
>    at
> org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93)
>    at
> org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64)
>    at
> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
>    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:386)
>    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:324)
>    at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
>    at java.security.AccessController.doPrivileged(Native Method)
>    at javax.security.auth.Subject.doAs(Subject.java:396)
>    at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1115)
>    at org.apache.hadoop.mapred.Child.main(Child.java:262)
> Caused by: java.lang.reflect.InvocationTargetException
>    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>    at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>    at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>    at java.lang.reflect.Method.invoke(Method.java:597)
>    at
> org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88)
>    ... 9 more
> Caused by: java.lang.RuntimeException: Error in configuring object
>    at
> org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93)
>    at
> org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64)
>    at
> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
>    at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:34)
>    ... 14 more
> Caused by: java.lang.reflect.InvocationTargetException
>    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>    at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>    at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>    at java.lang.reflect.Method.invoke(Method.java:597)
>    at
> org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88)
>    ... 17 more
> Caused by: java.lang.RuntimeException: configuration exception
>    at org.apache.hadoop.streaming.PipeMapRed.configure(PipeMapRed.java:230)
>    at org.apache.hadoop.streaming.PipeMapper.configure(PipeMapper.java:66)
>    ... 22 more
> Caused by: java.io.IOException: Cannot run program
> "/home/cloudera/bejoy/apps/inputs/wc/WcStreamMap.py": java.io.IOException:
> error=13, Permission denied
>    at java.lang.ProcessBuilder.start(ProcessBuilder.java:460)
>    at org.apache.hadoop.streaming.PipeMapRed.configure(PipeMapRed.java:214)
>    ... 23 more
> Caused by: java.io.IOException: java.io.IOException: error=13, Permission
> denied
>    at java.lang.UNIXProcess.<init>(UNIXProcess.java:148)
>    at java.lang.ProcessImpl.start(ProcessImpl.java:65)
>    at java.lang.ProcessBuilder.start(ProcessBuilder.java:453)
>    ... 24 more
>
> On the error I checked the permissions of mapper and reducer. Issued a chmod
> 777 command as well. Still no luck.
>
> The permission of the files are as follows
> cloudera@cloudera-vm:~$ ls -l /home/cloudera/bejoy/apps/inputs/wc/
> -rwxrwxrwx 1 cloudera cloudera  707 2011-09-11 23:42 WcStreamMap.py
> -rwxrwxrwx 1 cloudera cloudera 1077 2011-09-11 23:42 WcStreamReduce.py
>
> I'm testing the same on Cloudera Demo VM. So the hadoop setup would be on
> pseudo distributed mode. Any help would be highly appreciated.
>
> Thank You
>
> Regards
> Bejoy.K.S
>