You are viewing a plain text version of this content. The canonical link for it is here.

Posted to hdfs-dev@hadoop.apache.org by Chris Nauroth <cn...@hortonworks.com> on 2013/04/22 20:49:43 UTC

Re: Hadoop Streaming job error - Need help urgent

(Moving to user list, hdfs-dev bcc'd.)

Hi Prithvi,

>From a quick scan, it looks to me like one of your commands ends up using
"input_path" as a string literal instead of replacing with the value of the
input_path variable.  I've pasted the command below.  Notice that one of
the -file options used "input_path" instead of "$input_path".

Is that the problem?

Hope this helps,
--Chris



    $hadoop_bin --config $hadoop_config jar $hadoop_streaming -D
mapred.task.timeout=0 -D
mapred.job.name="BC_N$((num_of_node))_M$((num_of_mapper))"
-D mapred.reduce.tasks=$num_of_reducer -input
input_BC_N$((num_of_node))_M$((num_of_mapper))
-output $output_path -file brandes_mapper -file src/mslab/BC_reducer.py
-file src/mslab/MapReduceUtil.py -file input_path -mapper "./brandes_mapper
$input_path $num_of_node" -reducer "./BC_reducer.py"



On Mon, Apr 22, 2013 at 10:11 AM, prithvi dammalapati <
d.prithvi999@gmail.com> wrote:

> I have the following hadoop code to find the betweenness centrality of a
> graph
>
>     java_home=/usr/lib/jvm/java-1.7.0-openjdk-amd64
>     hadoop_home=/usr/local/hadoop/hadoop-1.0.4
>     hadoop_lib=$hadoop_home/hadoop-core-1.0.4.jar
>     hadoop_bin=$hadoop_home/bin/hadoop
>     hadoop_config=$hadoop_home/conf
>
> hadoop_streaming=$hadoop_home/contrib/streaming/hadoop-streaming-1.0.4.jar
>     #task specific parameters
>     source_code=BetweennessCentrality.java
>     jar_file=BetweennessCentrality.jar
>     main_class=mslab.BetweennessCentrality
>     num_of_node=38012
>     num_of_mapper=100
>     num_of_reducer=8
>     input_path=/data/dblp_author_conf_adj.txt
>     output_path=dblp_bc_N$(($num_of_node))_M$((num_of_mapper))
>     rm build -rf
>     mkdir build
>     $java_home/bin/javac -d build -classpath .:$hadoop_lib
> src/mslab/$source_code
>     rm $jar_file -f
>     $java_home/bin/jar -cf $jar_file -C build/ .
>     $hadoop_bin --config $hadoop_config fs -rmr $output_path
>     $hadoop_bin --config $hadoop_config jar $jar_file $main_class
> $num_of_node       $num_of_mapper
>
>     rm brandes_mapper
>
>     g++ src/mslab/mapred_brandes.cpp -O3 -o brandes_mapper
>     $hadoop_bin --config $hadoop_config jar $hadoop_streaming -D
> mapred.task.timeout=0 -D mapred.job.name="BC_N$((num_of_node))_M$((num_of_mapper))"
> -D mapred.reduce.tasks=$num_of_reducer -input
> input_BC_N$((num_of_node))_M$((num_of_mapper)) -output $output_path -file
> brandes_mapper -file src/mslab/BC_reducer.py -file
> src/mslab/MapReduceUtil.py -file input_path -mapper "./brandes_mapper
> $input_path $num_of_node" -reducer "./BC_reducer.py"
>
> When I run this code in a shell script, i get the following errors:
>
>     Warning: $HADOOP_HOME is deprecated.
>     File: /home/hduser/Downloads/mgmf/trunk/input_path does not exist, or
> is not readable.
>     Streaming Command Failed!
>
> but the file exits at the specified path
>
>     /Downloads/mgmf/trunk/data$ ls
>     dblp_author_conf_adj.txt
>
> I have also added the input file into HDFS using
>
>     /usr/local/hadoop$ bin/hadoop dfs -copyFromLocal /source /destination
>
> Can someone help me solve this problem?
>
>
> Any help is appreciated,
> Thanks
> Prithvi
>

Re: Hadoop Streaming job error - Need help urgent

Posted by Chris Nauroth <cn...@hortonworks.com>.

OK, great.  It looks like with the change to "$input_path", you've made
progress.

Now it's actually submitting the job, but something is causing the map
tasks to fail.  Usually, this is some kind of bug in user code, so you'll
need to do some further investigation on your side.  I expect the tracking
URL mentioned in the output above will give you some clues.  That should
also steer you towards the individual task log outputs.

--Chris



On Mon, Apr 22, 2013 at 12:04 PM, prithvi dammalapati <
d.prithvi999@gmail.com> wrote:

> java_home=/usr/lib/jvm/java-1.7.0-openjdk-amd64
> hadoop_home=/usr/local/hadoop/hadoop-1.0.4
> hadoop_lib=$hadoop_home/hadoop-core-1.0.4.jar
> hadoop_bin=$hadoop_home/bin/hadoop
> hadoop_config=$hadoop_home/conf
> hadoop_streaming=$hadoop_home/contrib/streaming/hadoop-streaming-1.0.4.jar
> #task specific parameters
> source_code=BetweennessCentrality.java
> jar_file=BetweennessCentrality.jar
> main_class=mslab.BetweennessCentrality
> num_of_node=38012
> num_of_mapper=100
> num_of_reducer=8
> input_path=/data/dblp_author_conf_adj.txt
> output_path=dblp_bc_N$(($num_
> of_node))_M$((num_of_mapper))
> rm build -rf
> mkdir build
> $java_home/bin/javac -d build -classpath .:$hadoop_lib src/mslab/$source_code
> rm $jar_file -f
> $java_home/bin/jar -cf $jar_file -C build/ .
> $hadoop_bin --config $hadoop_config fs -rmr $output_path
> $hadoop_bin --config $hadoop_config jar $jar_file $main_class $num_of_node       $num_of_mapper
>
> rm brandes_mapper
>
> g++ src/mslab/mapred_brandes.cpp -O3 -o brandes_mapper
> $hadoop_bin --config $hadoop_config jar $hadoop_streaming -D mapred.task.timeout=0 -D mapred.job.name="BC_N$((num_of_node))_M$((num_of_mapper))" -D mapred.reduce.tasks=$num_of_reducer -input input_BC_N$((num_of_node))_M$((num_of_mapper)) -output $output_path -file brandes_mapper -file src/mslab/BC_reducer.py -file src/mslab/MapReduceUtil.py -file $input_path -mapper "./brandes_mapper $input_path $num_of_node" -reducer "./BC_reducer.py"
>
> After running this code, I get the following error
> 13/04/22 12:29:44 INFO streaming.StreamJob:  map 0%  reduce 0%
> 13/04/22 12:30:01 INFO streaming.StreamJob:  map 20%  reduce 0%
> 13/04/22 12:30:10 INFO streaming.StreamJob:  map 40%  reduce 0%
> 13/04/22 12:30:13 INFO streaming.StreamJob:  map 40%  reduce 2%
> 13/04/22 12:30:16 INFO streaming.StreamJob:  map 40%  reduce 13%
> 13/04/22 12:30:19 INFO streaming.StreamJob:  map 60%  reduce 13%
> 13/04/22 12:30:28 INFO streaming.StreamJob:  map 60%  reduce 17%
> 13/04/22 12:30:31 INFO streaming.StreamJob:  map 60%  reduce 20%
> 13/04/22 12:31:01 INFO streaming.StreamJob:  map 100%  reduce 100%
> 13/04/22 12:31:01 INFO streaming.StreamJob: To kill this job, run:
> 13/04/22 12:31:01 INFO streaming.StreamJob: /usr/local/hadoop/hadoop-1.0.4/libexec/../bin/hadoop job  -Dmapred.job.tracker=localhost:54311 -kill job_201304221215_0002
> 13/04/22 12:31:01 INFO streaming.StreamJob: Tracking URL: http://localhost:50030/jobdetails.jsp?jobid=job_201304221215_0002
> 13/04/22 <http://localhost:50030/jobdetails.jsp?jobid=job_201304221215_000213/04/22> 12:31:01 ERROR streaming.StreamJob: Job not successful. Error: # of failed Map Tasks exceeded allowed limit. FailedCount: 1. LastFailedTask: task_201304221215_0002_m_000006
> 13/04/22 12:31:01 INFO streaming.StreamJob: killJob...
>
> Even if i reduce the num_of_nodes to 10 and no_of_mappers to 10 I get the same error. Can someone help me solve this error
>
> Any help is appreciated
>
> Thanks
>
> Prithvi
>
>
>
> On Mon, Apr 22, 2013 at 12:49 PM, Chris Nauroth <cn...@hortonworks.com>wrote:
>
>> (Moving to user list, hdfs-dev bcc'd.)
>>
>> Hi Prithvi,
>>
>> From a quick scan, it looks to me like one of your commands ends up using
>> "input_path" as a string literal instead of replacing with the value of the
>> input_path variable.  I've pasted the command below.  Notice that one of
>> the -file options used "input_path" instead of "$input_path".
>>
>> Is that the problem?
>>
>> Hope this helps,
>> --Chris
>>
>>
>>
>>     $hadoop_bin --config $hadoop_config jar $hadoop_streaming -D
>> mapred.task.timeout=0 -D mapred.job.name="BC_N$((num_of_node))_M$((num_of_mapper))"
>> -D mapred.reduce.tasks=$num_of_reducer -input
>> input_BC_N$((num_of_node))_M$((num_of_mapper)) -output $output_path
>> -file brandes_mapper -file src/mslab/BC_reducer.py -file
>> src/mslab/MapReduceUtil.py -file input_path -mapper "./brandes_mapper
>> $input_path $num_of_node" -reducer "./BC_reducer.py"
>>
>>
>>
>> On Mon, Apr 22, 2013 at 10:11 AM, prithvi dammalapati <
>> d.prithvi999@gmail.com> wrote:
>>
>>> I have the following hadoop code to find the betweenness centrality of a
>>> graph
>>>
>>>     java_home=/usr/lib/jvm/java-1.7.0-openjdk-amd64
>>>     hadoop_home=/usr/local/hadoop/hadoop-1.0.4
>>>     hadoop_lib=$hadoop_home/hadoop-core-1.0.4.jar
>>>     hadoop_bin=$hadoop_home/bin/hadoop
>>>     hadoop_config=$hadoop_home/conf
>>>
>>> hadoop_streaming=$hadoop_home/contrib/streaming/hadoop-streaming-1.0.4.jar
>>>     #task specific parameters
>>>     source_code=BetweennessCentrality.java
>>>     jar_file=BetweennessCentrality.jar
>>>     main_class=mslab.BetweennessCentrality
>>>     num_of_node=38012
>>>     num_of_mapper=100
>>>     num_of_reducer=8
>>>     input_path=/data/dblp_author_conf_adj.txt
>>>     output_path=dblp_bc_N$(($num_of_node))_M$((num_of_mapper))
>>>     rm build -rf
>>>     mkdir build
>>>     $java_home/bin/javac -d build -classpath .:$hadoop_lib
>>> src/mslab/$source_code
>>>     rm $jar_file -f
>>>     $java_home/bin/jar -cf $jar_file -C build/ .
>>>     $hadoop_bin --config $hadoop_config fs -rmr $output_path
>>>     $hadoop_bin --config $hadoop_config jar $jar_file $main_class
>>> $num_of_node       $num_of_mapper
>>>
>>>     rm brandes_mapper
>>>
>>>     g++ src/mslab/mapred_brandes.cpp -O3 -o brandes_mapper
>>>     $hadoop_bin --config $hadoop_config jar $hadoop_streaming -D
>>> mapred.task.timeout=0 -D mapred.job.name="BC_N$((num_of_node))_M$((num_of_mapper))"
>>> -D mapred.reduce.tasks=$num_of_reducer -input
>>> input_BC_N$((num_of_node))_M$((num_of_mapper)) -output $output_path -file
>>> brandes_mapper -file src/mslab/BC_reducer.py -file
>>> src/mslab/MapReduceUtil.py -file input_path -mapper "./brandes_mapper
>>> $input_path $num_of_node" -reducer "./BC_reducer.py"
>>>
>>> When I run this code in a shell script, i get the following errors:
>>>
>>>     Warning: $HADOOP_HOME is deprecated.
>>>     File: /home/hduser/Downloads/mgmf/trunk/input_path does not exist,
>>> or is not readable.
>>>     Streaming Command Failed!
>>>
>>> but the file exits at the specified path
>>>
>>>     /Downloads/mgmf/trunk/data$ ls
>>>     dblp_author_conf_adj.txt
>>>
>>> I have also added the input file into HDFS using
>>>
>>>     /usr/local/hadoop$ bin/hadoop dfs -copyFromLocal /source /destination
>>>
>>> Can someone help me solve this problem?
>>>
>>>
>>> Any help is appreciated,
>>> Thanks
>>> Prithvi
>>>
>>
>>
>

Re: Hadoop Streaming job error - Need help urgent

Posted by Chris Nauroth <cn...@hortonworks.com>.

OK, great.  It looks like with the change to "$input_path", you've made
progress.

Now it's actually submitting the job, but something is causing the map
tasks to fail.  Usually, this is some kind of bug in user code, so you'll
need to do some further investigation on your side.  I expect the tracking
URL mentioned in the output above will give you some clues.  That should
also steer you towards the individual task log outputs.

--Chris



On Mon, Apr 22, 2013 at 12:04 PM, prithvi dammalapati <
d.prithvi999@gmail.com> wrote:

> java_home=/usr/lib/jvm/java-1.7.0-openjdk-amd64
> hadoop_home=/usr/local/hadoop/hadoop-1.0.4
> hadoop_lib=$hadoop_home/hadoop-core-1.0.4.jar
> hadoop_bin=$hadoop_home/bin/hadoop
> hadoop_config=$hadoop_home/conf
> hadoop_streaming=$hadoop_home/contrib/streaming/hadoop-streaming-1.0.4.jar
> #task specific parameters
> source_code=BetweennessCentrality.java
> jar_file=BetweennessCentrality.jar
> main_class=mslab.BetweennessCentrality
> num_of_node=38012
> num_of_mapper=100
> num_of_reducer=8
> input_path=/data/dblp_author_conf_adj.txt
> output_path=dblp_bc_N$(($num_
> of_node))_M$((num_of_mapper))
> rm build -rf
> mkdir build
> $java_home/bin/javac -d build -classpath .:$hadoop_lib src/mslab/$source_code
> rm $jar_file -f
> $java_home/bin/jar -cf $jar_file -C build/ .
> $hadoop_bin --config $hadoop_config fs -rmr $output_path
> $hadoop_bin --config $hadoop_config jar $jar_file $main_class $num_of_node       $num_of_mapper
>
> rm brandes_mapper
>
> g++ src/mslab/mapred_brandes.cpp -O3 -o brandes_mapper
> $hadoop_bin --config $hadoop_config jar $hadoop_streaming -D mapred.task.timeout=0 -D mapred.job.name="BC_N$((num_of_node))_M$((num_of_mapper))" -D mapred.reduce.tasks=$num_of_reducer -input input_BC_N$((num_of_node))_M$((num_of_mapper)) -output $output_path -file brandes_mapper -file src/mslab/BC_reducer.py -file src/mslab/MapReduceUtil.py -file $input_path -mapper "./brandes_mapper $input_path $num_of_node" -reducer "./BC_reducer.py"
>
> After running this code, I get the following error
> 13/04/22 12:29:44 INFO streaming.StreamJob:  map 0%  reduce 0%
> 13/04/22 12:30:01 INFO streaming.StreamJob:  map 20%  reduce 0%
> 13/04/22 12:30:10 INFO streaming.StreamJob:  map 40%  reduce 0%
> 13/04/22 12:30:13 INFO streaming.StreamJob:  map 40%  reduce 2%
> 13/04/22 12:30:16 INFO streaming.StreamJob:  map 40%  reduce 13%
> 13/04/22 12:30:19 INFO streaming.StreamJob:  map 60%  reduce 13%
> 13/04/22 12:30:28 INFO streaming.StreamJob:  map 60%  reduce 17%
> 13/04/22 12:30:31 INFO streaming.StreamJob:  map 60%  reduce 20%
> 13/04/22 12:31:01 INFO streaming.StreamJob:  map 100%  reduce 100%
> 13/04/22 12:31:01 INFO streaming.StreamJob: To kill this job, run:
> 13/04/22 12:31:01 INFO streaming.StreamJob: /usr/local/hadoop/hadoop-1.0.4/libexec/../bin/hadoop job  -Dmapred.job.tracker=localhost:54311 -kill job_201304221215_0002
> 13/04/22 12:31:01 INFO streaming.StreamJob: Tracking URL: http://localhost:50030/jobdetails.jsp?jobid=job_201304221215_0002
> 13/04/22 <http://localhost:50030/jobdetails.jsp?jobid=job_201304221215_000213/04/22> 12:31:01 ERROR streaming.StreamJob: Job not successful. Error: # of failed Map Tasks exceeded allowed limit. FailedCount: 1. LastFailedTask: task_201304221215_0002_m_000006
> 13/04/22 12:31:01 INFO streaming.StreamJob: killJob...
>
> Even if i reduce the num_of_nodes to 10 and no_of_mappers to 10 I get the same error. Can someone help me solve this error
>
> Any help is appreciated
>
> Thanks
>
> Prithvi
>
>
>
> On Mon, Apr 22, 2013 at 12:49 PM, Chris Nauroth <cn...@hortonworks.com>wrote:
>
>> (Moving to user list, hdfs-dev bcc'd.)
>>
>> Hi Prithvi,
>>
>> From a quick scan, it looks to me like one of your commands ends up using
>> "input_path" as a string literal instead of replacing with the value of the
>> input_path variable.  I've pasted the command below.  Notice that one of
>> the -file options used "input_path" instead of "$input_path".
>>
>> Is that the problem?
>>
>> Hope this helps,
>> --Chris
>>
>>
>>
>>     $hadoop_bin --config $hadoop_config jar $hadoop_streaming -D
>> mapred.task.timeout=0 -D mapred.job.name="BC_N$((num_of_node))_M$((num_of_mapper))"
>> -D mapred.reduce.tasks=$num_of_reducer -input
>> input_BC_N$((num_of_node))_M$((num_of_mapper)) -output $output_path
>> -file brandes_mapper -file src/mslab/BC_reducer.py -file
>> src/mslab/MapReduceUtil.py -file input_path -mapper "./brandes_mapper
>> $input_path $num_of_node" -reducer "./BC_reducer.py"
>>
>>
>>
>> On Mon, Apr 22, 2013 at 10:11 AM, prithvi dammalapati <
>> d.prithvi999@gmail.com> wrote:
>>
>>> I have the following hadoop code to find the betweenness centrality of a
>>> graph
>>>
>>>     java_home=/usr/lib/jvm/java-1.7.0-openjdk-amd64
>>>     hadoop_home=/usr/local/hadoop/hadoop-1.0.4
>>>     hadoop_lib=$hadoop_home/hadoop-core-1.0.4.jar
>>>     hadoop_bin=$hadoop_home/bin/hadoop
>>>     hadoop_config=$hadoop_home/conf
>>>
>>> hadoop_streaming=$hadoop_home/contrib/streaming/hadoop-streaming-1.0.4.jar
>>>     #task specific parameters
>>>     source_code=BetweennessCentrality.java
>>>     jar_file=BetweennessCentrality.jar
>>>     main_class=mslab.BetweennessCentrality
>>>     num_of_node=38012
>>>     num_of_mapper=100
>>>     num_of_reducer=8
>>>     input_path=/data/dblp_author_conf_adj.txt
>>>     output_path=dblp_bc_N$(($num_of_node))_M$((num_of_mapper))
>>>     rm build -rf
>>>     mkdir build
>>>     $java_home/bin/javac -d build -classpath .:$hadoop_lib
>>> src/mslab/$source_code
>>>     rm $jar_file -f
>>>     $java_home/bin/jar -cf $jar_file -C build/ .
>>>     $hadoop_bin --config $hadoop_config fs -rmr $output_path
>>>     $hadoop_bin --config $hadoop_config jar $jar_file $main_class
>>> $num_of_node       $num_of_mapper
>>>
>>>     rm brandes_mapper
>>>
>>>     g++ src/mslab/mapred_brandes.cpp -O3 -o brandes_mapper
>>>     $hadoop_bin --config $hadoop_config jar $hadoop_streaming -D
>>> mapred.task.timeout=0 -D mapred.job.name="BC_N$((num_of_node))_M$((num_of_mapper))"
>>> -D mapred.reduce.tasks=$num_of_reducer -input
>>> input_BC_N$((num_of_node))_M$((num_of_mapper)) -output $output_path -file
>>> brandes_mapper -file src/mslab/BC_reducer.py -file
>>> src/mslab/MapReduceUtil.py -file input_path -mapper "./brandes_mapper
>>> $input_path $num_of_node" -reducer "./BC_reducer.py"
>>>
>>> When I run this code in a shell script, i get the following errors:
>>>
>>>     Warning: $HADOOP_HOME is deprecated.
>>>     File: /home/hduser/Downloads/mgmf/trunk/input_path does not exist,
>>> or is not readable.
>>>     Streaming Command Failed!
>>>
>>> but the file exits at the specified path
>>>
>>>     /Downloads/mgmf/trunk/data$ ls
>>>     dblp_author_conf_adj.txt
>>>
>>> I have also added the input file into HDFS using
>>>
>>>     /usr/local/hadoop$ bin/hadoop dfs -copyFromLocal /source /destination
>>>
>>> Can someone help me solve this problem?
>>>
>>>
>>> Any help is appreciated,
>>> Thanks
>>> Prithvi
>>>
>>
>>
>

Re: Hadoop Streaming job error - Need help urgent

Posted by Chris Nauroth <cn...@hortonworks.com>.

OK, great.  It looks like with the change to "$input_path", you've made
progress.

Now it's actually submitting the job, but something is causing the map
tasks to fail.  Usually, this is some kind of bug in user code, so you'll
need to do some further investigation on your side.  I expect the tracking
URL mentioned in the output above will give you some clues.  That should
also steer you towards the individual task log outputs.

--Chris



On Mon, Apr 22, 2013 at 12:04 PM, prithvi dammalapati <
d.prithvi999@gmail.com> wrote:

> java_home=/usr/lib/jvm/java-1.7.0-openjdk-amd64
> hadoop_home=/usr/local/hadoop/hadoop-1.0.4
> hadoop_lib=$hadoop_home/hadoop-core-1.0.4.jar
> hadoop_bin=$hadoop_home/bin/hadoop
> hadoop_config=$hadoop_home/conf
> hadoop_streaming=$hadoop_home/contrib/streaming/hadoop-streaming-1.0.4.jar
> #task specific parameters
> source_code=BetweennessCentrality.java
> jar_file=BetweennessCentrality.jar
> main_class=mslab.BetweennessCentrality
> num_of_node=38012
> num_of_mapper=100
> num_of_reducer=8
> input_path=/data/dblp_author_conf_adj.txt
> output_path=dblp_bc_N$(($num_
> of_node))_M$((num_of_mapper))
> rm build -rf
> mkdir build
> $java_home/bin/javac -d build -classpath .:$hadoop_lib src/mslab/$source_code
> rm $jar_file -f
> $java_home/bin/jar -cf $jar_file -C build/ .
> $hadoop_bin --config $hadoop_config fs -rmr $output_path
> $hadoop_bin --config $hadoop_config jar $jar_file $main_class $num_of_node       $num_of_mapper
>
> rm brandes_mapper
>
> g++ src/mslab/mapred_brandes.cpp -O3 -o brandes_mapper
> $hadoop_bin --config $hadoop_config jar $hadoop_streaming -D mapred.task.timeout=0 -D mapred.job.name="BC_N$((num_of_node))_M$((num_of_mapper))" -D mapred.reduce.tasks=$num_of_reducer -input input_BC_N$((num_of_node))_M$((num_of_mapper)) -output $output_path -file brandes_mapper -file src/mslab/BC_reducer.py -file src/mslab/MapReduceUtil.py -file $input_path -mapper "./brandes_mapper $input_path $num_of_node" -reducer "./BC_reducer.py"
>
> After running this code, I get the following error
> 13/04/22 12:29:44 INFO streaming.StreamJob:  map 0%  reduce 0%
> 13/04/22 12:30:01 INFO streaming.StreamJob:  map 20%  reduce 0%
> 13/04/22 12:30:10 INFO streaming.StreamJob:  map 40%  reduce 0%
> 13/04/22 12:30:13 INFO streaming.StreamJob:  map 40%  reduce 2%
> 13/04/22 12:30:16 INFO streaming.StreamJob:  map 40%  reduce 13%
> 13/04/22 12:30:19 INFO streaming.StreamJob:  map 60%  reduce 13%
> 13/04/22 12:30:28 INFO streaming.StreamJob:  map 60%  reduce 17%
> 13/04/22 12:30:31 INFO streaming.StreamJob:  map 60%  reduce 20%
> 13/04/22 12:31:01 INFO streaming.StreamJob:  map 100%  reduce 100%
> 13/04/22 12:31:01 INFO streaming.StreamJob: To kill this job, run:
> 13/04/22 12:31:01 INFO streaming.StreamJob: /usr/local/hadoop/hadoop-1.0.4/libexec/../bin/hadoop job  -Dmapred.job.tracker=localhost:54311 -kill job_201304221215_0002
> 13/04/22 12:31:01 INFO streaming.StreamJob: Tracking URL: http://localhost:50030/jobdetails.jsp?jobid=job_201304221215_0002
> 13/04/22 <http://localhost:50030/jobdetails.jsp?jobid=job_201304221215_000213/04/22> 12:31:01 ERROR streaming.StreamJob: Job not successful. Error: # of failed Map Tasks exceeded allowed limit. FailedCount: 1. LastFailedTask: task_201304221215_0002_m_000006
> 13/04/22 12:31:01 INFO streaming.StreamJob: killJob...
>
> Even if i reduce the num_of_nodes to 10 and no_of_mappers to 10 I get the same error. Can someone help me solve this error
>
> Any help is appreciated
>
> Thanks
>
> Prithvi
>
>
>
> On Mon, Apr 22, 2013 at 12:49 PM, Chris Nauroth <cn...@hortonworks.com>wrote:
>
>> (Moving to user list, hdfs-dev bcc'd.)
>>
>> Hi Prithvi,
>>
>> From a quick scan, it looks to me like one of your commands ends up using
>> "input_path" as a string literal instead of replacing with the value of the
>> input_path variable.  I've pasted the command below.  Notice that one of
>> the -file options used "input_path" instead of "$input_path".
>>
>> Is that the problem?
>>
>> Hope this helps,
>> --Chris
>>
>>
>>
>>     $hadoop_bin --config $hadoop_config jar $hadoop_streaming -D
>> mapred.task.timeout=0 -D mapred.job.name="BC_N$((num_of_node))_M$((num_of_mapper))"
>> -D mapred.reduce.tasks=$num_of_reducer -input
>> input_BC_N$((num_of_node))_M$((num_of_mapper)) -output $output_path
>> -file brandes_mapper -file src/mslab/BC_reducer.py -file
>> src/mslab/MapReduceUtil.py -file input_path -mapper "./brandes_mapper
>> $input_path $num_of_node" -reducer "./BC_reducer.py"
>>
>>
>>
>> On Mon, Apr 22, 2013 at 10:11 AM, prithvi dammalapati <
>> d.prithvi999@gmail.com> wrote:
>>
>>> I have the following hadoop code to find the betweenness centrality of a
>>> graph
>>>
>>>     java_home=/usr/lib/jvm/java-1.7.0-openjdk-amd64
>>>     hadoop_home=/usr/local/hadoop/hadoop-1.0.4
>>>     hadoop_lib=$hadoop_home/hadoop-core-1.0.4.jar
>>>     hadoop_bin=$hadoop_home/bin/hadoop
>>>     hadoop_config=$hadoop_home/conf
>>>
>>> hadoop_streaming=$hadoop_home/contrib/streaming/hadoop-streaming-1.0.4.jar
>>>     #task specific parameters
>>>     source_code=BetweennessCentrality.java
>>>     jar_file=BetweennessCentrality.jar
>>>     main_class=mslab.BetweennessCentrality
>>>     num_of_node=38012
>>>     num_of_mapper=100
>>>     num_of_reducer=8
>>>     input_path=/data/dblp_author_conf_adj.txt
>>>     output_path=dblp_bc_N$(($num_of_node))_M$((num_of_mapper))
>>>     rm build -rf
>>>     mkdir build
>>>     $java_home/bin/javac -d build -classpath .:$hadoop_lib
>>> src/mslab/$source_code
>>>     rm $jar_file -f
>>>     $java_home/bin/jar -cf $jar_file -C build/ .
>>>     $hadoop_bin --config $hadoop_config fs -rmr $output_path
>>>     $hadoop_bin --config $hadoop_config jar $jar_file $main_class
>>> $num_of_node       $num_of_mapper
>>>
>>>     rm brandes_mapper
>>>
>>>     g++ src/mslab/mapred_brandes.cpp -O3 -o brandes_mapper
>>>     $hadoop_bin --config $hadoop_config jar $hadoop_streaming -D
>>> mapred.task.timeout=0 -D mapred.job.name="BC_N$((num_of_node))_M$((num_of_mapper))"
>>> -D mapred.reduce.tasks=$num_of_reducer -input
>>> input_BC_N$((num_of_node))_M$((num_of_mapper)) -output $output_path -file
>>> brandes_mapper -file src/mslab/BC_reducer.py -file
>>> src/mslab/MapReduceUtil.py -file input_path -mapper "./brandes_mapper
>>> $input_path $num_of_node" -reducer "./BC_reducer.py"
>>>
>>> When I run this code in a shell script, i get the following errors:
>>>
>>>     Warning: $HADOOP_HOME is deprecated.
>>>     File: /home/hduser/Downloads/mgmf/trunk/input_path does not exist,
>>> or is not readable.
>>>     Streaming Command Failed!
>>>
>>> but the file exits at the specified path
>>>
>>>     /Downloads/mgmf/trunk/data$ ls
>>>     dblp_author_conf_adj.txt
>>>
>>> I have also added the input file into HDFS using
>>>
>>>     /usr/local/hadoop$ bin/hadoop dfs -copyFromLocal /source /destination
>>>
>>> Can someone help me solve this problem?
>>>
>>>
>>> Any help is appreciated,
>>> Thanks
>>> Prithvi
>>>
>>
>>
>

Re: Hadoop Streaming job error - Need help urgent

Posted by Chris Nauroth <cn...@hortonworks.com>.

OK, great.  It looks like with the change to "$input_path", you've made
progress.

Now it's actually submitting the job, but something is causing the map
tasks to fail.  Usually, this is some kind of bug in user code, so you'll
need to do some further investigation on your side.  I expect the tracking
URL mentioned in the output above will give you some clues.  That should
also steer you towards the individual task log outputs.

--Chris



On Mon, Apr 22, 2013 at 12:04 PM, prithvi dammalapati <
d.prithvi999@gmail.com> wrote:

> java_home=/usr/lib/jvm/java-1.7.0-openjdk-amd64
> hadoop_home=/usr/local/hadoop/hadoop-1.0.4
> hadoop_lib=$hadoop_home/hadoop-core-1.0.4.jar
> hadoop_bin=$hadoop_home/bin/hadoop
> hadoop_config=$hadoop_home/conf
> hadoop_streaming=$hadoop_home/contrib/streaming/hadoop-streaming-1.0.4.jar
> #task specific parameters
> source_code=BetweennessCentrality.java
> jar_file=BetweennessCentrality.jar
> main_class=mslab.BetweennessCentrality
> num_of_node=38012
> num_of_mapper=100
> num_of_reducer=8
> input_path=/data/dblp_author_conf_adj.txt
> output_path=dblp_bc_N$(($num_
> of_node))_M$((num_of_mapper))
> rm build -rf
> mkdir build
> $java_home/bin/javac -d build -classpath .:$hadoop_lib src/mslab/$source_code
> rm $jar_file -f
> $java_home/bin/jar -cf $jar_file -C build/ .
> $hadoop_bin --config $hadoop_config fs -rmr $output_path
> $hadoop_bin --config $hadoop_config jar $jar_file $main_class $num_of_node       $num_of_mapper
>
> rm brandes_mapper
>
> g++ src/mslab/mapred_brandes.cpp -O3 -o brandes_mapper
> $hadoop_bin --config $hadoop_config jar $hadoop_streaming -D mapred.task.timeout=0 -D mapred.job.name="BC_N$((num_of_node))_M$((num_of_mapper))" -D mapred.reduce.tasks=$num_of_reducer -input input_BC_N$((num_of_node))_M$((num_of_mapper)) -output $output_path -file brandes_mapper -file src/mslab/BC_reducer.py -file src/mslab/MapReduceUtil.py -file $input_path -mapper "./brandes_mapper $input_path $num_of_node" -reducer "./BC_reducer.py"
>
> After running this code, I get the following error
> 13/04/22 12:29:44 INFO streaming.StreamJob:  map 0%  reduce 0%
> 13/04/22 12:30:01 INFO streaming.StreamJob:  map 20%  reduce 0%
> 13/04/22 12:30:10 INFO streaming.StreamJob:  map 40%  reduce 0%
> 13/04/22 12:30:13 INFO streaming.StreamJob:  map 40%  reduce 2%
> 13/04/22 12:30:16 INFO streaming.StreamJob:  map 40%  reduce 13%
> 13/04/22 12:30:19 INFO streaming.StreamJob:  map 60%  reduce 13%
> 13/04/22 12:30:28 INFO streaming.StreamJob:  map 60%  reduce 17%
> 13/04/22 12:30:31 INFO streaming.StreamJob:  map 60%  reduce 20%
> 13/04/22 12:31:01 INFO streaming.StreamJob:  map 100%  reduce 100%
> 13/04/22 12:31:01 INFO streaming.StreamJob: To kill this job, run:
> 13/04/22 12:31:01 INFO streaming.StreamJob: /usr/local/hadoop/hadoop-1.0.4/libexec/../bin/hadoop job  -Dmapred.job.tracker=localhost:54311 -kill job_201304221215_0002
> 13/04/22 12:31:01 INFO streaming.StreamJob: Tracking URL: http://localhost:50030/jobdetails.jsp?jobid=job_201304221215_0002
> 13/04/22 <http://localhost:50030/jobdetails.jsp?jobid=job_201304221215_000213/04/22> 12:31:01 ERROR streaming.StreamJob: Job not successful. Error: # of failed Map Tasks exceeded allowed limit. FailedCount: 1. LastFailedTask: task_201304221215_0002_m_000006
> 13/04/22 12:31:01 INFO streaming.StreamJob: killJob...
>
> Even if i reduce the num_of_nodes to 10 and no_of_mappers to 10 I get the same error. Can someone help me solve this error
>
> Any help is appreciated
>
> Thanks
>
> Prithvi
>
>
>
> On Mon, Apr 22, 2013 at 12:49 PM, Chris Nauroth <cn...@hortonworks.com>wrote:
>
>> (Moving to user list, hdfs-dev bcc'd.)
>>
>> Hi Prithvi,
>>
>> From a quick scan, it looks to me like one of your commands ends up using
>> "input_path" as a string literal instead of replacing with the value of the
>> input_path variable.  I've pasted the command below.  Notice that one of
>> the -file options used "input_path" instead of "$input_path".
>>
>> Is that the problem?
>>
>> Hope this helps,
>> --Chris
>>
>>
>>
>>     $hadoop_bin --config $hadoop_config jar $hadoop_streaming -D
>> mapred.task.timeout=0 -D mapred.job.name="BC_N$((num_of_node))_M$((num_of_mapper))"
>> -D mapred.reduce.tasks=$num_of_reducer -input
>> input_BC_N$((num_of_node))_M$((num_of_mapper)) -output $output_path
>> -file brandes_mapper -file src/mslab/BC_reducer.py -file
>> src/mslab/MapReduceUtil.py -file input_path -mapper "./brandes_mapper
>> $input_path $num_of_node" -reducer "./BC_reducer.py"
>>
>>
>>
>> On Mon, Apr 22, 2013 at 10:11 AM, prithvi dammalapati <
>> d.prithvi999@gmail.com> wrote:
>>
>>> I have the following hadoop code to find the betweenness centrality of a
>>> graph
>>>
>>>     java_home=/usr/lib/jvm/java-1.7.0-openjdk-amd64
>>>     hadoop_home=/usr/local/hadoop/hadoop-1.0.4
>>>     hadoop_lib=$hadoop_home/hadoop-core-1.0.4.jar
>>>     hadoop_bin=$hadoop_home/bin/hadoop
>>>     hadoop_config=$hadoop_home/conf
>>>
>>> hadoop_streaming=$hadoop_home/contrib/streaming/hadoop-streaming-1.0.4.jar
>>>     #task specific parameters
>>>     source_code=BetweennessCentrality.java
>>>     jar_file=BetweennessCentrality.jar
>>>     main_class=mslab.BetweennessCentrality
>>>     num_of_node=38012
>>>     num_of_mapper=100
>>>     num_of_reducer=8
>>>     input_path=/data/dblp_author_conf_adj.txt
>>>     output_path=dblp_bc_N$(($num_of_node))_M$((num_of_mapper))
>>>     rm build -rf
>>>     mkdir build
>>>     $java_home/bin/javac -d build -classpath .:$hadoop_lib
>>> src/mslab/$source_code
>>>     rm $jar_file -f
>>>     $java_home/bin/jar -cf $jar_file -C build/ .
>>>     $hadoop_bin --config $hadoop_config fs -rmr $output_path
>>>     $hadoop_bin --config $hadoop_config jar $jar_file $main_class
>>> $num_of_node       $num_of_mapper
>>>
>>>     rm brandes_mapper
>>>
>>>     g++ src/mslab/mapred_brandes.cpp -O3 -o brandes_mapper
>>>     $hadoop_bin --config $hadoop_config jar $hadoop_streaming -D
>>> mapred.task.timeout=0 -D mapred.job.name="BC_N$((num_of_node))_M$((num_of_mapper))"
>>> -D mapred.reduce.tasks=$num_of_reducer -input
>>> input_BC_N$((num_of_node))_M$((num_of_mapper)) -output $output_path -file
>>> brandes_mapper -file src/mslab/BC_reducer.py -file
>>> src/mslab/MapReduceUtil.py -file input_path -mapper "./brandes_mapper
>>> $input_path $num_of_node" -reducer "./BC_reducer.py"
>>>
>>> When I run this code in a shell script, i get the following errors:
>>>
>>>     Warning: $HADOOP_HOME is deprecated.
>>>     File: /home/hduser/Downloads/mgmf/trunk/input_path does not exist,
>>> or is not readable.
>>>     Streaming Command Failed!
>>>
>>> but the file exits at the specified path
>>>
>>>     /Downloads/mgmf/trunk/data$ ls
>>>     dblp_author_conf_adj.txt
>>>
>>> I have also added the input file into HDFS using
>>>
>>>     /usr/local/hadoop$ bin/hadoop dfs -copyFromLocal /source /destination
>>>
>>> Can someone help me solve this problem?
>>>
>>>
>>> Any help is appreciated,
>>> Thanks
>>> Prithvi
>>>
>>
>>
>

Re: Hadoop Streaming job error - Need help urgent

Posted by prithvi dammalapati <d....@gmail.com>.

java_home=/usr/lib/jvm/java-1.7.0-openjdk-amd64
hadoop_home=/usr/local/hadoop/hadoop-1.0.4
hadoop_lib=$hadoop_home/hadoop-core-1.0.4.jar
hadoop_bin=$hadoop_home/bin/hadoop
hadoop_config=$hadoop_home/conf
hadoop_streaming=$hadoop_home/contrib/streaming/hadoop-streaming-1.0.4.jar
#task specific parameters
source_code=BetweennessCentrality.java
jar_file=BetweennessCentrality.jar
main_class=mslab.BetweennessCentrality
num_of_node=38012
num_of_mapper=100
num_of_reducer=8
input_path=/data/dblp_author_conf_adj.txt
output_path=dblp_bc_N$(($num_of_node))_M$((num_of_mapper))
rm build -rf
mkdir build
$java_home/bin/javac -d build -classpath .:$hadoop_lib src/mslab/$source_code
rm $jar_file -f
$java_home/bin/jar -cf $jar_file -C build/ .
$hadoop_bin --config $hadoop_config fs -rmr $output_path
$hadoop_bin --config $hadoop_config jar $jar_file $main_class
$num_of_node       $num_of_mapper

rm brandes_mapper

g++ src/mslab/mapred_brandes.cpp -O3 -o brandes_mapper
$hadoop_bin --config $hadoop_config jar $hadoop_streaming -D
mapred.task.timeout=0 -D
mapred.job.name="BC_N$((num_of_node))_M$((num_of_mapper))" -D
mapred.reduce.tasks=$num_of_reducer -input
input_BC_N$((num_of_node))_M$((num_of_mapper)) -output $output_path
-file brandes_mapper -file src/mslab/BC_reducer.py -file
src/mslab/MapReduceUtil.py -file $input_path -mapper "./brandes_mapper
$input_path $num_of_node" -reducer "./BC_reducer.py"

After running this code, I get the following error
13/04/22 12:29:44 INFO streaming.StreamJob:  map 0%  reduce 0%
13/04/22 12:30:01 INFO streaming.StreamJob:  map 20%  reduce 0%
13/04/22 12:30:10 INFO streaming.StreamJob:  map 40%  reduce 0%
13/04/22 12:30:13 INFO streaming.StreamJob:  map 40%  reduce 2%
13/04/22 12:30:16 INFO streaming.StreamJob:  map 40%  reduce 13%
13/04/22 12:30:19 INFO streaming.StreamJob:  map 60%  reduce 13%
13/04/22 12:30:28 INFO streaming.StreamJob:  map 60%  reduce 17%
13/04/22 12:30:31 INFO streaming.StreamJob:  map 60%  reduce 20%
13/04/22 12:31:01 INFO streaming.StreamJob:  map 100%  reduce 100%
13/04/22 12:31:01 INFO streaming.StreamJob: To kill this job, run:
13/04/22 12:31:01 INFO streaming.StreamJob:
/usr/local/hadoop/hadoop-1.0.4/libexec/../bin/hadoop job
-Dmapred.job.tracker=localhost:54311 -kill job_201304221215_0002
13/04/22 12:31:01 INFO streaming.StreamJob: Tracking URL:
http://localhost:50030/jobdetails.jsp?jobid=job_201304221215_0002
13/04/22 12:31:01 ERROR streaming.StreamJob: Job not successful.
Error: # of failed Map Tasks exceeded allowed limit. FailedCount: 1.
LastFailedTask: task_201304221215_0002_m_000006
13/04/22 12:31:01 INFO streaming.StreamJob: killJob...

Even if i reduce the num_of_nodes to 10 and no_of_mappers to 10 I get
the same error. Can someone help me solve this error

Any help is appreciated

Thanks

Prithvi



On Mon, Apr 22, 2013 at 12:49 PM, Chris Nauroth <cn...@hortonworks.com>wrote:

> (Moving to user list, hdfs-dev bcc'd.)
>
> Hi Prithvi,
>
> From a quick scan, it looks to me like one of your commands ends up using
> "input_path" as a string literal instead of replacing with the value of the
> input_path variable.  I've pasted the command below.  Notice that one of
> the -file options used "input_path" instead of "$input_path".
>
> Is that the problem?
>
> Hope this helps,
> --Chris
>
>
>
>     $hadoop_bin --config $hadoop_config jar $hadoop_streaming -D
> mapred.task.timeout=0 -D mapred.job.name="BC_N$((num_of_node))_M$((num_of_mapper))"
> -D mapred.reduce.tasks=$num_of_reducer -input
> input_BC_N$((num_of_node))_M$((num_of_mapper)) -output $output_path -file
> brandes_mapper -file src/mslab/BC_reducer.py -file
> src/mslab/MapReduceUtil.py -file input_path -mapper "./brandes_mapper
> $input_path $num_of_node" -reducer "./BC_reducer.py"
>
>
>
> On Mon, Apr 22, 2013 at 10:11 AM, prithvi dammalapati <
> d.prithvi999@gmail.com> wrote:
>
>> I have the following hadoop code to find the betweenness centrality of a
>> graph
>>
>>     java_home=/usr/lib/jvm/java-1.7.0-openjdk-amd64
>>     hadoop_home=/usr/local/hadoop/hadoop-1.0.4
>>     hadoop_lib=$hadoop_home/hadoop-core-1.0.4.jar
>>     hadoop_bin=$hadoop_home/bin/hadoop
>>     hadoop_config=$hadoop_home/conf
>>
>> hadoop_streaming=$hadoop_home/contrib/streaming/hadoop-streaming-1.0.4.jar
>>     #task specific parameters
>>     source_code=BetweennessCentrality.java
>>     jar_file=BetweennessCentrality.jar
>>     main_class=mslab.BetweennessCentrality
>>     num_of_node=38012
>>     num_of_mapper=100
>>     num_of_reducer=8
>>     input_path=/data/dblp_author_conf_adj.txt
>>     output_path=dblp_bc_N$(($num_of_node))_M$((num_of_mapper))
>>     rm build -rf
>>     mkdir build
>>     $java_home/bin/javac -d build -classpath .:$hadoop_lib
>> src/mslab/$source_code
>>     rm $jar_file -f
>>     $java_home/bin/jar -cf $jar_file -C build/ .
>>     $hadoop_bin --config $hadoop_config fs -rmr $output_path
>>     $hadoop_bin --config $hadoop_config jar $jar_file $main_class
>> $num_of_node       $num_of_mapper
>>
>>     rm brandes_mapper
>>
>>     g++ src/mslab/mapred_brandes.cpp -O3 -o brandes_mapper
>>     $hadoop_bin --config $hadoop_config jar $hadoop_streaming -D
>> mapred.task.timeout=0 -D mapred.job.name="BC_N$((num_of_node))_M$((num_of_mapper))"
>> -D mapred.reduce.tasks=$num_of_reducer -input
>> input_BC_N$((num_of_node))_M$((num_of_mapper)) -output $output_path -file
>> brandes_mapper -file src/mslab/BC_reducer.py -file
>> src/mslab/MapReduceUtil.py -file input_path -mapper "./brandes_mapper
>> $input_path $num_of_node" -reducer "./BC_reducer.py"
>>
>> When I run this code in a shell script, i get the following errors:
>>
>>     Warning: $HADOOP_HOME is deprecated.
>>     File: /home/hduser/Downloads/mgmf/trunk/input_path does not exist, or
>> is not readable.
>>     Streaming Command Failed!
>>
>> but the file exits at the specified path
>>
>>     /Downloads/mgmf/trunk/data$ ls
>>     dblp_author_conf_adj.txt
>>
>> I have also added the input file into HDFS using
>>
>>     /usr/local/hadoop$ bin/hadoop dfs -copyFromLocal /source /destination
>>
>> Can someone help me solve this problem?
>>
>>
>> Any help is appreciated,
>> Thanks
>> Prithvi
>>
>
>

Re: Hadoop Streaming job error - Need help urgent

Posted by prithvi dammalapati <d....@gmail.com>.

java_home=/usr/lib/jvm/java-1.7.0-openjdk-amd64
hadoop_home=/usr/local/hadoop/hadoop-1.0.4
hadoop_lib=$hadoop_home/hadoop-core-1.0.4.jar
hadoop_bin=$hadoop_home/bin/hadoop
hadoop_config=$hadoop_home/conf
hadoop_streaming=$hadoop_home/contrib/streaming/hadoop-streaming-1.0.4.jar
#task specific parameters
source_code=BetweennessCentrality.java
jar_file=BetweennessCentrality.jar
main_class=mslab.BetweennessCentrality
num_of_node=38012
num_of_mapper=100
num_of_reducer=8
input_path=/data/dblp_author_conf_adj.txt
output_path=dblp_bc_N$(($num_of_node))_M$((num_of_mapper))
rm build -rf
mkdir build
$java_home/bin/javac -d build -classpath .:$hadoop_lib src/mslab/$source_code
rm $jar_file -f
$java_home/bin/jar -cf $jar_file -C build/ .
$hadoop_bin --config $hadoop_config fs -rmr $output_path
$hadoop_bin --config $hadoop_config jar $jar_file $main_class
$num_of_node       $num_of_mapper

rm brandes_mapper

g++ src/mslab/mapred_brandes.cpp -O3 -o brandes_mapper
$hadoop_bin --config $hadoop_config jar $hadoop_streaming -D
mapred.task.timeout=0 -D
mapred.job.name="BC_N$((num_of_node))_M$((num_of_mapper))" -D
mapred.reduce.tasks=$num_of_reducer -input
input_BC_N$((num_of_node))_M$((num_of_mapper)) -output $output_path
-file brandes_mapper -file src/mslab/BC_reducer.py -file
src/mslab/MapReduceUtil.py -file $input_path -mapper "./brandes_mapper
$input_path $num_of_node" -reducer "./BC_reducer.py"

After running this code, I get the following error
13/04/22 12:29:44 INFO streaming.StreamJob:  map 0%  reduce 0%
13/04/22 12:30:01 INFO streaming.StreamJob:  map 20%  reduce 0%
13/04/22 12:30:10 INFO streaming.StreamJob:  map 40%  reduce 0%
13/04/22 12:30:13 INFO streaming.StreamJob:  map 40%  reduce 2%
13/04/22 12:30:16 INFO streaming.StreamJob:  map 40%  reduce 13%
13/04/22 12:30:19 INFO streaming.StreamJob:  map 60%  reduce 13%
13/04/22 12:30:28 INFO streaming.StreamJob:  map 60%  reduce 17%
13/04/22 12:30:31 INFO streaming.StreamJob:  map 60%  reduce 20%
13/04/22 12:31:01 INFO streaming.StreamJob:  map 100%  reduce 100%
13/04/22 12:31:01 INFO streaming.StreamJob: To kill this job, run:
13/04/22 12:31:01 INFO streaming.StreamJob:
/usr/local/hadoop/hadoop-1.0.4/libexec/../bin/hadoop job
-Dmapred.job.tracker=localhost:54311 -kill job_201304221215_0002
13/04/22 12:31:01 INFO streaming.StreamJob: Tracking URL:
http://localhost:50030/jobdetails.jsp?jobid=job_201304221215_0002
13/04/22 12:31:01 ERROR streaming.StreamJob: Job not successful.
Error: # of failed Map Tasks exceeded allowed limit. FailedCount: 1.
LastFailedTask: task_201304221215_0002_m_000006
13/04/22 12:31:01 INFO streaming.StreamJob: killJob...

Even if i reduce the num_of_nodes to 10 and no_of_mappers to 10 I get
the same error. Can someone help me solve this error

Any help is appreciated

Thanks

Prithvi



On Mon, Apr 22, 2013 at 12:49 PM, Chris Nauroth <cn...@hortonworks.com>wrote:

> (Moving to user list, hdfs-dev bcc'd.)
>
> Hi Prithvi,
>
> From a quick scan, it looks to me like one of your commands ends up using
> "input_path" as a string literal instead of replacing with the value of the
> input_path variable.  I've pasted the command below.  Notice that one of
> the -file options used "input_path" instead of "$input_path".
>
> Is that the problem?
>
> Hope this helps,
> --Chris
>
>
>
>     $hadoop_bin --config $hadoop_config jar $hadoop_streaming -D
> mapred.task.timeout=0 -D mapred.job.name="BC_N$((num_of_node))_M$((num_of_mapper))"
> -D mapred.reduce.tasks=$num_of_reducer -input
> input_BC_N$((num_of_node))_M$((num_of_mapper)) -output $output_path -file
> brandes_mapper -file src/mslab/BC_reducer.py -file
> src/mslab/MapReduceUtil.py -file input_path -mapper "./brandes_mapper
> $input_path $num_of_node" -reducer "./BC_reducer.py"
>
>
>
> On Mon, Apr 22, 2013 at 10:11 AM, prithvi dammalapati <
> d.prithvi999@gmail.com> wrote:
>
>> I have the following hadoop code to find the betweenness centrality of a
>> graph
>>
>>     java_home=/usr/lib/jvm/java-1.7.0-openjdk-amd64
>>     hadoop_home=/usr/local/hadoop/hadoop-1.0.4
>>     hadoop_lib=$hadoop_home/hadoop-core-1.0.4.jar
>>     hadoop_bin=$hadoop_home/bin/hadoop
>>     hadoop_config=$hadoop_home/conf
>>
>> hadoop_streaming=$hadoop_home/contrib/streaming/hadoop-streaming-1.0.4.jar
>>     #task specific parameters
>>     source_code=BetweennessCentrality.java
>>     jar_file=BetweennessCentrality.jar
>>     main_class=mslab.BetweennessCentrality
>>     num_of_node=38012
>>     num_of_mapper=100
>>     num_of_reducer=8
>>     input_path=/data/dblp_author_conf_adj.txt
>>     output_path=dblp_bc_N$(($num_of_node))_M$((num_of_mapper))
>>     rm build -rf
>>     mkdir build
>>     $java_home/bin/javac -d build -classpath .:$hadoop_lib
>> src/mslab/$source_code
>>     rm $jar_file -f
>>     $java_home/bin/jar -cf $jar_file -C build/ .
>>     $hadoop_bin --config $hadoop_config fs -rmr $output_path
>>     $hadoop_bin --config $hadoop_config jar $jar_file $main_class
>> $num_of_node       $num_of_mapper
>>
>>     rm brandes_mapper
>>
>>     g++ src/mslab/mapred_brandes.cpp -O3 -o brandes_mapper
>>     $hadoop_bin --config $hadoop_config jar $hadoop_streaming -D
>> mapred.task.timeout=0 -D mapred.job.name="BC_N$((num_of_node))_M$((num_of_mapper))"
>> -D mapred.reduce.tasks=$num_of_reducer -input
>> input_BC_N$((num_of_node))_M$((num_of_mapper)) -output $output_path -file
>> brandes_mapper -file src/mslab/BC_reducer.py -file
>> src/mslab/MapReduceUtil.py -file input_path -mapper "./brandes_mapper
>> $input_path $num_of_node" -reducer "./BC_reducer.py"
>>
>> When I run this code in a shell script, i get the following errors:
>>
>>     Warning: $HADOOP_HOME is deprecated.
>>     File: /home/hduser/Downloads/mgmf/trunk/input_path does not exist, or
>> is not readable.
>>     Streaming Command Failed!
>>
>> but the file exits at the specified path
>>
>>     /Downloads/mgmf/trunk/data$ ls
>>     dblp_author_conf_adj.txt
>>
>> I have also added the input file into HDFS using
>>
>>     /usr/local/hadoop$ bin/hadoop dfs -copyFromLocal /source /destination
>>
>> Can someone help me solve this problem?
>>
>>
>> Any help is appreciated,
>> Thanks
>> Prithvi
>>
>
>

Re: Hadoop Streaming job error - Need help urgent

Posted by prithvi dammalapati <d....@gmail.com>.

java_home=/usr/lib/jvm/java-1.7.0-openjdk-amd64
hadoop_home=/usr/local/hadoop/hadoop-1.0.4
hadoop_lib=$hadoop_home/hadoop-core-1.0.4.jar
hadoop_bin=$hadoop_home/bin/hadoop
hadoop_config=$hadoop_home/conf
hadoop_streaming=$hadoop_home/contrib/streaming/hadoop-streaming-1.0.4.jar
#task specific parameters
source_code=BetweennessCentrality.java
jar_file=BetweennessCentrality.jar
main_class=mslab.BetweennessCentrality
num_of_node=38012
num_of_mapper=100
num_of_reducer=8
input_path=/data/dblp_author_conf_adj.txt
output_path=dblp_bc_N$(($num_of_node))_M$((num_of_mapper))
rm build -rf
mkdir build
$java_home/bin/javac -d build -classpath .:$hadoop_lib src/mslab/$source_code
rm $jar_file -f
$java_home/bin/jar -cf $jar_file -C build/ .
$hadoop_bin --config $hadoop_config fs -rmr $output_path
$hadoop_bin --config $hadoop_config jar $jar_file $main_class
$num_of_node       $num_of_mapper

rm brandes_mapper

g++ src/mslab/mapred_brandes.cpp -O3 -o brandes_mapper
$hadoop_bin --config $hadoop_config jar $hadoop_streaming -D
mapred.task.timeout=0 -D
mapred.job.name="BC_N$((num_of_node))_M$((num_of_mapper))" -D
mapred.reduce.tasks=$num_of_reducer -input
input_BC_N$((num_of_node))_M$((num_of_mapper)) -output $output_path
-file brandes_mapper -file src/mslab/BC_reducer.py -file
src/mslab/MapReduceUtil.py -file $input_path -mapper "./brandes_mapper
$input_path $num_of_node" -reducer "./BC_reducer.py"

After running this code, I get the following error
13/04/22 12:29:44 INFO streaming.StreamJob:  map 0%  reduce 0%
13/04/22 12:30:01 INFO streaming.StreamJob:  map 20%  reduce 0%
13/04/22 12:30:10 INFO streaming.StreamJob:  map 40%  reduce 0%
13/04/22 12:30:13 INFO streaming.StreamJob:  map 40%  reduce 2%
13/04/22 12:30:16 INFO streaming.StreamJob:  map 40%  reduce 13%
13/04/22 12:30:19 INFO streaming.StreamJob:  map 60%  reduce 13%
13/04/22 12:30:28 INFO streaming.StreamJob:  map 60%  reduce 17%
13/04/22 12:30:31 INFO streaming.StreamJob:  map 60%  reduce 20%
13/04/22 12:31:01 INFO streaming.StreamJob:  map 100%  reduce 100%
13/04/22 12:31:01 INFO streaming.StreamJob: To kill this job, run:
13/04/22 12:31:01 INFO streaming.StreamJob:
/usr/local/hadoop/hadoop-1.0.4/libexec/../bin/hadoop job
-Dmapred.job.tracker=localhost:54311 -kill job_201304221215_0002
13/04/22 12:31:01 INFO streaming.StreamJob: Tracking URL:
http://localhost:50030/jobdetails.jsp?jobid=job_201304221215_0002
13/04/22 12:31:01 ERROR streaming.StreamJob: Job not successful.
Error: # of failed Map Tasks exceeded allowed limit. FailedCount: 1.
LastFailedTask: task_201304221215_0002_m_000006
13/04/22 12:31:01 INFO streaming.StreamJob: killJob...

Even if i reduce the num_of_nodes to 10 and no_of_mappers to 10 I get
the same error. Can someone help me solve this error

Any help is appreciated

Thanks

Prithvi



On Mon, Apr 22, 2013 at 12:49 PM, Chris Nauroth <cn...@hortonworks.com>wrote:

> (Moving to user list, hdfs-dev bcc'd.)
>
> Hi Prithvi,
>
> From a quick scan, it looks to me like one of your commands ends up using
> "input_path" as a string literal instead of replacing with the value of the
> input_path variable.  I've pasted the command below.  Notice that one of
> the -file options used "input_path" instead of "$input_path".
>
> Is that the problem?
>
> Hope this helps,
> --Chris
>
>
>
>     $hadoop_bin --config $hadoop_config jar $hadoop_streaming -D
> mapred.task.timeout=0 -D mapred.job.name="BC_N$((num_of_node))_M$((num_of_mapper))"
> -D mapred.reduce.tasks=$num_of_reducer -input
> input_BC_N$((num_of_node))_M$((num_of_mapper)) -output $output_path -file
> brandes_mapper -file src/mslab/BC_reducer.py -file
> src/mslab/MapReduceUtil.py -file input_path -mapper "./brandes_mapper
> $input_path $num_of_node" -reducer "./BC_reducer.py"
>
>
>
> On Mon, Apr 22, 2013 at 10:11 AM, prithvi dammalapati <
> d.prithvi999@gmail.com> wrote:
>
>> I have the following hadoop code to find the betweenness centrality of a
>> graph
>>
>>     java_home=/usr/lib/jvm/java-1.7.0-openjdk-amd64
>>     hadoop_home=/usr/local/hadoop/hadoop-1.0.4
>>     hadoop_lib=$hadoop_home/hadoop-core-1.0.4.jar
>>     hadoop_bin=$hadoop_home/bin/hadoop
>>     hadoop_config=$hadoop_home/conf
>>
>> hadoop_streaming=$hadoop_home/contrib/streaming/hadoop-streaming-1.0.4.jar
>>     #task specific parameters
>>     source_code=BetweennessCentrality.java
>>     jar_file=BetweennessCentrality.jar
>>     main_class=mslab.BetweennessCentrality
>>     num_of_node=38012
>>     num_of_mapper=100
>>     num_of_reducer=8
>>     input_path=/data/dblp_author_conf_adj.txt
>>     output_path=dblp_bc_N$(($num_of_node))_M$((num_of_mapper))
>>     rm build -rf
>>     mkdir build
>>     $java_home/bin/javac -d build -classpath .:$hadoop_lib
>> src/mslab/$source_code
>>     rm $jar_file -f
>>     $java_home/bin/jar -cf $jar_file -C build/ .
>>     $hadoop_bin --config $hadoop_config fs -rmr $output_path
>>     $hadoop_bin --config $hadoop_config jar $jar_file $main_class
>> $num_of_node       $num_of_mapper
>>
>>     rm brandes_mapper
>>
>>     g++ src/mslab/mapred_brandes.cpp -O3 -o brandes_mapper
>>     $hadoop_bin --config $hadoop_config jar $hadoop_streaming -D
>> mapred.task.timeout=0 -D mapred.job.name="BC_N$((num_of_node))_M$((num_of_mapper))"
>> -D mapred.reduce.tasks=$num_of_reducer -input
>> input_BC_N$((num_of_node))_M$((num_of_mapper)) -output $output_path -file
>> brandes_mapper -file src/mslab/BC_reducer.py -file
>> src/mslab/MapReduceUtil.py -file input_path -mapper "./brandes_mapper
>> $input_path $num_of_node" -reducer "./BC_reducer.py"
>>
>> When I run this code in a shell script, i get the following errors:
>>
>>     Warning: $HADOOP_HOME is deprecated.
>>     File: /home/hduser/Downloads/mgmf/trunk/input_path does not exist, or
>> is not readable.
>>     Streaming Command Failed!
>>
>> but the file exits at the specified path
>>
>>     /Downloads/mgmf/trunk/data$ ls
>>     dblp_author_conf_adj.txt
>>
>> I have also added the input file into HDFS using
>>
>>     /usr/local/hadoop$ bin/hadoop dfs -copyFromLocal /source /destination
>>
>> Can someone help me solve this problem?
>>
>>
>> Any help is appreciated,
>> Thanks
>> Prithvi
>>
>
>

Re: Hadoop Streaming job error - Need help urgent

Posted by prithvi dammalapati <d....@gmail.com>.

java_home=/usr/lib/jvm/java-1.7.0-openjdk-amd64
hadoop_home=/usr/local/hadoop/hadoop-1.0.4
hadoop_lib=$hadoop_home/hadoop-core-1.0.4.jar
hadoop_bin=$hadoop_home/bin/hadoop
hadoop_config=$hadoop_home/conf
hadoop_streaming=$hadoop_home/contrib/streaming/hadoop-streaming-1.0.4.jar
#task specific parameters
source_code=BetweennessCentrality.java
jar_file=BetweennessCentrality.jar
main_class=mslab.BetweennessCentrality
num_of_node=38012
num_of_mapper=100
num_of_reducer=8
input_path=/data/dblp_author_conf_adj.txt
output_path=dblp_bc_N$(($num_of_node))_M$((num_of_mapper))
rm build -rf
mkdir build
$java_home/bin/javac -d build -classpath .:$hadoop_lib src/mslab/$source_code
rm $jar_file -f
$java_home/bin/jar -cf $jar_file -C build/ .
$hadoop_bin --config $hadoop_config fs -rmr $output_path
$hadoop_bin --config $hadoop_config jar $jar_file $main_class
$num_of_node       $num_of_mapper

rm brandes_mapper

g++ src/mslab/mapred_brandes.cpp -O3 -o brandes_mapper
$hadoop_bin --config $hadoop_config jar $hadoop_streaming -D
mapred.task.timeout=0 -D
mapred.job.name="BC_N$((num_of_node))_M$((num_of_mapper))" -D
mapred.reduce.tasks=$num_of_reducer -input
input_BC_N$((num_of_node))_M$((num_of_mapper)) -output $output_path
-file brandes_mapper -file src/mslab/BC_reducer.py -file
src/mslab/MapReduceUtil.py -file $input_path -mapper "./brandes_mapper
$input_path $num_of_node" -reducer "./BC_reducer.py"

After running this code, I get the following error
13/04/22 12:29:44 INFO streaming.StreamJob:  map 0%  reduce 0%
13/04/22 12:30:01 INFO streaming.StreamJob:  map 20%  reduce 0%
13/04/22 12:30:10 INFO streaming.StreamJob:  map 40%  reduce 0%
13/04/22 12:30:13 INFO streaming.StreamJob:  map 40%  reduce 2%
13/04/22 12:30:16 INFO streaming.StreamJob:  map 40%  reduce 13%
13/04/22 12:30:19 INFO streaming.StreamJob:  map 60%  reduce 13%
13/04/22 12:30:28 INFO streaming.StreamJob:  map 60%  reduce 17%
13/04/22 12:30:31 INFO streaming.StreamJob:  map 60%  reduce 20%
13/04/22 12:31:01 INFO streaming.StreamJob:  map 100%  reduce 100%
13/04/22 12:31:01 INFO streaming.StreamJob: To kill this job, run:
13/04/22 12:31:01 INFO streaming.StreamJob:
/usr/local/hadoop/hadoop-1.0.4/libexec/../bin/hadoop job
-Dmapred.job.tracker=localhost:54311 -kill job_201304221215_0002
13/04/22 12:31:01 INFO streaming.StreamJob: Tracking URL:
http://localhost:50030/jobdetails.jsp?jobid=job_201304221215_0002
13/04/22 12:31:01 ERROR streaming.StreamJob: Job not successful.
Error: # of failed Map Tasks exceeded allowed limit. FailedCount: 1.
LastFailedTask: task_201304221215_0002_m_000006
13/04/22 12:31:01 INFO streaming.StreamJob: killJob...

Even if i reduce the num_of_nodes to 10 and no_of_mappers to 10 I get
the same error. Can someone help me solve this error

Any help is appreciated

Thanks

Prithvi



On Mon, Apr 22, 2013 at 12:49 PM, Chris Nauroth <cn...@hortonworks.com>wrote:

> (Moving to user list, hdfs-dev bcc'd.)
>
> Hi Prithvi,
>
> From a quick scan, it looks to me like one of your commands ends up using
> "input_path" as a string literal instead of replacing with the value of the
> input_path variable.  I've pasted the command below.  Notice that one of
> the -file options used "input_path" instead of "$input_path".
>
> Is that the problem?
>
> Hope this helps,
> --Chris
>
>
>
>     $hadoop_bin --config $hadoop_config jar $hadoop_streaming -D
> mapred.task.timeout=0 -D mapred.job.name="BC_N$((num_of_node))_M$((num_of_mapper))"
> -D mapred.reduce.tasks=$num_of_reducer -input
> input_BC_N$((num_of_node))_M$((num_of_mapper)) -output $output_path -file
> brandes_mapper -file src/mslab/BC_reducer.py -file
> src/mslab/MapReduceUtil.py -file input_path -mapper "./brandes_mapper
> $input_path $num_of_node" -reducer "./BC_reducer.py"
>
>
>
> On Mon, Apr 22, 2013 at 10:11 AM, prithvi dammalapati <
> d.prithvi999@gmail.com> wrote:
>
>> I have the following hadoop code to find the betweenness centrality of a
>> graph
>>
>>     java_home=/usr/lib/jvm/java-1.7.0-openjdk-amd64
>>     hadoop_home=/usr/local/hadoop/hadoop-1.0.4
>>     hadoop_lib=$hadoop_home/hadoop-core-1.0.4.jar
>>     hadoop_bin=$hadoop_home/bin/hadoop
>>     hadoop_config=$hadoop_home/conf
>>
>> hadoop_streaming=$hadoop_home/contrib/streaming/hadoop-streaming-1.0.4.jar
>>     #task specific parameters
>>     source_code=BetweennessCentrality.java
>>     jar_file=BetweennessCentrality.jar
>>     main_class=mslab.BetweennessCentrality
>>     num_of_node=38012
>>     num_of_mapper=100
>>     num_of_reducer=8
>>     input_path=/data/dblp_author_conf_adj.txt
>>     output_path=dblp_bc_N$(($num_of_node))_M$((num_of_mapper))
>>     rm build -rf
>>     mkdir build
>>     $java_home/bin/javac -d build -classpath .:$hadoop_lib
>> src/mslab/$source_code
>>     rm $jar_file -f
>>     $java_home/bin/jar -cf $jar_file -C build/ .
>>     $hadoop_bin --config $hadoop_config fs -rmr $output_path
>>     $hadoop_bin --config $hadoop_config jar $jar_file $main_class
>> $num_of_node       $num_of_mapper
>>
>>     rm brandes_mapper
>>
>>     g++ src/mslab/mapred_brandes.cpp -O3 -o brandes_mapper
>>     $hadoop_bin --config $hadoop_config jar $hadoop_streaming -D
>> mapred.task.timeout=0 -D mapred.job.name="BC_N$((num_of_node))_M$((num_of_mapper))"
>> -D mapred.reduce.tasks=$num_of_reducer -input
>> input_BC_N$((num_of_node))_M$((num_of_mapper)) -output $output_path -file
>> brandes_mapper -file src/mslab/BC_reducer.py -file
>> src/mslab/MapReduceUtil.py -file input_path -mapper "./brandes_mapper
>> $input_path $num_of_node" -reducer "./BC_reducer.py"
>>
>> When I run this code in a shell script, i get the following errors:
>>
>>     Warning: $HADOOP_HOME is deprecated.
>>     File: /home/hduser/Downloads/mgmf/trunk/input_path does not exist, or
>> is not readable.
>>     Streaming Command Failed!
>>
>> but the file exits at the specified path
>>
>>     /Downloads/mgmf/trunk/data$ ls
>>     dblp_author_conf_adj.txt
>>
>> I have also added the input file into HDFS using
>>
>>     /usr/local/hadoop$ bin/hadoop dfs -copyFromLocal /source /destination
>>
>> Can someone help me solve this problem?
>>
>>
>> Any help is appreciated,
>> Thanks
>> Prithvi
>>
>
>