You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hadoop.apache.org by Rahul Singh <sm...@gmail.com> on 2014/04/14 08:38:53 UTC
hadoop inconsistent behaviour
Hi,
I am running a job(wordcount example) on 3 node cluster(1 master and 2
slave), some times the job passes but some times it fails(as reduce fails,
input data few kbs).
I am not able to nail down the reason of this inconsistency.
failed log:
14/04/14 11:57:24 WARN util.NativeCodeLoader: Unable to load native-hadoop
library for your platform... using builtin-java classes where applicable
14/04/14 11:57:25 INFO client.RMProxy: Connecting to ResourceManager at /
20.0.1.206:8032
14/04/14 11:57:26 INFO input.FileInputFormat: Total input paths to process
: 1
14/04/14 11:57:26 INFO mapreduce.JobSubmitter: number of splits:1
14/04/14 11:57:26 INFO mapreduce.JobSubmitter: Submitting tokens for job:
job_1397454060494_0003
14/04/14 11:57:26 INFO impl.YarnClientImpl: Submitted application
application_1397454060494_0003
14/04/14 11:57:26 INFO mapreduce.Job: The url to track the job:
http://20.0.1.206:8088/proxy/application_1397454060494_0003/
14/04/14 11:57:26 INFO mapreduce.Job: Running job: job_1397454060494_0003
14/04/14 11:57:34 INFO mapreduce.Job: Job job_1397454060494_0003 running in
uber mode : false
14/04/14 11:57:34 INFO mapreduce.Job: map 0% reduce 0%
14/04/14 11:57:40 INFO mapreduce.Job: map 100% reduce 0%
14/04/14 11:57:46 INFO mapreduce.Job: map 100% reduce 13%
14/04/14 11:57:48 INFO mapreduce.Job: map 100% reduce 25%
14/04/14 11:57:49 INFO mapreduce.Job: map 100% reduce 38%
14/04/14 11:57:50 INFO mapreduce.Job: map 100% reduce 50%
14/04/14 11:57:54 INFO mapreduce.Job: Task Id :
attempt_1397454060494_0003_r_000003_0, Status : FAILED
14/04/14 11:57:54 INFO mapreduce.Job: Task Id :
attempt_1397454060494_0003_r_000001_0, Status : FAILED
14/04/14 11:57:56 INFO mapreduce.Job: Task Id :
attempt_1397454060494_0003_r_000005_0, Status : FAILED
14/04/14 11:57:56 INFO mapreduce.Job: Task Id :
attempt_1397454060494_0003_r_000007_0, Status : FAILED
14/04/14 11:58:02 INFO mapreduce.Job: map 100% reduce 63%
14/04/14 11:58:04 INFO mapreduce.Job: map 100% reduce 75%
14/04/14 11:58:09 INFO mapreduce.Job: Task Id :
attempt_1397454060494_0003_r_000003_1, Status : FAILED
14/04/14 11:58:11 INFO mapreduce.Job: Task Id :
attempt_1397454060494_0003_r_000005_1, Status : FAILED
14/04/14 11:58:24 INFO mapreduce.Job: Task Id :
attempt_1397454060494_0003_r_000003_2, Status : FAILED
14/04/14 11:58:26 INFO mapreduce.Job: Task Id :
attempt_1397454060494_0003_r_000005_2, Status : FAILED
14/04/14 11:58:40 INFO mapreduce.Job: map 100% reduce 100%
14/04/14 11:58:40 INFO mapreduce.Job: Job job_1397454060494_0003 failed
with state FAILED due to: Task failed task_1397454060494_0003_r_000003
Job failed as tasks failed. failedMaps:0 failedReduces:1
14/04/14 11:58:40 INFO mapreduce.Job: Counters: 51
File System Counters
FILE: Number of bytes read=80
FILE: Number of bytes written=596766
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=175
HDFS: Number of bytes written=28
HDFS: Number of read operations=21
HDFS: Number of large read operations=0
HDFS: Number of write operations=12
Job Counters
Failed reduce tasks=9
Killed reduce tasks=1
Launched map tasks=1
Launched reduce tasks=16
Data-local map tasks=1
Total time spent by all maps in occupied slots (ms)=3477
Total time spent by all reduces in occupied slots (ms)=148867
Total time spent by all map tasks (ms)=3477
Total time spent by all reduce tasks (ms)=148867
Total vcore-seconds taken by all map tasks=3477
Total vcore-seconds taken by all reduce tasks=148867
Total megabyte-seconds taken by all map tasks=3560448
Total megabyte-seconds taken by all reduce tasks=152439808
Map-Reduce Framework
Map input records=3
Map output records=13
Map output bytes=110
Map output materialized bytes=112
Input split bytes=117
Combine input records=13
Combine output records=6
Reduce input groups=4
Reduce shuffle bytes=80
Reduce input records=4
Reduce output records=4
Spilled Records=10
Shuffled Maps =6
Failed Shuffles=0
Merged Map outputs=6
GC time elapsed (ms)=142
CPU time spent (ms)=6420
Physical memory (bytes) snapshot=1100853248
Virtual memory (bytes) snapshot=4468314112
Total committed heap usage (bytes)=1406992384
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=58
File Output Format Counters
Bytes Written=28
Job Passing Logs:
hadoop jar ../share/hadoop/mapreduce/hadoop-mapreduce-examples-2.3.0.jar
wordcount /user/hduser/input /user/hduser/output_wordcount9
14/04/14 11:47:27 WARN util.NativeCodeLoader: Unable to load native-hadoop
library for your platform... using builtin-java classes where applicable
14/04/14 11:47:28 INFO client.RMProxy: Connecting to ResourceManager at /
20.0.1.206:8032
14/04/14 11:47:28 INFO input.FileInputFormat: Total input paths to process
: 1
14/04/14 11:47:29 INFO mapreduce.JobSubmitter: number of splits:1
14/04/14 11:47:29 INFO mapreduce.JobSubmitter: Submitting tokens for job:
job_1397454060494_0002
14/04/14 11:47:29 INFO impl.YarnClientImpl: Submitted application
application_1397454060494_0002
14/04/14 11:47:29 INFO mapreduce.Job: The url to track the job:
http://20.0.1.206:8088/proxy/application_1397454060494_0002/
14/04/14 11:47:29 INFO mapreduce.Job: Running job: job_1397454060494_0002
14/04/14 11:47:36 INFO mapreduce.Job: Job job_1397454060494_0002 running in
uber mode : false
14/04/14 11:47:36 INFO mapreduce.Job: map 0% reduce 0%
14/04/14 11:47:50 INFO mapreduce.Job: Task Id :
attempt_1397454060494_0002_m_000000_0, Status : FAILED
14/04/14 11:48:05 INFO mapreduce.Job: Task Id :
attempt_1397454060494_0002_m_000000_1, Status : FAILED
14/04/14 11:48:20 INFO mapreduce.Job: Task Id :
attempt_1397454060494_0002_m_000000_2, Status : FAILED
14/04/14 11:48:26 INFO mapreduce.Job: map 100% reduce 0%
14/04/14 11:48:34 INFO mapreduce.Job: map 100% reduce 13%
14/04/14 11:48:35 INFO mapreduce.Job: map 100% reduce 25%
14/04/14 11:48:37 INFO mapreduce.Job: map 100% reduce 50%
14/04/14 11:48:41 INFO mapreduce.Job: Task Id :
attempt_1397454060494_0002_r_000001_0, Status : FAILED
14/04/14 11:48:42 INFO mapreduce.Job: Task Id :
attempt_1397454060494_0002_r_000003_0, Status : FAILED
14/04/14 11:48:43 INFO mapreduce.Job: Task Id :
attempt_1397454060494_0002_r_000005_0, Status : FAILED
14/04/14 11:48:44 INFO mapreduce.Job: Task Id :
attempt_1397454060494_0002_r_000007_0, Status : FAILED
14/04/14 11:48:50 INFO mapreduce.Job: map 100% reduce 63%
14/04/14 11:48:51 INFO mapreduce.Job: map 100% reduce 75%
14/04/14 11:48:52 INFO mapreduce.Job: map 100% reduce 88%
14/04/14 11:48:58 INFO mapreduce.Job: Task Id :
attempt_1397454060494_0002_r_000005_1, Status : FAILED
14/04/14 11:49:05 INFO mapreduce.Job: map 100% reduce 100%
14/04/14 11:49:06 INFO mapreduce.Job: Job job_1397454060494_0002 completed
successfully
14/04/14 11:49:06 INFO mapreduce.Job: Counters: 52
File System Counters
FILE: Number of bytes read=112
FILE: Number of bytes written=767175
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=175
HDFS: Number of bytes written=40
HDFS: Number of read operations=27
HDFS: Number of large read operations=0
HDFS: Number of write operations=16
Job Counters
Failed map tasks=3
Failed reduce tasks=5
Launched map tasks=4
Launched reduce tasks=13
Other local map tasks=3
Data-local map tasks=1
Total time spent by all maps in occupied slots (ms)=41629
Total time spent by all reduces in occupied slots (ms)=104530
Total time spent by all map tasks (ms)=41629
Total time spent by all reduce tasks (ms)=104530
Total vcore-seconds taken by all map tasks=41629
Total vcore-seconds taken by all reduce tasks=104530
Total megabyte-seconds taken by all map tasks=42628096
Total megabyte-seconds taken by all reduce tasks=107038720
Map-Reduce Framework
Map input records=3
Map output records=13
Map output bytes=110
Map output materialized bytes=112
Input split bytes=117
Combine input records=13
Combine output records=6
Reduce input groups=6
Reduce shuffle bytes=112
Reduce input records=6
Reduce output records=6
Spilled Records=12
Shuffled Maps =8
Failed Shuffles=0
Merged Map outputs=8
GC time elapsed (ms)=186
CPU time spent (ms)=8890
Physical memory (bytes) snapshot=1408913408
Virtual memory (bytes) snapshot=5727019008
Total committed heap usage (bytes)=1808990208
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=58
File Output Format Counters
Bytes Written=40
Thanks and Regards,
-Rahul Singh
Re: hadoop inconsistent behaviour
Posted by Rahul Singh <sm...@gmail.com>.
I cleaned up the log directory before running the job. Now there is no
nodemanager jobs. when i see in userlogs directory i am getting some syslog
files with the following error:
2014-04-14 11:58:23,472 INFO [main] org.apache.hadoop.ipc.Client: Retrying
connect to server: poc-hadoop06/127.0.1.1:40221. Already tried 7 time(s);
retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10,
sleepTime=1000 MILLISECONDS)
2014-04-14 11:58:24,474 INFO [main] org.apache.hadoop.ipc.Client: Retrying
connect to server: poc-hadoop06/127.0.1.1:40221. Already tried 8 time(s);
retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10,
sleepTime=1000 MILLISECONDS)
2014-04-14 11:58:25,475 INFO [main] org.apache.hadoop.ipc.Client: Retrying
connect to server: poc-hadoop06/127.0.1.1:40221. Already tried 9 time(s);
retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10,
sleepTime=1000 MILLISECONDS)
2014-04-14 11:58:25,477 WARN [main] org.apache.hadoop.mapred.YarnChild:
Exception running child : java.net.ConnectException: Call From poc-hadoop06/
127.0.1.1 to poc-hadoop06:40221 failed on connection exception:
java.net.ConnectException: Connection refused; For more details see:
http://wiki.apache.org/hadoop/ConnectionRefused
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
Method)
at
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
at
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:783)
at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:730)
at org.apache.hadoop.ipc.Client.call(Client.java:1410)
at org.apache.hadoop.ipc.Client.call(Client.java:1359)
at
org.apache.hadoop.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:231)
at $Proxy6.getTask(Unknown Source)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:137)
Caused by: java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:567)
at
org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:529)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:493)
at
org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:601)
at
org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:696)
at
org.apache.hadoop.ipc.Client$Connection.access$2700(Client.java:367)
at org.apache.hadoop.ipc.Client.getConnection(Client.java:1458)
at org.apache.hadoop.ipc.Client.call(Client.java:1377)
... 4 more
2014-04-14 11:58:25,478 INFO [main]
org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Stopping ReduceTask
metrics system...
2014-04-14 11:58:25,478 INFO [main]
org.apache.hadoop.metrics2.impl.MetricsSystemImpl: ReduceTask metrics
system stopped.
2014-04-14 11:58:25,479 INFO [main]
org.apache.hadoop.metrics2.impl.MetricsSystemImpl: ReduceTask metrics
system shutdown complete.
On Mon, Apr 14, 2014 at 12:21 PM, Shengjun Xin <sx...@gopivotal.com> wrote:
> You can check the nodemanager log
>
>
> On Mon, Apr 14, 2014 at 2:38 PM, Rahul Singh <sm...@gmail.com>wrote:
>
>> Hi,
>> I am running a job(wordcount example) on 3 node cluster(1 master and 2
>> slave), some times the job passes but some times it fails(as reduce fails,
>> input data few kbs).
>> I am not able to nail down the reason of this inconsistency.
>>
>>
>>
>> failed log:
>>
>> 14/04/14 11:57:24 WARN util.NativeCodeLoader: Unable to load
>> native-hadoop library for your platform... using builtin-java classes where
>> applicable
>> 14/04/14 11:57:25 INFO client.RMProxy: Connecting to ResourceManager at /
>> 20.0.1.206:8032
>> 14/04/14 11:57:26 INFO input.FileInputFormat: Total input paths to
>> process : 1
>> 14/04/14 11:57:26 INFO mapreduce.JobSubmitter: number of splits:1
>> 14/04/14 11:57:26 INFO mapreduce.JobSubmitter: Submitting tokens for job:
>> job_1397454060494_0003
>> 14/04/14 11:57:26 INFO impl.YarnClientImpl: Submitted application
>> application_1397454060494_0003
>> 14/04/14 11:57:26 INFO mapreduce.Job: The url to track the job:
>> http://20.0.1.206:8088/proxy/application_1397454060494_0003/
>> 14/04/14 11:57:26 INFO mapreduce.Job: Running job: job_1397454060494_0003
>> 14/04/14 11:57:34 INFO mapreduce.Job: Job job_1397454060494_0003 running
>> in uber mode : false
>> 14/04/14 11:57:34 INFO mapreduce.Job: map 0% reduce 0%
>> 14/04/14 11:57:40 INFO mapreduce.Job: map 100% reduce 0%
>> 14/04/14 11:57:46 INFO mapreduce.Job: map 100% reduce 13%
>> 14/04/14 11:57:48 INFO mapreduce.Job: map 100% reduce 25%
>> 14/04/14 11:57:49 INFO mapreduce.Job: map 100% reduce 38%
>> 14/04/14 11:57:50 INFO mapreduce.Job: map 100% reduce 50%
>> 14/04/14 11:57:54 INFO mapreduce.Job: Task Id :
>> attempt_1397454060494_0003_r_000003_0, Status : FAILED
>> 14/04/14 11:57:54 INFO mapreduce.Job: Task Id :
>> attempt_1397454060494_0003_r_000001_0, Status : FAILED
>> 14/04/14 11:57:56 INFO mapreduce.Job: Task Id :
>> attempt_1397454060494_0003_r_000005_0, Status : FAILED
>> 14/04/14 11:57:56 INFO mapreduce.Job: Task Id :
>> attempt_1397454060494_0003_r_000007_0, Status : FAILED
>> 14/04/14 11:58:02 INFO mapreduce.Job: map 100% reduce 63%
>> 14/04/14 11:58:04 INFO mapreduce.Job: map 100% reduce 75%
>> 14/04/14 11:58:09 INFO mapreduce.Job: Task Id :
>> attempt_1397454060494_0003_r_000003_1, Status : FAILED
>> 14/04/14 11:58:11 INFO mapreduce.Job: Task Id :
>> attempt_1397454060494_0003_r_000005_1, Status : FAILED
>> 14/04/14 11:58:24 INFO mapreduce.Job: Task Id :
>> attempt_1397454060494_0003_r_000003_2, Status : FAILED
>> 14/04/14 11:58:26 INFO mapreduce.Job: Task Id :
>> attempt_1397454060494_0003_r_000005_2, Status : FAILED
>> 14/04/14 11:58:40 INFO mapreduce.Job: map 100% reduce 100%
>> 14/04/14 11:58:40 INFO mapreduce.Job: Job job_1397454060494_0003 failed
>> with state FAILED due to: Task failed task_1397454060494_0003_r_000003
>> Job failed as tasks failed. failedMaps:0 failedReduces:1
>>
>> 14/04/14 11:58:40 INFO mapreduce.Job: Counters: 51
>> File System Counters
>> FILE: Number of bytes read=80
>> FILE: Number of bytes written=596766
>> FILE: Number of read operations=0
>> FILE: Number of large read operations=0
>> FILE: Number of write operations=0
>> HDFS: Number of bytes read=175
>> HDFS: Number of bytes written=28
>> HDFS: Number of read operations=21
>> HDFS: Number of large read operations=0
>> HDFS: Number of write operations=12
>> Job Counters
>> Failed reduce tasks=9
>> Killed reduce tasks=1
>> Launched map tasks=1
>> Launched reduce tasks=16
>> Data-local map tasks=1
>> Total time spent by all maps in occupied slots (ms)=3477
>> Total time spent by all reduces in occupied slots (ms)=148867
>> Total time spent by all map tasks (ms)=3477
>> Total time spent by all reduce tasks (ms)=148867
>> Total vcore-seconds taken by all map tasks=3477
>> Total vcore-seconds taken by all reduce tasks=148867
>> Total megabyte-seconds taken by all map tasks=3560448
>> Total megabyte-seconds taken by all reduce tasks=152439808
>> Map-Reduce Framework
>> Map input records=3
>> Map output records=13
>> Map output bytes=110
>> Map output materialized bytes=112
>> Input split bytes=117
>> Combine input records=13
>> Combine output records=6
>> Reduce input groups=4
>> Reduce shuffle bytes=80
>> Reduce input records=4
>> Reduce output records=4
>> Spilled Records=10
>> Shuffled Maps =6
>> Failed Shuffles=0
>> Merged Map outputs=6
>> GC time elapsed (ms)=142
>> CPU time spent (ms)=6420
>> Physical memory (bytes) snapshot=1100853248
>> Virtual memory (bytes) snapshot=4468314112
>> Total committed heap usage (bytes)=1406992384
>> Shuffle Errors
>> BAD_ID=0
>> CONNECTION=0
>> IO_ERROR=0
>> WRONG_LENGTH=0
>> WRONG_MAP=0
>> WRONG_REDUCE=0
>> File Input Format Counters
>> Bytes Read=58
>> File Output Format Counters
>> Bytes Written=28
>>
>> Job Passing Logs:
>> hadoop jar ../share/hadoop/mapreduce/hadoop-mapreduce-examples-2.3.0.jar
>> wordcount /user/hduser/input /user/hduser/output_wordcount9
>> 14/04/14 11:47:27 WARN util.NativeCodeLoader: Unable to load
>> native-hadoop library for your platform... using builtin-java classes where
>> applicable
>> 14/04/14 11:47:28 INFO client.RMProxy: Connecting to ResourceManager at /
>> 20.0.1.206:8032
>> 14/04/14 11:47:28 INFO input.FileInputFormat: Total input paths to
>> process : 1
>> 14/04/14 11:47:29 INFO mapreduce.JobSubmitter: number of splits:1
>> 14/04/14 11:47:29 INFO mapreduce.JobSubmitter: Submitting tokens for job:
>> job_1397454060494_0002
>> 14/04/14 11:47:29 INFO impl.YarnClientImpl: Submitted application
>> application_1397454060494_0002
>> 14/04/14 11:47:29 INFO mapreduce.Job: The url to track the job:
>> http://20.0.1.206:8088/proxy/application_1397454060494_0002/
>> 14/04/14 11:47:29 INFO mapreduce.Job: Running job: job_1397454060494_0002
>> 14/04/14 11:47:36 INFO mapreduce.Job: Job job_1397454060494_0002 running
>> in uber mode : false
>> 14/04/14 11:47:36 INFO mapreduce.Job: map 0% reduce 0%
>> 14/04/14 11:47:50 INFO mapreduce.Job: Task Id :
>> attempt_1397454060494_0002_m_000000_0, Status : FAILED
>> 14/04/14 11:48:05 INFO mapreduce.Job: Task Id :
>> attempt_1397454060494_0002_m_000000_1, Status : FAILED
>> 14/04/14 11:48:20 INFO mapreduce.Job: Task Id :
>> attempt_1397454060494_0002_m_000000_2, Status : FAILED
>> 14/04/14 11:48:26 INFO mapreduce.Job: map 100% reduce 0%
>> 14/04/14 11:48:34 INFO mapreduce.Job: map 100% reduce 13%
>> 14/04/14 11:48:35 INFO mapreduce.Job: map 100% reduce 25%
>> 14/04/14 11:48:37 INFO mapreduce.Job: map 100% reduce 50%
>> 14/04/14 11:48:41 INFO mapreduce.Job: Task Id :
>> attempt_1397454060494_0002_r_000001_0, Status : FAILED
>> 14/04/14 11:48:42 INFO mapreduce.Job: Task Id :
>> attempt_1397454060494_0002_r_000003_0, Status : FAILED
>> 14/04/14 11:48:43 INFO mapreduce.Job: Task Id :
>> attempt_1397454060494_0002_r_000005_0, Status : FAILED
>> 14/04/14 11:48:44 INFO mapreduce.Job: Task Id :
>> attempt_1397454060494_0002_r_000007_0, Status : FAILED
>> 14/04/14 11:48:50 INFO mapreduce.Job: map 100% reduce 63%
>> 14/04/14 11:48:51 INFO mapreduce.Job: map 100% reduce 75%
>> 14/04/14 11:48:52 INFO mapreduce.Job: map 100% reduce 88%
>> 14/04/14 11:48:58 INFO mapreduce.Job: Task Id :
>> attempt_1397454060494_0002_r_000005_1, Status : FAILED
>> 14/04/14 11:49:05 INFO mapreduce.Job: map 100% reduce 100%
>> 14/04/14 11:49:06 INFO mapreduce.Job: Job job_1397454060494_0002
>> completed successfully
>> 14/04/14 11:49:06 INFO mapreduce.Job: Counters: 52
>> File System Counters
>> FILE: Number of bytes read=112
>> FILE: Number of bytes written=767175
>> FILE: Number of read operations=0
>> FILE: Number of large read operations=0
>> FILE: Number of write operations=0
>> HDFS: Number of bytes read=175
>> HDFS: Number of bytes written=40
>> HDFS: Number of read operations=27
>> HDFS: Number of large read operations=0
>> HDFS: Number of write operations=16
>> Job Counters
>> Failed map tasks=3
>> Failed reduce tasks=5
>> Launched map tasks=4
>> Launched reduce tasks=13
>> Other local map tasks=3
>> Data-local map tasks=1
>> Total time spent by all maps in occupied slots (ms)=41629
>> Total time spent by all reduces in occupied slots (ms)=104530
>> Total time spent by all map tasks (ms)=41629
>> Total time spent by all reduce tasks (ms)=104530
>> Total vcore-seconds taken by all map tasks=41629
>> Total vcore-seconds taken by all reduce tasks=104530
>> Total megabyte-seconds taken by all map tasks=42628096
>> Total megabyte-seconds taken by all reduce tasks=107038720
>> Map-Reduce Framework
>> Map input records=3
>> Map output records=13
>> Map output bytes=110
>> Map output materialized bytes=112
>> Input split bytes=117
>> Combine input records=13
>> Combine output records=6
>> Reduce input groups=6
>> Reduce shuffle bytes=112
>> Reduce input records=6
>> Reduce output records=6
>> Spilled Records=12
>> Shuffled Maps =8
>> Failed Shuffles=0
>> Merged Map outputs=8
>> GC time elapsed (ms)=186
>> CPU time spent (ms)=8890
>> Physical memory (bytes) snapshot=1408913408
>> Virtual memory (bytes) snapshot=5727019008
>> Total committed heap usage (bytes)=1808990208
>> Shuffle Errors
>> BAD_ID=0
>> CONNECTION=0
>> IO_ERROR=0
>> WRONG_LENGTH=0
>> WRONG_MAP=0
>> WRONG_REDUCE=0
>> File Input Format Counters
>> Bytes Read=58
>> File Output Format Counters
>> Bytes Written=40
>>
>> Thanks and Regards,
>> -Rahul Singh
>>
>
>
>
> --
> Regards
> Shengjun
>
Re: hadoop inconsistent behaviour
Posted by Rahul Singh <sm...@gmail.com>.
I cleaned up the log directory before running the job. Now there is no
nodemanager jobs. when i see in userlogs directory i am getting some syslog
files with the following error:
2014-04-14 11:58:23,472 INFO [main] org.apache.hadoop.ipc.Client: Retrying
connect to server: poc-hadoop06/127.0.1.1:40221. Already tried 7 time(s);
retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10,
sleepTime=1000 MILLISECONDS)
2014-04-14 11:58:24,474 INFO [main] org.apache.hadoop.ipc.Client: Retrying
connect to server: poc-hadoop06/127.0.1.1:40221. Already tried 8 time(s);
retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10,
sleepTime=1000 MILLISECONDS)
2014-04-14 11:58:25,475 INFO [main] org.apache.hadoop.ipc.Client: Retrying
connect to server: poc-hadoop06/127.0.1.1:40221. Already tried 9 time(s);
retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10,
sleepTime=1000 MILLISECONDS)
2014-04-14 11:58:25,477 WARN [main] org.apache.hadoop.mapred.YarnChild:
Exception running child : java.net.ConnectException: Call From poc-hadoop06/
127.0.1.1 to poc-hadoop06:40221 failed on connection exception:
java.net.ConnectException: Connection refused; For more details see:
http://wiki.apache.org/hadoop/ConnectionRefused
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
Method)
at
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
at
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:783)
at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:730)
at org.apache.hadoop.ipc.Client.call(Client.java:1410)
at org.apache.hadoop.ipc.Client.call(Client.java:1359)
at
org.apache.hadoop.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:231)
at $Proxy6.getTask(Unknown Source)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:137)
Caused by: java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:567)
at
org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:529)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:493)
at
org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:601)
at
org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:696)
at
org.apache.hadoop.ipc.Client$Connection.access$2700(Client.java:367)
at org.apache.hadoop.ipc.Client.getConnection(Client.java:1458)
at org.apache.hadoop.ipc.Client.call(Client.java:1377)
... 4 more
2014-04-14 11:58:25,478 INFO [main]
org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Stopping ReduceTask
metrics system...
2014-04-14 11:58:25,478 INFO [main]
org.apache.hadoop.metrics2.impl.MetricsSystemImpl: ReduceTask metrics
system stopped.
2014-04-14 11:58:25,479 INFO [main]
org.apache.hadoop.metrics2.impl.MetricsSystemImpl: ReduceTask metrics
system shutdown complete.
On Mon, Apr 14, 2014 at 12:21 PM, Shengjun Xin <sx...@gopivotal.com> wrote:
> You can check the nodemanager log
>
>
> On Mon, Apr 14, 2014 at 2:38 PM, Rahul Singh <sm...@gmail.com>wrote:
>
>> Hi,
>> I am running a job(wordcount example) on 3 node cluster(1 master and 2
>> slave), some times the job passes but some times it fails(as reduce fails,
>> input data few kbs).
>> I am not able to nail down the reason of this inconsistency.
>>
>>
>>
>> failed log:
>>
>> 14/04/14 11:57:24 WARN util.NativeCodeLoader: Unable to load
>> native-hadoop library for your platform... using builtin-java classes where
>> applicable
>> 14/04/14 11:57:25 INFO client.RMProxy: Connecting to ResourceManager at /
>> 20.0.1.206:8032
>> 14/04/14 11:57:26 INFO input.FileInputFormat: Total input paths to
>> process : 1
>> 14/04/14 11:57:26 INFO mapreduce.JobSubmitter: number of splits:1
>> 14/04/14 11:57:26 INFO mapreduce.JobSubmitter: Submitting tokens for job:
>> job_1397454060494_0003
>> 14/04/14 11:57:26 INFO impl.YarnClientImpl: Submitted application
>> application_1397454060494_0003
>> 14/04/14 11:57:26 INFO mapreduce.Job: The url to track the job:
>> http://20.0.1.206:8088/proxy/application_1397454060494_0003/
>> 14/04/14 11:57:26 INFO mapreduce.Job: Running job: job_1397454060494_0003
>> 14/04/14 11:57:34 INFO mapreduce.Job: Job job_1397454060494_0003 running
>> in uber mode : false
>> 14/04/14 11:57:34 INFO mapreduce.Job: map 0% reduce 0%
>> 14/04/14 11:57:40 INFO mapreduce.Job: map 100% reduce 0%
>> 14/04/14 11:57:46 INFO mapreduce.Job: map 100% reduce 13%
>> 14/04/14 11:57:48 INFO mapreduce.Job: map 100% reduce 25%
>> 14/04/14 11:57:49 INFO mapreduce.Job: map 100% reduce 38%
>> 14/04/14 11:57:50 INFO mapreduce.Job: map 100% reduce 50%
>> 14/04/14 11:57:54 INFO mapreduce.Job: Task Id :
>> attempt_1397454060494_0003_r_000003_0, Status : FAILED
>> 14/04/14 11:57:54 INFO mapreduce.Job: Task Id :
>> attempt_1397454060494_0003_r_000001_0, Status : FAILED
>> 14/04/14 11:57:56 INFO mapreduce.Job: Task Id :
>> attempt_1397454060494_0003_r_000005_0, Status : FAILED
>> 14/04/14 11:57:56 INFO mapreduce.Job: Task Id :
>> attempt_1397454060494_0003_r_000007_0, Status : FAILED
>> 14/04/14 11:58:02 INFO mapreduce.Job: map 100% reduce 63%
>> 14/04/14 11:58:04 INFO mapreduce.Job: map 100% reduce 75%
>> 14/04/14 11:58:09 INFO mapreduce.Job: Task Id :
>> attempt_1397454060494_0003_r_000003_1, Status : FAILED
>> 14/04/14 11:58:11 INFO mapreduce.Job: Task Id :
>> attempt_1397454060494_0003_r_000005_1, Status : FAILED
>> 14/04/14 11:58:24 INFO mapreduce.Job: Task Id :
>> attempt_1397454060494_0003_r_000003_2, Status : FAILED
>> 14/04/14 11:58:26 INFO mapreduce.Job: Task Id :
>> attempt_1397454060494_0003_r_000005_2, Status : FAILED
>> 14/04/14 11:58:40 INFO mapreduce.Job: map 100% reduce 100%
>> 14/04/14 11:58:40 INFO mapreduce.Job: Job job_1397454060494_0003 failed
>> with state FAILED due to: Task failed task_1397454060494_0003_r_000003
>> Job failed as tasks failed. failedMaps:0 failedReduces:1
>>
>> 14/04/14 11:58:40 INFO mapreduce.Job: Counters: 51
>> File System Counters
>> FILE: Number of bytes read=80
>> FILE: Number of bytes written=596766
>> FILE: Number of read operations=0
>> FILE: Number of large read operations=0
>> FILE: Number of write operations=0
>> HDFS: Number of bytes read=175
>> HDFS: Number of bytes written=28
>> HDFS: Number of read operations=21
>> HDFS: Number of large read operations=0
>> HDFS: Number of write operations=12
>> Job Counters
>> Failed reduce tasks=9
>> Killed reduce tasks=1
>> Launched map tasks=1
>> Launched reduce tasks=16
>> Data-local map tasks=1
>> Total time spent by all maps in occupied slots (ms)=3477
>> Total time spent by all reduces in occupied slots (ms)=148867
>> Total time spent by all map tasks (ms)=3477
>> Total time spent by all reduce tasks (ms)=148867
>> Total vcore-seconds taken by all map tasks=3477
>> Total vcore-seconds taken by all reduce tasks=148867
>> Total megabyte-seconds taken by all map tasks=3560448
>> Total megabyte-seconds taken by all reduce tasks=152439808
>> Map-Reduce Framework
>> Map input records=3
>> Map output records=13
>> Map output bytes=110
>> Map output materialized bytes=112
>> Input split bytes=117
>> Combine input records=13
>> Combine output records=6
>> Reduce input groups=4
>> Reduce shuffle bytes=80
>> Reduce input records=4
>> Reduce output records=4
>> Spilled Records=10
>> Shuffled Maps =6
>> Failed Shuffles=0
>> Merged Map outputs=6
>> GC time elapsed (ms)=142
>> CPU time spent (ms)=6420
>> Physical memory (bytes) snapshot=1100853248
>> Virtual memory (bytes) snapshot=4468314112
>> Total committed heap usage (bytes)=1406992384
>> Shuffle Errors
>> BAD_ID=0
>> CONNECTION=0
>> IO_ERROR=0
>> WRONG_LENGTH=0
>> WRONG_MAP=0
>> WRONG_REDUCE=0
>> File Input Format Counters
>> Bytes Read=58
>> File Output Format Counters
>> Bytes Written=28
>>
>> Job Passing Logs:
>> hadoop jar ../share/hadoop/mapreduce/hadoop-mapreduce-examples-2.3.0.jar
>> wordcount /user/hduser/input /user/hduser/output_wordcount9
>> 14/04/14 11:47:27 WARN util.NativeCodeLoader: Unable to load
>> native-hadoop library for your platform... using builtin-java classes where
>> applicable
>> 14/04/14 11:47:28 INFO client.RMProxy: Connecting to ResourceManager at /
>> 20.0.1.206:8032
>> 14/04/14 11:47:28 INFO input.FileInputFormat: Total input paths to
>> process : 1
>> 14/04/14 11:47:29 INFO mapreduce.JobSubmitter: number of splits:1
>> 14/04/14 11:47:29 INFO mapreduce.JobSubmitter: Submitting tokens for job:
>> job_1397454060494_0002
>> 14/04/14 11:47:29 INFO impl.YarnClientImpl: Submitted application
>> application_1397454060494_0002
>> 14/04/14 11:47:29 INFO mapreduce.Job: The url to track the job:
>> http://20.0.1.206:8088/proxy/application_1397454060494_0002/
>> 14/04/14 11:47:29 INFO mapreduce.Job: Running job: job_1397454060494_0002
>> 14/04/14 11:47:36 INFO mapreduce.Job: Job job_1397454060494_0002 running
>> in uber mode : false
>> 14/04/14 11:47:36 INFO mapreduce.Job: map 0% reduce 0%
>> 14/04/14 11:47:50 INFO mapreduce.Job: Task Id :
>> attempt_1397454060494_0002_m_000000_0, Status : FAILED
>> 14/04/14 11:48:05 INFO mapreduce.Job: Task Id :
>> attempt_1397454060494_0002_m_000000_1, Status : FAILED
>> 14/04/14 11:48:20 INFO mapreduce.Job: Task Id :
>> attempt_1397454060494_0002_m_000000_2, Status : FAILED
>> 14/04/14 11:48:26 INFO mapreduce.Job: map 100% reduce 0%
>> 14/04/14 11:48:34 INFO mapreduce.Job: map 100% reduce 13%
>> 14/04/14 11:48:35 INFO mapreduce.Job: map 100% reduce 25%
>> 14/04/14 11:48:37 INFO mapreduce.Job: map 100% reduce 50%
>> 14/04/14 11:48:41 INFO mapreduce.Job: Task Id :
>> attempt_1397454060494_0002_r_000001_0, Status : FAILED
>> 14/04/14 11:48:42 INFO mapreduce.Job: Task Id :
>> attempt_1397454060494_0002_r_000003_0, Status : FAILED
>> 14/04/14 11:48:43 INFO mapreduce.Job: Task Id :
>> attempt_1397454060494_0002_r_000005_0, Status : FAILED
>> 14/04/14 11:48:44 INFO mapreduce.Job: Task Id :
>> attempt_1397454060494_0002_r_000007_0, Status : FAILED
>> 14/04/14 11:48:50 INFO mapreduce.Job: map 100% reduce 63%
>> 14/04/14 11:48:51 INFO mapreduce.Job: map 100% reduce 75%
>> 14/04/14 11:48:52 INFO mapreduce.Job: map 100% reduce 88%
>> 14/04/14 11:48:58 INFO mapreduce.Job: Task Id :
>> attempt_1397454060494_0002_r_000005_1, Status : FAILED
>> 14/04/14 11:49:05 INFO mapreduce.Job: map 100% reduce 100%
>> 14/04/14 11:49:06 INFO mapreduce.Job: Job job_1397454060494_0002
>> completed successfully
>> 14/04/14 11:49:06 INFO mapreduce.Job: Counters: 52
>> File System Counters
>> FILE: Number of bytes read=112
>> FILE: Number of bytes written=767175
>> FILE: Number of read operations=0
>> FILE: Number of large read operations=0
>> FILE: Number of write operations=0
>> HDFS: Number of bytes read=175
>> HDFS: Number of bytes written=40
>> HDFS: Number of read operations=27
>> HDFS: Number of large read operations=0
>> HDFS: Number of write operations=16
>> Job Counters
>> Failed map tasks=3
>> Failed reduce tasks=5
>> Launched map tasks=4
>> Launched reduce tasks=13
>> Other local map tasks=3
>> Data-local map tasks=1
>> Total time spent by all maps in occupied slots (ms)=41629
>> Total time spent by all reduces in occupied slots (ms)=104530
>> Total time spent by all map tasks (ms)=41629
>> Total time spent by all reduce tasks (ms)=104530
>> Total vcore-seconds taken by all map tasks=41629
>> Total vcore-seconds taken by all reduce tasks=104530
>> Total megabyte-seconds taken by all map tasks=42628096
>> Total megabyte-seconds taken by all reduce tasks=107038720
>> Map-Reduce Framework
>> Map input records=3
>> Map output records=13
>> Map output bytes=110
>> Map output materialized bytes=112
>> Input split bytes=117
>> Combine input records=13
>> Combine output records=6
>> Reduce input groups=6
>> Reduce shuffle bytes=112
>> Reduce input records=6
>> Reduce output records=6
>> Spilled Records=12
>> Shuffled Maps =8
>> Failed Shuffles=0
>> Merged Map outputs=8
>> GC time elapsed (ms)=186
>> CPU time spent (ms)=8890
>> Physical memory (bytes) snapshot=1408913408
>> Virtual memory (bytes) snapshot=5727019008
>> Total committed heap usage (bytes)=1808990208
>> Shuffle Errors
>> BAD_ID=0
>> CONNECTION=0
>> IO_ERROR=0
>> WRONG_LENGTH=0
>> WRONG_MAP=0
>> WRONG_REDUCE=0
>> File Input Format Counters
>> Bytes Read=58
>> File Output Format Counters
>> Bytes Written=40
>>
>> Thanks and Regards,
>> -Rahul Singh
>>
>
>
>
> --
> Regards
> Shengjun
>
Re: hadoop inconsistent behaviour
Posted by Rahul Singh <sm...@gmail.com>.
I cleaned up the log directory before running the job. Now there is no
nodemanager jobs. when i see in userlogs directory i am getting some syslog
files with the following error:
2014-04-14 11:58:23,472 INFO [main] org.apache.hadoop.ipc.Client: Retrying
connect to server: poc-hadoop06/127.0.1.1:40221. Already tried 7 time(s);
retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10,
sleepTime=1000 MILLISECONDS)
2014-04-14 11:58:24,474 INFO [main] org.apache.hadoop.ipc.Client: Retrying
connect to server: poc-hadoop06/127.0.1.1:40221. Already tried 8 time(s);
retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10,
sleepTime=1000 MILLISECONDS)
2014-04-14 11:58:25,475 INFO [main] org.apache.hadoop.ipc.Client: Retrying
connect to server: poc-hadoop06/127.0.1.1:40221. Already tried 9 time(s);
retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10,
sleepTime=1000 MILLISECONDS)
2014-04-14 11:58:25,477 WARN [main] org.apache.hadoop.mapred.YarnChild:
Exception running child : java.net.ConnectException: Call From poc-hadoop06/
127.0.1.1 to poc-hadoop06:40221 failed on connection exception:
java.net.ConnectException: Connection refused; For more details see:
http://wiki.apache.org/hadoop/ConnectionRefused
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
Method)
at
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
at
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:783)
at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:730)
at org.apache.hadoop.ipc.Client.call(Client.java:1410)
at org.apache.hadoop.ipc.Client.call(Client.java:1359)
at
org.apache.hadoop.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:231)
at $Proxy6.getTask(Unknown Source)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:137)
Caused by: java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:567)
at
org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:529)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:493)
at
org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:601)
at
org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:696)
at
org.apache.hadoop.ipc.Client$Connection.access$2700(Client.java:367)
at org.apache.hadoop.ipc.Client.getConnection(Client.java:1458)
at org.apache.hadoop.ipc.Client.call(Client.java:1377)
... 4 more
2014-04-14 11:58:25,478 INFO [main]
org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Stopping ReduceTask
metrics system...
2014-04-14 11:58:25,478 INFO [main]
org.apache.hadoop.metrics2.impl.MetricsSystemImpl: ReduceTask metrics
system stopped.
2014-04-14 11:58:25,479 INFO [main]
org.apache.hadoop.metrics2.impl.MetricsSystemImpl: ReduceTask metrics
system shutdown complete.
On Mon, Apr 14, 2014 at 12:21 PM, Shengjun Xin <sx...@gopivotal.com> wrote:
> You can check the nodemanager log
>
>
> On Mon, Apr 14, 2014 at 2:38 PM, Rahul Singh <sm...@gmail.com>wrote:
>
>> Hi,
>> I am running a job(wordcount example) on 3 node cluster(1 master and 2
>> slave), some times the job passes but some times it fails(as reduce fails,
>> input data few kbs).
>> I am not able to nail down the reason of this inconsistency.
>>
>>
>>
>> failed log:
>>
>> 14/04/14 11:57:24 WARN util.NativeCodeLoader: Unable to load
>> native-hadoop library for your platform... using builtin-java classes where
>> applicable
>> 14/04/14 11:57:25 INFO client.RMProxy: Connecting to ResourceManager at /
>> 20.0.1.206:8032
>> 14/04/14 11:57:26 INFO input.FileInputFormat: Total input paths to
>> process : 1
>> 14/04/14 11:57:26 INFO mapreduce.JobSubmitter: number of splits:1
>> 14/04/14 11:57:26 INFO mapreduce.JobSubmitter: Submitting tokens for job:
>> job_1397454060494_0003
>> 14/04/14 11:57:26 INFO impl.YarnClientImpl: Submitted application
>> application_1397454060494_0003
>> 14/04/14 11:57:26 INFO mapreduce.Job: The url to track the job:
>> http://20.0.1.206:8088/proxy/application_1397454060494_0003/
>> 14/04/14 11:57:26 INFO mapreduce.Job: Running job: job_1397454060494_0003
>> 14/04/14 11:57:34 INFO mapreduce.Job: Job job_1397454060494_0003 running
>> in uber mode : false
>> 14/04/14 11:57:34 INFO mapreduce.Job: map 0% reduce 0%
>> 14/04/14 11:57:40 INFO mapreduce.Job: map 100% reduce 0%
>> 14/04/14 11:57:46 INFO mapreduce.Job: map 100% reduce 13%
>> 14/04/14 11:57:48 INFO mapreduce.Job: map 100% reduce 25%
>> 14/04/14 11:57:49 INFO mapreduce.Job: map 100% reduce 38%
>> 14/04/14 11:57:50 INFO mapreduce.Job: map 100% reduce 50%
>> 14/04/14 11:57:54 INFO mapreduce.Job: Task Id :
>> attempt_1397454060494_0003_r_000003_0, Status : FAILED
>> 14/04/14 11:57:54 INFO mapreduce.Job: Task Id :
>> attempt_1397454060494_0003_r_000001_0, Status : FAILED
>> 14/04/14 11:57:56 INFO mapreduce.Job: Task Id :
>> attempt_1397454060494_0003_r_000005_0, Status : FAILED
>> 14/04/14 11:57:56 INFO mapreduce.Job: Task Id :
>> attempt_1397454060494_0003_r_000007_0, Status : FAILED
>> 14/04/14 11:58:02 INFO mapreduce.Job: map 100% reduce 63%
>> 14/04/14 11:58:04 INFO mapreduce.Job: map 100% reduce 75%
>> 14/04/14 11:58:09 INFO mapreduce.Job: Task Id :
>> attempt_1397454060494_0003_r_000003_1, Status : FAILED
>> 14/04/14 11:58:11 INFO mapreduce.Job: Task Id :
>> attempt_1397454060494_0003_r_000005_1, Status : FAILED
>> 14/04/14 11:58:24 INFO mapreduce.Job: Task Id :
>> attempt_1397454060494_0003_r_000003_2, Status : FAILED
>> 14/04/14 11:58:26 INFO mapreduce.Job: Task Id :
>> attempt_1397454060494_0003_r_000005_2, Status : FAILED
>> 14/04/14 11:58:40 INFO mapreduce.Job: map 100% reduce 100%
>> 14/04/14 11:58:40 INFO mapreduce.Job: Job job_1397454060494_0003 failed
>> with state FAILED due to: Task failed task_1397454060494_0003_r_000003
>> Job failed as tasks failed. failedMaps:0 failedReduces:1
>>
>> 14/04/14 11:58:40 INFO mapreduce.Job: Counters: 51
>> File System Counters
>> FILE: Number of bytes read=80
>> FILE: Number of bytes written=596766
>> FILE: Number of read operations=0
>> FILE: Number of large read operations=0
>> FILE: Number of write operations=0
>> HDFS: Number of bytes read=175
>> HDFS: Number of bytes written=28
>> HDFS: Number of read operations=21
>> HDFS: Number of large read operations=0
>> HDFS: Number of write operations=12
>> Job Counters
>> Failed reduce tasks=9
>> Killed reduce tasks=1
>> Launched map tasks=1
>> Launched reduce tasks=16
>> Data-local map tasks=1
>> Total time spent by all maps in occupied slots (ms)=3477
>> Total time spent by all reduces in occupied slots (ms)=148867
>> Total time spent by all map tasks (ms)=3477
>> Total time spent by all reduce tasks (ms)=148867
>> Total vcore-seconds taken by all map tasks=3477
>> Total vcore-seconds taken by all reduce tasks=148867
>> Total megabyte-seconds taken by all map tasks=3560448
>> Total megabyte-seconds taken by all reduce tasks=152439808
>> Map-Reduce Framework
>> Map input records=3
>> Map output records=13
>> Map output bytes=110
>> Map output materialized bytes=112
>> Input split bytes=117
>> Combine input records=13
>> Combine output records=6
>> Reduce input groups=4
>> Reduce shuffle bytes=80
>> Reduce input records=4
>> Reduce output records=4
>> Spilled Records=10
>> Shuffled Maps =6
>> Failed Shuffles=0
>> Merged Map outputs=6
>> GC time elapsed (ms)=142
>> CPU time spent (ms)=6420
>> Physical memory (bytes) snapshot=1100853248
>> Virtual memory (bytes) snapshot=4468314112
>> Total committed heap usage (bytes)=1406992384
>> Shuffle Errors
>> BAD_ID=0
>> CONNECTION=0
>> IO_ERROR=0
>> WRONG_LENGTH=0
>> WRONG_MAP=0
>> WRONG_REDUCE=0
>> File Input Format Counters
>> Bytes Read=58
>> File Output Format Counters
>> Bytes Written=28
>>
>> Job Passing Logs:
>> hadoop jar ../share/hadoop/mapreduce/hadoop-mapreduce-examples-2.3.0.jar
>> wordcount /user/hduser/input /user/hduser/output_wordcount9
>> 14/04/14 11:47:27 WARN util.NativeCodeLoader: Unable to load
>> native-hadoop library for your platform... using builtin-java classes where
>> applicable
>> 14/04/14 11:47:28 INFO client.RMProxy: Connecting to ResourceManager at /
>> 20.0.1.206:8032
>> 14/04/14 11:47:28 INFO input.FileInputFormat: Total input paths to
>> process : 1
>> 14/04/14 11:47:29 INFO mapreduce.JobSubmitter: number of splits:1
>> 14/04/14 11:47:29 INFO mapreduce.JobSubmitter: Submitting tokens for job:
>> job_1397454060494_0002
>> 14/04/14 11:47:29 INFO impl.YarnClientImpl: Submitted application
>> application_1397454060494_0002
>> 14/04/14 11:47:29 INFO mapreduce.Job: The url to track the job:
>> http://20.0.1.206:8088/proxy/application_1397454060494_0002/
>> 14/04/14 11:47:29 INFO mapreduce.Job: Running job: job_1397454060494_0002
>> 14/04/14 11:47:36 INFO mapreduce.Job: Job job_1397454060494_0002 running
>> in uber mode : false
>> 14/04/14 11:47:36 INFO mapreduce.Job: map 0% reduce 0%
>> 14/04/14 11:47:50 INFO mapreduce.Job: Task Id :
>> attempt_1397454060494_0002_m_000000_0, Status : FAILED
>> 14/04/14 11:48:05 INFO mapreduce.Job: Task Id :
>> attempt_1397454060494_0002_m_000000_1, Status : FAILED
>> 14/04/14 11:48:20 INFO mapreduce.Job: Task Id :
>> attempt_1397454060494_0002_m_000000_2, Status : FAILED
>> 14/04/14 11:48:26 INFO mapreduce.Job: map 100% reduce 0%
>> 14/04/14 11:48:34 INFO mapreduce.Job: map 100% reduce 13%
>> 14/04/14 11:48:35 INFO mapreduce.Job: map 100% reduce 25%
>> 14/04/14 11:48:37 INFO mapreduce.Job: map 100% reduce 50%
>> 14/04/14 11:48:41 INFO mapreduce.Job: Task Id :
>> attempt_1397454060494_0002_r_000001_0, Status : FAILED
>> 14/04/14 11:48:42 INFO mapreduce.Job: Task Id :
>> attempt_1397454060494_0002_r_000003_0, Status : FAILED
>> 14/04/14 11:48:43 INFO mapreduce.Job: Task Id :
>> attempt_1397454060494_0002_r_000005_0, Status : FAILED
>> 14/04/14 11:48:44 INFO mapreduce.Job: Task Id :
>> attempt_1397454060494_0002_r_000007_0, Status : FAILED
>> 14/04/14 11:48:50 INFO mapreduce.Job: map 100% reduce 63%
>> 14/04/14 11:48:51 INFO mapreduce.Job: map 100% reduce 75%
>> 14/04/14 11:48:52 INFO mapreduce.Job: map 100% reduce 88%
>> 14/04/14 11:48:58 INFO mapreduce.Job: Task Id :
>> attempt_1397454060494_0002_r_000005_1, Status : FAILED
>> 14/04/14 11:49:05 INFO mapreduce.Job: map 100% reduce 100%
>> 14/04/14 11:49:06 INFO mapreduce.Job: Job job_1397454060494_0002
>> completed successfully
>> 14/04/14 11:49:06 INFO mapreduce.Job: Counters: 52
>> File System Counters
>> FILE: Number of bytes read=112
>> FILE: Number of bytes written=767175
>> FILE: Number of read operations=0
>> FILE: Number of large read operations=0
>> FILE: Number of write operations=0
>> HDFS: Number of bytes read=175
>> HDFS: Number of bytes written=40
>> HDFS: Number of read operations=27
>> HDFS: Number of large read operations=0
>> HDFS: Number of write operations=16
>> Job Counters
>> Failed map tasks=3
>> Failed reduce tasks=5
>> Launched map tasks=4
>> Launched reduce tasks=13
>> Other local map tasks=3
>> Data-local map tasks=1
>> Total time spent by all maps in occupied slots (ms)=41629
>> Total time spent by all reduces in occupied slots (ms)=104530
>> Total time spent by all map tasks (ms)=41629
>> Total time spent by all reduce tasks (ms)=104530
>> Total vcore-seconds taken by all map tasks=41629
>> Total vcore-seconds taken by all reduce tasks=104530
>> Total megabyte-seconds taken by all map tasks=42628096
>> Total megabyte-seconds taken by all reduce tasks=107038720
>> Map-Reduce Framework
>> Map input records=3
>> Map output records=13
>> Map output bytes=110
>> Map output materialized bytes=112
>> Input split bytes=117
>> Combine input records=13
>> Combine output records=6
>> Reduce input groups=6
>> Reduce shuffle bytes=112
>> Reduce input records=6
>> Reduce output records=6
>> Spilled Records=12
>> Shuffled Maps =8
>> Failed Shuffles=0
>> Merged Map outputs=8
>> GC time elapsed (ms)=186
>> CPU time spent (ms)=8890
>> Physical memory (bytes) snapshot=1408913408
>> Virtual memory (bytes) snapshot=5727019008
>> Total committed heap usage (bytes)=1808990208
>> Shuffle Errors
>> BAD_ID=0
>> CONNECTION=0
>> IO_ERROR=0
>> WRONG_LENGTH=0
>> WRONG_MAP=0
>> WRONG_REDUCE=0
>> File Input Format Counters
>> Bytes Read=58
>> File Output Format Counters
>> Bytes Written=40
>>
>> Thanks and Regards,
>> -Rahul Singh
>>
>
>
>
> --
> Regards
> Shengjun
>
Re: hadoop inconsistent behaviour
Posted by Rahul Singh <sm...@gmail.com>.
I cleaned up the log directory before running the job. Now there is no
nodemanager jobs. when i see in userlogs directory i am getting some syslog
files with the following error:
2014-04-14 11:58:23,472 INFO [main] org.apache.hadoop.ipc.Client: Retrying
connect to server: poc-hadoop06/127.0.1.1:40221. Already tried 7 time(s);
retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10,
sleepTime=1000 MILLISECONDS)
2014-04-14 11:58:24,474 INFO [main] org.apache.hadoop.ipc.Client: Retrying
connect to server: poc-hadoop06/127.0.1.1:40221. Already tried 8 time(s);
retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10,
sleepTime=1000 MILLISECONDS)
2014-04-14 11:58:25,475 INFO [main] org.apache.hadoop.ipc.Client: Retrying
connect to server: poc-hadoop06/127.0.1.1:40221. Already tried 9 time(s);
retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10,
sleepTime=1000 MILLISECONDS)
2014-04-14 11:58:25,477 WARN [main] org.apache.hadoop.mapred.YarnChild:
Exception running child : java.net.ConnectException: Call From poc-hadoop06/
127.0.1.1 to poc-hadoop06:40221 failed on connection exception:
java.net.ConnectException: Connection refused; For more details see:
http://wiki.apache.org/hadoop/ConnectionRefused
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
Method)
at
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
at
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:783)
at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:730)
at org.apache.hadoop.ipc.Client.call(Client.java:1410)
at org.apache.hadoop.ipc.Client.call(Client.java:1359)
at
org.apache.hadoop.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:231)
at $Proxy6.getTask(Unknown Source)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:137)
Caused by: java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:567)
at
org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:529)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:493)
at
org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:601)
at
org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:696)
at
org.apache.hadoop.ipc.Client$Connection.access$2700(Client.java:367)
at org.apache.hadoop.ipc.Client.getConnection(Client.java:1458)
at org.apache.hadoop.ipc.Client.call(Client.java:1377)
... 4 more
2014-04-14 11:58:25,478 INFO [main]
org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Stopping ReduceTask
metrics system...
2014-04-14 11:58:25,478 INFO [main]
org.apache.hadoop.metrics2.impl.MetricsSystemImpl: ReduceTask metrics
system stopped.
2014-04-14 11:58:25,479 INFO [main]
org.apache.hadoop.metrics2.impl.MetricsSystemImpl: ReduceTask metrics
system shutdown complete.
On Mon, Apr 14, 2014 at 12:21 PM, Shengjun Xin <sx...@gopivotal.com> wrote:
> You can check the nodemanager log
>
>
> On Mon, Apr 14, 2014 at 2:38 PM, Rahul Singh <sm...@gmail.com>wrote:
>
>> Hi,
>> I am running a job(wordcount example) on 3 node cluster(1 master and 2
>> slave), some times the job passes but some times it fails(as reduce fails,
>> input data few kbs).
>> I am not able to nail down the reason of this inconsistency.
>>
>>
>>
>> failed log:
>>
>> 14/04/14 11:57:24 WARN util.NativeCodeLoader: Unable to load
>> native-hadoop library for your platform... using builtin-java classes where
>> applicable
>> 14/04/14 11:57:25 INFO client.RMProxy: Connecting to ResourceManager at /
>> 20.0.1.206:8032
>> 14/04/14 11:57:26 INFO input.FileInputFormat: Total input paths to
>> process : 1
>> 14/04/14 11:57:26 INFO mapreduce.JobSubmitter: number of splits:1
>> 14/04/14 11:57:26 INFO mapreduce.JobSubmitter: Submitting tokens for job:
>> job_1397454060494_0003
>> 14/04/14 11:57:26 INFO impl.YarnClientImpl: Submitted application
>> application_1397454060494_0003
>> 14/04/14 11:57:26 INFO mapreduce.Job: The url to track the job:
>> http://20.0.1.206:8088/proxy/application_1397454060494_0003/
>> 14/04/14 11:57:26 INFO mapreduce.Job: Running job: job_1397454060494_0003
>> 14/04/14 11:57:34 INFO mapreduce.Job: Job job_1397454060494_0003 running
>> in uber mode : false
>> 14/04/14 11:57:34 INFO mapreduce.Job: map 0% reduce 0%
>> 14/04/14 11:57:40 INFO mapreduce.Job: map 100% reduce 0%
>> 14/04/14 11:57:46 INFO mapreduce.Job: map 100% reduce 13%
>> 14/04/14 11:57:48 INFO mapreduce.Job: map 100% reduce 25%
>> 14/04/14 11:57:49 INFO mapreduce.Job: map 100% reduce 38%
>> 14/04/14 11:57:50 INFO mapreduce.Job: map 100% reduce 50%
>> 14/04/14 11:57:54 INFO mapreduce.Job: Task Id :
>> attempt_1397454060494_0003_r_000003_0, Status : FAILED
>> 14/04/14 11:57:54 INFO mapreduce.Job: Task Id :
>> attempt_1397454060494_0003_r_000001_0, Status : FAILED
>> 14/04/14 11:57:56 INFO mapreduce.Job: Task Id :
>> attempt_1397454060494_0003_r_000005_0, Status : FAILED
>> 14/04/14 11:57:56 INFO mapreduce.Job: Task Id :
>> attempt_1397454060494_0003_r_000007_0, Status : FAILED
>> 14/04/14 11:58:02 INFO mapreduce.Job: map 100% reduce 63%
>> 14/04/14 11:58:04 INFO mapreduce.Job: map 100% reduce 75%
>> 14/04/14 11:58:09 INFO mapreduce.Job: Task Id :
>> attempt_1397454060494_0003_r_000003_1, Status : FAILED
>> 14/04/14 11:58:11 INFO mapreduce.Job: Task Id :
>> attempt_1397454060494_0003_r_000005_1, Status : FAILED
>> 14/04/14 11:58:24 INFO mapreduce.Job: Task Id :
>> attempt_1397454060494_0003_r_000003_2, Status : FAILED
>> 14/04/14 11:58:26 INFO mapreduce.Job: Task Id :
>> attempt_1397454060494_0003_r_000005_2, Status : FAILED
>> 14/04/14 11:58:40 INFO mapreduce.Job: map 100% reduce 100%
>> 14/04/14 11:58:40 INFO mapreduce.Job: Job job_1397454060494_0003 failed
>> with state FAILED due to: Task failed task_1397454060494_0003_r_000003
>> Job failed as tasks failed. failedMaps:0 failedReduces:1
>>
>> 14/04/14 11:58:40 INFO mapreduce.Job: Counters: 51
>> File System Counters
>> FILE: Number of bytes read=80
>> FILE: Number of bytes written=596766
>> FILE: Number of read operations=0
>> FILE: Number of large read operations=0
>> FILE: Number of write operations=0
>> HDFS: Number of bytes read=175
>> HDFS: Number of bytes written=28
>> HDFS: Number of read operations=21
>> HDFS: Number of large read operations=0
>> HDFS: Number of write operations=12
>> Job Counters
>> Failed reduce tasks=9
>> Killed reduce tasks=1
>> Launched map tasks=1
>> Launched reduce tasks=16
>> Data-local map tasks=1
>> Total time spent by all maps in occupied slots (ms)=3477
>> Total time spent by all reduces in occupied slots (ms)=148867
>> Total time spent by all map tasks (ms)=3477
>> Total time spent by all reduce tasks (ms)=148867
>> Total vcore-seconds taken by all map tasks=3477
>> Total vcore-seconds taken by all reduce tasks=148867
>> Total megabyte-seconds taken by all map tasks=3560448
>> Total megabyte-seconds taken by all reduce tasks=152439808
>> Map-Reduce Framework
>> Map input records=3
>> Map output records=13
>> Map output bytes=110
>> Map output materialized bytes=112
>> Input split bytes=117
>> Combine input records=13
>> Combine output records=6
>> Reduce input groups=4
>> Reduce shuffle bytes=80
>> Reduce input records=4
>> Reduce output records=4
>> Spilled Records=10
>> Shuffled Maps =6
>> Failed Shuffles=0
>> Merged Map outputs=6
>> GC time elapsed (ms)=142
>> CPU time spent (ms)=6420
>> Physical memory (bytes) snapshot=1100853248
>> Virtual memory (bytes) snapshot=4468314112
>> Total committed heap usage (bytes)=1406992384
>> Shuffle Errors
>> BAD_ID=0
>> CONNECTION=0
>> IO_ERROR=0
>> WRONG_LENGTH=0
>> WRONG_MAP=0
>> WRONG_REDUCE=0
>> File Input Format Counters
>> Bytes Read=58
>> File Output Format Counters
>> Bytes Written=28
>>
>> Job Passing Logs:
>> hadoop jar ../share/hadoop/mapreduce/hadoop-mapreduce-examples-2.3.0.jar
>> wordcount /user/hduser/input /user/hduser/output_wordcount9
>> 14/04/14 11:47:27 WARN util.NativeCodeLoader: Unable to load
>> native-hadoop library for your platform... using builtin-java classes where
>> applicable
>> 14/04/14 11:47:28 INFO client.RMProxy: Connecting to ResourceManager at /
>> 20.0.1.206:8032
>> 14/04/14 11:47:28 INFO input.FileInputFormat: Total input paths to
>> process : 1
>> 14/04/14 11:47:29 INFO mapreduce.JobSubmitter: number of splits:1
>> 14/04/14 11:47:29 INFO mapreduce.JobSubmitter: Submitting tokens for job:
>> job_1397454060494_0002
>> 14/04/14 11:47:29 INFO impl.YarnClientImpl: Submitted application
>> application_1397454060494_0002
>> 14/04/14 11:47:29 INFO mapreduce.Job: The url to track the job:
>> http://20.0.1.206:8088/proxy/application_1397454060494_0002/
>> 14/04/14 11:47:29 INFO mapreduce.Job: Running job: job_1397454060494_0002
>> 14/04/14 11:47:36 INFO mapreduce.Job: Job job_1397454060494_0002 running
>> in uber mode : false
>> 14/04/14 11:47:36 INFO mapreduce.Job: map 0% reduce 0%
>> 14/04/14 11:47:50 INFO mapreduce.Job: Task Id :
>> attempt_1397454060494_0002_m_000000_0, Status : FAILED
>> 14/04/14 11:48:05 INFO mapreduce.Job: Task Id :
>> attempt_1397454060494_0002_m_000000_1, Status : FAILED
>> 14/04/14 11:48:20 INFO mapreduce.Job: Task Id :
>> attempt_1397454060494_0002_m_000000_2, Status : FAILED
>> 14/04/14 11:48:26 INFO mapreduce.Job: map 100% reduce 0%
>> 14/04/14 11:48:34 INFO mapreduce.Job: map 100% reduce 13%
>> 14/04/14 11:48:35 INFO mapreduce.Job: map 100% reduce 25%
>> 14/04/14 11:48:37 INFO mapreduce.Job: map 100% reduce 50%
>> 14/04/14 11:48:41 INFO mapreduce.Job: Task Id :
>> attempt_1397454060494_0002_r_000001_0, Status : FAILED
>> 14/04/14 11:48:42 INFO mapreduce.Job: Task Id :
>> attempt_1397454060494_0002_r_000003_0, Status : FAILED
>> 14/04/14 11:48:43 INFO mapreduce.Job: Task Id :
>> attempt_1397454060494_0002_r_000005_0, Status : FAILED
>> 14/04/14 11:48:44 INFO mapreduce.Job: Task Id :
>> attempt_1397454060494_0002_r_000007_0, Status : FAILED
>> 14/04/14 11:48:50 INFO mapreduce.Job: map 100% reduce 63%
>> 14/04/14 11:48:51 INFO mapreduce.Job: map 100% reduce 75%
>> 14/04/14 11:48:52 INFO mapreduce.Job: map 100% reduce 88%
>> 14/04/14 11:48:58 INFO mapreduce.Job: Task Id :
>> attempt_1397454060494_0002_r_000005_1, Status : FAILED
>> 14/04/14 11:49:05 INFO mapreduce.Job: map 100% reduce 100%
>> 14/04/14 11:49:06 INFO mapreduce.Job: Job job_1397454060494_0002
>> completed successfully
>> 14/04/14 11:49:06 INFO mapreduce.Job: Counters: 52
>> File System Counters
>> FILE: Number of bytes read=112
>> FILE: Number of bytes written=767175
>> FILE: Number of read operations=0
>> FILE: Number of large read operations=0
>> FILE: Number of write operations=0
>> HDFS: Number of bytes read=175
>> HDFS: Number of bytes written=40
>> HDFS: Number of read operations=27
>> HDFS: Number of large read operations=0
>> HDFS: Number of write operations=16
>> Job Counters
>> Failed map tasks=3
>> Failed reduce tasks=5
>> Launched map tasks=4
>> Launched reduce tasks=13
>> Other local map tasks=3
>> Data-local map tasks=1
>> Total time spent by all maps in occupied slots (ms)=41629
>> Total time spent by all reduces in occupied slots (ms)=104530
>> Total time spent by all map tasks (ms)=41629
>> Total time spent by all reduce tasks (ms)=104530
>> Total vcore-seconds taken by all map tasks=41629
>> Total vcore-seconds taken by all reduce tasks=104530
>> Total megabyte-seconds taken by all map tasks=42628096
>> Total megabyte-seconds taken by all reduce tasks=107038720
>> Map-Reduce Framework
>> Map input records=3
>> Map output records=13
>> Map output bytes=110
>> Map output materialized bytes=112
>> Input split bytes=117
>> Combine input records=13
>> Combine output records=6
>> Reduce input groups=6
>> Reduce shuffle bytes=112
>> Reduce input records=6
>> Reduce output records=6
>> Spilled Records=12
>> Shuffled Maps =8
>> Failed Shuffles=0
>> Merged Map outputs=8
>> GC time elapsed (ms)=186
>> CPU time spent (ms)=8890
>> Physical memory (bytes) snapshot=1408913408
>> Virtual memory (bytes) snapshot=5727019008
>> Total committed heap usage (bytes)=1808990208
>> Shuffle Errors
>> BAD_ID=0
>> CONNECTION=0
>> IO_ERROR=0
>> WRONG_LENGTH=0
>> WRONG_MAP=0
>> WRONG_REDUCE=0
>> File Input Format Counters
>> Bytes Read=58
>> File Output Format Counters
>> Bytes Written=40
>>
>> Thanks and Regards,
>> -Rahul Singh
>>
>
>
>
> --
> Regards
> Shengjun
>
Re: hadoop inconsistent behaviour
Posted by Shengjun Xin <sx...@gopivotal.com>.
You can check the nodemanager log
On Mon, Apr 14, 2014 at 2:38 PM, Rahul Singh <sm...@gmail.com>wrote:
> Hi,
> I am running a job(wordcount example) on 3 node cluster(1 master and 2
> slave), some times the job passes but some times it fails(as reduce fails,
> input data few kbs).
> I am not able to nail down the reason of this inconsistency.
>
>
>
> failed log:
>
> 14/04/14 11:57:24 WARN util.NativeCodeLoader: Unable to load native-hadoop
> library for your platform... using builtin-java classes where applicable
> 14/04/14 11:57:25 INFO client.RMProxy: Connecting to ResourceManager at /
> 20.0.1.206:8032
> 14/04/14 11:57:26 INFO input.FileInputFormat: Total input paths to process
> : 1
> 14/04/14 11:57:26 INFO mapreduce.JobSubmitter: number of splits:1
> 14/04/14 11:57:26 INFO mapreduce.JobSubmitter: Submitting tokens for job:
> job_1397454060494_0003
> 14/04/14 11:57:26 INFO impl.YarnClientImpl: Submitted application
> application_1397454060494_0003
> 14/04/14 11:57:26 INFO mapreduce.Job: The url to track the job:
> http://20.0.1.206:8088/proxy/application_1397454060494_0003/
> 14/04/14 11:57:26 INFO mapreduce.Job: Running job: job_1397454060494_0003
> 14/04/14 11:57:34 INFO mapreduce.Job: Job job_1397454060494_0003 running
> in uber mode : false
> 14/04/14 11:57:34 INFO mapreduce.Job: map 0% reduce 0%
> 14/04/14 11:57:40 INFO mapreduce.Job: map 100% reduce 0%
> 14/04/14 11:57:46 INFO mapreduce.Job: map 100% reduce 13%
> 14/04/14 11:57:48 INFO mapreduce.Job: map 100% reduce 25%
> 14/04/14 11:57:49 INFO mapreduce.Job: map 100% reduce 38%
> 14/04/14 11:57:50 INFO mapreduce.Job: map 100% reduce 50%
> 14/04/14 11:57:54 INFO mapreduce.Job: Task Id :
> attempt_1397454060494_0003_r_000003_0, Status : FAILED
> 14/04/14 11:57:54 INFO mapreduce.Job: Task Id :
> attempt_1397454060494_0003_r_000001_0, Status : FAILED
> 14/04/14 11:57:56 INFO mapreduce.Job: Task Id :
> attempt_1397454060494_0003_r_000005_0, Status : FAILED
> 14/04/14 11:57:56 INFO mapreduce.Job: Task Id :
> attempt_1397454060494_0003_r_000007_0, Status : FAILED
> 14/04/14 11:58:02 INFO mapreduce.Job: map 100% reduce 63%
> 14/04/14 11:58:04 INFO mapreduce.Job: map 100% reduce 75%
> 14/04/14 11:58:09 INFO mapreduce.Job: Task Id :
> attempt_1397454060494_0003_r_000003_1, Status : FAILED
> 14/04/14 11:58:11 INFO mapreduce.Job: Task Id :
> attempt_1397454060494_0003_r_000005_1, Status : FAILED
> 14/04/14 11:58:24 INFO mapreduce.Job: Task Id :
> attempt_1397454060494_0003_r_000003_2, Status : FAILED
> 14/04/14 11:58:26 INFO mapreduce.Job: Task Id :
> attempt_1397454060494_0003_r_000005_2, Status : FAILED
> 14/04/14 11:58:40 INFO mapreduce.Job: map 100% reduce 100%
> 14/04/14 11:58:40 INFO mapreduce.Job: Job job_1397454060494_0003 failed
> with state FAILED due to: Task failed task_1397454060494_0003_r_000003
> Job failed as tasks failed. failedMaps:0 failedReduces:1
>
> 14/04/14 11:58:40 INFO mapreduce.Job: Counters: 51
> File System Counters
> FILE: Number of bytes read=80
> FILE: Number of bytes written=596766
> FILE: Number of read operations=0
> FILE: Number of large read operations=0
> FILE: Number of write operations=0
> HDFS: Number of bytes read=175
> HDFS: Number of bytes written=28
> HDFS: Number of read operations=21
> HDFS: Number of large read operations=0
> HDFS: Number of write operations=12
> Job Counters
> Failed reduce tasks=9
> Killed reduce tasks=1
> Launched map tasks=1
> Launched reduce tasks=16
> Data-local map tasks=1
> Total time spent by all maps in occupied slots (ms)=3477
> Total time spent by all reduces in occupied slots (ms)=148867
> Total time spent by all map tasks (ms)=3477
> Total time spent by all reduce tasks (ms)=148867
> Total vcore-seconds taken by all map tasks=3477
> Total vcore-seconds taken by all reduce tasks=148867
> Total megabyte-seconds taken by all map tasks=3560448
> Total megabyte-seconds taken by all reduce tasks=152439808
> Map-Reduce Framework
> Map input records=3
> Map output records=13
> Map output bytes=110
> Map output materialized bytes=112
> Input split bytes=117
> Combine input records=13
> Combine output records=6
> Reduce input groups=4
> Reduce shuffle bytes=80
> Reduce input records=4
> Reduce output records=4
> Spilled Records=10
> Shuffled Maps =6
> Failed Shuffles=0
> Merged Map outputs=6
> GC time elapsed (ms)=142
> CPU time spent (ms)=6420
> Physical memory (bytes) snapshot=1100853248
> Virtual memory (bytes) snapshot=4468314112
> Total committed heap usage (bytes)=1406992384
> Shuffle Errors
> BAD_ID=0
> CONNECTION=0
> IO_ERROR=0
> WRONG_LENGTH=0
> WRONG_MAP=0
> WRONG_REDUCE=0
> File Input Format Counters
> Bytes Read=58
> File Output Format Counters
> Bytes Written=28
>
> Job Passing Logs:
> hadoop jar ../share/hadoop/mapreduce/hadoop-mapreduce-examples-2.3.0.jar
> wordcount /user/hduser/input /user/hduser/output_wordcount9
> 14/04/14 11:47:27 WARN util.NativeCodeLoader: Unable to load native-hadoop
> library for your platform... using builtin-java classes where applicable
> 14/04/14 11:47:28 INFO client.RMProxy: Connecting to ResourceManager at /
> 20.0.1.206:8032
> 14/04/14 11:47:28 INFO input.FileInputFormat: Total input paths to process
> : 1
> 14/04/14 11:47:29 INFO mapreduce.JobSubmitter: number of splits:1
> 14/04/14 11:47:29 INFO mapreduce.JobSubmitter: Submitting tokens for job:
> job_1397454060494_0002
> 14/04/14 11:47:29 INFO impl.YarnClientImpl: Submitted application
> application_1397454060494_0002
> 14/04/14 11:47:29 INFO mapreduce.Job: The url to track the job:
> http://20.0.1.206:8088/proxy/application_1397454060494_0002/
> 14/04/14 11:47:29 INFO mapreduce.Job: Running job: job_1397454060494_0002
> 14/04/14 11:47:36 INFO mapreduce.Job: Job job_1397454060494_0002 running
> in uber mode : false
> 14/04/14 11:47:36 INFO mapreduce.Job: map 0% reduce 0%
> 14/04/14 11:47:50 INFO mapreduce.Job: Task Id :
> attempt_1397454060494_0002_m_000000_0, Status : FAILED
> 14/04/14 11:48:05 INFO mapreduce.Job: Task Id :
> attempt_1397454060494_0002_m_000000_1, Status : FAILED
> 14/04/14 11:48:20 INFO mapreduce.Job: Task Id :
> attempt_1397454060494_0002_m_000000_2, Status : FAILED
> 14/04/14 11:48:26 INFO mapreduce.Job: map 100% reduce 0%
> 14/04/14 11:48:34 INFO mapreduce.Job: map 100% reduce 13%
> 14/04/14 11:48:35 INFO mapreduce.Job: map 100% reduce 25%
> 14/04/14 11:48:37 INFO mapreduce.Job: map 100% reduce 50%
> 14/04/14 11:48:41 INFO mapreduce.Job: Task Id :
> attempt_1397454060494_0002_r_000001_0, Status : FAILED
> 14/04/14 11:48:42 INFO mapreduce.Job: Task Id :
> attempt_1397454060494_0002_r_000003_0, Status : FAILED
> 14/04/14 11:48:43 INFO mapreduce.Job: Task Id :
> attempt_1397454060494_0002_r_000005_0, Status : FAILED
> 14/04/14 11:48:44 INFO mapreduce.Job: Task Id :
> attempt_1397454060494_0002_r_000007_0, Status : FAILED
> 14/04/14 11:48:50 INFO mapreduce.Job: map 100% reduce 63%
> 14/04/14 11:48:51 INFO mapreduce.Job: map 100% reduce 75%
> 14/04/14 11:48:52 INFO mapreduce.Job: map 100% reduce 88%
> 14/04/14 11:48:58 INFO mapreduce.Job: Task Id :
> attempt_1397454060494_0002_r_000005_1, Status : FAILED
> 14/04/14 11:49:05 INFO mapreduce.Job: map 100% reduce 100%
> 14/04/14 11:49:06 INFO mapreduce.Job: Job job_1397454060494_0002 completed
> successfully
> 14/04/14 11:49:06 INFO mapreduce.Job: Counters: 52
> File System Counters
> FILE: Number of bytes read=112
> FILE: Number of bytes written=767175
> FILE: Number of read operations=0
> FILE: Number of large read operations=0
> FILE: Number of write operations=0
> HDFS: Number of bytes read=175
> HDFS: Number of bytes written=40
> HDFS: Number of read operations=27
> HDFS: Number of large read operations=0
> HDFS: Number of write operations=16
> Job Counters
> Failed map tasks=3
> Failed reduce tasks=5
> Launched map tasks=4
> Launched reduce tasks=13
> Other local map tasks=3
> Data-local map tasks=1
> Total time spent by all maps in occupied slots (ms)=41629
> Total time spent by all reduces in occupied slots (ms)=104530
> Total time spent by all map tasks (ms)=41629
> Total time spent by all reduce tasks (ms)=104530
> Total vcore-seconds taken by all map tasks=41629
> Total vcore-seconds taken by all reduce tasks=104530
> Total megabyte-seconds taken by all map tasks=42628096
> Total megabyte-seconds taken by all reduce tasks=107038720
> Map-Reduce Framework
> Map input records=3
> Map output records=13
> Map output bytes=110
> Map output materialized bytes=112
> Input split bytes=117
> Combine input records=13
> Combine output records=6
> Reduce input groups=6
> Reduce shuffle bytes=112
> Reduce input records=6
> Reduce output records=6
> Spilled Records=12
> Shuffled Maps =8
> Failed Shuffles=0
> Merged Map outputs=8
> GC time elapsed (ms)=186
> CPU time spent (ms)=8890
> Physical memory (bytes) snapshot=1408913408
> Virtual memory (bytes) snapshot=5727019008
> Total committed heap usage (bytes)=1808990208
> Shuffle Errors
> BAD_ID=0
> CONNECTION=0
> IO_ERROR=0
> WRONG_LENGTH=0
> WRONG_MAP=0
> WRONG_REDUCE=0
> File Input Format Counters
> Bytes Read=58
> File Output Format Counters
> Bytes Written=40
>
> Thanks and Regards,
> -Rahul Singh
>
--
Regards
Shengjun
Re: hadoop inconsistent behaviour
Posted by Gordon Wang <gw...@gopivotal.com>.
You can find the reduce container from RM's web page.
BTW: from above log, you can check if application master crashes.
On Mon, Apr 14, 2014 at 3:12 PM, Rahul Singh <sm...@gmail.com>wrote:
> how do i identify an reduce container? there are multiple container dirs
> in my application id folder in userlogs.
>
>
> On Mon, Apr 14, 2014 at 12:29 PM, Gordon Wang <gw...@gopivotal.com> wrote:
>
>> Hi Rahul,
>>
>> What is the log of reduce container ? Please paste the log and we can see
>> the reason.
>>
>>
>> On Mon, Apr 14, 2014 at 2:38 PM, Rahul Singh <sm...@gmail.com>wrote:
>>
>>> Hi,
>>> I am running a job(wordcount example) on 3 node cluster(1 master and 2
>>> slave), some times the job passes but some times it fails(as reduce fails,
>>> input data few kbs).
>>> I am not able to nail down the reason of this inconsistency.
>>>
>>>
>>>
>>> failed log:
>>>
>>> 14/04/14 11:57:24 WARN util.NativeCodeLoader: Unable to load
>>> native-hadoop library for your platform... using builtin-java classes where
>>> applicable
>>> 14/04/14 11:57:25 INFO client.RMProxy: Connecting to ResourceManager at /
>>> 20.0.1.206:8032
>>> 14/04/14 11:57:26 INFO input.FileInputFormat: Total input paths to
>>> process : 1
>>> 14/04/14 11:57:26 INFO mapreduce.JobSubmitter: number of splits:1
>>> 14/04/14 11:57:26 INFO mapreduce.JobSubmitter: Submitting tokens for
>>> job: job_1397454060494_0003
>>> 14/04/14 11:57:26 INFO impl.YarnClientImpl: Submitted application
>>> application_1397454060494_0003
>>> 14/04/14 11:57:26 INFO mapreduce.Job: The url to track the job:
>>> http://20.0.1.206:8088/proxy/application_1397454060494_0003/
>>> 14/04/14 11:57:26 INFO mapreduce.Job: Running job: job_1397454060494_0003
>>> 14/04/14 11:57:34 INFO mapreduce.Job: Job job_1397454060494_0003 running
>>> in uber mode : false
>>> 14/04/14 11:57:34 INFO mapreduce.Job: map 0% reduce 0%
>>> 14/04/14 11:57:40 INFO mapreduce.Job: map 100% reduce 0%
>>> 14/04/14 11:57:46 INFO mapreduce.Job: map 100% reduce 13%
>>> 14/04/14 11:57:48 INFO mapreduce.Job: map 100% reduce 25%
>>> 14/04/14 11:57:49 INFO mapreduce.Job: map 100% reduce 38%
>>> 14/04/14 11:57:50 INFO mapreduce.Job: map 100% reduce 50%
>>> 14/04/14 11:57:54 INFO mapreduce.Job: Task Id :
>>> attempt_1397454060494_0003_r_000003_0, Status : FAILED
>>> 14/04/14 11:57:54 INFO mapreduce.Job: Task Id :
>>> attempt_1397454060494_0003_r_000001_0, Status : FAILED
>>> 14/04/14 11:57:56 INFO mapreduce.Job: Task Id :
>>> attempt_1397454060494_0003_r_000005_0, Status : FAILED
>>> 14/04/14 11:57:56 INFO mapreduce.Job: Task Id :
>>> attempt_1397454060494_0003_r_000007_0, Status : FAILED
>>> 14/04/14 11:58:02 INFO mapreduce.Job: map 100% reduce 63%
>>> 14/04/14 11:58:04 INFO mapreduce.Job: map 100% reduce 75%
>>> 14/04/14 11:58:09 INFO mapreduce.Job: Task Id :
>>> attempt_1397454060494_0003_r_000003_1, Status : FAILED
>>> 14/04/14 11:58:11 INFO mapreduce.Job: Task Id :
>>> attempt_1397454060494_0003_r_000005_1, Status : FAILED
>>> 14/04/14 11:58:24 INFO mapreduce.Job: Task Id :
>>> attempt_1397454060494_0003_r_000003_2, Status : FAILED
>>> 14/04/14 11:58:26 INFO mapreduce.Job: Task Id :
>>> attempt_1397454060494_0003_r_000005_2, Status : FAILED
>>> 14/04/14 11:58:40 INFO mapreduce.Job: map 100% reduce 100%
>>> 14/04/14 11:58:40 INFO mapreduce.Job: Job job_1397454060494_0003 failed
>>> with state FAILED due to: Task failed task_1397454060494_0003_r_000003
>>> Job failed as tasks failed. failedMaps:0 failedReduces:1
>>>
>>> 14/04/14 11:58:40 INFO mapreduce.Job: Counters: 51
>>> File System Counters
>>> FILE: Number of bytes read=80
>>> FILE: Number of bytes written=596766
>>> FILE: Number of read operations=0
>>> FILE: Number of large read operations=0
>>> FILE: Number of write operations=0
>>> HDFS: Number of bytes read=175
>>> HDFS: Number of bytes written=28
>>> HDFS: Number of read operations=21
>>> HDFS: Number of large read operations=0
>>> HDFS: Number of write operations=12
>>> Job Counters
>>> Failed reduce tasks=9
>>> Killed reduce tasks=1
>>> Launched map tasks=1
>>> Launched reduce tasks=16
>>> Data-local map tasks=1
>>> Total time spent by all maps in occupied slots (ms)=3477
>>> Total time spent by all reduces in occupied slots (ms)=148867
>>> Total time spent by all map tasks (ms)=3477
>>> Total time spent by all reduce tasks (ms)=148867
>>> Total vcore-seconds taken by all map tasks=3477
>>> Total vcore-seconds taken by all reduce tasks=148867
>>> Total megabyte-seconds taken by all map tasks=3560448
>>> Total megabyte-seconds taken by all reduce tasks=152439808
>>> Map-Reduce Framework
>>> Map input records=3
>>> Map output records=13
>>> Map output bytes=110
>>> Map output materialized bytes=112
>>> Input split bytes=117
>>> Combine input records=13
>>> Combine output records=6
>>> Reduce input groups=4
>>> Reduce shuffle bytes=80
>>> Reduce input records=4
>>> Reduce output records=4
>>> Spilled Records=10
>>> Shuffled Maps =6
>>> Failed Shuffles=0
>>> Merged Map outputs=6
>>> GC time elapsed (ms)=142
>>> CPU time spent (ms)=6420
>>> Physical memory (bytes) snapshot=1100853248
>>> Virtual memory (bytes) snapshot=4468314112
>>> Total committed heap usage (bytes)=1406992384
>>> Shuffle Errors
>>> BAD_ID=0
>>> CONNECTION=0
>>> IO_ERROR=0
>>> WRONG_LENGTH=0
>>> WRONG_MAP=0
>>> WRONG_REDUCE=0
>>> File Input Format Counters
>>> Bytes Read=58
>>> File Output Format Counters
>>> Bytes Written=28
>>>
>>> Job Passing Logs:
>>> hadoop jar ../share/hadoop/mapreduce/hadoop-mapreduce-examples-2.3.0.jar
>>> wordcount /user/hduser/input /user/hduser/output_wordcount9
>>> 14/04/14 11:47:27 WARN util.NativeCodeLoader: Unable to load
>>> native-hadoop library for your platform... using builtin-java classes where
>>> applicable
>>> 14/04/14 11:47:28 INFO client.RMProxy: Connecting to ResourceManager at /
>>> 20.0.1.206:8032
>>> 14/04/14 11:47:28 INFO input.FileInputFormat: Total input paths to
>>> process : 1
>>> 14/04/14 11:47:29 INFO mapreduce.JobSubmitter: number of splits:1
>>> 14/04/14 11:47:29 INFO mapreduce.JobSubmitter: Submitting tokens for
>>> job: job_1397454060494_0002
>>> 14/04/14 11:47:29 INFO impl.YarnClientImpl: Submitted application
>>> application_1397454060494_0002
>>> 14/04/14 11:47:29 INFO mapreduce.Job: The url to track the job:
>>> http://20.0.1.206:8088/proxy/application_1397454060494_0002/
>>> 14/04/14 11:47:29 INFO mapreduce.Job: Running job: job_1397454060494_0002
>>> 14/04/14 11:47:36 INFO mapreduce.Job: Job job_1397454060494_0002 running
>>> in uber mode : false
>>> 14/04/14 11:47:36 INFO mapreduce.Job: map 0% reduce 0%
>>> 14/04/14 11:47:50 INFO mapreduce.Job: Task Id :
>>> attempt_1397454060494_0002_m_000000_0, Status : FAILED
>>> 14/04/14 11:48:05 INFO mapreduce.Job: Task Id :
>>> attempt_1397454060494_0002_m_000000_1, Status : FAILED
>>> 14/04/14 11:48:20 INFO mapreduce.Job: Task Id :
>>> attempt_1397454060494_0002_m_000000_2, Status : FAILED
>>> 14/04/14 11:48:26 INFO mapreduce.Job: map 100% reduce 0%
>>> 14/04/14 11:48:34 INFO mapreduce.Job: map 100% reduce 13%
>>> 14/04/14 11:48:35 INFO mapreduce.Job: map 100% reduce 25%
>>> 14/04/14 11:48:37 INFO mapreduce.Job: map 100% reduce 50%
>>> 14/04/14 11:48:41 INFO mapreduce.Job: Task Id :
>>> attempt_1397454060494_0002_r_000001_0, Status : FAILED
>>> 14/04/14 11:48:42 INFO mapreduce.Job: Task Id :
>>> attempt_1397454060494_0002_r_000003_0, Status : FAILED
>>> 14/04/14 11:48:43 INFO mapreduce.Job: Task Id :
>>> attempt_1397454060494_0002_r_000005_0, Status : FAILED
>>> 14/04/14 11:48:44 INFO mapreduce.Job: Task Id :
>>> attempt_1397454060494_0002_r_000007_0, Status : FAILED
>>> 14/04/14 11:48:50 INFO mapreduce.Job: map 100% reduce 63%
>>> 14/04/14 11:48:51 INFO mapreduce.Job: map 100% reduce 75%
>>> 14/04/14 11:48:52 INFO mapreduce.Job: map 100% reduce 88%
>>> 14/04/14 11:48:58 INFO mapreduce.Job: Task Id :
>>> attempt_1397454060494_0002_r_000005_1, Status : FAILED
>>> 14/04/14 11:49:05 INFO mapreduce.Job: map 100% reduce 100%
>>> 14/04/14 11:49:06 INFO mapreduce.Job: Job job_1397454060494_0002
>>> completed successfully
>>> 14/04/14 11:49:06 INFO mapreduce.Job: Counters: 52
>>> File System Counters
>>> FILE: Number of bytes read=112
>>> FILE: Number of bytes written=767175
>>> FILE: Number of read operations=0
>>> FILE: Number of large read operations=0
>>> FILE: Number of write operations=0
>>> HDFS: Number of bytes read=175
>>> HDFS: Number of bytes written=40
>>> HDFS: Number of read operations=27
>>> HDFS: Number of large read operations=0
>>> HDFS: Number of write operations=16
>>> Job Counters
>>> Failed map tasks=3
>>> Failed reduce tasks=5
>>> Launched map tasks=4
>>> Launched reduce tasks=13
>>> Other local map tasks=3
>>> Data-local map tasks=1
>>> Total time spent by all maps in occupied slots (ms)=41629
>>> Total time spent by all reduces in occupied slots (ms)=104530
>>> Total time spent by all map tasks (ms)=41629
>>> Total time spent by all reduce tasks (ms)=104530
>>> Total vcore-seconds taken by all map tasks=41629
>>> Total vcore-seconds taken by all reduce tasks=104530
>>> Total megabyte-seconds taken by all map tasks=42628096
>>> Total megabyte-seconds taken by all reduce tasks=107038720
>>> Map-Reduce Framework
>>> Map input records=3
>>> Map output records=13
>>> Map output bytes=110
>>> Map output materialized bytes=112
>>> Input split bytes=117
>>> Combine input records=13
>>> Combine output records=6
>>> Reduce input groups=6
>>> Reduce shuffle bytes=112
>>> Reduce input records=6
>>> Reduce output records=6
>>> Spilled Records=12
>>> Shuffled Maps =8
>>> Failed Shuffles=0
>>> Merged Map outputs=8
>>> GC time elapsed (ms)=186
>>> CPU time spent (ms)=8890
>>> Physical memory (bytes) snapshot=1408913408
>>> Virtual memory (bytes) snapshot=5727019008
>>> Total committed heap usage (bytes)=1808990208
>>> Shuffle Errors
>>> BAD_ID=0
>>> CONNECTION=0
>>> IO_ERROR=0
>>> WRONG_LENGTH=0
>>> WRONG_MAP=0
>>> WRONG_REDUCE=0
>>> File Input Format Counters
>>> Bytes Read=58
>>> File Output Format Counters
>>> Bytes Written=40
>>>
>>> Thanks and Regards,
>>> -Rahul Singh
>>>
>>
>>
>>
>> --
>> Regards
>> Gordon Wang
>>
>
>
--
Regards
Gordon Wang
Re: hadoop inconsistent behaviour
Posted by Gordon Wang <gw...@gopivotal.com>.
You can find the reduce container from RM's web page.
BTW: from above log, you can check if application master crashes.
On Mon, Apr 14, 2014 at 3:12 PM, Rahul Singh <sm...@gmail.com>wrote:
> how do i identify an reduce container? there are multiple container dirs
> in my application id folder in userlogs.
>
>
> On Mon, Apr 14, 2014 at 12:29 PM, Gordon Wang <gw...@gopivotal.com> wrote:
>
>> Hi Rahul,
>>
>> What is the log of reduce container ? Please paste the log and we can see
>> the reason.
>>
>>
>> On Mon, Apr 14, 2014 at 2:38 PM, Rahul Singh <sm...@gmail.com>wrote:
>>
>>> Hi,
>>> I am running a job(wordcount example) on 3 node cluster(1 master and 2
>>> slave), some times the job passes but some times it fails(as reduce fails,
>>> input data few kbs).
>>> I am not able to nail down the reason of this inconsistency.
>>>
>>>
>>>
>>> failed log:
>>>
>>> 14/04/14 11:57:24 WARN util.NativeCodeLoader: Unable to load
>>> native-hadoop library for your platform... using builtin-java classes where
>>> applicable
>>> 14/04/14 11:57:25 INFO client.RMProxy: Connecting to ResourceManager at /
>>> 20.0.1.206:8032
>>> 14/04/14 11:57:26 INFO input.FileInputFormat: Total input paths to
>>> process : 1
>>> 14/04/14 11:57:26 INFO mapreduce.JobSubmitter: number of splits:1
>>> 14/04/14 11:57:26 INFO mapreduce.JobSubmitter: Submitting tokens for
>>> job: job_1397454060494_0003
>>> 14/04/14 11:57:26 INFO impl.YarnClientImpl: Submitted application
>>> application_1397454060494_0003
>>> 14/04/14 11:57:26 INFO mapreduce.Job: The url to track the job:
>>> http://20.0.1.206:8088/proxy/application_1397454060494_0003/
>>> 14/04/14 11:57:26 INFO mapreduce.Job: Running job: job_1397454060494_0003
>>> 14/04/14 11:57:34 INFO mapreduce.Job: Job job_1397454060494_0003 running
>>> in uber mode : false
>>> 14/04/14 11:57:34 INFO mapreduce.Job: map 0% reduce 0%
>>> 14/04/14 11:57:40 INFO mapreduce.Job: map 100% reduce 0%
>>> 14/04/14 11:57:46 INFO mapreduce.Job: map 100% reduce 13%
>>> 14/04/14 11:57:48 INFO mapreduce.Job: map 100% reduce 25%
>>> 14/04/14 11:57:49 INFO mapreduce.Job: map 100% reduce 38%
>>> 14/04/14 11:57:50 INFO mapreduce.Job: map 100% reduce 50%
>>> 14/04/14 11:57:54 INFO mapreduce.Job: Task Id :
>>> attempt_1397454060494_0003_r_000003_0, Status : FAILED
>>> 14/04/14 11:57:54 INFO mapreduce.Job: Task Id :
>>> attempt_1397454060494_0003_r_000001_0, Status : FAILED
>>> 14/04/14 11:57:56 INFO mapreduce.Job: Task Id :
>>> attempt_1397454060494_0003_r_000005_0, Status : FAILED
>>> 14/04/14 11:57:56 INFO mapreduce.Job: Task Id :
>>> attempt_1397454060494_0003_r_000007_0, Status : FAILED
>>> 14/04/14 11:58:02 INFO mapreduce.Job: map 100% reduce 63%
>>> 14/04/14 11:58:04 INFO mapreduce.Job: map 100% reduce 75%
>>> 14/04/14 11:58:09 INFO mapreduce.Job: Task Id :
>>> attempt_1397454060494_0003_r_000003_1, Status : FAILED
>>> 14/04/14 11:58:11 INFO mapreduce.Job: Task Id :
>>> attempt_1397454060494_0003_r_000005_1, Status : FAILED
>>> 14/04/14 11:58:24 INFO mapreduce.Job: Task Id :
>>> attempt_1397454060494_0003_r_000003_2, Status : FAILED
>>> 14/04/14 11:58:26 INFO mapreduce.Job: Task Id :
>>> attempt_1397454060494_0003_r_000005_2, Status : FAILED
>>> 14/04/14 11:58:40 INFO mapreduce.Job: map 100% reduce 100%
>>> 14/04/14 11:58:40 INFO mapreduce.Job: Job job_1397454060494_0003 failed
>>> with state FAILED due to: Task failed task_1397454060494_0003_r_000003
>>> Job failed as tasks failed. failedMaps:0 failedReduces:1
>>>
>>> 14/04/14 11:58:40 INFO mapreduce.Job: Counters: 51
>>> File System Counters
>>> FILE: Number of bytes read=80
>>> FILE: Number of bytes written=596766
>>> FILE: Number of read operations=0
>>> FILE: Number of large read operations=0
>>> FILE: Number of write operations=0
>>> HDFS: Number of bytes read=175
>>> HDFS: Number of bytes written=28
>>> HDFS: Number of read operations=21
>>> HDFS: Number of large read operations=0
>>> HDFS: Number of write operations=12
>>> Job Counters
>>> Failed reduce tasks=9
>>> Killed reduce tasks=1
>>> Launched map tasks=1
>>> Launched reduce tasks=16
>>> Data-local map tasks=1
>>> Total time spent by all maps in occupied slots (ms)=3477
>>> Total time spent by all reduces in occupied slots (ms)=148867
>>> Total time spent by all map tasks (ms)=3477
>>> Total time spent by all reduce tasks (ms)=148867
>>> Total vcore-seconds taken by all map tasks=3477
>>> Total vcore-seconds taken by all reduce tasks=148867
>>> Total megabyte-seconds taken by all map tasks=3560448
>>> Total megabyte-seconds taken by all reduce tasks=152439808
>>> Map-Reduce Framework
>>> Map input records=3
>>> Map output records=13
>>> Map output bytes=110
>>> Map output materialized bytes=112
>>> Input split bytes=117
>>> Combine input records=13
>>> Combine output records=6
>>> Reduce input groups=4
>>> Reduce shuffle bytes=80
>>> Reduce input records=4
>>> Reduce output records=4
>>> Spilled Records=10
>>> Shuffled Maps =6
>>> Failed Shuffles=0
>>> Merged Map outputs=6
>>> GC time elapsed (ms)=142
>>> CPU time spent (ms)=6420
>>> Physical memory (bytes) snapshot=1100853248
>>> Virtual memory (bytes) snapshot=4468314112
>>> Total committed heap usage (bytes)=1406992384
>>> Shuffle Errors
>>> BAD_ID=0
>>> CONNECTION=0
>>> IO_ERROR=0
>>> WRONG_LENGTH=0
>>> WRONG_MAP=0
>>> WRONG_REDUCE=0
>>> File Input Format Counters
>>> Bytes Read=58
>>> File Output Format Counters
>>> Bytes Written=28
>>>
>>> Job Passing Logs:
>>> hadoop jar ../share/hadoop/mapreduce/hadoop-mapreduce-examples-2.3.0.jar
>>> wordcount /user/hduser/input /user/hduser/output_wordcount9
>>> 14/04/14 11:47:27 WARN util.NativeCodeLoader: Unable to load
>>> native-hadoop library for your platform... using builtin-java classes where
>>> applicable
>>> 14/04/14 11:47:28 INFO client.RMProxy: Connecting to ResourceManager at /
>>> 20.0.1.206:8032
>>> 14/04/14 11:47:28 INFO input.FileInputFormat: Total input paths to
>>> process : 1
>>> 14/04/14 11:47:29 INFO mapreduce.JobSubmitter: number of splits:1
>>> 14/04/14 11:47:29 INFO mapreduce.JobSubmitter: Submitting tokens for
>>> job: job_1397454060494_0002
>>> 14/04/14 11:47:29 INFO impl.YarnClientImpl: Submitted application
>>> application_1397454060494_0002
>>> 14/04/14 11:47:29 INFO mapreduce.Job: The url to track the job:
>>> http://20.0.1.206:8088/proxy/application_1397454060494_0002/
>>> 14/04/14 11:47:29 INFO mapreduce.Job: Running job: job_1397454060494_0002
>>> 14/04/14 11:47:36 INFO mapreduce.Job: Job job_1397454060494_0002 running
>>> in uber mode : false
>>> 14/04/14 11:47:36 INFO mapreduce.Job: map 0% reduce 0%
>>> 14/04/14 11:47:50 INFO mapreduce.Job: Task Id :
>>> attempt_1397454060494_0002_m_000000_0, Status : FAILED
>>> 14/04/14 11:48:05 INFO mapreduce.Job: Task Id :
>>> attempt_1397454060494_0002_m_000000_1, Status : FAILED
>>> 14/04/14 11:48:20 INFO mapreduce.Job: Task Id :
>>> attempt_1397454060494_0002_m_000000_2, Status : FAILED
>>> 14/04/14 11:48:26 INFO mapreduce.Job: map 100% reduce 0%
>>> 14/04/14 11:48:34 INFO mapreduce.Job: map 100% reduce 13%
>>> 14/04/14 11:48:35 INFO mapreduce.Job: map 100% reduce 25%
>>> 14/04/14 11:48:37 INFO mapreduce.Job: map 100% reduce 50%
>>> 14/04/14 11:48:41 INFO mapreduce.Job: Task Id :
>>> attempt_1397454060494_0002_r_000001_0, Status : FAILED
>>> 14/04/14 11:48:42 INFO mapreduce.Job: Task Id :
>>> attempt_1397454060494_0002_r_000003_0, Status : FAILED
>>> 14/04/14 11:48:43 INFO mapreduce.Job: Task Id :
>>> attempt_1397454060494_0002_r_000005_0, Status : FAILED
>>> 14/04/14 11:48:44 INFO mapreduce.Job: Task Id :
>>> attempt_1397454060494_0002_r_000007_0, Status : FAILED
>>> 14/04/14 11:48:50 INFO mapreduce.Job: map 100% reduce 63%
>>> 14/04/14 11:48:51 INFO mapreduce.Job: map 100% reduce 75%
>>> 14/04/14 11:48:52 INFO mapreduce.Job: map 100% reduce 88%
>>> 14/04/14 11:48:58 INFO mapreduce.Job: Task Id :
>>> attempt_1397454060494_0002_r_000005_1, Status : FAILED
>>> 14/04/14 11:49:05 INFO mapreduce.Job: map 100% reduce 100%
>>> 14/04/14 11:49:06 INFO mapreduce.Job: Job job_1397454060494_0002
>>> completed successfully
>>> 14/04/14 11:49:06 INFO mapreduce.Job: Counters: 52
>>> File System Counters
>>> FILE: Number of bytes read=112
>>> FILE: Number of bytes written=767175
>>> FILE: Number of read operations=0
>>> FILE: Number of large read operations=0
>>> FILE: Number of write operations=0
>>> HDFS: Number of bytes read=175
>>> HDFS: Number of bytes written=40
>>> HDFS: Number of read operations=27
>>> HDFS: Number of large read operations=0
>>> HDFS: Number of write operations=16
>>> Job Counters
>>> Failed map tasks=3
>>> Failed reduce tasks=5
>>> Launched map tasks=4
>>> Launched reduce tasks=13
>>> Other local map tasks=3
>>> Data-local map tasks=1
>>> Total time spent by all maps in occupied slots (ms)=41629
>>> Total time spent by all reduces in occupied slots (ms)=104530
>>> Total time spent by all map tasks (ms)=41629
>>> Total time spent by all reduce tasks (ms)=104530
>>> Total vcore-seconds taken by all map tasks=41629
>>> Total vcore-seconds taken by all reduce tasks=104530
>>> Total megabyte-seconds taken by all map tasks=42628096
>>> Total megabyte-seconds taken by all reduce tasks=107038720
>>> Map-Reduce Framework
>>> Map input records=3
>>> Map output records=13
>>> Map output bytes=110
>>> Map output materialized bytes=112
>>> Input split bytes=117
>>> Combine input records=13
>>> Combine output records=6
>>> Reduce input groups=6
>>> Reduce shuffle bytes=112
>>> Reduce input records=6
>>> Reduce output records=6
>>> Spilled Records=12
>>> Shuffled Maps =8
>>> Failed Shuffles=0
>>> Merged Map outputs=8
>>> GC time elapsed (ms)=186
>>> CPU time spent (ms)=8890
>>> Physical memory (bytes) snapshot=1408913408
>>> Virtual memory (bytes) snapshot=5727019008
>>> Total committed heap usage (bytes)=1808990208
>>> Shuffle Errors
>>> BAD_ID=0
>>> CONNECTION=0
>>> IO_ERROR=0
>>> WRONG_LENGTH=0
>>> WRONG_MAP=0
>>> WRONG_REDUCE=0
>>> File Input Format Counters
>>> Bytes Read=58
>>> File Output Format Counters
>>> Bytes Written=40
>>>
>>> Thanks and Regards,
>>> -Rahul Singh
>>>
>>
>>
>>
>> --
>> Regards
>> Gordon Wang
>>
>
>
--
Regards
Gordon Wang
Re: hadoop inconsistent behaviour
Posted by Gordon Wang <gw...@gopivotal.com>.
You can find the reduce container from RM's web page.
BTW: from above log, you can check if application master crashes.
On Mon, Apr 14, 2014 at 3:12 PM, Rahul Singh <sm...@gmail.com>wrote:
> how do i identify an reduce container? there are multiple container dirs
> in my application id folder in userlogs.
>
>
> On Mon, Apr 14, 2014 at 12:29 PM, Gordon Wang <gw...@gopivotal.com> wrote:
>
>> Hi Rahul,
>>
>> What is the log of reduce container ? Please paste the log and we can see
>> the reason.
>>
>>
>> On Mon, Apr 14, 2014 at 2:38 PM, Rahul Singh <sm...@gmail.com>wrote:
>>
>>> Hi,
>>> I am running a job(wordcount example) on 3 node cluster(1 master and 2
>>> slave), some times the job passes but some times it fails(as reduce fails,
>>> input data few kbs).
>>> I am not able to nail down the reason of this inconsistency.
>>>
>>>
>>>
>>> failed log:
>>>
>>> 14/04/14 11:57:24 WARN util.NativeCodeLoader: Unable to load
>>> native-hadoop library for your platform... using builtin-java classes where
>>> applicable
>>> 14/04/14 11:57:25 INFO client.RMProxy: Connecting to ResourceManager at /
>>> 20.0.1.206:8032
>>> 14/04/14 11:57:26 INFO input.FileInputFormat: Total input paths to
>>> process : 1
>>> 14/04/14 11:57:26 INFO mapreduce.JobSubmitter: number of splits:1
>>> 14/04/14 11:57:26 INFO mapreduce.JobSubmitter: Submitting tokens for
>>> job: job_1397454060494_0003
>>> 14/04/14 11:57:26 INFO impl.YarnClientImpl: Submitted application
>>> application_1397454060494_0003
>>> 14/04/14 11:57:26 INFO mapreduce.Job: The url to track the job:
>>> http://20.0.1.206:8088/proxy/application_1397454060494_0003/
>>> 14/04/14 11:57:26 INFO mapreduce.Job: Running job: job_1397454060494_0003
>>> 14/04/14 11:57:34 INFO mapreduce.Job: Job job_1397454060494_0003 running
>>> in uber mode : false
>>> 14/04/14 11:57:34 INFO mapreduce.Job: map 0% reduce 0%
>>> 14/04/14 11:57:40 INFO mapreduce.Job: map 100% reduce 0%
>>> 14/04/14 11:57:46 INFO mapreduce.Job: map 100% reduce 13%
>>> 14/04/14 11:57:48 INFO mapreduce.Job: map 100% reduce 25%
>>> 14/04/14 11:57:49 INFO mapreduce.Job: map 100% reduce 38%
>>> 14/04/14 11:57:50 INFO mapreduce.Job: map 100% reduce 50%
>>> 14/04/14 11:57:54 INFO mapreduce.Job: Task Id :
>>> attempt_1397454060494_0003_r_000003_0, Status : FAILED
>>> 14/04/14 11:57:54 INFO mapreduce.Job: Task Id :
>>> attempt_1397454060494_0003_r_000001_0, Status : FAILED
>>> 14/04/14 11:57:56 INFO mapreduce.Job: Task Id :
>>> attempt_1397454060494_0003_r_000005_0, Status : FAILED
>>> 14/04/14 11:57:56 INFO mapreduce.Job: Task Id :
>>> attempt_1397454060494_0003_r_000007_0, Status : FAILED
>>> 14/04/14 11:58:02 INFO mapreduce.Job: map 100% reduce 63%
>>> 14/04/14 11:58:04 INFO mapreduce.Job: map 100% reduce 75%
>>> 14/04/14 11:58:09 INFO mapreduce.Job: Task Id :
>>> attempt_1397454060494_0003_r_000003_1, Status : FAILED
>>> 14/04/14 11:58:11 INFO mapreduce.Job: Task Id :
>>> attempt_1397454060494_0003_r_000005_1, Status : FAILED
>>> 14/04/14 11:58:24 INFO mapreduce.Job: Task Id :
>>> attempt_1397454060494_0003_r_000003_2, Status : FAILED
>>> 14/04/14 11:58:26 INFO mapreduce.Job: Task Id :
>>> attempt_1397454060494_0003_r_000005_2, Status : FAILED
>>> 14/04/14 11:58:40 INFO mapreduce.Job: map 100% reduce 100%
>>> 14/04/14 11:58:40 INFO mapreduce.Job: Job job_1397454060494_0003 failed
>>> with state FAILED due to: Task failed task_1397454060494_0003_r_000003
>>> Job failed as tasks failed. failedMaps:0 failedReduces:1
>>>
>>> 14/04/14 11:58:40 INFO mapreduce.Job: Counters: 51
>>> File System Counters
>>> FILE: Number of bytes read=80
>>> FILE: Number of bytes written=596766
>>> FILE: Number of read operations=0
>>> FILE: Number of large read operations=0
>>> FILE: Number of write operations=0
>>> HDFS: Number of bytes read=175
>>> HDFS: Number of bytes written=28
>>> HDFS: Number of read operations=21
>>> HDFS: Number of large read operations=0
>>> HDFS: Number of write operations=12
>>> Job Counters
>>> Failed reduce tasks=9
>>> Killed reduce tasks=1
>>> Launched map tasks=1
>>> Launched reduce tasks=16
>>> Data-local map tasks=1
>>> Total time spent by all maps in occupied slots (ms)=3477
>>> Total time spent by all reduces in occupied slots (ms)=148867
>>> Total time spent by all map tasks (ms)=3477
>>> Total time spent by all reduce tasks (ms)=148867
>>> Total vcore-seconds taken by all map tasks=3477
>>> Total vcore-seconds taken by all reduce tasks=148867
>>> Total megabyte-seconds taken by all map tasks=3560448
>>> Total megabyte-seconds taken by all reduce tasks=152439808
>>> Map-Reduce Framework
>>> Map input records=3
>>> Map output records=13
>>> Map output bytes=110
>>> Map output materialized bytes=112
>>> Input split bytes=117
>>> Combine input records=13
>>> Combine output records=6
>>> Reduce input groups=4
>>> Reduce shuffle bytes=80
>>> Reduce input records=4
>>> Reduce output records=4
>>> Spilled Records=10
>>> Shuffled Maps =6
>>> Failed Shuffles=0
>>> Merged Map outputs=6
>>> GC time elapsed (ms)=142
>>> CPU time spent (ms)=6420
>>> Physical memory (bytes) snapshot=1100853248
>>> Virtual memory (bytes) snapshot=4468314112
>>> Total committed heap usage (bytes)=1406992384
>>> Shuffle Errors
>>> BAD_ID=0
>>> CONNECTION=0
>>> IO_ERROR=0
>>> WRONG_LENGTH=0
>>> WRONG_MAP=0
>>> WRONG_REDUCE=0
>>> File Input Format Counters
>>> Bytes Read=58
>>> File Output Format Counters
>>> Bytes Written=28
>>>
>>> Job Passing Logs:
>>> hadoop jar ../share/hadoop/mapreduce/hadoop-mapreduce-examples-2.3.0.jar
>>> wordcount /user/hduser/input /user/hduser/output_wordcount9
>>> 14/04/14 11:47:27 WARN util.NativeCodeLoader: Unable to load
>>> native-hadoop library for your platform... using builtin-java classes where
>>> applicable
>>> 14/04/14 11:47:28 INFO client.RMProxy: Connecting to ResourceManager at /
>>> 20.0.1.206:8032
>>> 14/04/14 11:47:28 INFO input.FileInputFormat: Total input paths to
>>> process : 1
>>> 14/04/14 11:47:29 INFO mapreduce.JobSubmitter: number of splits:1
>>> 14/04/14 11:47:29 INFO mapreduce.JobSubmitter: Submitting tokens for
>>> job: job_1397454060494_0002
>>> 14/04/14 11:47:29 INFO impl.YarnClientImpl: Submitted application
>>> application_1397454060494_0002
>>> 14/04/14 11:47:29 INFO mapreduce.Job: The url to track the job:
>>> http://20.0.1.206:8088/proxy/application_1397454060494_0002/
>>> 14/04/14 11:47:29 INFO mapreduce.Job: Running job: job_1397454060494_0002
>>> 14/04/14 11:47:36 INFO mapreduce.Job: Job job_1397454060494_0002 running
>>> in uber mode : false
>>> 14/04/14 11:47:36 INFO mapreduce.Job: map 0% reduce 0%
>>> 14/04/14 11:47:50 INFO mapreduce.Job: Task Id :
>>> attempt_1397454060494_0002_m_000000_0, Status : FAILED
>>> 14/04/14 11:48:05 INFO mapreduce.Job: Task Id :
>>> attempt_1397454060494_0002_m_000000_1, Status : FAILED
>>> 14/04/14 11:48:20 INFO mapreduce.Job: Task Id :
>>> attempt_1397454060494_0002_m_000000_2, Status : FAILED
>>> 14/04/14 11:48:26 INFO mapreduce.Job: map 100% reduce 0%
>>> 14/04/14 11:48:34 INFO mapreduce.Job: map 100% reduce 13%
>>> 14/04/14 11:48:35 INFO mapreduce.Job: map 100% reduce 25%
>>> 14/04/14 11:48:37 INFO mapreduce.Job: map 100% reduce 50%
>>> 14/04/14 11:48:41 INFO mapreduce.Job: Task Id :
>>> attempt_1397454060494_0002_r_000001_0, Status : FAILED
>>> 14/04/14 11:48:42 INFO mapreduce.Job: Task Id :
>>> attempt_1397454060494_0002_r_000003_0, Status : FAILED
>>> 14/04/14 11:48:43 INFO mapreduce.Job: Task Id :
>>> attempt_1397454060494_0002_r_000005_0, Status : FAILED
>>> 14/04/14 11:48:44 INFO mapreduce.Job: Task Id :
>>> attempt_1397454060494_0002_r_000007_0, Status : FAILED
>>> 14/04/14 11:48:50 INFO mapreduce.Job: map 100% reduce 63%
>>> 14/04/14 11:48:51 INFO mapreduce.Job: map 100% reduce 75%
>>> 14/04/14 11:48:52 INFO mapreduce.Job: map 100% reduce 88%
>>> 14/04/14 11:48:58 INFO mapreduce.Job: Task Id :
>>> attempt_1397454060494_0002_r_000005_1, Status : FAILED
>>> 14/04/14 11:49:05 INFO mapreduce.Job: map 100% reduce 100%
>>> 14/04/14 11:49:06 INFO mapreduce.Job: Job job_1397454060494_0002
>>> completed successfully
>>> 14/04/14 11:49:06 INFO mapreduce.Job: Counters: 52
>>> File System Counters
>>> FILE: Number of bytes read=112
>>> FILE: Number of bytes written=767175
>>> FILE: Number of read operations=0
>>> FILE: Number of large read operations=0
>>> FILE: Number of write operations=0
>>> HDFS: Number of bytes read=175
>>> HDFS: Number of bytes written=40
>>> HDFS: Number of read operations=27
>>> HDFS: Number of large read operations=0
>>> HDFS: Number of write operations=16
>>> Job Counters
>>> Failed map tasks=3
>>> Failed reduce tasks=5
>>> Launched map tasks=4
>>> Launched reduce tasks=13
>>> Other local map tasks=3
>>> Data-local map tasks=1
>>> Total time spent by all maps in occupied slots (ms)=41629
>>> Total time spent by all reduces in occupied slots (ms)=104530
>>> Total time spent by all map tasks (ms)=41629
>>> Total time spent by all reduce tasks (ms)=104530
>>> Total vcore-seconds taken by all map tasks=41629
>>> Total vcore-seconds taken by all reduce tasks=104530
>>> Total megabyte-seconds taken by all map tasks=42628096
>>> Total megabyte-seconds taken by all reduce tasks=107038720
>>> Map-Reduce Framework
>>> Map input records=3
>>> Map output records=13
>>> Map output bytes=110
>>> Map output materialized bytes=112
>>> Input split bytes=117
>>> Combine input records=13
>>> Combine output records=6
>>> Reduce input groups=6
>>> Reduce shuffle bytes=112
>>> Reduce input records=6
>>> Reduce output records=6
>>> Spilled Records=12
>>> Shuffled Maps =8
>>> Failed Shuffles=0
>>> Merged Map outputs=8
>>> GC time elapsed (ms)=186
>>> CPU time spent (ms)=8890
>>> Physical memory (bytes) snapshot=1408913408
>>> Virtual memory (bytes) snapshot=5727019008
>>> Total committed heap usage (bytes)=1808990208
>>> Shuffle Errors
>>> BAD_ID=0
>>> CONNECTION=0
>>> IO_ERROR=0
>>> WRONG_LENGTH=0
>>> WRONG_MAP=0
>>> WRONG_REDUCE=0
>>> File Input Format Counters
>>> Bytes Read=58
>>> File Output Format Counters
>>> Bytes Written=40
>>>
>>> Thanks and Regards,
>>> -Rahul Singh
>>>
>>
>>
>>
>> --
>> Regards
>> Gordon Wang
>>
>
>
--
Regards
Gordon Wang
Re: hadoop inconsistent behaviour
Posted by Gordon Wang <gw...@gopivotal.com>.
You can find the reduce container from RM's web page.
BTW: from above log, you can check if application master crashes.
On Mon, Apr 14, 2014 at 3:12 PM, Rahul Singh <sm...@gmail.com>wrote:
> how do i identify an reduce container? there are multiple container dirs
> in my application id folder in userlogs.
>
>
> On Mon, Apr 14, 2014 at 12:29 PM, Gordon Wang <gw...@gopivotal.com> wrote:
>
>> Hi Rahul,
>>
>> What is the log of reduce container ? Please paste the log and we can see
>> the reason.
>>
>>
>> On Mon, Apr 14, 2014 at 2:38 PM, Rahul Singh <sm...@gmail.com>wrote:
>>
>>> Hi,
>>> I am running a job(wordcount example) on 3 node cluster(1 master and 2
>>> slave), some times the job passes but some times it fails(as reduce fails,
>>> input data few kbs).
>>> I am not able to nail down the reason of this inconsistency.
>>>
>>>
>>>
>>> failed log:
>>>
>>> 14/04/14 11:57:24 WARN util.NativeCodeLoader: Unable to load
>>> native-hadoop library for your platform... using builtin-java classes where
>>> applicable
>>> 14/04/14 11:57:25 INFO client.RMProxy: Connecting to ResourceManager at /
>>> 20.0.1.206:8032
>>> 14/04/14 11:57:26 INFO input.FileInputFormat: Total input paths to
>>> process : 1
>>> 14/04/14 11:57:26 INFO mapreduce.JobSubmitter: number of splits:1
>>> 14/04/14 11:57:26 INFO mapreduce.JobSubmitter: Submitting tokens for
>>> job: job_1397454060494_0003
>>> 14/04/14 11:57:26 INFO impl.YarnClientImpl: Submitted application
>>> application_1397454060494_0003
>>> 14/04/14 11:57:26 INFO mapreduce.Job: The url to track the job:
>>> http://20.0.1.206:8088/proxy/application_1397454060494_0003/
>>> 14/04/14 11:57:26 INFO mapreduce.Job: Running job: job_1397454060494_0003
>>> 14/04/14 11:57:34 INFO mapreduce.Job: Job job_1397454060494_0003 running
>>> in uber mode : false
>>> 14/04/14 11:57:34 INFO mapreduce.Job: map 0% reduce 0%
>>> 14/04/14 11:57:40 INFO mapreduce.Job: map 100% reduce 0%
>>> 14/04/14 11:57:46 INFO mapreduce.Job: map 100% reduce 13%
>>> 14/04/14 11:57:48 INFO mapreduce.Job: map 100% reduce 25%
>>> 14/04/14 11:57:49 INFO mapreduce.Job: map 100% reduce 38%
>>> 14/04/14 11:57:50 INFO mapreduce.Job: map 100% reduce 50%
>>> 14/04/14 11:57:54 INFO mapreduce.Job: Task Id :
>>> attempt_1397454060494_0003_r_000003_0, Status : FAILED
>>> 14/04/14 11:57:54 INFO mapreduce.Job: Task Id :
>>> attempt_1397454060494_0003_r_000001_0, Status : FAILED
>>> 14/04/14 11:57:56 INFO mapreduce.Job: Task Id :
>>> attempt_1397454060494_0003_r_000005_0, Status : FAILED
>>> 14/04/14 11:57:56 INFO mapreduce.Job: Task Id :
>>> attempt_1397454060494_0003_r_000007_0, Status : FAILED
>>> 14/04/14 11:58:02 INFO mapreduce.Job: map 100% reduce 63%
>>> 14/04/14 11:58:04 INFO mapreduce.Job: map 100% reduce 75%
>>> 14/04/14 11:58:09 INFO mapreduce.Job: Task Id :
>>> attempt_1397454060494_0003_r_000003_1, Status : FAILED
>>> 14/04/14 11:58:11 INFO mapreduce.Job: Task Id :
>>> attempt_1397454060494_0003_r_000005_1, Status : FAILED
>>> 14/04/14 11:58:24 INFO mapreduce.Job: Task Id :
>>> attempt_1397454060494_0003_r_000003_2, Status : FAILED
>>> 14/04/14 11:58:26 INFO mapreduce.Job: Task Id :
>>> attempt_1397454060494_0003_r_000005_2, Status : FAILED
>>> 14/04/14 11:58:40 INFO mapreduce.Job: map 100% reduce 100%
>>> 14/04/14 11:58:40 INFO mapreduce.Job: Job job_1397454060494_0003 failed
>>> with state FAILED due to: Task failed task_1397454060494_0003_r_000003
>>> Job failed as tasks failed. failedMaps:0 failedReduces:1
>>>
>>> 14/04/14 11:58:40 INFO mapreduce.Job: Counters: 51
>>> File System Counters
>>> FILE: Number of bytes read=80
>>> FILE: Number of bytes written=596766
>>> FILE: Number of read operations=0
>>> FILE: Number of large read operations=0
>>> FILE: Number of write operations=0
>>> HDFS: Number of bytes read=175
>>> HDFS: Number of bytes written=28
>>> HDFS: Number of read operations=21
>>> HDFS: Number of large read operations=0
>>> HDFS: Number of write operations=12
>>> Job Counters
>>> Failed reduce tasks=9
>>> Killed reduce tasks=1
>>> Launched map tasks=1
>>> Launched reduce tasks=16
>>> Data-local map tasks=1
>>> Total time spent by all maps in occupied slots (ms)=3477
>>> Total time spent by all reduces in occupied slots (ms)=148867
>>> Total time spent by all map tasks (ms)=3477
>>> Total time spent by all reduce tasks (ms)=148867
>>> Total vcore-seconds taken by all map tasks=3477
>>> Total vcore-seconds taken by all reduce tasks=148867
>>> Total megabyte-seconds taken by all map tasks=3560448
>>> Total megabyte-seconds taken by all reduce tasks=152439808
>>> Map-Reduce Framework
>>> Map input records=3
>>> Map output records=13
>>> Map output bytes=110
>>> Map output materialized bytes=112
>>> Input split bytes=117
>>> Combine input records=13
>>> Combine output records=6
>>> Reduce input groups=4
>>> Reduce shuffle bytes=80
>>> Reduce input records=4
>>> Reduce output records=4
>>> Spilled Records=10
>>> Shuffled Maps =6
>>> Failed Shuffles=0
>>> Merged Map outputs=6
>>> GC time elapsed (ms)=142
>>> CPU time spent (ms)=6420
>>> Physical memory (bytes) snapshot=1100853248
>>> Virtual memory (bytes) snapshot=4468314112
>>> Total committed heap usage (bytes)=1406992384
>>> Shuffle Errors
>>> BAD_ID=0
>>> CONNECTION=0
>>> IO_ERROR=0
>>> WRONG_LENGTH=0
>>> WRONG_MAP=0
>>> WRONG_REDUCE=0
>>> File Input Format Counters
>>> Bytes Read=58
>>> File Output Format Counters
>>> Bytes Written=28
>>>
>>> Job Passing Logs:
>>> hadoop jar ../share/hadoop/mapreduce/hadoop-mapreduce-examples-2.3.0.jar
>>> wordcount /user/hduser/input /user/hduser/output_wordcount9
>>> 14/04/14 11:47:27 WARN util.NativeCodeLoader: Unable to load
>>> native-hadoop library for your platform... using builtin-java classes where
>>> applicable
>>> 14/04/14 11:47:28 INFO client.RMProxy: Connecting to ResourceManager at /
>>> 20.0.1.206:8032
>>> 14/04/14 11:47:28 INFO input.FileInputFormat: Total input paths to
>>> process : 1
>>> 14/04/14 11:47:29 INFO mapreduce.JobSubmitter: number of splits:1
>>> 14/04/14 11:47:29 INFO mapreduce.JobSubmitter: Submitting tokens for
>>> job: job_1397454060494_0002
>>> 14/04/14 11:47:29 INFO impl.YarnClientImpl: Submitted application
>>> application_1397454060494_0002
>>> 14/04/14 11:47:29 INFO mapreduce.Job: The url to track the job:
>>> http://20.0.1.206:8088/proxy/application_1397454060494_0002/
>>> 14/04/14 11:47:29 INFO mapreduce.Job: Running job: job_1397454060494_0002
>>> 14/04/14 11:47:36 INFO mapreduce.Job: Job job_1397454060494_0002 running
>>> in uber mode : false
>>> 14/04/14 11:47:36 INFO mapreduce.Job: map 0% reduce 0%
>>> 14/04/14 11:47:50 INFO mapreduce.Job: Task Id :
>>> attempt_1397454060494_0002_m_000000_0, Status : FAILED
>>> 14/04/14 11:48:05 INFO mapreduce.Job: Task Id :
>>> attempt_1397454060494_0002_m_000000_1, Status : FAILED
>>> 14/04/14 11:48:20 INFO mapreduce.Job: Task Id :
>>> attempt_1397454060494_0002_m_000000_2, Status : FAILED
>>> 14/04/14 11:48:26 INFO mapreduce.Job: map 100% reduce 0%
>>> 14/04/14 11:48:34 INFO mapreduce.Job: map 100% reduce 13%
>>> 14/04/14 11:48:35 INFO mapreduce.Job: map 100% reduce 25%
>>> 14/04/14 11:48:37 INFO mapreduce.Job: map 100% reduce 50%
>>> 14/04/14 11:48:41 INFO mapreduce.Job: Task Id :
>>> attempt_1397454060494_0002_r_000001_0, Status : FAILED
>>> 14/04/14 11:48:42 INFO mapreduce.Job: Task Id :
>>> attempt_1397454060494_0002_r_000003_0, Status : FAILED
>>> 14/04/14 11:48:43 INFO mapreduce.Job: Task Id :
>>> attempt_1397454060494_0002_r_000005_0, Status : FAILED
>>> 14/04/14 11:48:44 INFO mapreduce.Job: Task Id :
>>> attempt_1397454060494_0002_r_000007_0, Status : FAILED
>>> 14/04/14 11:48:50 INFO mapreduce.Job: map 100% reduce 63%
>>> 14/04/14 11:48:51 INFO mapreduce.Job: map 100% reduce 75%
>>> 14/04/14 11:48:52 INFO mapreduce.Job: map 100% reduce 88%
>>> 14/04/14 11:48:58 INFO mapreduce.Job: Task Id :
>>> attempt_1397454060494_0002_r_000005_1, Status : FAILED
>>> 14/04/14 11:49:05 INFO mapreduce.Job: map 100% reduce 100%
>>> 14/04/14 11:49:06 INFO mapreduce.Job: Job job_1397454060494_0002
>>> completed successfully
>>> 14/04/14 11:49:06 INFO mapreduce.Job: Counters: 52
>>> File System Counters
>>> FILE: Number of bytes read=112
>>> FILE: Number of bytes written=767175
>>> FILE: Number of read operations=0
>>> FILE: Number of large read operations=0
>>> FILE: Number of write operations=0
>>> HDFS: Number of bytes read=175
>>> HDFS: Number of bytes written=40
>>> HDFS: Number of read operations=27
>>> HDFS: Number of large read operations=0
>>> HDFS: Number of write operations=16
>>> Job Counters
>>> Failed map tasks=3
>>> Failed reduce tasks=5
>>> Launched map tasks=4
>>> Launched reduce tasks=13
>>> Other local map tasks=3
>>> Data-local map tasks=1
>>> Total time spent by all maps in occupied slots (ms)=41629
>>> Total time spent by all reduces in occupied slots (ms)=104530
>>> Total time spent by all map tasks (ms)=41629
>>> Total time spent by all reduce tasks (ms)=104530
>>> Total vcore-seconds taken by all map tasks=41629
>>> Total vcore-seconds taken by all reduce tasks=104530
>>> Total megabyte-seconds taken by all map tasks=42628096
>>> Total megabyte-seconds taken by all reduce tasks=107038720
>>> Map-Reduce Framework
>>> Map input records=3
>>> Map output records=13
>>> Map output bytes=110
>>> Map output materialized bytes=112
>>> Input split bytes=117
>>> Combine input records=13
>>> Combine output records=6
>>> Reduce input groups=6
>>> Reduce shuffle bytes=112
>>> Reduce input records=6
>>> Reduce output records=6
>>> Spilled Records=12
>>> Shuffled Maps =8
>>> Failed Shuffles=0
>>> Merged Map outputs=8
>>> GC time elapsed (ms)=186
>>> CPU time spent (ms)=8890
>>> Physical memory (bytes) snapshot=1408913408
>>> Virtual memory (bytes) snapshot=5727019008
>>> Total committed heap usage (bytes)=1808990208
>>> Shuffle Errors
>>> BAD_ID=0
>>> CONNECTION=0
>>> IO_ERROR=0
>>> WRONG_LENGTH=0
>>> WRONG_MAP=0
>>> WRONG_REDUCE=0
>>> File Input Format Counters
>>> Bytes Read=58
>>> File Output Format Counters
>>> Bytes Written=40
>>>
>>> Thanks and Regards,
>>> -Rahul Singh
>>>
>>
>>
>>
>> --
>> Regards
>> Gordon Wang
>>
>
>
--
Regards
Gordon Wang
Re: hadoop inconsistent behaviour
Posted by Rahul Singh <sm...@gmail.com>.
how do i identify an reduce container? there are multiple container dirs in
my application id folder in userlogs.
On Mon, Apr 14, 2014 at 12:29 PM, Gordon Wang <gw...@gopivotal.com> wrote:
> Hi Rahul,
>
> What is the log of reduce container ? Please paste the log and we can see
> the reason.
>
>
> On Mon, Apr 14, 2014 at 2:38 PM, Rahul Singh <sm...@gmail.com>wrote:
>
>> Hi,
>> I am running a job(wordcount example) on 3 node cluster(1 master and 2
>> slave), some times the job passes but some times it fails(as reduce fails,
>> input data few kbs).
>> I am not able to nail down the reason of this inconsistency.
>>
>>
>>
>> failed log:
>>
>> 14/04/14 11:57:24 WARN util.NativeCodeLoader: Unable to load
>> native-hadoop library for your platform... using builtin-java classes where
>> applicable
>> 14/04/14 11:57:25 INFO client.RMProxy: Connecting to ResourceManager at /
>> 20.0.1.206:8032
>> 14/04/14 11:57:26 INFO input.FileInputFormat: Total input paths to
>> process : 1
>> 14/04/14 11:57:26 INFO mapreduce.JobSubmitter: number of splits:1
>> 14/04/14 11:57:26 INFO mapreduce.JobSubmitter: Submitting tokens for job:
>> job_1397454060494_0003
>> 14/04/14 11:57:26 INFO impl.YarnClientImpl: Submitted application
>> application_1397454060494_0003
>> 14/04/14 11:57:26 INFO mapreduce.Job: The url to track the job:
>> http://20.0.1.206:8088/proxy/application_1397454060494_0003/
>> 14/04/14 11:57:26 INFO mapreduce.Job: Running job: job_1397454060494_0003
>> 14/04/14 11:57:34 INFO mapreduce.Job: Job job_1397454060494_0003 running
>> in uber mode : false
>> 14/04/14 11:57:34 INFO mapreduce.Job: map 0% reduce 0%
>> 14/04/14 11:57:40 INFO mapreduce.Job: map 100% reduce 0%
>> 14/04/14 11:57:46 INFO mapreduce.Job: map 100% reduce 13%
>> 14/04/14 11:57:48 INFO mapreduce.Job: map 100% reduce 25%
>> 14/04/14 11:57:49 INFO mapreduce.Job: map 100% reduce 38%
>> 14/04/14 11:57:50 INFO mapreduce.Job: map 100% reduce 50%
>> 14/04/14 11:57:54 INFO mapreduce.Job: Task Id :
>> attempt_1397454060494_0003_r_000003_0, Status : FAILED
>> 14/04/14 11:57:54 INFO mapreduce.Job: Task Id :
>> attempt_1397454060494_0003_r_000001_0, Status : FAILED
>> 14/04/14 11:57:56 INFO mapreduce.Job: Task Id :
>> attempt_1397454060494_0003_r_000005_0, Status : FAILED
>> 14/04/14 11:57:56 INFO mapreduce.Job: Task Id :
>> attempt_1397454060494_0003_r_000007_0, Status : FAILED
>> 14/04/14 11:58:02 INFO mapreduce.Job: map 100% reduce 63%
>> 14/04/14 11:58:04 INFO mapreduce.Job: map 100% reduce 75%
>> 14/04/14 11:58:09 INFO mapreduce.Job: Task Id :
>> attempt_1397454060494_0003_r_000003_1, Status : FAILED
>> 14/04/14 11:58:11 INFO mapreduce.Job: Task Id :
>> attempt_1397454060494_0003_r_000005_1, Status : FAILED
>> 14/04/14 11:58:24 INFO mapreduce.Job: Task Id :
>> attempt_1397454060494_0003_r_000003_2, Status : FAILED
>> 14/04/14 11:58:26 INFO mapreduce.Job: Task Id :
>> attempt_1397454060494_0003_r_000005_2, Status : FAILED
>> 14/04/14 11:58:40 INFO mapreduce.Job: map 100% reduce 100%
>> 14/04/14 11:58:40 INFO mapreduce.Job: Job job_1397454060494_0003 failed
>> with state FAILED due to: Task failed task_1397454060494_0003_r_000003
>> Job failed as tasks failed. failedMaps:0 failedReduces:1
>>
>> 14/04/14 11:58:40 INFO mapreduce.Job: Counters: 51
>> File System Counters
>> FILE: Number of bytes read=80
>> FILE: Number of bytes written=596766
>> FILE: Number of read operations=0
>> FILE: Number of large read operations=0
>> FILE: Number of write operations=0
>> HDFS: Number of bytes read=175
>> HDFS: Number of bytes written=28
>> HDFS: Number of read operations=21
>> HDFS: Number of large read operations=0
>> HDFS: Number of write operations=12
>> Job Counters
>> Failed reduce tasks=9
>> Killed reduce tasks=1
>> Launched map tasks=1
>> Launched reduce tasks=16
>> Data-local map tasks=1
>> Total time spent by all maps in occupied slots (ms)=3477
>> Total time spent by all reduces in occupied slots (ms)=148867
>> Total time spent by all map tasks (ms)=3477
>> Total time spent by all reduce tasks (ms)=148867
>> Total vcore-seconds taken by all map tasks=3477
>> Total vcore-seconds taken by all reduce tasks=148867
>> Total megabyte-seconds taken by all map tasks=3560448
>> Total megabyte-seconds taken by all reduce tasks=152439808
>> Map-Reduce Framework
>> Map input records=3
>> Map output records=13
>> Map output bytes=110
>> Map output materialized bytes=112
>> Input split bytes=117
>> Combine input records=13
>> Combine output records=6
>> Reduce input groups=4
>> Reduce shuffle bytes=80
>> Reduce input records=4
>> Reduce output records=4
>> Spilled Records=10
>> Shuffled Maps =6
>> Failed Shuffles=0
>> Merged Map outputs=6
>> GC time elapsed (ms)=142
>> CPU time spent (ms)=6420
>> Physical memory (bytes) snapshot=1100853248
>> Virtual memory (bytes) snapshot=4468314112
>> Total committed heap usage (bytes)=1406992384
>> Shuffle Errors
>> BAD_ID=0
>> CONNECTION=0
>> IO_ERROR=0
>> WRONG_LENGTH=0
>> WRONG_MAP=0
>> WRONG_REDUCE=0
>> File Input Format Counters
>> Bytes Read=58
>> File Output Format Counters
>> Bytes Written=28
>>
>> Job Passing Logs:
>> hadoop jar ../share/hadoop/mapreduce/hadoop-mapreduce-examples-2.3.0.jar
>> wordcount /user/hduser/input /user/hduser/output_wordcount9
>> 14/04/14 11:47:27 WARN util.NativeCodeLoader: Unable to load
>> native-hadoop library for your platform... using builtin-java classes where
>> applicable
>> 14/04/14 11:47:28 INFO client.RMProxy: Connecting to ResourceManager at /
>> 20.0.1.206:8032
>> 14/04/14 11:47:28 INFO input.FileInputFormat: Total input paths to
>> process : 1
>> 14/04/14 11:47:29 INFO mapreduce.JobSubmitter: number of splits:1
>> 14/04/14 11:47:29 INFO mapreduce.JobSubmitter: Submitting tokens for job:
>> job_1397454060494_0002
>> 14/04/14 11:47:29 INFO impl.YarnClientImpl: Submitted application
>> application_1397454060494_0002
>> 14/04/14 11:47:29 INFO mapreduce.Job: The url to track the job:
>> http://20.0.1.206:8088/proxy/application_1397454060494_0002/
>> 14/04/14 11:47:29 INFO mapreduce.Job: Running job: job_1397454060494_0002
>> 14/04/14 11:47:36 INFO mapreduce.Job: Job job_1397454060494_0002 running
>> in uber mode : false
>> 14/04/14 11:47:36 INFO mapreduce.Job: map 0% reduce 0%
>> 14/04/14 11:47:50 INFO mapreduce.Job: Task Id :
>> attempt_1397454060494_0002_m_000000_0, Status : FAILED
>> 14/04/14 11:48:05 INFO mapreduce.Job: Task Id :
>> attempt_1397454060494_0002_m_000000_1, Status : FAILED
>> 14/04/14 11:48:20 INFO mapreduce.Job: Task Id :
>> attempt_1397454060494_0002_m_000000_2, Status : FAILED
>> 14/04/14 11:48:26 INFO mapreduce.Job: map 100% reduce 0%
>> 14/04/14 11:48:34 INFO mapreduce.Job: map 100% reduce 13%
>> 14/04/14 11:48:35 INFO mapreduce.Job: map 100% reduce 25%
>> 14/04/14 11:48:37 INFO mapreduce.Job: map 100% reduce 50%
>> 14/04/14 11:48:41 INFO mapreduce.Job: Task Id :
>> attempt_1397454060494_0002_r_000001_0, Status : FAILED
>> 14/04/14 11:48:42 INFO mapreduce.Job: Task Id :
>> attempt_1397454060494_0002_r_000003_0, Status : FAILED
>> 14/04/14 11:48:43 INFO mapreduce.Job: Task Id :
>> attempt_1397454060494_0002_r_000005_0, Status : FAILED
>> 14/04/14 11:48:44 INFO mapreduce.Job: Task Id :
>> attempt_1397454060494_0002_r_000007_0, Status : FAILED
>> 14/04/14 11:48:50 INFO mapreduce.Job: map 100% reduce 63%
>> 14/04/14 11:48:51 INFO mapreduce.Job: map 100% reduce 75%
>> 14/04/14 11:48:52 INFO mapreduce.Job: map 100% reduce 88%
>> 14/04/14 11:48:58 INFO mapreduce.Job: Task Id :
>> attempt_1397454060494_0002_r_000005_1, Status : FAILED
>> 14/04/14 11:49:05 INFO mapreduce.Job: map 100% reduce 100%
>> 14/04/14 11:49:06 INFO mapreduce.Job: Job job_1397454060494_0002
>> completed successfully
>> 14/04/14 11:49:06 INFO mapreduce.Job: Counters: 52
>> File System Counters
>> FILE: Number of bytes read=112
>> FILE: Number of bytes written=767175
>> FILE: Number of read operations=0
>> FILE: Number of large read operations=0
>> FILE: Number of write operations=0
>> HDFS: Number of bytes read=175
>> HDFS: Number of bytes written=40
>> HDFS: Number of read operations=27
>> HDFS: Number of large read operations=0
>> HDFS: Number of write operations=16
>> Job Counters
>> Failed map tasks=3
>> Failed reduce tasks=5
>> Launched map tasks=4
>> Launched reduce tasks=13
>> Other local map tasks=3
>> Data-local map tasks=1
>> Total time spent by all maps in occupied slots (ms)=41629
>> Total time spent by all reduces in occupied slots (ms)=104530
>> Total time spent by all map tasks (ms)=41629
>> Total time spent by all reduce tasks (ms)=104530
>> Total vcore-seconds taken by all map tasks=41629
>> Total vcore-seconds taken by all reduce tasks=104530
>> Total megabyte-seconds taken by all map tasks=42628096
>> Total megabyte-seconds taken by all reduce tasks=107038720
>> Map-Reduce Framework
>> Map input records=3
>> Map output records=13
>> Map output bytes=110
>> Map output materialized bytes=112
>> Input split bytes=117
>> Combine input records=13
>> Combine output records=6
>> Reduce input groups=6
>> Reduce shuffle bytes=112
>> Reduce input records=6
>> Reduce output records=6
>> Spilled Records=12
>> Shuffled Maps =8
>> Failed Shuffles=0
>> Merged Map outputs=8
>> GC time elapsed (ms)=186
>> CPU time spent (ms)=8890
>> Physical memory (bytes) snapshot=1408913408
>> Virtual memory (bytes) snapshot=5727019008
>> Total committed heap usage (bytes)=1808990208
>> Shuffle Errors
>> BAD_ID=0
>> CONNECTION=0
>> IO_ERROR=0
>> WRONG_LENGTH=0
>> WRONG_MAP=0
>> WRONG_REDUCE=0
>> File Input Format Counters
>> Bytes Read=58
>> File Output Format Counters
>> Bytes Written=40
>>
>> Thanks and Regards,
>> -Rahul Singh
>>
>
>
>
> --
> Regards
> Gordon Wang
>
Re: hadoop inconsistent behaviour
Posted by Rahul Singh <sm...@gmail.com>.
how do i identify an reduce container? there are multiple container dirs in
my application id folder in userlogs.
On Mon, Apr 14, 2014 at 12:29 PM, Gordon Wang <gw...@gopivotal.com> wrote:
> Hi Rahul,
>
> What is the log of reduce container ? Please paste the log and we can see
> the reason.
>
>
> On Mon, Apr 14, 2014 at 2:38 PM, Rahul Singh <sm...@gmail.com>wrote:
>
>> Hi,
>> I am running a job(wordcount example) on 3 node cluster(1 master and 2
>> slave), some times the job passes but some times it fails(as reduce fails,
>> input data few kbs).
>> I am not able to nail down the reason of this inconsistency.
>>
>>
>>
>> failed log:
>>
>> 14/04/14 11:57:24 WARN util.NativeCodeLoader: Unable to load
>> native-hadoop library for your platform... using builtin-java classes where
>> applicable
>> 14/04/14 11:57:25 INFO client.RMProxy: Connecting to ResourceManager at /
>> 20.0.1.206:8032
>> 14/04/14 11:57:26 INFO input.FileInputFormat: Total input paths to
>> process : 1
>> 14/04/14 11:57:26 INFO mapreduce.JobSubmitter: number of splits:1
>> 14/04/14 11:57:26 INFO mapreduce.JobSubmitter: Submitting tokens for job:
>> job_1397454060494_0003
>> 14/04/14 11:57:26 INFO impl.YarnClientImpl: Submitted application
>> application_1397454060494_0003
>> 14/04/14 11:57:26 INFO mapreduce.Job: The url to track the job:
>> http://20.0.1.206:8088/proxy/application_1397454060494_0003/
>> 14/04/14 11:57:26 INFO mapreduce.Job: Running job: job_1397454060494_0003
>> 14/04/14 11:57:34 INFO mapreduce.Job: Job job_1397454060494_0003 running
>> in uber mode : false
>> 14/04/14 11:57:34 INFO mapreduce.Job: map 0% reduce 0%
>> 14/04/14 11:57:40 INFO mapreduce.Job: map 100% reduce 0%
>> 14/04/14 11:57:46 INFO mapreduce.Job: map 100% reduce 13%
>> 14/04/14 11:57:48 INFO mapreduce.Job: map 100% reduce 25%
>> 14/04/14 11:57:49 INFO mapreduce.Job: map 100% reduce 38%
>> 14/04/14 11:57:50 INFO mapreduce.Job: map 100% reduce 50%
>> 14/04/14 11:57:54 INFO mapreduce.Job: Task Id :
>> attempt_1397454060494_0003_r_000003_0, Status : FAILED
>> 14/04/14 11:57:54 INFO mapreduce.Job: Task Id :
>> attempt_1397454060494_0003_r_000001_0, Status : FAILED
>> 14/04/14 11:57:56 INFO mapreduce.Job: Task Id :
>> attempt_1397454060494_0003_r_000005_0, Status : FAILED
>> 14/04/14 11:57:56 INFO mapreduce.Job: Task Id :
>> attempt_1397454060494_0003_r_000007_0, Status : FAILED
>> 14/04/14 11:58:02 INFO mapreduce.Job: map 100% reduce 63%
>> 14/04/14 11:58:04 INFO mapreduce.Job: map 100% reduce 75%
>> 14/04/14 11:58:09 INFO mapreduce.Job: Task Id :
>> attempt_1397454060494_0003_r_000003_1, Status : FAILED
>> 14/04/14 11:58:11 INFO mapreduce.Job: Task Id :
>> attempt_1397454060494_0003_r_000005_1, Status : FAILED
>> 14/04/14 11:58:24 INFO mapreduce.Job: Task Id :
>> attempt_1397454060494_0003_r_000003_2, Status : FAILED
>> 14/04/14 11:58:26 INFO mapreduce.Job: Task Id :
>> attempt_1397454060494_0003_r_000005_2, Status : FAILED
>> 14/04/14 11:58:40 INFO mapreduce.Job: map 100% reduce 100%
>> 14/04/14 11:58:40 INFO mapreduce.Job: Job job_1397454060494_0003 failed
>> with state FAILED due to: Task failed task_1397454060494_0003_r_000003
>> Job failed as tasks failed. failedMaps:0 failedReduces:1
>>
>> 14/04/14 11:58:40 INFO mapreduce.Job: Counters: 51
>> File System Counters
>> FILE: Number of bytes read=80
>> FILE: Number of bytes written=596766
>> FILE: Number of read operations=0
>> FILE: Number of large read operations=0
>> FILE: Number of write operations=0
>> HDFS: Number of bytes read=175
>> HDFS: Number of bytes written=28
>> HDFS: Number of read operations=21
>> HDFS: Number of large read operations=0
>> HDFS: Number of write operations=12
>> Job Counters
>> Failed reduce tasks=9
>> Killed reduce tasks=1
>> Launched map tasks=1
>> Launched reduce tasks=16
>> Data-local map tasks=1
>> Total time spent by all maps in occupied slots (ms)=3477
>> Total time spent by all reduces in occupied slots (ms)=148867
>> Total time spent by all map tasks (ms)=3477
>> Total time spent by all reduce tasks (ms)=148867
>> Total vcore-seconds taken by all map tasks=3477
>> Total vcore-seconds taken by all reduce tasks=148867
>> Total megabyte-seconds taken by all map tasks=3560448
>> Total megabyte-seconds taken by all reduce tasks=152439808
>> Map-Reduce Framework
>> Map input records=3
>> Map output records=13
>> Map output bytes=110
>> Map output materialized bytes=112
>> Input split bytes=117
>> Combine input records=13
>> Combine output records=6
>> Reduce input groups=4
>> Reduce shuffle bytes=80
>> Reduce input records=4
>> Reduce output records=4
>> Spilled Records=10
>> Shuffled Maps =6
>> Failed Shuffles=0
>> Merged Map outputs=6
>> GC time elapsed (ms)=142
>> CPU time spent (ms)=6420
>> Physical memory (bytes) snapshot=1100853248
>> Virtual memory (bytes) snapshot=4468314112
>> Total committed heap usage (bytes)=1406992384
>> Shuffle Errors
>> BAD_ID=0
>> CONNECTION=0
>> IO_ERROR=0
>> WRONG_LENGTH=0
>> WRONG_MAP=0
>> WRONG_REDUCE=0
>> File Input Format Counters
>> Bytes Read=58
>> File Output Format Counters
>> Bytes Written=28
>>
>> Job Passing Logs:
>> hadoop jar ../share/hadoop/mapreduce/hadoop-mapreduce-examples-2.3.0.jar
>> wordcount /user/hduser/input /user/hduser/output_wordcount9
>> 14/04/14 11:47:27 WARN util.NativeCodeLoader: Unable to load
>> native-hadoop library for your platform... using builtin-java classes where
>> applicable
>> 14/04/14 11:47:28 INFO client.RMProxy: Connecting to ResourceManager at /
>> 20.0.1.206:8032
>> 14/04/14 11:47:28 INFO input.FileInputFormat: Total input paths to
>> process : 1
>> 14/04/14 11:47:29 INFO mapreduce.JobSubmitter: number of splits:1
>> 14/04/14 11:47:29 INFO mapreduce.JobSubmitter: Submitting tokens for job:
>> job_1397454060494_0002
>> 14/04/14 11:47:29 INFO impl.YarnClientImpl: Submitted application
>> application_1397454060494_0002
>> 14/04/14 11:47:29 INFO mapreduce.Job: The url to track the job:
>> http://20.0.1.206:8088/proxy/application_1397454060494_0002/
>> 14/04/14 11:47:29 INFO mapreduce.Job: Running job: job_1397454060494_0002
>> 14/04/14 11:47:36 INFO mapreduce.Job: Job job_1397454060494_0002 running
>> in uber mode : false
>> 14/04/14 11:47:36 INFO mapreduce.Job: map 0% reduce 0%
>> 14/04/14 11:47:50 INFO mapreduce.Job: Task Id :
>> attempt_1397454060494_0002_m_000000_0, Status : FAILED
>> 14/04/14 11:48:05 INFO mapreduce.Job: Task Id :
>> attempt_1397454060494_0002_m_000000_1, Status : FAILED
>> 14/04/14 11:48:20 INFO mapreduce.Job: Task Id :
>> attempt_1397454060494_0002_m_000000_2, Status : FAILED
>> 14/04/14 11:48:26 INFO mapreduce.Job: map 100% reduce 0%
>> 14/04/14 11:48:34 INFO mapreduce.Job: map 100% reduce 13%
>> 14/04/14 11:48:35 INFO mapreduce.Job: map 100% reduce 25%
>> 14/04/14 11:48:37 INFO mapreduce.Job: map 100% reduce 50%
>> 14/04/14 11:48:41 INFO mapreduce.Job: Task Id :
>> attempt_1397454060494_0002_r_000001_0, Status : FAILED
>> 14/04/14 11:48:42 INFO mapreduce.Job: Task Id :
>> attempt_1397454060494_0002_r_000003_0, Status : FAILED
>> 14/04/14 11:48:43 INFO mapreduce.Job: Task Id :
>> attempt_1397454060494_0002_r_000005_0, Status : FAILED
>> 14/04/14 11:48:44 INFO mapreduce.Job: Task Id :
>> attempt_1397454060494_0002_r_000007_0, Status : FAILED
>> 14/04/14 11:48:50 INFO mapreduce.Job: map 100% reduce 63%
>> 14/04/14 11:48:51 INFO mapreduce.Job: map 100% reduce 75%
>> 14/04/14 11:48:52 INFO mapreduce.Job: map 100% reduce 88%
>> 14/04/14 11:48:58 INFO mapreduce.Job: Task Id :
>> attempt_1397454060494_0002_r_000005_1, Status : FAILED
>> 14/04/14 11:49:05 INFO mapreduce.Job: map 100% reduce 100%
>> 14/04/14 11:49:06 INFO mapreduce.Job: Job job_1397454060494_0002
>> completed successfully
>> 14/04/14 11:49:06 INFO mapreduce.Job: Counters: 52
>> File System Counters
>> FILE: Number of bytes read=112
>> FILE: Number of bytes written=767175
>> FILE: Number of read operations=0
>> FILE: Number of large read operations=0
>> FILE: Number of write operations=0
>> HDFS: Number of bytes read=175
>> HDFS: Number of bytes written=40
>> HDFS: Number of read operations=27
>> HDFS: Number of large read operations=0
>> HDFS: Number of write operations=16
>> Job Counters
>> Failed map tasks=3
>> Failed reduce tasks=5
>> Launched map tasks=4
>> Launched reduce tasks=13
>> Other local map tasks=3
>> Data-local map tasks=1
>> Total time spent by all maps in occupied slots (ms)=41629
>> Total time spent by all reduces in occupied slots (ms)=104530
>> Total time spent by all map tasks (ms)=41629
>> Total time spent by all reduce tasks (ms)=104530
>> Total vcore-seconds taken by all map tasks=41629
>> Total vcore-seconds taken by all reduce tasks=104530
>> Total megabyte-seconds taken by all map tasks=42628096
>> Total megabyte-seconds taken by all reduce tasks=107038720
>> Map-Reduce Framework
>> Map input records=3
>> Map output records=13
>> Map output bytes=110
>> Map output materialized bytes=112
>> Input split bytes=117
>> Combine input records=13
>> Combine output records=6
>> Reduce input groups=6
>> Reduce shuffle bytes=112
>> Reduce input records=6
>> Reduce output records=6
>> Spilled Records=12
>> Shuffled Maps =8
>> Failed Shuffles=0
>> Merged Map outputs=8
>> GC time elapsed (ms)=186
>> CPU time spent (ms)=8890
>> Physical memory (bytes) snapshot=1408913408
>> Virtual memory (bytes) snapshot=5727019008
>> Total committed heap usage (bytes)=1808990208
>> Shuffle Errors
>> BAD_ID=0
>> CONNECTION=0
>> IO_ERROR=0
>> WRONG_LENGTH=0
>> WRONG_MAP=0
>> WRONG_REDUCE=0
>> File Input Format Counters
>> Bytes Read=58
>> File Output Format Counters
>> Bytes Written=40
>>
>> Thanks and Regards,
>> -Rahul Singh
>>
>
>
>
> --
> Regards
> Gordon Wang
>
Re: hadoop inconsistent behaviour
Posted by Rahul Singh <sm...@gmail.com>.
how do i identify an reduce container? there are multiple container dirs in
my application id folder in userlogs.
On Mon, Apr 14, 2014 at 12:29 PM, Gordon Wang <gw...@gopivotal.com> wrote:
> Hi Rahul,
>
> What is the log of reduce container ? Please paste the log and we can see
> the reason.
>
>
> On Mon, Apr 14, 2014 at 2:38 PM, Rahul Singh <sm...@gmail.com>wrote:
>
>> Hi,
>> I am running a job(wordcount example) on 3 node cluster(1 master and 2
>> slave), some times the job passes but some times it fails(as reduce fails,
>> input data few kbs).
>> I am not able to nail down the reason of this inconsistency.
>>
>>
>>
>> failed log:
>>
>> 14/04/14 11:57:24 WARN util.NativeCodeLoader: Unable to load
>> native-hadoop library for your platform... using builtin-java classes where
>> applicable
>> 14/04/14 11:57:25 INFO client.RMProxy: Connecting to ResourceManager at /
>> 20.0.1.206:8032
>> 14/04/14 11:57:26 INFO input.FileInputFormat: Total input paths to
>> process : 1
>> 14/04/14 11:57:26 INFO mapreduce.JobSubmitter: number of splits:1
>> 14/04/14 11:57:26 INFO mapreduce.JobSubmitter: Submitting tokens for job:
>> job_1397454060494_0003
>> 14/04/14 11:57:26 INFO impl.YarnClientImpl: Submitted application
>> application_1397454060494_0003
>> 14/04/14 11:57:26 INFO mapreduce.Job: The url to track the job:
>> http://20.0.1.206:8088/proxy/application_1397454060494_0003/
>> 14/04/14 11:57:26 INFO mapreduce.Job: Running job: job_1397454060494_0003
>> 14/04/14 11:57:34 INFO mapreduce.Job: Job job_1397454060494_0003 running
>> in uber mode : false
>> 14/04/14 11:57:34 INFO mapreduce.Job: map 0% reduce 0%
>> 14/04/14 11:57:40 INFO mapreduce.Job: map 100% reduce 0%
>> 14/04/14 11:57:46 INFO mapreduce.Job: map 100% reduce 13%
>> 14/04/14 11:57:48 INFO mapreduce.Job: map 100% reduce 25%
>> 14/04/14 11:57:49 INFO mapreduce.Job: map 100% reduce 38%
>> 14/04/14 11:57:50 INFO mapreduce.Job: map 100% reduce 50%
>> 14/04/14 11:57:54 INFO mapreduce.Job: Task Id :
>> attempt_1397454060494_0003_r_000003_0, Status : FAILED
>> 14/04/14 11:57:54 INFO mapreduce.Job: Task Id :
>> attempt_1397454060494_0003_r_000001_0, Status : FAILED
>> 14/04/14 11:57:56 INFO mapreduce.Job: Task Id :
>> attempt_1397454060494_0003_r_000005_0, Status : FAILED
>> 14/04/14 11:57:56 INFO mapreduce.Job: Task Id :
>> attempt_1397454060494_0003_r_000007_0, Status : FAILED
>> 14/04/14 11:58:02 INFO mapreduce.Job: map 100% reduce 63%
>> 14/04/14 11:58:04 INFO mapreduce.Job: map 100% reduce 75%
>> 14/04/14 11:58:09 INFO mapreduce.Job: Task Id :
>> attempt_1397454060494_0003_r_000003_1, Status : FAILED
>> 14/04/14 11:58:11 INFO mapreduce.Job: Task Id :
>> attempt_1397454060494_0003_r_000005_1, Status : FAILED
>> 14/04/14 11:58:24 INFO mapreduce.Job: Task Id :
>> attempt_1397454060494_0003_r_000003_2, Status : FAILED
>> 14/04/14 11:58:26 INFO mapreduce.Job: Task Id :
>> attempt_1397454060494_0003_r_000005_2, Status : FAILED
>> 14/04/14 11:58:40 INFO mapreduce.Job: map 100% reduce 100%
>> 14/04/14 11:58:40 INFO mapreduce.Job: Job job_1397454060494_0003 failed
>> with state FAILED due to: Task failed task_1397454060494_0003_r_000003
>> Job failed as tasks failed. failedMaps:0 failedReduces:1
>>
>> 14/04/14 11:58:40 INFO mapreduce.Job: Counters: 51
>> File System Counters
>> FILE: Number of bytes read=80
>> FILE: Number of bytes written=596766
>> FILE: Number of read operations=0
>> FILE: Number of large read operations=0
>> FILE: Number of write operations=0
>> HDFS: Number of bytes read=175
>> HDFS: Number of bytes written=28
>> HDFS: Number of read operations=21
>> HDFS: Number of large read operations=0
>> HDFS: Number of write operations=12
>> Job Counters
>> Failed reduce tasks=9
>> Killed reduce tasks=1
>> Launched map tasks=1
>> Launched reduce tasks=16
>> Data-local map tasks=1
>> Total time spent by all maps in occupied slots (ms)=3477
>> Total time spent by all reduces in occupied slots (ms)=148867
>> Total time spent by all map tasks (ms)=3477
>> Total time spent by all reduce tasks (ms)=148867
>> Total vcore-seconds taken by all map tasks=3477
>> Total vcore-seconds taken by all reduce tasks=148867
>> Total megabyte-seconds taken by all map tasks=3560448
>> Total megabyte-seconds taken by all reduce tasks=152439808
>> Map-Reduce Framework
>> Map input records=3
>> Map output records=13
>> Map output bytes=110
>> Map output materialized bytes=112
>> Input split bytes=117
>> Combine input records=13
>> Combine output records=6
>> Reduce input groups=4
>> Reduce shuffle bytes=80
>> Reduce input records=4
>> Reduce output records=4
>> Spilled Records=10
>> Shuffled Maps =6
>> Failed Shuffles=0
>> Merged Map outputs=6
>> GC time elapsed (ms)=142
>> CPU time spent (ms)=6420
>> Physical memory (bytes) snapshot=1100853248
>> Virtual memory (bytes) snapshot=4468314112
>> Total committed heap usage (bytes)=1406992384
>> Shuffle Errors
>> BAD_ID=0
>> CONNECTION=0
>> IO_ERROR=0
>> WRONG_LENGTH=0
>> WRONG_MAP=0
>> WRONG_REDUCE=0
>> File Input Format Counters
>> Bytes Read=58
>> File Output Format Counters
>> Bytes Written=28
>>
>> Job Passing Logs:
>> hadoop jar ../share/hadoop/mapreduce/hadoop-mapreduce-examples-2.3.0.jar
>> wordcount /user/hduser/input /user/hduser/output_wordcount9
>> 14/04/14 11:47:27 WARN util.NativeCodeLoader: Unable to load
>> native-hadoop library for your platform... using builtin-java classes where
>> applicable
>> 14/04/14 11:47:28 INFO client.RMProxy: Connecting to ResourceManager at /
>> 20.0.1.206:8032
>> 14/04/14 11:47:28 INFO input.FileInputFormat: Total input paths to
>> process : 1
>> 14/04/14 11:47:29 INFO mapreduce.JobSubmitter: number of splits:1
>> 14/04/14 11:47:29 INFO mapreduce.JobSubmitter: Submitting tokens for job:
>> job_1397454060494_0002
>> 14/04/14 11:47:29 INFO impl.YarnClientImpl: Submitted application
>> application_1397454060494_0002
>> 14/04/14 11:47:29 INFO mapreduce.Job: The url to track the job:
>> http://20.0.1.206:8088/proxy/application_1397454060494_0002/
>> 14/04/14 11:47:29 INFO mapreduce.Job: Running job: job_1397454060494_0002
>> 14/04/14 11:47:36 INFO mapreduce.Job: Job job_1397454060494_0002 running
>> in uber mode : false
>> 14/04/14 11:47:36 INFO mapreduce.Job: map 0% reduce 0%
>> 14/04/14 11:47:50 INFO mapreduce.Job: Task Id :
>> attempt_1397454060494_0002_m_000000_0, Status : FAILED
>> 14/04/14 11:48:05 INFO mapreduce.Job: Task Id :
>> attempt_1397454060494_0002_m_000000_1, Status : FAILED
>> 14/04/14 11:48:20 INFO mapreduce.Job: Task Id :
>> attempt_1397454060494_0002_m_000000_2, Status : FAILED
>> 14/04/14 11:48:26 INFO mapreduce.Job: map 100% reduce 0%
>> 14/04/14 11:48:34 INFO mapreduce.Job: map 100% reduce 13%
>> 14/04/14 11:48:35 INFO mapreduce.Job: map 100% reduce 25%
>> 14/04/14 11:48:37 INFO mapreduce.Job: map 100% reduce 50%
>> 14/04/14 11:48:41 INFO mapreduce.Job: Task Id :
>> attempt_1397454060494_0002_r_000001_0, Status : FAILED
>> 14/04/14 11:48:42 INFO mapreduce.Job: Task Id :
>> attempt_1397454060494_0002_r_000003_0, Status : FAILED
>> 14/04/14 11:48:43 INFO mapreduce.Job: Task Id :
>> attempt_1397454060494_0002_r_000005_0, Status : FAILED
>> 14/04/14 11:48:44 INFO mapreduce.Job: Task Id :
>> attempt_1397454060494_0002_r_000007_0, Status : FAILED
>> 14/04/14 11:48:50 INFO mapreduce.Job: map 100% reduce 63%
>> 14/04/14 11:48:51 INFO mapreduce.Job: map 100% reduce 75%
>> 14/04/14 11:48:52 INFO mapreduce.Job: map 100% reduce 88%
>> 14/04/14 11:48:58 INFO mapreduce.Job: Task Id :
>> attempt_1397454060494_0002_r_000005_1, Status : FAILED
>> 14/04/14 11:49:05 INFO mapreduce.Job: map 100% reduce 100%
>> 14/04/14 11:49:06 INFO mapreduce.Job: Job job_1397454060494_0002
>> completed successfully
>> 14/04/14 11:49:06 INFO mapreduce.Job: Counters: 52
>> File System Counters
>> FILE: Number of bytes read=112
>> FILE: Number of bytes written=767175
>> FILE: Number of read operations=0
>> FILE: Number of large read operations=0
>> FILE: Number of write operations=0
>> HDFS: Number of bytes read=175
>> HDFS: Number of bytes written=40
>> HDFS: Number of read operations=27
>> HDFS: Number of large read operations=0
>> HDFS: Number of write operations=16
>> Job Counters
>> Failed map tasks=3
>> Failed reduce tasks=5
>> Launched map tasks=4
>> Launched reduce tasks=13
>> Other local map tasks=3
>> Data-local map tasks=1
>> Total time spent by all maps in occupied slots (ms)=41629
>> Total time spent by all reduces in occupied slots (ms)=104530
>> Total time spent by all map tasks (ms)=41629
>> Total time spent by all reduce tasks (ms)=104530
>> Total vcore-seconds taken by all map tasks=41629
>> Total vcore-seconds taken by all reduce tasks=104530
>> Total megabyte-seconds taken by all map tasks=42628096
>> Total megabyte-seconds taken by all reduce tasks=107038720
>> Map-Reduce Framework
>> Map input records=3
>> Map output records=13
>> Map output bytes=110
>> Map output materialized bytes=112
>> Input split bytes=117
>> Combine input records=13
>> Combine output records=6
>> Reduce input groups=6
>> Reduce shuffle bytes=112
>> Reduce input records=6
>> Reduce output records=6
>> Spilled Records=12
>> Shuffled Maps =8
>> Failed Shuffles=0
>> Merged Map outputs=8
>> GC time elapsed (ms)=186
>> CPU time spent (ms)=8890
>> Physical memory (bytes) snapshot=1408913408
>> Virtual memory (bytes) snapshot=5727019008
>> Total committed heap usage (bytes)=1808990208
>> Shuffle Errors
>> BAD_ID=0
>> CONNECTION=0
>> IO_ERROR=0
>> WRONG_LENGTH=0
>> WRONG_MAP=0
>> WRONG_REDUCE=0
>> File Input Format Counters
>> Bytes Read=58
>> File Output Format Counters
>> Bytes Written=40
>>
>> Thanks and Regards,
>> -Rahul Singh
>>
>
>
>
> --
> Regards
> Gordon Wang
>
Re: hadoop inconsistent behaviour
Posted by Rahul Singh <sm...@gmail.com>.
how do i identify an reduce container? there are multiple container dirs in
my application id folder in userlogs.
On Mon, Apr 14, 2014 at 12:29 PM, Gordon Wang <gw...@gopivotal.com> wrote:
> Hi Rahul,
>
> What is the log of reduce container ? Please paste the log and we can see
> the reason.
>
>
> On Mon, Apr 14, 2014 at 2:38 PM, Rahul Singh <sm...@gmail.com>wrote:
>
>> Hi,
>> I am running a job(wordcount example) on 3 node cluster(1 master and 2
>> slave), some times the job passes but some times it fails(as reduce fails,
>> input data few kbs).
>> I am not able to nail down the reason of this inconsistency.
>>
>>
>>
>> failed log:
>>
>> 14/04/14 11:57:24 WARN util.NativeCodeLoader: Unable to load
>> native-hadoop library for your platform... using builtin-java classes where
>> applicable
>> 14/04/14 11:57:25 INFO client.RMProxy: Connecting to ResourceManager at /
>> 20.0.1.206:8032
>> 14/04/14 11:57:26 INFO input.FileInputFormat: Total input paths to
>> process : 1
>> 14/04/14 11:57:26 INFO mapreduce.JobSubmitter: number of splits:1
>> 14/04/14 11:57:26 INFO mapreduce.JobSubmitter: Submitting tokens for job:
>> job_1397454060494_0003
>> 14/04/14 11:57:26 INFO impl.YarnClientImpl: Submitted application
>> application_1397454060494_0003
>> 14/04/14 11:57:26 INFO mapreduce.Job: The url to track the job:
>> http://20.0.1.206:8088/proxy/application_1397454060494_0003/
>> 14/04/14 11:57:26 INFO mapreduce.Job: Running job: job_1397454060494_0003
>> 14/04/14 11:57:34 INFO mapreduce.Job: Job job_1397454060494_0003 running
>> in uber mode : false
>> 14/04/14 11:57:34 INFO mapreduce.Job: map 0% reduce 0%
>> 14/04/14 11:57:40 INFO mapreduce.Job: map 100% reduce 0%
>> 14/04/14 11:57:46 INFO mapreduce.Job: map 100% reduce 13%
>> 14/04/14 11:57:48 INFO mapreduce.Job: map 100% reduce 25%
>> 14/04/14 11:57:49 INFO mapreduce.Job: map 100% reduce 38%
>> 14/04/14 11:57:50 INFO mapreduce.Job: map 100% reduce 50%
>> 14/04/14 11:57:54 INFO mapreduce.Job: Task Id :
>> attempt_1397454060494_0003_r_000003_0, Status : FAILED
>> 14/04/14 11:57:54 INFO mapreduce.Job: Task Id :
>> attempt_1397454060494_0003_r_000001_0, Status : FAILED
>> 14/04/14 11:57:56 INFO mapreduce.Job: Task Id :
>> attempt_1397454060494_0003_r_000005_0, Status : FAILED
>> 14/04/14 11:57:56 INFO mapreduce.Job: Task Id :
>> attempt_1397454060494_0003_r_000007_0, Status : FAILED
>> 14/04/14 11:58:02 INFO mapreduce.Job: map 100% reduce 63%
>> 14/04/14 11:58:04 INFO mapreduce.Job: map 100% reduce 75%
>> 14/04/14 11:58:09 INFO mapreduce.Job: Task Id :
>> attempt_1397454060494_0003_r_000003_1, Status : FAILED
>> 14/04/14 11:58:11 INFO mapreduce.Job: Task Id :
>> attempt_1397454060494_0003_r_000005_1, Status : FAILED
>> 14/04/14 11:58:24 INFO mapreduce.Job: Task Id :
>> attempt_1397454060494_0003_r_000003_2, Status : FAILED
>> 14/04/14 11:58:26 INFO mapreduce.Job: Task Id :
>> attempt_1397454060494_0003_r_000005_2, Status : FAILED
>> 14/04/14 11:58:40 INFO mapreduce.Job: map 100% reduce 100%
>> 14/04/14 11:58:40 INFO mapreduce.Job: Job job_1397454060494_0003 failed
>> with state FAILED due to: Task failed task_1397454060494_0003_r_000003
>> Job failed as tasks failed. failedMaps:0 failedReduces:1
>>
>> 14/04/14 11:58:40 INFO mapreduce.Job: Counters: 51
>> File System Counters
>> FILE: Number of bytes read=80
>> FILE: Number of bytes written=596766
>> FILE: Number of read operations=0
>> FILE: Number of large read operations=0
>> FILE: Number of write operations=0
>> HDFS: Number of bytes read=175
>> HDFS: Number of bytes written=28
>> HDFS: Number of read operations=21
>> HDFS: Number of large read operations=0
>> HDFS: Number of write operations=12
>> Job Counters
>> Failed reduce tasks=9
>> Killed reduce tasks=1
>> Launched map tasks=1
>> Launched reduce tasks=16
>> Data-local map tasks=1
>> Total time spent by all maps in occupied slots (ms)=3477
>> Total time spent by all reduces in occupied slots (ms)=148867
>> Total time spent by all map tasks (ms)=3477
>> Total time spent by all reduce tasks (ms)=148867
>> Total vcore-seconds taken by all map tasks=3477
>> Total vcore-seconds taken by all reduce tasks=148867
>> Total megabyte-seconds taken by all map tasks=3560448
>> Total megabyte-seconds taken by all reduce tasks=152439808
>> Map-Reduce Framework
>> Map input records=3
>> Map output records=13
>> Map output bytes=110
>> Map output materialized bytes=112
>> Input split bytes=117
>> Combine input records=13
>> Combine output records=6
>> Reduce input groups=4
>> Reduce shuffle bytes=80
>> Reduce input records=4
>> Reduce output records=4
>> Spilled Records=10
>> Shuffled Maps =6
>> Failed Shuffles=0
>> Merged Map outputs=6
>> GC time elapsed (ms)=142
>> CPU time spent (ms)=6420
>> Physical memory (bytes) snapshot=1100853248
>> Virtual memory (bytes) snapshot=4468314112
>> Total committed heap usage (bytes)=1406992384
>> Shuffle Errors
>> BAD_ID=0
>> CONNECTION=0
>> IO_ERROR=0
>> WRONG_LENGTH=0
>> WRONG_MAP=0
>> WRONG_REDUCE=0
>> File Input Format Counters
>> Bytes Read=58
>> File Output Format Counters
>> Bytes Written=28
>>
>> Job Passing Logs:
>> hadoop jar ../share/hadoop/mapreduce/hadoop-mapreduce-examples-2.3.0.jar
>> wordcount /user/hduser/input /user/hduser/output_wordcount9
>> 14/04/14 11:47:27 WARN util.NativeCodeLoader: Unable to load
>> native-hadoop library for your platform... using builtin-java classes where
>> applicable
>> 14/04/14 11:47:28 INFO client.RMProxy: Connecting to ResourceManager at /
>> 20.0.1.206:8032
>> 14/04/14 11:47:28 INFO input.FileInputFormat: Total input paths to
>> process : 1
>> 14/04/14 11:47:29 INFO mapreduce.JobSubmitter: number of splits:1
>> 14/04/14 11:47:29 INFO mapreduce.JobSubmitter: Submitting tokens for job:
>> job_1397454060494_0002
>> 14/04/14 11:47:29 INFO impl.YarnClientImpl: Submitted application
>> application_1397454060494_0002
>> 14/04/14 11:47:29 INFO mapreduce.Job: The url to track the job:
>> http://20.0.1.206:8088/proxy/application_1397454060494_0002/
>> 14/04/14 11:47:29 INFO mapreduce.Job: Running job: job_1397454060494_0002
>> 14/04/14 11:47:36 INFO mapreduce.Job: Job job_1397454060494_0002 running
>> in uber mode : false
>> 14/04/14 11:47:36 INFO mapreduce.Job: map 0% reduce 0%
>> 14/04/14 11:47:50 INFO mapreduce.Job: Task Id :
>> attempt_1397454060494_0002_m_000000_0, Status : FAILED
>> 14/04/14 11:48:05 INFO mapreduce.Job: Task Id :
>> attempt_1397454060494_0002_m_000000_1, Status : FAILED
>> 14/04/14 11:48:20 INFO mapreduce.Job: Task Id :
>> attempt_1397454060494_0002_m_000000_2, Status : FAILED
>> 14/04/14 11:48:26 INFO mapreduce.Job: map 100% reduce 0%
>> 14/04/14 11:48:34 INFO mapreduce.Job: map 100% reduce 13%
>> 14/04/14 11:48:35 INFO mapreduce.Job: map 100% reduce 25%
>> 14/04/14 11:48:37 INFO mapreduce.Job: map 100% reduce 50%
>> 14/04/14 11:48:41 INFO mapreduce.Job: Task Id :
>> attempt_1397454060494_0002_r_000001_0, Status : FAILED
>> 14/04/14 11:48:42 INFO mapreduce.Job: Task Id :
>> attempt_1397454060494_0002_r_000003_0, Status : FAILED
>> 14/04/14 11:48:43 INFO mapreduce.Job: Task Id :
>> attempt_1397454060494_0002_r_000005_0, Status : FAILED
>> 14/04/14 11:48:44 INFO mapreduce.Job: Task Id :
>> attempt_1397454060494_0002_r_000007_0, Status : FAILED
>> 14/04/14 11:48:50 INFO mapreduce.Job: map 100% reduce 63%
>> 14/04/14 11:48:51 INFO mapreduce.Job: map 100% reduce 75%
>> 14/04/14 11:48:52 INFO mapreduce.Job: map 100% reduce 88%
>> 14/04/14 11:48:58 INFO mapreduce.Job: Task Id :
>> attempt_1397454060494_0002_r_000005_1, Status : FAILED
>> 14/04/14 11:49:05 INFO mapreduce.Job: map 100% reduce 100%
>> 14/04/14 11:49:06 INFO mapreduce.Job: Job job_1397454060494_0002
>> completed successfully
>> 14/04/14 11:49:06 INFO mapreduce.Job: Counters: 52
>> File System Counters
>> FILE: Number of bytes read=112
>> FILE: Number of bytes written=767175
>> FILE: Number of read operations=0
>> FILE: Number of large read operations=0
>> FILE: Number of write operations=0
>> HDFS: Number of bytes read=175
>> HDFS: Number of bytes written=40
>> HDFS: Number of read operations=27
>> HDFS: Number of large read operations=0
>> HDFS: Number of write operations=16
>> Job Counters
>> Failed map tasks=3
>> Failed reduce tasks=5
>> Launched map tasks=4
>> Launched reduce tasks=13
>> Other local map tasks=3
>> Data-local map tasks=1
>> Total time spent by all maps in occupied slots (ms)=41629
>> Total time spent by all reduces in occupied slots (ms)=104530
>> Total time spent by all map tasks (ms)=41629
>> Total time spent by all reduce tasks (ms)=104530
>> Total vcore-seconds taken by all map tasks=41629
>> Total vcore-seconds taken by all reduce tasks=104530
>> Total megabyte-seconds taken by all map tasks=42628096
>> Total megabyte-seconds taken by all reduce tasks=107038720
>> Map-Reduce Framework
>> Map input records=3
>> Map output records=13
>> Map output bytes=110
>> Map output materialized bytes=112
>> Input split bytes=117
>> Combine input records=13
>> Combine output records=6
>> Reduce input groups=6
>> Reduce shuffle bytes=112
>> Reduce input records=6
>> Reduce output records=6
>> Spilled Records=12
>> Shuffled Maps =8
>> Failed Shuffles=0
>> Merged Map outputs=8
>> GC time elapsed (ms)=186
>> CPU time spent (ms)=8890
>> Physical memory (bytes) snapshot=1408913408
>> Virtual memory (bytes) snapshot=5727019008
>> Total committed heap usage (bytes)=1808990208
>> Shuffle Errors
>> BAD_ID=0
>> CONNECTION=0
>> IO_ERROR=0
>> WRONG_LENGTH=0
>> WRONG_MAP=0
>> WRONG_REDUCE=0
>> File Input Format Counters
>> Bytes Read=58
>> File Output Format Counters
>> Bytes Written=40
>>
>> Thanks and Regards,
>> -Rahul Singh
>>
>
>
>
> --
> Regards
> Gordon Wang
>
Re: hadoop inconsistent behaviour
Posted by Gordon Wang <gw...@gopivotal.com>.
Hi Rahul,
What is the log of reduce container ? Please paste the log and we can see
the reason.
On Mon, Apr 14, 2014 at 2:38 PM, Rahul Singh <sm...@gmail.com>wrote:
> Hi,
> I am running a job(wordcount example) on 3 node cluster(1 master and 2
> slave), some times the job passes but some times it fails(as reduce fails,
> input data few kbs).
> I am not able to nail down the reason of this inconsistency.
>
>
>
> failed log:
>
> 14/04/14 11:57:24 WARN util.NativeCodeLoader: Unable to load native-hadoop
> library for your platform... using builtin-java classes where applicable
> 14/04/14 11:57:25 INFO client.RMProxy: Connecting to ResourceManager at /
> 20.0.1.206:8032
> 14/04/14 11:57:26 INFO input.FileInputFormat: Total input paths to process
> : 1
> 14/04/14 11:57:26 INFO mapreduce.JobSubmitter: number of splits:1
> 14/04/14 11:57:26 INFO mapreduce.JobSubmitter: Submitting tokens for job:
> job_1397454060494_0003
> 14/04/14 11:57:26 INFO impl.YarnClientImpl: Submitted application
> application_1397454060494_0003
> 14/04/14 11:57:26 INFO mapreduce.Job: The url to track the job:
> http://20.0.1.206:8088/proxy/application_1397454060494_0003/
> 14/04/14 11:57:26 INFO mapreduce.Job: Running job: job_1397454060494_0003
> 14/04/14 11:57:34 INFO mapreduce.Job: Job job_1397454060494_0003 running
> in uber mode : false
> 14/04/14 11:57:34 INFO mapreduce.Job: map 0% reduce 0%
> 14/04/14 11:57:40 INFO mapreduce.Job: map 100% reduce 0%
> 14/04/14 11:57:46 INFO mapreduce.Job: map 100% reduce 13%
> 14/04/14 11:57:48 INFO mapreduce.Job: map 100% reduce 25%
> 14/04/14 11:57:49 INFO mapreduce.Job: map 100% reduce 38%
> 14/04/14 11:57:50 INFO mapreduce.Job: map 100% reduce 50%
> 14/04/14 11:57:54 INFO mapreduce.Job: Task Id :
> attempt_1397454060494_0003_r_000003_0, Status : FAILED
> 14/04/14 11:57:54 INFO mapreduce.Job: Task Id :
> attempt_1397454060494_0003_r_000001_0, Status : FAILED
> 14/04/14 11:57:56 INFO mapreduce.Job: Task Id :
> attempt_1397454060494_0003_r_000005_0, Status : FAILED
> 14/04/14 11:57:56 INFO mapreduce.Job: Task Id :
> attempt_1397454060494_0003_r_000007_0, Status : FAILED
> 14/04/14 11:58:02 INFO mapreduce.Job: map 100% reduce 63%
> 14/04/14 11:58:04 INFO mapreduce.Job: map 100% reduce 75%
> 14/04/14 11:58:09 INFO mapreduce.Job: Task Id :
> attempt_1397454060494_0003_r_000003_1, Status : FAILED
> 14/04/14 11:58:11 INFO mapreduce.Job: Task Id :
> attempt_1397454060494_0003_r_000005_1, Status : FAILED
> 14/04/14 11:58:24 INFO mapreduce.Job: Task Id :
> attempt_1397454060494_0003_r_000003_2, Status : FAILED
> 14/04/14 11:58:26 INFO mapreduce.Job: Task Id :
> attempt_1397454060494_0003_r_000005_2, Status : FAILED
> 14/04/14 11:58:40 INFO mapreduce.Job: map 100% reduce 100%
> 14/04/14 11:58:40 INFO mapreduce.Job: Job job_1397454060494_0003 failed
> with state FAILED due to: Task failed task_1397454060494_0003_r_000003
> Job failed as tasks failed. failedMaps:0 failedReduces:1
>
> 14/04/14 11:58:40 INFO mapreduce.Job: Counters: 51
> File System Counters
> FILE: Number of bytes read=80
> FILE: Number of bytes written=596766
> FILE: Number of read operations=0
> FILE: Number of large read operations=0
> FILE: Number of write operations=0
> HDFS: Number of bytes read=175
> HDFS: Number of bytes written=28
> HDFS: Number of read operations=21
> HDFS: Number of large read operations=0
> HDFS: Number of write operations=12
> Job Counters
> Failed reduce tasks=9
> Killed reduce tasks=1
> Launched map tasks=1
> Launched reduce tasks=16
> Data-local map tasks=1
> Total time spent by all maps in occupied slots (ms)=3477
> Total time spent by all reduces in occupied slots (ms)=148867
> Total time spent by all map tasks (ms)=3477
> Total time spent by all reduce tasks (ms)=148867
> Total vcore-seconds taken by all map tasks=3477
> Total vcore-seconds taken by all reduce tasks=148867
> Total megabyte-seconds taken by all map tasks=3560448
> Total megabyte-seconds taken by all reduce tasks=152439808
> Map-Reduce Framework
> Map input records=3
> Map output records=13
> Map output bytes=110
> Map output materialized bytes=112
> Input split bytes=117
> Combine input records=13
> Combine output records=6
> Reduce input groups=4
> Reduce shuffle bytes=80
> Reduce input records=4
> Reduce output records=4
> Spilled Records=10
> Shuffled Maps =6
> Failed Shuffles=0
> Merged Map outputs=6
> GC time elapsed (ms)=142
> CPU time spent (ms)=6420
> Physical memory (bytes) snapshot=1100853248
> Virtual memory (bytes) snapshot=4468314112
> Total committed heap usage (bytes)=1406992384
> Shuffle Errors
> BAD_ID=0
> CONNECTION=0
> IO_ERROR=0
> WRONG_LENGTH=0
> WRONG_MAP=0
> WRONG_REDUCE=0
> File Input Format Counters
> Bytes Read=58
> File Output Format Counters
> Bytes Written=28
>
> Job Passing Logs:
> hadoop jar ../share/hadoop/mapreduce/hadoop-mapreduce-examples-2.3.0.jar
> wordcount /user/hduser/input /user/hduser/output_wordcount9
> 14/04/14 11:47:27 WARN util.NativeCodeLoader: Unable to load native-hadoop
> library for your platform... using builtin-java classes where applicable
> 14/04/14 11:47:28 INFO client.RMProxy: Connecting to ResourceManager at /
> 20.0.1.206:8032
> 14/04/14 11:47:28 INFO input.FileInputFormat: Total input paths to process
> : 1
> 14/04/14 11:47:29 INFO mapreduce.JobSubmitter: number of splits:1
> 14/04/14 11:47:29 INFO mapreduce.JobSubmitter: Submitting tokens for job:
> job_1397454060494_0002
> 14/04/14 11:47:29 INFO impl.YarnClientImpl: Submitted application
> application_1397454060494_0002
> 14/04/14 11:47:29 INFO mapreduce.Job: The url to track the job:
> http://20.0.1.206:8088/proxy/application_1397454060494_0002/
> 14/04/14 11:47:29 INFO mapreduce.Job: Running job: job_1397454060494_0002
> 14/04/14 11:47:36 INFO mapreduce.Job: Job job_1397454060494_0002 running
> in uber mode : false
> 14/04/14 11:47:36 INFO mapreduce.Job: map 0% reduce 0%
> 14/04/14 11:47:50 INFO mapreduce.Job: Task Id :
> attempt_1397454060494_0002_m_000000_0, Status : FAILED
> 14/04/14 11:48:05 INFO mapreduce.Job: Task Id :
> attempt_1397454060494_0002_m_000000_1, Status : FAILED
> 14/04/14 11:48:20 INFO mapreduce.Job: Task Id :
> attempt_1397454060494_0002_m_000000_2, Status : FAILED
> 14/04/14 11:48:26 INFO mapreduce.Job: map 100% reduce 0%
> 14/04/14 11:48:34 INFO mapreduce.Job: map 100% reduce 13%
> 14/04/14 11:48:35 INFO mapreduce.Job: map 100% reduce 25%
> 14/04/14 11:48:37 INFO mapreduce.Job: map 100% reduce 50%
> 14/04/14 11:48:41 INFO mapreduce.Job: Task Id :
> attempt_1397454060494_0002_r_000001_0, Status : FAILED
> 14/04/14 11:48:42 INFO mapreduce.Job: Task Id :
> attempt_1397454060494_0002_r_000003_0, Status : FAILED
> 14/04/14 11:48:43 INFO mapreduce.Job: Task Id :
> attempt_1397454060494_0002_r_000005_0, Status : FAILED
> 14/04/14 11:48:44 INFO mapreduce.Job: Task Id :
> attempt_1397454060494_0002_r_000007_0, Status : FAILED
> 14/04/14 11:48:50 INFO mapreduce.Job: map 100% reduce 63%
> 14/04/14 11:48:51 INFO mapreduce.Job: map 100% reduce 75%
> 14/04/14 11:48:52 INFO mapreduce.Job: map 100% reduce 88%
> 14/04/14 11:48:58 INFO mapreduce.Job: Task Id :
> attempt_1397454060494_0002_r_000005_1, Status : FAILED
> 14/04/14 11:49:05 INFO mapreduce.Job: map 100% reduce 100%
> 14/04/14 11:49:06 INFO mapreduce.Job: Job job_1397454060494_0002 completed
> successfully
> 14/04/14 11:49:06 INFO mapreduce.Job: Counters: 52
> File System Counters
> FILE: Number of bytes read=112
> FILE: Number of bytes written=767175
> FILE: Number of read operations=0
> FILE: Number of large read operations=0
> FILE: Number of write operations=0
> HDFS: Number of bytes read=175
> HDFS: Number of bytes written=40
> HDFS: Number of read operations=27
> HDFS: Number of large read operations=0
> HDFS: Number of write operations=16
> Job Counters
> Failed map tasks=3
> Failed reduce tasks=5
> Launched map tasks=4
> Launched reduce tasks=13
> Other local map tasks=3
> Data-local map tasks=1
> Total time spent by all maps in occupied slots (ms)=41629
> Total time spent by all reduces in occupied slots (ms)=104530
> Total time spent by all map tasks (ms)=41629
> Total time spent by all reduce tasks (ms)=104530
> Total vcore-seconds taken by all map tasks=41629
> Total vcore-seconds taken by all reduce tasks=104530
> Total megabyte-seconds taken by all map tasks=42628096
> Total megabyte-seconds taken by all reduce tasks=107038720
> Map-Reduce Framework
> Map input records=3
> Map output records=13
> Map output bytes=110
> Map output materialized bytes=112
> Input split bytes=117
> Combine input records=13
> Combine output records=6
> Reduce input groups=6
> Reduce shuffle bytes=112
> Reduce input records=6
> Reduce output records=6
> Spilled Records=12
> Shuffled Maps =8
> Failed Shuffles=0
> Merged Map outputs=8
> GC time elapsed (ms)=186
> CPU time spent (ms)=8890
> Physical memory (bytes) snapshot=1408913408
> Virtual memory (bytes) snapshot=5727019008
> Total committed heap usage (bytes)=1808990208
> Shuffle Errors
> BAD_ID=0
> CONNECTION=0
> IO_ERROR=0
> WRONG_LENGTH=0
> WRONG_MAP=0
> WRONG_REDUCE=0
> File Input Format Counters
> Bytes Read=58
> File Output Format Counters
> Bytes Written=40
>
> Thanks and Regards,
> -Rahul Singh
>
--
Regards
Gordon Wang
Re: hadoop inconsistent behaviour
Posted by Gordon Wang <gw...@gopivotal.com>.
Hi Rahul,
What is the log of reduce container ? Please paste the log and we can see
the reason.
On Mon, Apr 14, 2014 at 2:38 PM, Rahul Singh <sm...@gmail.com>wrote:
> Hi,
> I am running a job(wordcount example) on 3 node cluster(1 master and 2
> slave), some times the job passes but some times it fails(as reduce fails,
> input data few kbs).
> I am not able to nail down the reason of this inconsistency.
>
>
>
> failed log:
>
> 14/04/14 11:57:24 WARN util.NativeCodeLoader: Unable to load native-hadoop
> library for your platform... using builtin-java classes where applicable
> 14/04/14 11:57:25 INFO client.RMProxy: Connecting to ResourceManager at /
> 20.0.1.206:8032
> 14/04/14 11:57:26 INFO input.FileInputFormat: Total input paths to process
> : 1
> 14/04/14 11:57:26 INFO mapreduce.JobSubmitter: number of splits:1
> 14/04/14 11:57:26 INFO mapreduce.JobSubmitter: Submitting tokens for job:
> job_1397454060494_0003
> 14/04/14 11:57:26 INFO impl.YarnClientImpl: Submitted application
> application_1397454060494_0003
> 14/04/14 11:57:26 INFO mapreduce.Job: The url to track the job:
> http://20.0.1.206:8088/proxy/application_1397454060494_0003/
> 14/04/14 11:57:26 INFO mapreduce.Job: Running job: job_1397454060494_0003
> 14/04/14 11:57:34 INFO mapreduce.Job: Job job_1397454060494_0003 running
> in uber mode : false
> 14/04/14 11:57:34 INFO mapreduce.Job: map 0% reduce 0%
> 14/04/14 11:57:40 INFO mapreduce.Job: map 100% reduce 0%
> 14/04/14 11:57:46 INFO mapreduce.Job: map 100% reduce 13%
> 14/04/14 11:57:48 INFO mapreduce.Job: map 100% reduce 25%
> 14/04/14 11:57:49 INFO mapreduce.Job: map 100% reduce 38%
> 14/04/14 11:57:50 INFO mapreduce.Job: map 100% reduce 50%
> 14/04/14 11:57:54 INFO mapreduce.Job: Task Id :
> attempt_1397454060494_0003_r_000003_0, Status : FAILED
> 14/04/14 11:57:54 INFO mapreduce.Job: Task Id :
> attempt_1397454060494_0003_r_000001_0, Status : FAILED
> 14/04/14 11:57:56 INFO mapreduce.Job: Task Id :
> attempt_1397454060494_0003_r_000005_0, Status : FAILED
> 14/04/14 11:57:56 INFO mapreduce.Job: Task Id :
> attempt_1397454060494_0003_r_000007_0, Status : FAILED
> 14/04/14 11:58:02 INFO mapreduce.Job: map 100% reduce 63%
> 14/04/14 11:58:04 INFO mapreduce.Job: map 100% reduce 75%
> 14/04/14 11:58:09 INFO mapreduce.Job: Task Id :
> attempt_1397454060494_0003_r_000003_1, Status : FAILED
> 14/04/14 11:58:11 INFO mapreduce.Job: Task Id :
> attempt_1397454060494_0003_r_000005_1, Status : FAILED
> 14/04/14 11:58:24 INFO mapreduce.Job: Task Id :
> attempt_1397454060494_0003_r_000003_2, Status : FAILED
> 14/04/14 11:58:26 INFO mapreduce.Job: Task Id :
> attempt_1397454060494_0003_r_000005_2, Status : FAILED
> 14/04/14 11:58:40 INFO mapreduce.Job: map 100% reduce 100%
> 14/04/14 11:58:40 INFO mapreduce.Job: Job job_1397454060494_0003 failed
> with state FAILED due to: Task failed task_1397454060494_0003_r_000003
> Job failed as tasks failed. failedMaps:0 failedReduces:1
>
> 14/04/14 11:58:40 INFO mapreduce.Job: Counters: 51
> File System Counters
> FILE: Number of bytes read=80
> FILE: Number of bytes written=596766
> FILE: Number of read operations=0
> FILE: Number of large read operations=0
> FILE: Number of write operations=0
> HDFS: Number of bytes read=175
> HDFS: Number of bytes written=28
> HDFS: Number of read operations=21
> HDFS: Number of large read operations=0
> HDFS: Number of write operations=12
> Job Counters
> Failed reduce tasks=9
> Killed reduce tasks=1
> Launched map tasks=1
> Launched reduce tasks=16
> Data-local map tasks=1
> Total time spent by all maps in occupied slots (ms)=3477
> Total time spent by all reduces in occupied slots (ms)=148867
> Total time spent by all map tasks (ms)=3477
> Total time spent by all reduce tasks (ms)=148867
> Total vcore-seconds taken by all map tasks=3477
> Total vcore-seconds taken by all reduce tasks=148867
> Total megabyte-seconds taken by all map tasks=3560448
> Total megabyte-seconds taken by all reduce tasks=152439808
> Map-Reduce Framework
> Map input records=3
> Map output records=13
> Map output bytes=110
> Map output materialized bytes=112
> Input split bytes=117
> Combine input records=13
> Combine output records=6
> Reduce input groups=4
> Reduce shuffle bytes=80
> Reduce input records=4
> Reduce output records=4
> Spilled Records=10
> Shuffled Maps =6
> Failed Shuffles=0
> Merged Map outputs=6
> GC time elapsed (ms)=142
> CPU time spent (ms)=6420
> Physical memory (bytes) snapshot=1100853248
> Virtual memory (bytes) snapshot=4468314112
> Total committed heap usage (bytes)=1406992384
> Shuffle Errors
> BAD_ID=0
> CONNECTION=0
> IO_ERROR=0
> WRONG_LENGTH=0
> WRONG_MAP=0
> WRONG_REDUCE=0
> File Input Format Counters
> Bytes Read=58
> File Output Format Counters
> Bytes Written=28
>
> Job Passing Logs:
> hadoop jar ../share/hadoop/mapreduce/hadoop-mapreduce-examples-2.3.0.jar
> wordcount /user/hduser/input /user/hduser/output_wordcount9
> 14/04/14 11:47:27 WARN util.NativeCodeLoader: Unable to load native-hadoop
> library for your platform... using builtin-java classes where applicable
> 14/04/14 11:47:28 INFO client.RMProxy: Connecting to ResourceManager at /
> 20.0.1.206:8032
> 14/04/14 11:47:28 INFO input.FileInputFormat: Total input paths to process
> : 1
> 14/04/14 11:47:29 INFO mapreduce.JobSubmitter: number of splits:1
> 14/04/14 11:47:29 INFO mapreduce.JobSubmitter: Submitting tokens for job:
> job_1397454060494_0002
> 14/04/14 11:47:29 INFO impl.YarnClientImpl: Submitted application
> application_1397454060494_0002
> 14/04/14 11:47:29 INFO mapreduce.Job: The url to track the job:
> http://20.0.1.206:8088/proxy/application_1397454060494_0002/
> 14/04/14 11:47:29 INFO mapreduce.Job: Running job: job_1397454060494_0002
> 14/04/14 11:47:36 INFO mapreduce.Job: Job job_1397454060494_0002 running
> in uber mode : false
> 14/04/14 11:47:36 INFO mapreduce.Job: map 0% reduce 0%
> 14/04/14 11:47:50 INFO mapreduce.Job: Task Id :
> attempt_1397454060494_0002_m_000000_0, Status : FAILED
> 14/04/14 11:48:05 INFO mapreduce.Job: Task Id :
> attempt_1397454060494_0002_m_000000_1, Status : FAILED
> 14/04/14 11:48:20 INFO mapreduce.Job: Task Id :
> attempt_1397454060494_0002_m_000000_2, Status : FAILED
> 14/04/14 11:48:26 INFO mapreduce.Job: map 100% reduce 0%
> 14/04/14 11:48:34 INFO mapreduce.Job: map 100% reduce 13%
> 14/04/14 11:48:35 INFO mapreduce.Job: map 100% reduce 25%
> 14/04/14 11:48:37 INFO mapreduce.Job: map 100% reduce 50%
> 14/04/14 11:48:41 INFO mapreduce.Job: Task Id :
> attempt_1397454060494_0002_r_000001_0, Status : FAILED
> 14/04/14 11:48:42 INFO mapreduce.Job: Task Id :
> attempt_1397454060494_0002_r_000003_0, Status : FAILED
> 14/04/14 11:48:43 INFO mapreduce.Job: Task Id :
> attempt_1397454060494_0002_r_000005_0, Status : FAILED
> 14/04/14 11:48:44 INFO mapreduce.Job: Task Id :
> attempt_1397454060494_0002_r_000007_0, Status : FAILED
> 14/04/14 11:48:50 INFO mapreduce.Job: map 100% reduce 63%
> 14/04/14 11:48:51 INFO mapreduce.Job: map 100% reduce 75%
> 14/04/14 11:48:52 INFO mapreduce.Job: map 100% reduce 88%
> 14/04/14 11:48:58 INFO mapreduce.Job: Task Id :
> attempt_1397454060494_0002_r_000005_1, Status : FAILED
> 14/04/14 11:49:05 INFO mapreduce.Job: map 100% reduce 100%
> 14/04/14 11:49:06 INFO mapreduce.Job: Job job_1397454060494_0002 completed
> successfully
> 14/04/14 11:49:06 INFO mapreduce.Job: Counters: 52
> File System Counters
> FILE: Number of bytes read=112
> FILE: Number of bytes written=767175
> FILE: Number of read operations=0
> FILE: Number of large read operations=0
> FILE: Number of write operations=0
> HDFS: Number of bytes read=175
> HDFS: Number of bytes written=40
> HDFS: Number of read operations=27
> HDFS: Number of large read operations=0
> HDFS: Number of write operations=16
> Job Counters
> Failed map tasks=3
> Failed reduce tasks=5
> Launched map tasks=4
> Launched reduce tasks=13
> Other local map tasks=3
> Data-local map tasks=1
> Total time spent by all maps in occupied slots (ms)=41629
> Total time spent by all reduces in occupied slots (ms)=104530
> Total time spent by all map tasks (ms)=41629
> Total time spent by all reduce tasks (ms)=104530
> Total vcore-seconds taken by all map tasks=41629
> Total vcore-seconds taken by all reduce tasks=104530
> Total megabyte-seconds taken by all map tasks=42628096
> Total megabyte-seconds taken by all reduce tasks=107038720
> Map-Reduce Framework
> Map input records=3
> Map output records=13
> Map output bytes=110
> Map output materialized bytes=112
> Input split bytes=117
> Combine input records=13
> Combine output records=6
> Reduce input groups=6
> Reduce shuffle bytes=112
> Reduce input records=6
> Reduce output records=6
> Spilled Records=12
> Shuffled Maps =8
> Failed Shuffles=0
> Merged Map outputs=8
> GC time elapsed (ms)=186
> CPU time spent (ms)=8890
> Physical memory (bytes) snapshot=1408913408
> Virtual memory (bytes) snapshot=5727019008
> Total committed heap usage (bytes)=1808990208
> Shuffle Errors
> BAD_ID=0
> CONNECTION=0
> IO_ERROR=0
> WRONG_LENGTH=0
> WRONG_MAP=0
> WRONG_REDUCE=0
> File Input Format Counters
> Bytes Read=58
> File Output Format Counters
> Bytes Written=40
>
> Thanks and Regards,
> -Rahul Singh
>
--
Regards
Gordon Wang
Re: hadoop inconsistent behaviour
Posted by Gordon Wang <gw...@gopivotal.com>.
Hi Rahul,
What is the log of reduce container ? Please paste the log and we can see
the reason.
On Mon, Apr 14, 2014 at 2:38 PM, Rahul Singh <sm...@gmail.com>wrote:
> Hi,
> I am running a job(wordcount example) on 3 node cluster(1 master and 2
> slave), some times the job passes but some times it fails(as reduce fails,
> input data few kbs).
> I am not able to nail down the reason of this inconsistency.
>
>
>
> failed log:
>
> 14/04/14 11:57:24 WARN util.NativeCodeLoader: Unable to load native-hadoop
> library for your platform... using builtin-java classes where applicable
> 14/04/14 11:57:25 INFO client.RMProxy: Connecting to ResourceManager at /
> 20.0.1.206:8032
> 14/04/14 11:57:26 INFO input.FileInputFormat: Total input paths to process
> : 1
> 14/04/14 11:57:26 INFO mapreduce.JobSubmitter: number of splits:1
> 14/04/14 11:57:26 INFO mapreduce.JobSubmitter: Submitting tokens for job:
> job_1397454060494_0003
> 14/04/14 11:57:26 INFO impl.YarnClientImpl: Submitted application
> application_1397454060494_0003
> 14/04/14 11:57:26 INFO mapreduce.Job: The url to track the job:
> http://20.0.1.206:8088/proxy/application_1397454060494_0003/
> 14/04/14 11:57:26 INFO mapreduce.Job: Running job: job_1397454060494_0003
> 14/04/14 11:57:34 INFO mapreduce.Job: Job job_1397454060494_0003 running
> in uber mode : false
> 14/04/14 11:57:34 INFO mapreduce.Job: map 0% reduce 0%
> 14/04/14 11:57:40 INFO mapreduce.Job: map 100% reduce 0%
> 14/04/14 11:57:46 INFO mapreduce.Job: map 100% reduce 13%
> 14/04/14 11:57:48 INFO mapreduce.Job: map 100% reduce 25%
> 14/04/14 11:57:49 INFO mapreduce.Job: map 100% reduce 38%
> 14/04/14 11:57:50 INFO mapreduce.Job: map 100% reduce 50%
> 14/04/14 11:57:54 INFO mapreduce.Job: Task Id :
> attempt_1397454060494_0003_r_000003_0, Status : FAILED
> 14/04/14 11:57:54 INFO mapreduce.Job: Task Id :
> attempt_1397454060494_0003_r_000001_0, Status : FAILED
> 14/04/14 11:57:56 INFO mapreduce.Job: Task Id :
> attempt_1397454060494_0003_r_000005_0, Status : FAILED
> 14/04/14 11:57:56 INFO mapreduce.Job: Task Id :
> attempt_1397454060494_0003_r_000007_0, Status : FAILED
> 14/04/14 11:58:02 INFO mapreduce.Job: map 100% reduce 63%
> 14/04/14 11:58:04 INFO mapreduce.Job: map 100% reduce 75%
> 14/04/14 11:58:09 INFO mapreduce.Job: Task Id :
> attempt_1397454060494_0003_r_000003_1, Status : FAILED
> 14/04/14 11:58:11 INFO mapreduce.Job: Task Id :
> attempt_1397454060494_0003_r_000005_1, Status : FAILED
> 14/04/14 11:58:24 INFO mapreduce.Job: Task Id :
> attempt_1397454060494_0003_r_000003_2, Status : FAILED
> 14/04/14 11:58:26 INFO mapreduce.Job: Task Id :
> attempt_1397454060494_0003_r_000005_2, Status : FAILED
> 14/04/14 11:58:40 INFO mapreduce.Job: map 100% reduce 100%
> 14/04/14 11:58:40 INFO mapreduce.Job: Job job_1397454060494_0003 failed
> with state FAILED due to: Task failed task_1397454060494_0003_r_000003
> Job failed as tasks failed. failedMaps:0 failedReduces:1
>
> 14/04/14 11:58:40 INFO mapreduce.Job: Counters: 51
> File System Counters
> FILE: Number of bytes read=80
> FILE: Number of bytes written=596766
> FILE: Number of read operations=0
> FILE: Number of large read operations=0
> FILE: Number of write operations=0
> HDFS: Number of bytes read=175
> HDFS: Number of bytes written=28
> HDFS: Number of read operations=21
> HDFS: Number of large read operations=0
> HDFS: Number of write operations=12
> Job Counters
> Failed reduce tasks=9
> Killed reduce tasks=1
> Launched map tasks=1
> Launched reduce tasks=16
> Data-local map tasks=1
> Total time spent by all maps in occupied slots (ms)=3477
> Total time spent by all reduces in occupied slots (ms)=148867
> Total time spent by all map tasks (ms)=3477
> Total time spent by all reduce tasks (ms)=148867
> Total vcore-seconds taken by all map tasks=3477
> Total vcore-seconds taken by all reduce tasks=148867
> Total megabyte-seconds taken by all map tasks=3560448
> Total megabyte-seconds taken by all reduce tasks=152439808
> Map-Reduce Framework
> Map input records=3
> Map output records=13
> Map output bytes=110
> Map output materialized bytes=112
> Input split bytes=117
> Combine input records=13
> Combine output records=6
> Reduce input groups=4
> Reduce shuffle bytes=80
> Reduce input records=4
> Reduce output records=4
> Spilled Records=10
> Shuffled Maps =6
> Failed Shuffles=0
> Merged Map outputs=6
> GC time elapsed (ms)=142
> CPU time spent (ms)=6420
> Physical memory (bytes) snapshot=1100853248
> Virtual memory (bytes) snapshot=4468314112
> Total committed heap usage (bytes)=1406992384
> Shuffle Errors
> BAD_ID=0
> CONNECTION=0
> IO_ERROR=0
> WRONG_LENGTH=0
> WRONG_MAP=0
> WRONG_REDUCE=0
> File Input Format Counters
> Bytes Read=58
> File Output Format Counters
> Bytes Written=28
>
> Job Passing Logs:
> hadoop jar ../share/hadoop/mapreduce/hadoop-mapreduce-examples-2.3.0.jar
> wordcount /user/hduser/input /user/hduser/output_wordcount9
> 14/04/14 11:47:27 WARN util.NativeCodeLoader: Unable to load native-hadoop
> library for your platform... using builtin-java classes where applicable
> 14/04/14 11:47:28 INFO client.RMProxy: Connecting to ResourceManager at /
> 20.0.1.206:8032
> 14/04/14 11:47:28 INFO input.FileInputFormat: Total input paths to process
> : 1
> 14/04/14 11:47:29 INFO mapreduce.JobSubmitter: number of splits:1
> 14/04/14 11:47:29 INFO mapreduce.JobSubmitter: Submitting tokens for job:
> job_1397454060494_0002
> 14/04/14 11:47:29 INFO impl.YarnClientImpl: Submitted application
> application_1397454060494_0002
> 14/04/14 11:47:29 INFO mapreduce.Job: The url to track the job:
> http://20.0.1.206:8088/proxy/application_1397454060494_0002/
> 14/04/14 11:47:29 INFO mapreduce.Job: Running job: job_1397454060494_0002
> 14/04/14 11:47:36 INFO mapreduce.Job: Job job_1397454060494_0002 running
> in uber mode : false
> 14/04/14 11:47:36 INFO mapreduce.Job: map 0% reduce 0%
> 14/04/14 11:47:50 INFO mapreduce.Job: Task Id :
> attempt_1397454060494_0002_m_000000_0, Status : FAILED
> 14/04/14 11:48:05 INFO mapreduce.Job: Task Id :
> attempt_1397454060494_0002_m_000000_1, Status : FAILED
> 14/04/14 11:48:20 INFO mapreduce.Job: Task Id :
> attempt_1397454060494_0002_m_000000_2, Status : FAILED
> 14/04/14 11:48:26 INFO mapreduce.Job: map 100% reduce 0%
> 14/04/14 11:48:34 INFO mapreduce.Job: map 100% reduce 13%
> 14/04/14 11:48:35 INFO mapreduce.Job: map 100% reduce 25%
> 14/04/14 11:48:37 INFO mapreduce.Job: map 100% reduce 50%
> 14/04/14 11:48:41 INFO mapreduce.Job: Task Id :
> attempt_1397454060494_0002_r_000001_0, Status : FAILED
> 14/04/14 11:48:42 INFO mapreduce.Job: Task Id :
> attempt_1397454060494_0002_r_000003_0, Status : FAILED
> 14/04/14 11:48:43 INFO mapreduce.Job: Task Id :
> attempt_1397454060494_0002_r_000005_0, Status : FAILED
> 14/04/14 11:48:44 INFO mapreduce.Job: Task Id :
> attempt_1397454060494_0002_r_000007_0, Status : FAILED
> 14/04/14 11:48:50 INFO mapreduce.Job: map 100% reduce 63%
> 14/04/14 11:48:51 INFO mapreduce.Job: map 100% reduce 75%
> 14/04/14 11:48:52 INFO mapreduce.Job: map 100% reduce 88%
> 14/04/14 11:48:58 INFO mapreduce.Job: Task Id :
> attempt_1397454060494_0002_r_000005_1, Status : FAILED
> 14/04/14 11:49:05 INFO mapreduce.Job: map 100% reduce 100%
> 14/04/14 11:49:06 INFO mapreduce.Job: Job job_1397454060494_0002 completed
> successfully
> 14/04/14 11:49:06 INFO mapreduce.Job: Counters: 52
> File System Counters
> FILE: Number of bytes read=112
> FILE: Number of bytes written=767175
> FILE: Number of read operations=0
> FILE: Number of large read operations=0
> FILE: Number of write operations=0
> HDFS: Number of bytes read=175
> HDFS: Number of bytes written=40
> HDFS: Number of read operations=27
> HDFS: Number of large read operations=0
> HDFS: Number of write operations=16
> Job Counters
> Failed map tasks=3
> Failed reduce tasks=5
> Launched map tasks=4
> Launched reduce tasks=13
> Other local map tasks=3
> Data-local map tasks=1
> Total time spent by all maps in occupied slots (ms)=41629
> Total time spent by all reduces in occupied slots (ms)=104530
> Total time spent by all map tasks (ms)=41629
> Total time spent by all reduce tasks (ms)=104530
> Total vcore-seconds taken by all map tasks=41629
> Total vcore-seconds taken by all reduce tasks=104530
> Total megabyte-seconds taken by all map tasks=42628096
> Total megabyte-seconds taken by all reduce tasks=107038720
> Map-Reduce Framework
> Map input records=3
> Map output records=13
> Map output bytes=110
> Map output materialized bytes=112
> Input split bytes=117
> Combine input records=13
> Combine output records=6
> Reduce input groups=6
> Reduce shuffle bytes=112
> Reduce input records=6
> Reduce output records=6
> Spilled Records=12
> Shuffled Maps =8
> Failed Shuffles=0
> Merged Map outputs=8
> GC time elapsed (ms)=186
> CPU time spent (ms)=8890
> Physical memory (bytes) snapshot=1408913408
> Virtual memory (bytes) snapshot=5727019008
> Total committed heap usage (bytes)=1808990208
> Shuffle Errors
> BAD_ID=0
> CONNECTION=0
> IO_ERROR=0
> WRONG_LENGTH=0
> WRONG_MAP=0
> WRONG_REDUCE=0
> File Input Format Counters
> Bytes Read=58
> File Output Format Counters
> Bytes Written=40
>
> Thanks and Regards,
> -Rahul Singh
>
--
Regards
Gordon Wang
Re: hadoop inconsistent behaviour
Posted by Shengjun Xin <sx...@gopivotal.com>.
You can check the nodemanager log
On Mon, Apr 14, 2014 at 2:38 PM, Rahul Singh <sm...@gmail.com>wrote:
> Hi,
> I am running a job(wordcount example) on 3 node cluster(1 master and 2
> slave), some times the job passes but some times it fails(as reduce fails,
> input data few kbs).
> I am not able to nail down the reason of this inconsistency.
>
>
>
> failed log:
>
> 14/04/14 11:57:24 WARN util.NativeCodeLoader: Unable to load native-hadoop
> library for your platform... using builtin-java classes where applicable
> 14/04/14 11:57:25 INFO client.RMProxy: Connecting to ResourceManager at /
> 20.0.1.206:8032
> 14/04/14 11:57:26 INFO input.FileInputFormat: Total input paths to process
> : 1
> 14/04/14 11:57:26 INFO mapreduce.JobSubmitter: number of splits:1
> 14/04/14 11:57:26 INFO mapreduce.JobSubmitter: Submitting tokens for job:
> job_1397454060494_0003
> 14/04/14 11:57:26 INFO impl.YarnClientImpl: Submitted application
> application_1397454060494_0003
> 14/04/14 11:57:26 INFO mapreduce.Job: The url to track the job:
> http://20.0.1.206:8088/proxy/application_1397454060494_0003/
> 14/04/14 11:57:26 INFO mapreduce.Job: Running job: job_1397454060494_0003
> 14/04/14 11:57:34 INFO mapreduce.Job: Job job_1397454060494_0003 running
> in uber mode : false
> 14/04/14 11:57:34 INFO mapreduce.Job: map 0% reduce 0%
> 14/04/14 11:57:40 INFO mapreduce.Job: map 100% reduce 0%
> 14/04/14 11:57:46 INFO mapreduce.Job: map 100% reduce 13%
> 14/04/14 11:57:48 INFO mapreduce.Job: map 100% reduce 25%
> 14/04/14 11:57:49 INFO mapreduce.Job: map 100% reduce 38%
> 14/04/14 11:57:50 INFO mapreduce.Job: map 100% reduce 50%
> 14/04/14 11:57:54 INFO mapreduce.Job: Task Id :
> attempt_1397454060494_0003_r_000003_0, Status : FAILED
> 14/04/14 11:57:54 INFO mapreduce.Job: Task Id :
> attempt_1397454060494_0003_r_000001_0, Status : FAILED
> 14/04/14 11:57:56 INFO mapreduce.Job: Task Id :
> attempt_1397454060494_0003_r_000005_0, Status : FAILED
> 14/04/14 11:57:56 INFO mapreduce.Job: Task Id :
> attempt_1397454060494_0003_r_000007_0, Status : FAILED
> 14/04/14 11:58:02 INFO mapreduce.Job: map 100% reduce 63%
> 14/04/14 11:58:04 INFO mapreduce.Job: map 100% reduce 75%
> 14/04/14 11:58:09 INFO mapreduce.Job: Task Id :
> attempt_1397454060494_0003_r_000003_1, Status : FAILED
> 14/04/14 11:58:11 INFO mapreduce.Job: Task Id :
> attempt_1397454060494_0003_r_000005_1, Status : FAILED
> 14/04/14 11:58:24 INFO mapreduce.Job: Task Id :
> attempt_1397454060494_0003_r_000003_2, Status : FAILED
> 14/04/14 11:58:26 INFO mapreduce.Job: Task Id :
> attempt_1397454060494_0003_r_000005_2, Status : FAILED
> 14/04/14 11:58:40 INFO mapreduce.Job: map 100% reduce 100%
> 14/04/14 11:58:40 INFO mapreduce.Job: Job job_1397454060494_0003 failed
> with state FAILED due to: Task failed task_1397454060494_0003_r_000003
> Job failed as tasks failed. failedMaps:0 failedReduces:1
>
> 14/04/14 11:58:40 INFO mapreduce.Job: Counters: 51
> File System Counters
> FILE: Number of bytes read=80
> FILE: Number of bytes written=596766
> FILE: Number of read operations=0
> FILE: Number of large read operations=0
> FILE: Number of write operations=0
> HDFS: Number of bytes read=175
> HDFS: Number of bytes written=28
> HDFS: Number of read operations=21
> HDFS: Number of large read operations=0
> HDFS: Number of write operations=12
> Job Counters
> Failed reduce tasks=9
> Killed reduce tasks=1
> Launched map tasks=1
> Launched reduce tasks=16
> Data-local map tasks=1
> Total time spent by all maps in occupied slots (ms)=3477
> Total time spent by all reduces in occupied slots (ms)=148867
> Total time spent by all map tasks (ms)=3477
> Total time spent by all reduce tasks (ms)=148867
> Total vcore-seconds taken by all map tasks=3477
> Total vcore-seconds taken by all reduce tasks=148867
> Total megabyte-seconds taken by all map tasks=3560448
> Total megabyte-seconds taken by all reduce tasks=152439808
> Map-Reduce Framework
> Map input records=3
> Map output records=13
> Map output bytes=110
> Map output materialized bytes=112
> Input split bytes=117
> Combine input records=13
> Combine output records=6
> Reduce input groups=4
> Reduce shuffle bytes=80
> Reduce input records=4
> Reduce output records=4
> Spilled Records=10
> Shuffled Maps =6
> Failed Shuffles=0
> Merged Map outputs=6
> GC time elapsed (ms)=142
> CPU time spent (ms)=6420
> Physical memory (bytes) snapshot=1100853248
> Virtual memory (bytes) snapshot=4468314112
> Total committed heap usage (bytes)=1406992384
> Shuffle Errors
> BAD_ID=0
> CONNECTION=0
> IO_ERROR=0
> WRONG_LENGTH=0
> WRONG_MAP=0
> WRONG_REDUCE=0
> File Input Format Counters
> Bytes Read=58
> File Output Format Counters
> Bytes Written=28
>
> Job Passing Logs:
> hadoop jar ../share/hadoop/mapreduce/hadoop-mapreduce-examples-2.3.0.jar
> wordcount /user/hduser/input /user/hduser/output_wordcount9
> 14/04/14 11:47:27 WARN util.NativeCodeLoader: Unable to load native-hadoop
> library for your platform... using builtin-java classes where applicable
> 14/04/14 11:47:28 INFO client.RMProxy: Connecting to ResourceManager at /
> 20.0.1.206:8032
> 14/04/14 11:47:28 INFO input.FileInputFormat: Total input paths to process
> : 1
> 14/04/14 11:47:29 INFO mapreduce.JobSubmitter: number of splits:1
> 14/04/14 11:47:29 INFO mapreduce.JobSubmitter: Submitting tokens for job:
> job_1397454060494_0002
> 14/04/14 11:47:29 INFO impl.YarnClientImpl: Submitted application
> application_1397454060494_0002
> 14/04/14 11:47:29 INFO mapreduce.Job: The url to track the job:
> http://20.0.1.206:8088/proxy/application_1397454060494_0002/
> 14/04/14 11:47:29 INFO mapreduce.Job: Running job: job_1397454060494_0002
> 14/04/14 11:47:36 INFO mapreduce.Job: Job job_1397454060494_0002 running
> in uber mode : false
> 14/04/14 11:47:36 INFO mapreduce.Job: map 0% reduce 0%
> 14/04/14 11:47:50 INFO mapreduce.Job: Task Id :
> attempt_1397454060494_0002_m_000000_0, Status : FAILED
> 14/04/14 11:48:05 INFO mapreduce.Job: Task Id :
> attempt_1397454060494_0002_m_000000_1, Status : FAILED
> 14/04/14 11:48:20 INFO mapreduce.Job: Task Id :
> attempt_1397454060494_0002_m_000000_2, Status : FAILED
> 14/04/14 11:48:26 INFO mapreduce.Job: map 100% reduce 0%
> 14/04/14 11:48:34 INFO mapreduce.Job: map 100% reduce 13%
> 14/04/14 11:48:35 INFO mapreduce.Job: map 100% reduce 25%
> 14/04/14 11:48:37 INFO mapreduce.Job: map 100% reduce 50%
> 14/04/14 11:48:41 INFO mapreduce.Job: Task Id :
> attempt_1397454060494_0002_r_000001_0, Status : FAILED
> 14/04/14 11:48:42 INFO mapreduce.Job: Task Id :
> attempt_1397454060494_0002_r_000003_0, Status : FAILED
> 14/04/14 11:48:43 INFO mapreduce.Job: Task Id :
> attempt_1397454060494_0002_r_000005_0, Status : FAILED
> 14/04/14 11:48:44 INFO mapreduce.Job: Task Id :
> attempt_1397454060494_0002_r_000007_0, Status : FAILED
> 14/04/14 11:48:50 INFO mapreduce.Job: map 100% reduce 63%
> 14/04/14 11:48:51 INFO mapreduce.Job: map 100% reduce 75%
> 14/04/14 11:48:52 INFO mapreduce.Job: map 100% reduce 88%
> 14/04/14 11:48:58 INFO mapreduce.Job: Task Id :
> attempt_1397454060494_0002_r_000005_1, Status : FAILED
> 14/04/14 11:49:05 INFO mapreduce.Job: map 100% reduce 100%
> 14/04/14 11:49:06 INFO mapreduce.Job: Job job_1397454060494_0002 completed
> successfully
> 14/04/14 11:49:06 INFO mapreduce.Job: Counters: 52
> File System Counters
> FILE: Number of bytes read=112
> FILE: Number of bytes written=767175
> FILE: Number of read operations=0
> FILE: Number of large read operations=0
> FILE: Number of write operations=0
> HDFS: Number of bytes read=175
> HDFS: Number of bytes written=40
> HDFS: Number of read operations=27
> HDFS: Number of large read operations=0
> HDFS: Number of write operations=16
> Job Counters
> Failed map tasks=3
> Failed reduce tasks=5
> Launched map tasks=4
> Launched reduce tasks=13
> Other local map tasks=3
> Data-local map tasks=1
> Total time spent by all maps in occupied slots (ms)=41629
> Total time spent by all reduces in occupied slots (ms)=104530
> Total time spent by all map tasks (ms)=41629
> Total time spent by all reduce tasks (ms)=104530
> Total vcore-seconds taken by all map tasks=41629
> Total vcore-seconds taken by all reduce tasks=104530
> Total megabyte-seconds taken by all map tasks=42628096
> Total megabyte-seconds taken by all reduce tasks=107038720
> Map-Reduce Framework
> Map input records=3
> Map output records=13
> Map output bytes=110
> Map output materialized bytes=112
> Input split bytes=117
> Combine input records=13
> Combine output records=6
> Reduce input groups=6
> Reduce shuffle bytes=112
> Reduce input records=6
> Reduce output records=6
> Spilled Records=12
> Shuffled Maps =8
> Failed Shuffles=0
> Merged Map outputs=8
> GC time elapsed (ms)=186
> CPU time spent (ms)=8890
> Physical memory (bytes) snapshot=1408913408
> Virtual memory (bytes) snapshot=5727019008
> Total committed heap usage (bytes)=1808990208
> Shuffle Errors
> BAD_ID=0
> CONNECTION=0
> IO_ERROR=0
> WRONG_LENGTH=0
> WRONG_MAP=0
> WRONG_REDUCE=0
> File Input Format Counters
> Bytes Read=58
> File Output Format Counters
> Bytes Written=40
>
> Thanks and Regards,
> -Rahul Singh
>
--
Regards
Shengjun
Re: hadoop inconsistent behaviour
Posted by Shengjun Xin <sx...@gopivotal.com>.
You can check the nodemanager log
On Mon, Apr 14, 2014 at 2:38 PM, Rahul Singh <sm...@gmail.com>wrote:
> Hi,
> I am running a job(wordcount example) on 3 node cluster(1 master and 2
> slave), some times the job passes but some times it fails(as reduce fails,
> input data few kbs).
> I am not able to nail down the reason of this inconsistency.
>
>
>
> failed log:
>
> 14/04/14 11:57:24 WARN util.NativeCodeLoader: Unable to load native-hadoop
> library for your platform... using builtin-java classes where applicable
> 14/04/14 11:57:25 INFO client.RMProxy: Connecting to ResourceManager at /
> 20.0.1.206:8032
> 14/04/14 11:57:26 INFO input.FileInputFormat: Total input paths to process
> : 1
> 14/04/14 11:57:26 INFO mapreduce.JobSubmitter: number of splits:1
> 14/04/14 11:57:26 INFO mapreduce.JobSubmitter: Submitting tokens for job:
> job_1397454060494_0003
> 14/04/14 11:57:26 INFO impl.YarnClientImpl: Submitted application
> application_1397454060494_0003
> 14/04/14 11:57:26 INFO mapreduce.Job: The url to track the job:
> http://20.0.1.206:8088/proxy/application_1397454060494_0003/
> 14/04/14 11:57:26 INFO mapreduce.Job: Running job: job_1397454060494_0003
> 14/04/14 11:57:34 INFO mapreduce.Job: Job job_1397454060494_0003 running
> in uber mode : false
> 14/04/14 11:57:34 INFO mapreduce.Job: map 0% reduce 0%
> 14/04/14 11:57:40 INFO mapreduce.Job: map 100% reduce 0%
> 14/04/14 11:57:46 INFO mapreduce.Job: map 100% reduce 13%
> 14/04/14 11:57:48 INFO mapreduce.Job: map 100% reduce 25%
> 14/04/14 11:57:49 INFO mapreduce.Job: map 100% reduce 38%
> 14/04/14 11:57:50 INFO mapreduce.Job: map 100% reduce 50%
> 14/04/14 11:57:54 INFO mapreduce.Job: Task Id :
> attempt_1397454060494_0003_r_000003_0, Status : FAILED
> 14/04/14 11:57:54 INFO mapreduce.Job: Task Id :
> attempt_1397454060494_0003_r_000001_0, Status : FAILED
> 14/04/14 11:57:56 INFO mapreduce.Job: Task Id :
> attempt_1397454060494_0003_r_000005_0, Status : FAILED
> 14/04/14 11:57:56 INFO mapreduce.Job: Task Id :
> attempt_1397454060494_0003_r_000007_0, Status : FAILED
> 14/04/14 11:58:02 INFO mapreduce.Job: map 100% reduce 63%
> 14/04/14 11:58:04 INFO mapreduce.Job: map 100% reduce 75%
> 14/04/14 11:58:09 INFO mapreduce.Job: Task Id :
> attempt_1397454060494_0003_r_000003_1, Status : FAILED
> 14/04/14 11:58:11 INFO mapreduce.Job: Task Id :
> attempt_1397454060494_0003_r_000005_1, Status : FAILED
> 14/04/14 11:58:24 INFO mapreduce.Job: Task Id :
> attempt_1397454060494_0003_r_000003_2, Status : FAILED
> 14/04/14 11:58:26 INFO mapreduce.Job: Task Id :
> attempt_1397454060494_0003_r_000005_2, Status : FAILED
> 14/04/14 11:58:40 INFO mapreduce.Job: map 100% reduce 100%
> 14/04/14 11:58:40 INFO mapreduce.Job: Job job_1397454060494_0003 failed
> with state FAILED due to: Task failed task_1397454060494_0003_r_000003
> Job failed as tasks failed. failedMaps:0 failedReduces:1
>
> 14/04/14 11:58:40 INFO mapreduce.Job: Counters: 51
> File System Counters
> FILE: Number of bytes read=80
> FILE: Number of bytes written=596766
> FILE: Number of read operations=0
> FILE: Number of large read operations=0
> FILE: Number of write operations=0
> HDFS: Number of bytes read=175
> HDFS: Number of bytes written=28
> HDFS: Number of read operations=21
> HDFS: Number of large read operations=0
> HDFS: Number of write operations=12
> Job Counters
> Failed reduce tasks=9
> Killed reduce tasks=1
> Launched map tasks=1
> Launched reduce tasks=16
> Data-local map tasks=1
> Total time spent by all maps in occupied slots (ms)=3477
> Total time spent by all reduces in occupied slots (ms)=148867
> Total time spent by all map tasks (ms)=3477
> Total time spent by all reduce tasks (ms)=148867
> Total vcore-seconds taken by all map tasks=3477
> Total vcore-seconds taken by all reduce tasks=148867
> Total megabyte-seconds taken by all map tasks=3560448
> Total megabyte-seconds taken by all reduce tasks=152439808
> Map-Reduce Framework
> Map input records=3
> Map output records=13
> Map output bytes=110
> Map output materialized bytes=112
> Input split bytes=117
> Combine input records=13
> Combine output records=6
> Reduce input groups=4
> Reduce shuffle bytes=80
> Reduce input records=4
> Reduce output records=4
> Spilled Records=10
> Shuffled Maps =6
> Failed Shuffles=0
> Merged Map outputs=6
> GC time elapsed (ms)=142
> CPU time spent (ms)=6420
> Physical memory (bytes) snapshot=1100853248
> Virtual memory (bytes) snapshot=4468314112
> Total committed heap usage (bytes)=1406992384
> Shuffle Errors
> BAD_ID=0
> CONNECTION=0
> IO_ERROR=0
> WRONG_LENGTH=0
> WRONG_MAP=0
> WRONG_REDUCE=0
> File Input Format Counters
> Bytes Read=58
> File Output Format Counters
> Bytes Written=28
>
> Job Passing Logs:
> hadoop jar ../share/hadoop/mapreduce/hadoop-mapreduce-examples-2.3.0.jar
> wordcount /user/hduser/input /user/hduser/output_wordcount9
> 14/04/14 11:47:27 WARN util.NativeCodeLoader: Unable to load native-hadoop
> library for your platform... using builtin-java classes where applicable
> 14/04/14 11:47:28 INFO client.RMProxy: Connecting to ResourceManager at /
> 20.0.1.206:8032
> 14/04/14 11:47:28 INFO input.FileInputFormat: Total input paths to process
> : 1
> 14/04/14 11:47:29 INFO mapreduce.JobSubmitter: number of splits:1
> 14/04/14 11:47:29 INFO mapreduce.JobSubmitter: Submitting tokens for job:
> job_1397454060494_0002
> 14/04/14 11:47:29 INFO impl.YarnClientImpl: Submitted application
> application_1397454060494_0002
> 14/04/14 11:47:29 INFO mapreduce.Job: The url to track the job:
> http://20.0.1.206:8088/proxy/application_1397454060494_0002/
> 14/04/14 11:47:29 INFO mapreduce.Job: Running job: job_1397454060494_0002
> 14/04/14 11:47:36 INFO mapreduce.Job: Job job_1397454060494_0002 running
> in uber mode : false
> 14/04/14 11:47:36 INFO mapreduce.Job: map 0% reduce 0%
> 14/04/14 11:47:50 INFO mapreduce.Job: Task Id :
> attempt_1397454060494_0002_m_000000_0, Status : FAILED
> 14/04/14 11:48:05 INFO mapreduce.Job: Task Id :
> attempt_1397454060494_0002_m_000000_1, Status : FAILED
> 14/04/14 11:48:20 INFO mapreduce.Job: Task Id :
> attempt_1397454060494_0002_m_000000_2, Status : FAILED
> 14/04/14 11:48:26 INFO mapreduce.Job: map 100% reduce 0%
> 14/04/14 11:48:34 INFO mapreduce.Job: map 100% reduce 13%
> 14/04/14 11:48:35 INFO mapreduce.Job: map 100% reduce 25%
> 14/04/14 11:48:37 INFO mapreduce.Job: map 100% reduce 50%
> 14/04/14 11:48:41 INFO mapreduce.Job: Task Id :
> attempt_1397454060494_0002_r_000001_0, Status : FAILED
> 14/04/14 11:48:42 INFO mapreduce.Job: Task Id :
> attempt_1397454060494_0002_r_000003_0, Status : FAILED
> 14/04/14 11:48:43 INFO mapreduce.Job: Task Id :
> attempt_1397454060494_0002_r_000005_0, Status : FAILED
> 14/04/14 11:48:44 INFO mapreduce.Job: Task Id :
> attempt_1397454060494_0002_r_000007_0, Status : FAILED
> 14/04/14 11:48:50 INFO mapreduce.Job: map 100% reduce 63%
> 14/04/14 11:48:51 INFO mapreduce.Job: map 100% reduce 75%
> 14/04/14 11:48:52 INFO mapreduce.Job: map 100% reduce 88%
> 14/04/14 11:48:58 INFO mapreduce.Job: Task Id :
> attempt_1397454060494_0002_r_000005_1, Status : FAILED
> 14/04/14 11:49:05 INFO mapreduce.Job: map 100% reduce 100%
> 14/04/14 11:49:06 INFO mapreduce.Job: Job job_1397454060494_0002 completed
> successfully
> 14/04/14 11:49:06 INFO mapreduce.Job: Counters: 52
> File System Counters
> FILE: Number of bytes read=112
> FILE: Number of bytes written=767175
> FILE: Number of read operations=0
> FILE: Number of large read operations=0
> FILE: Number of write operations=0
> HDFS: Number of bytes read=175
> HDFS: Number of bytes written=40
> HDFS: Number of read operations=27
> HDFS: Number of large read operations=0
> HDFS: Number of write operations=16
> Job Counters
> Failed map tasks=3
> Failed reduce tasks=5
> Launched map tasks=4
> Launched reduce tasks=13
> Other local map tasks=3
> Data-local map tasks=1
> Total time spent by all maps in occupied slots (ms)=41629
> Total time spent by all reduces in occupied slots (ms)=104530
> Total time spent by all map tasks (ms)=41629
> Total time spent by all reduce tasks (ms)=104530
> Total vcore-seconds taken by all map tasks=41629
> Total vcore-seconds taken by all reduce tasks=104530
> Total megabyte-seconds taken by all map tasks=42628096
> Total megabyte-seconds taken by all reduce tasks=107038720
> Map-Reduce Framework
> Map input records=3
> Map output records=13
> Map output bytes=110
> Map output materialized bytes=112
> Input split bytes=117
> Combine input records=13
> Combine output records=6
> Reduce input groups=6
> Reduce shuffle bytes=112
> Reduce input records=6
> Reduce output records=6
> Spilled Records=12
> Shuffled Maps =8
> Failed Shuffles=0
> Merged Map outputs=8
> GC time elapsed (ms)=186
> CPU time spent (ms)=8890
> Physical memory (bytes) snapshot=1408913408
> Virtual memory (bytes) snapshot=5727019008
> Total committed heap usage (bytes)=1808990208
> Shuffle Errors
> BAD_ID=0
> CONNECTION=0
> IO_ERROR=0
> WRONG_LENGTH=0
> WRONG_MAP=0
> WRONG_REDUCE=0
> File Input Format Counters
> Bytes Read=58
> File Output Format Counters
> Bytes Written=40
>
> Thanks and Regards,
> -Rahul Singh
>
--
Regards
Shengjun
Re: hadoop inconsistent behaviour
Posted by Gordon Wang <gw...@gopivotal.com>.
Hi Rahul,
What is the log of reduce container ? Please paste the log and we can see
the reason.
On Mon, Apr 14, 2014 at 2:38 PM, Rahul Singh <sm...@gmail.com>wrote:
> Hi,
> I am running a job(wordcount example) on 3 node cluster(1 master and 2
> slave), some times the job passes but some times it fails(as reduce fails,
> input data few kbs).
> I am not able to nail down the reason of this inconsistency.
>
>
>
> failed log:
>
> 14/04/14 11:57:24 WARN util.NativeCodeLoader: Unable to load native-hadoop
> library for your platform... using builtin-java classes where applicable
> 14/04/14 11:57:25 INFO client.RMProxy: Connecting to ResourceManager at /
> 20.0.1.206:8032
> 14/04/14 11:57:26 INFO input.FileInputFormat: Total input paths to process
> : 1
> 14/04/14 11:57:26 INFO mapreduce.JobSubmitter: number of splits:1
> 14/04/14 11:57:26 INFO mapreduce.JobSubmitter: Submitting tokens for job:
> job_1397454060494_0003
> 14/04/14 11:57:26 INFO impl.YarnClientImpl: Submitted application
> application_1397454060494_0003
> 14/04/14 11:57:26 INFO mapreduce.Job: The url to track the job:
> http://20.0.1.206:8088/proxy/application_1397454060494_0003/
> 14/04/14 11:57:26 INFO mapreduce.Job: Running job: job_1397454060494_0003
> 14/04/14 11:57:34 INFO mapreduce.Job: Job job_1397454060494_0003 running
> in uber mode : false
> 14/04/14 11:57:34 INFO mapreduce.Job: map 0% reduce 0%
> 14/04/14 11:57:40 INFO mapreduce.Job: map 100% reduce 0%
> 14/04/14 11:57:46 INFO mapreduce.Job: map 100% reduce 13%
> 14/04/14 11:57:48 INFO mapreduce.Job: map 100% reduce 25%
> 14/04/14 11:57:49 INFO mapreduce.Job: map 100% reduce 38%
> 14/04/14 11:57:50 INFO mapreduce.Job: map 100% reduce 50%
> 14/04/14 11:57:54 INFO mapreduce.Job: Task Id :
> attempt_1397454060494_0003_r_000003_0, Status : FAILED
> 14/04/14 11:57:54 INFO mapreduce.Job: Task Id :
> attempt_1397454060494_0003_r_000001_0, Status : FAILED
> 14/04/14 11:57:56 INFO mapreduce.Job: Task Id :
> attempt_1397454060494_0003_r_000005_0, Status : FAILED
> 14/04/14 11:57:56 INFO mapreduce.Job: Task Id :
> attempt_1397454060494_0003_r_000007_0, Status : FAILED
> 14/04/14 11:58:02 INFO mapreduce.Job: map 100% reduce 63%
> 14/04/14 11:58:04 INFO mapreduce.Job: map 100% reduce 75%
> 14/04/14 11:58:09 INFO mapreduce.Job: Task Id :
> attempt_1397454060494_0003_r_000003_1, Status : FAILED
> 14/04/14 11:58:11 INFO mapreduce.Job: Task Id :
> attempt_1397454060494_0003_r_000005_1, Status : FAILED
> 14/04/14 11:58:24 INFO mapreduce.Job: Task Id :
> attempt_1397454060494_0003_r_000003_2, Status : FAILED
> 14/04/14 11:58:26 INFO mapreduce.Job: Task Id :
> attempt_1397454060494_0003_r_000005_2, Status : FAILED
> 14/04/14 11:58:40 INFO mapreduce.Job: map 100% reduce 100%
> 14/04/14 11:58:40 INFO mapreduce.Job: Job job_1397454060494_0003 failed
> with state FAILED due to: Task failed task_1397454060494_0003_r_000003
> Job failed as tasks failed. failedMaps:0 failedReduces:1
>
> 14/04/14 11:58:40 INFO mapreduce.Job: Counters: 51
> File System Counters
> FILE: Number of bytes read=80
> FILE: Number of bytes written=596766
> FILE: Number of read operations=0
> FILE: Number of large read operations=0
> FILE: Number of write operations=0
> HDFS: Number of bytes read=175
> HDFS: Number of bytes written=28
> HDFS: Number of read operations=21
> HDFS: Number of large read operations=0
> HDFS: Number of write operations=12
> Job Counters
> Failed reduce tasks=9
> Killed reduce tasks=1
> Launched map tasks=1
> Launched reduce tasks=16
> Data-local map tasks=1
> Total time spent by all maps in occupied slots (ms)=3477
> Total time spent by all reduces in occupied slots (ms)=148867
> Total time spent by all map tasks (ms)=3477
> Total time spent by all reduce tasks (ms)=148867
> Total vcore-seconds taken by all map tasks=3477
> Total vcore-seconds taken by all reduce tasks=148867
> Total megabyte-seconds taken by all map tasks=3560448
> Total megabyte-seconds taken by all reduce tasks=152439808
> Map-Reduce Framework
> Map input records=3
> Map output records=13
> Map output bytes=110
> Map output materialized bytes=112
> Input split bytes=117
> Combine input records=13
> Combine output records=6
> Reduce input groups=4
> Reduce shuffle bytes=80
> Reduce input records=4
> Reduce output records=4
> Spilled Records=10
> Shuffled Maps =6
> Failed Shuffles=0
> Merged Map outputs=6
> GC time elapsed (ms)=142
> CPU time spent (ms)=6420
> Physical memory (bytes) snapshot=1100853248
> Virtual memory (bytes) snapshot=4468314112
> Total committed heap usage (bytes)=1406992384
> Shuffle Errors
> BAD_ID=0
> CONNECTION=0
> IO_ERROR=0
> WRONG_LENGTH=0
> WRONG_MAP=0
> WRONG_REDUCE=0
> File Input Format Counters
> Bytes Read=58
> File Output Format Counters
> Bytes Written=28
>
> Job Passing Logs:
> hadoop jar ../share/hadoop/mapreduce/hadoop-mapreduce-examples-2.3.0.jar
> wordcount /user/hduser/input /user/hduser/output_wordcount9
> 14/04/14 11:47:27 WARN util.NativeCodeLoader: Unable to load native-hadoop
> library for your platform... using builtin-java classes where applicable
> 14/04/14 11:47:28 INFO client.RMProxy: Connecting to ResourceManager at /
> 20.0.1.206:8032
> 14/04/14 11:47:28 INFO input.FileInputFormat: Total input paths to process
> : 1
> 14/04/14 11:47:29 INFO mapreduce.JobSubmitter: number of splits:1
> 14/04/14 11:47:29 INFO mapreduce.JobSubmitter: Submitting tokens for job:
> job_1397454060494_0002
> 14/04/14 11:47:29 INFO impl.YarnClientImpl: Submitted application
> application_1397454060494_0002
> 14/04/14 11:47:29 INFO mapreduce.Job: The url to track the job:
> http://20.0.1.206:8088/proxy/application_1397454060494_0002/
> 14/04/14 11:47:29 INFO mapreduce.Job: Running job: job_1397454060494_0002
> 14/04/14 11:47:36 INFO mapreduce.Job: Job job_1397454060494_0002 running
> in uber mode : false
> 14/04/14 11:47:36 INFO mapreduce.Job: map 0% reduce 0%
> 14/04/14 11:47:50 INFO mapreduce.Job: Task Id :
> attempt_1397454060494_0002_m_000000_0, Status : FAILED
> 14/04/14 11:48:05 INFO mapreduce.Job: Task Id :
> attempt_1397454060494_0002_m_000000_1, Status : FAILED
> 14/04/14 11:48:20 INFO mapreduce.Job: Task Id :
> attempt_1397454060494_0002_m_000000_2, Status : FAILED
> 14/04/14 11:48:26 INFO mapreduce.Job: map 100% reduce 0%
> 14/04/14 11:48:34 INFO mapreduce.Job: map 100% reduce 13%
> 14/04/14 11:48:35 INFO mapreduce.Job: map 100% reduce 25%
> 14/04/14 11:48:37 INFO mapreduce.Job: map 100% reduce 50%
> 14/04/14 11:48:41 INFO mapreduce.Job: Task Id :
> attempt_1397454060494_0002_r_000001_0, Status : FAILED
> 14/04/14 11:48:42 INFO mapreduce.Job: Task Id :
> attempt_1397454060494_0002_r_000003_0, Status : FAILED
> 14/04/14 11:48:43 INFO mapreduce.Job: Task Id :
> attempt_1397454060494_0002_r_000005_0, Status : FAILED
> 14/04/14 11:48:44 INFO mapreduce.Job: Task Id :
> attempt_1397454060494_0002_r_000007_0, Status : FAILED
> 14/04/14 11:48:50 INFO mapreduce.Job: map 100% reduce 63%
> 14/04/14 11:48:51 INFO mapreduce.Job: map 100% reduce 75%
> 14/04/14 11:48:52 INFO mapreduce.Job: map 100% reduce 88%
> 14/04/14 11:48:58 INFO mapreduce.Job: Task Id :
> attempt_1397454060494_0002_r_000005_1, Status : FAILED
> 14/04/14 11:49:05 INFO mapreduce.Job: map 100% reduce 100%
> 14/04/14 11:49:06 INFO mapreduce.Job: Job job_1397454060494_0002 completed
> successfully
> 14/04/14 11:49:06 INFO mapreduce.Job: Counters: 52
> File System Counters
> FILE: Number of bytes read=112
> FILE: Number of bytes written=767175
> FILE: Number of read operations=0
> FILE: Number of large read operations=0
> FILE: Number of write operations=0
> HDFS: Number of bytes read=175
> HDFS: Number of bytes written=40
> HDFS: Number of read operations=27
> HDFS: Number of large read operations=0
> HDFS: Number of write operations=16
> Job Counters
> Failed map tasks=3
> Failed reduce tasks=5
> Launched map tasks=4
> Launched reduce tasks=13
> Other local map tasks=3
> Data-local map tasks=1
> Total time spent by all maps in occupied slots (ms)=41629
> Total time spent by all reduces in occupied slots (ms)=104530
> Total time spent by all map tasks (ms)=41629
> Total time spent by all reduce tasks (ms)=104530
> Total vcore-seconds taken by all map tasks=41629
> Total vcore-seconds taken by all reduce tasks=104530
> Total megabyte-seconds taken by all map tasks=42628096
> Total megabyte-seconds taken by all reduce tasks=107038720
> Map-Reduce Framework
> Map input records=3
> Map output records=13
> Map output bytes=110
> Map output materialized bytes=112
> Input split bytes=117
> Combine input records=13
> Combine output records=6
> Reduce input groups=6
> Reduce shuffle bytes=112
> Reduce input records=6
> Reduce output records=6
> Spilled Records=12
> Shuffled Maps =8
> Failed Shuffles=0
> Merged Map outputs=8
> GC time elapsed (ms)=186
> CPU time spent (ms)=8890
> Physical memory (bytes) snapshot=1408913408
> Virtual memory (bytes) snapshot=5727019008
> Total committed heap usage (bytes)=1808990208
> Shuffle Errors
> BAD_ID=0
> CONNECTION=0
> IO_ERROR=0
> WRONG_LENGTH=0
> WRONG_MAP=0
> WRONG_REDUCE=0
> File Input Format Counters
> Bytes Read=58
> File Output Format Counters
> Bytes Written=40
>
> Thanks and Regards,
> -Rahul Singh
>
--
Regards
Gordon Wang
Re: hadoop inconsistent behaviour
Posted by Shengjun Xin <sx...@gopivotal.com>.
You can check the nodemanager log
On Mon, Apr 14, 2014 at 2:38 PM, Rahul Singh <sm...@gmail.com>wrote:
> Hi,
> I am running a job(wordcount example) on 3 node cluster(1 master and 2
> slave), some times the job passes but some times it fails(as reduce fails,
> input data few kbs).
> I am not able to nail down the reason of this inconsistency.
>
>
>
> failed log:
>
> 14/04/14 11:57:24 WARN util.NativeCodeLoader: Unable to load native-hadoop
> library for your platform... using builtin-java classes where applicable
> 14/04/14 11:57:25 INFO client.RMProxy: Connecting to ResourceManager at /
> 20.0.1.206:8032
> 14/04/14 11:57:26 INFO input.FileInputFormat: Total input paths to process
> : 1
> 14/04/14 11:57:26 INFO mapreduce.JobSubmitter: number of splits:1
> 14/04/14 11:57:26 INFO mapreduce.JobSubmitter: Submitting tokens for job:
> job_1397454060494_0003
> 14/04/14 11:57:26 INFO impl.YarnClientImpl: Submitted application
> application_1397454060494_0003
> 14/04/14 11:57:26 INFO mapreduce.Job: The url to track the job:
> http://20.0.1.206:8088/proxy/application_1397454060494_0003/
> 14/04/14 11:57:26 INFO mapreduce.Job: Running job: job_1397454060494_0003
> 14/04/14 11:57:34 INFO mapreduce.Job: Job job_1397454060494_0003 running
> in uber mode : false
> 14/04/14 11:57:34 INFO mapreduce.Job: map 0% reduce 0%
> 14/04/14 11:57:40 INFO mapreduce.Job: map 100% reduce 0%
> 14/04/14 11:57:46 INFO mapreduce.Job: map 100% reduce 13%
> 14/04/14 11:57:48 INFO mapreduce.Job: map 100% reduce 25%
> 14/04/14 11:57:49 INFO mapreduce.Job: map 100% reduce 38%
> 14/04/14 11:57:50 INFO mapreduce.Job: map 100% reduce 50%
> 14/04/14 11:57:54 INFO mapreduce.Job: Task Id :
> attempt_1397454060494_0003_r_000003_0, Status : FAILED
> 14/04/14 11:57:54 INFO mapreduce.Job: Task Id :
> attempt_1397454060494_0003_r_000001_0, Status : FAILED
> 14/04/14 11:57:56 INFO mapreduce.Job: Task Id :
> attempt_1397454060494_0003_r_000005_0, Status : FAILED
> 14/04/14 11:57:56 INFO mapreduce.Job: Task Id :
> attempt_1397454060494_0003_r_000007_0, Status : FAILED
> 14/04/14 11:58:02 INFO mapreduce.Job: map 100% reduce 63%
> 14/04/14 11:58:04 INFO mapreduce.Job: map 100% reduce 75%
> 14/04/14 11:58:09 INFO mapreduce.Job: Task Id :
> attempt_1397454060494_0003_r_000003_1, Status : FAILED
> 14/04/14 11:58:11 INFO mapreduce.Job: Task Id :
> attempt_1397454060494_0003_r_000005_1, Status : FAILED
> 14/04/14 11:58:24 INFO mapreduce.Job: Task Id :
> attempt_1397454060494_0003_r_000003_2, Status : FAILED
> 14/04/14 11:58:26 INFO mapreduce.Job: Task Id :
> attempt_1397454060494_0003_r_000005_2, Status : FAILED
> 14/04/14 11:58:40 INFO mapreduce.Job: map 100% reduce 100%
> 14/04/14 11:58:40 INFO mapreduce.Job: Job job_1397454060494_0003 failed
> with state FAILED due to: Task failed task_1397454060494_0003_r_000003
> Job failed as tasks failed. failedMaps:0 failedReduces:1
>
> 14/04/14 11:58:40 INFO mapreduce.Job: Counters: 51
> File System Counters
> FILE: Number of bytes read=80
> FILE: Number of bytes written=596766
> FILE: Number of read operations=0
> FILE: Number of large read operations=0
> FILE: Number of write operations=0
> HDFS: Number of bytes read=175
> HDFS: Number of bytes written=28
> HDFS: Number of read operations=21
> HDFS: Number of large read operations=0
> HDFS: Number of write operations=12
> Job Counters
> Failed reduce tasks=9
> Killed reduce tasks=1
> Launched map tasks=1
> Launched reduce tasks=16
> Data-local map tasks=1
> Total time spent by all maps in occupied slots (ms)=3477
> Total time spent by all reduces in occupied slots (ms)=148867
> Total time spent by all map tasks (ms)=3477
> Total time spent by all reduce tasks (ms)=148867
> Total vcore-seconds taken by all map tasks=3477
> Total vcore-seconds taken by all reduce tasks=148867
> Total megabyte-seconds taken by all map tasks=3560448
> Total megabyte-seconds taken by all reduce tasks=152439808
> Map-Reduce Framework
> Map input records=3
> Map output records=13
> Map output bytes=110
> Map output materialized bytes=112
> Input split bytes=117
> Combine input records=13
> Combine output records=6
> Reduce input groups=4
> Reduce shuffle bytes=80
> Reduce input records=4
> Reduce output records=4
> Spilled Records=10
> Shuffled Maps =6
> Failed Shuffles=0
> Merged Map outputs=6
> GC time elapsed (ms)=142
> CPU time spent (ms)=6420
> Physical memory (bytes) snapshot=1100853248
> Virtual memory (bytes) snapshot=4468314112
> Total committed heap usage (bytes)=1406992384
> Shuffle Errors
> BAD_ID=0
> CONNECTION=0
> IO_ERROR=0
> WRONG_LENGTH=0
> WRONG_MAP=0
> WRONG_REDUCE=0
> File Input Format Counters
> Bytes Read=58
> File Output Format Counters
> Bytes Written=28
>
> Job Passing Logs:
> hadoop jar ../share/hadoop/mapreduce/hadoop-mapreduce-examples-2.3.0.jar
> wordcount /user/hduser/input /user/hduser/output_wordcount9
> 14/04/14 11:47:27 WARN util.NativeCodeLoader: Unable to load native-hadoop
> library for your platform... using builtin-java classes where applicable
> 14/04/14 11:47:28 INFO client.RMProxy: Connecting to ResourceManager at /
> 20.0.1.206:8032
> 14/04/14 11:47:28 INFO input.FileInputFormat: Total input paths to process
> : 1
> 14/04/14 11:47:29 INFO mapreduce.JobSubmitter: number of splits:1
> 14/04/14 11:47:29 INFO mapreduce.JobSubmitter: Submitting tokens for job:
> job_1397454060494_0002
> 14/04/14 11:47:29 INFO impl.YarnClientImpl: Submitted application
> application_1397454060494_0002
> 14/04/14 11:47:29 INFO mapreduce.Job: The url to track the job:
> http://20.0.1.206:8088/proxy/application_1397454060494_0002/
> 14/04/14 11:47:29 INFO mapreduce.Job: Running job: job_1397454060494_0002
> 14/04/14 11:47:36 INFO mapreduce.Job: Job job_1397454060494_0002 running
> in uber mode : false
> 14/04/14 11:47:36 INFO mapreduce.Job: map 0% reduce 0%
> 14/04/14 11:47:50 INFO mapreduce.Job: Task Id :
> attempt_1397454060494_0002_m_000000_0, Status : FAILED
> 14/04/14 11:48:05 INFO mapreduce.Job: Task Id :
> attempt_1397454060494_0002_m_000000_1, Status : FAILED
> 14/04/14 11:48:20 INFO mapreduce.Job: Task Id :
> attempt_1397454060494_0002_m_000000_2, Status : FAILED
> 14/04/14 11:48:26 INFO mapreduce.Job: map 100% reduce 0%
> 14/04/14 11:48:34 INFO mapreduce.Job: map 100% reduce 13%
> 14/04/14 11:48:35 INFO mapreduce.Job: map 100% reduce 25%
> 14/04/14 11:48:37 INFO mapreduce.Job: map 100% reduce 50%
> 14/04/14 11:48:41 INFO mapreduce.Job: Task Id :
> attempt_1397454060494_0002_r_000001_0, Status : FAILED
> 14/04/14 11:48:42 INFO mapreduce.Job: Task Id :
> attempt_1397454060494_0002_r_000003_0, Status : FAILED
> 14/04/14 11:48:43 INFO mapreduce.Job: Task Id :
> attempt_1397454060494_0002_r_000005_0, Status : FAILED
> 14/04/14 11:48:44 INFO mapreduce.Job: Task Id :
> attempt_1397454060494_0002_r_000007_0, Status : FAILED
> 14/04/14 11:48:50 INFO mapreduce.Job: map 100% reduce 63%
> 14/04/14 11:48:51 INFO mapreduce.Job: map 100% reduce 75%
> 14/04/14 11:48:52 INFO mapreduce.Job: map 100% reduce 88%
> 14/04/14 11:48:58 INFO mapreduce.Job: Task Id :
> attempt_1397454060494_0002_r_000005_1, Status : FAILED
> 14/04/14 11:49:05 INFO mapreduce.Job: map 100% reduce 100%
> 14/04/14 11:49:06 INFO mapreduce.Job: Job job_1397454060494_0002 completed
> successfully
> 14/04/14 11:49:06 INFO mapreduce.Job: Counters: 52
> File System Counters
> FILE: Number of bytes read=112
> FILE: Number of bytes written=767175
> FILE: Number of read operations=0
> FILE: Number of large read operations=0
> FILE: Number of write operations=0
> HDFS: Number of bytes read=175
> HDFS: Number of bytes written=40
> HDFS: Number of read operations=27
> HDFS: Number of large read operations=0
> HDFS: Number of write operations=16
> Job Counters
> Failed map tasks=3
> Failed reduce tasks=5
> Launched map tasks=4
> Launched reduce tasks=13
> Other local map tasks=3
> Data-local map tasks=1
> Total time spent by all maps in occupied slots (ms)=41629
> Total time spent by all reduces in occupied slots (ms)=104530
> Total time spent by all map tasks (ms)=41629
> Total time spent by all reduce tasks (ms)=104530
> Total vcore-seconds taken by all map tasks=41629
> Total vcore-seconds taken by all reduce tasks=104530
> Total megabyte-seconds taken by all map tasks=42628096
> Total megabyte-seconds taken by all reduce tasks=107038720
> Map-Reduce Framework
> Map input records=3
> Map output records=13
> Map output bytes=110
> Map output materialized bytes=112
> Input split bytes=117
> Combine input records=13
> Combine output records=6
> Reduce input groups=6
> Reduce shuffle bytes=112
> Reduce input records=6
> Reduce output records=6
> Spilled Records=12
> Shuffled Maps =8
> Failed Shuffles=0
> Merged Map outputs=8
> GC time elapsed (ms)=186
> CPU time spent (ms)=8890
> Physical memory (bytes) snapshot=1408913408
> Virtual memory (bytes) snapshot=5727019008
> Total committed heap usage (bytes)=1808990208
> Shuffle Errors
> BAD_ID=0
> CONNECTION=0
> IO_ERROR=0
> WRONG_LENGTH=0
> WRONG_MAP=0
> WRONG_REDUCE=0
> File Input Format Counters
> Bytes Read=58
> File Output Format Counters
> Bytes Written=40
>
> Thanks and Regards,
> -Rahul Singh
>
--
Regards
Shengjun