You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hadoop.apache.org by Robert Dyer <ps...@gmail.com> on 2013/10/22 06:55:45 UTC

Hadoop 2.2.0 MR tasks failing

I recently setup a 2.2.0 test cluster.  For some reason, all of my MR jobs
are failing.  The maps and reduces all run to completion, without any
errors.  Yet the app is marked failed and there is no final output.  Any
ideas?

Application Type: MAPREDUCE
State: FINISHED
FinalStatus: FAILED
Diagnostics: We crashed durring a commit

I notice in the logs this (but not sure what to make of it):

2013-10-21 23:42:41,379 INFO
org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl:
Memory usage of ProcessTree 789 for container-id
container_1382415258498_0002_01_000001: 250.4 MB of 2 GB physical
memory used; 2.0 GB of 6 GB virtual memory used
2013-10-21 23:42:41,743 WARN
org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor:
Exit code from container container_1382415258498_0002_01_000001 is :
255
2013-10-21 23:42:41,744 WARN
org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor:
Exception from container-launch with container ID:
container_1382415258498_0002_01_000001 and exit code: 255
org.apache.hadoop.util.Shell$ExitCodeException:

2013-10-21 23:42:41,746 INFO
org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor:
2013-10-21 23:42:41,747 WARN
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch:
Container exited with a non-zero exit code 255
2013-10-21 23:42:41,747 INFO
org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container:
Container container_1382415258498_0002_01_000001 transitioned from
RUNNING to EXITED_WITH_FAILURE
2013-10-21 23:42:41,747 INFO
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch:
Cleaning up container container_1382415258498_0002_01_000001
2013-10-21 23:42:41,764 INFO
org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor:
Deleting absolute path :
/hadoop/hadoop-2.2.0/cluster-data/usercache/hadoop/appcache/application_1382415258498_0002/container_1382415258498_0002_01_000001
2013-10-21 23:42:41,765 WARN
org.apache.hadoop.yarn.server.nodemanager.NMAuditLogger:
USER=hadoop	OPERATION=Container Finished -
Failed	TARGET=ContainerImpl	RESULT=FAILURE	DESCRIPTION=Container
failed with state:
EXITED_WITH_FAILURE	APPID=application_1382415258498_0002	CONTAINERID=container_1382415258498_0002_01_000001

Re: Hadoop 2.2.0 MR tasks failing

Posted by Robert Dyer <ps...@gmail.com>.

So does anyone have any ideas how to track this down?

Is it perhaps an exception somewhere in an output committer that is being
swallowed and not showing up in the logs?

On Tue, Oct 22, 2013 at 2:19 AM, Robert Dyer <rd...@iastate.edu> wrote:

> The logs for the maps and reduces show nothing useful.  There are a ton of
> warnings about deprecated and final config values, but the task runs and
> seems to finish without error.  The only errors I've found in logs are the
> ones I posted above, which were in the NodeManager log files.
>
> Here's an example map log:
>
> 2013-10-21 23:14:57,241 INFO [main] org.apache.hadoop.mapred.MapTask: Map
> output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer
> 2013-10-21 23:14:57,337 INFO [main] org.apache.hadoop.mapred.MapTask:
> (EQUATOR) 0 kvi 26214396(104857584)
> 2013-10-21 23:14:57,337 INFO [main] org.apache.hadoop.mapred.MapTask:
> mapreduce.task.io.sort.mb: 100
> 2013-10-21 23:14:57,337 INFO [main] org.apache.hadoop.mapred.MapTask: soft
> limit at 83886080
> 2013-10-21 23:14:57,337 INFO [main] org.apache.hadoop.mapred.MapTask:
> bufstart = 0; bufvoid = 104857600
> 2013-10-21 23:14:57,337 INFO [main] org.apache.hadoop.mapred.MapTask:
> kvstart = 26214396; length = 6553600
> 2013-10-21 23:14:57,392 INFO [main]
> org.apache.hadoop.io.compress.zlib.ZlibFactory: Successfully loaded &
> initialized native-zlib library
> 2013-10-21 23:14:57,392 INFO [main]
> org.apache.hadoop.io.compress.CodecPool: Got brand-new decompressor
> [.deflate]
> 2013-10-21 23:15:08,610 INFO [main] org.apache.hadoop.mapred.MapTask:
> Starting flush of map output
> 2013-10-21 23:15:08,610 INFO [main] org.apache.hadoop.mapred.MapTask:
> Spilling map output
> 2013-10-21 23:15:08,611 INFO [main] org.apache.hadoop.mapred.MapTask:
> bufstart = 0; bufend = 204512; bufvoid = 104857600
> 2013-10-21 23:15:08,611 INFO [main] org.apache.hadoop.mapred.MapTask:
> kvstart = 26214396(104857584); kvend = 26182336(104729344); length =
> 32061/6553600
> 2013-10-21 23:15:08,722 INFO [main]
> org.apache.hadoop.io.compress.CodecPool: Got brand-new compressor [.snappy]
> 2013-10-21 23:15:08,856 INFO [main] org.apache.hadoop.mapred.MapTask:
> Finished spill 0
> 2013-10-21 23:15:08,859 INFO [main] org.apache.hadoop.mapred.Task:
> Task:attempt_1382415258498_0001_m_000014_0 is done. And is in the process
> of committing
> 2013-10-21 23:15:08,896 INFO [main] org.apache.hadoop.mapred.Task: Task
> 'attempt_1382415258498_0001_m_000014_0' done.
>
>
>
> On Tue, Oct 22, 2013 at 12:16 AM, Arun C Murthy <ac...@hortonworks.com>wrote:
>
>> If you follow the links on the web-ui to the logs of the map/reduce
>> tasks, what do you see there?
>>
>> Arun
>>
>> On Oct 21, 2013, at 9:55 PM, Robert Dyer <ps...@gmail.com> wrote:
>>
>> I recently setup a 2.2.0 test cluster.  For some reason, all of my MR
>> jobs are failing.  The maps and reduces all run to completion, without any
>> errors.  Yet the app is marked failed and there is no final output.  Any
>> ideas?
>>
>> Application Type: MAPREDUCE
>> State: FINISHED
>> FinalStatus: FAILED
>> Diagnostics: We crashed durring a commit
>>
>> I notice in the logs this (but not sure what to make of it):
>>
>> 2013-10-21 23:42:41,379 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: Memory usage of ProcessTree 789 for container-id container_1382415258498_0002_01_000001: 250.4 MB of 2 GB physical memory used; 2.0 GB of 6 GB virtual memory used
>> 2013-10-21 23:42:41,743 WARN org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Exit code from container container_1382415258498_0002_01_000001 is : 255
>> 2013-10-21 23:42:41,744 WARN org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Exception from container-launch with container ID: container_1382415258498_0002_01_000001 and exit code: 255
>> org.apache.hadoop.util.Shell$ExitCodeException:
>>
>> 2013-10-21 23:42:41,746 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor:
>> 2013-10-21 23:42:41,747 WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch: Container exited with a non-zero exit code 255
>> 2013-10-21 23:42:41,747 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: Container container_1382415258498_0002_01_000001 transitioned from RUNNING to EXITED_WITH_FAILURE
>> 2013-10-21 23:42:41,747 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch: Cleaning up container container_1382415258498_0002_01_000001
>> 2013-10-21 23:42:41,764 INFO org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Deleting absolute path : /hadoop/hadoop-2.2.0/cluster-data/usercache/hadoop/appcache/application_1382415258498_0002/container_1382415258498_0002_01_000001
>> 2013-10-21 23:42:41,765 WARN org.apache.hadoop.yarn.server.nodemanager.NMAuditLogger: USER=hadoop	OPERATION=Container Finished - Failed	TARGET=ContainerImpl	RESULT=FAILURE	DESCRIPTION=Container failed with state: EXITED_WITH_FAILURE	APPID=application_1382415258498_0002	CONTAINERID=container_1382415258498_0002_01_000001
>>
>>
>>  --
>> Arun C. Murthy
>> Hortonworks Inc.
>> http://hortonworks.com/
>>
>>
>>
>> CONFIDENTIALITY NOTICE
>> NOTICE: This message is intended for the use of the individual or entity
>> to which it is addressed and may contain information that is confidential,
>> privileged and exempt from disclosure under applicable law. If the reader
>> of this message is not the intended recipient, you are hereby notified that
>> any printing, copying, dissemination, distribution, disclosure or
>> forwarding of this communication is strictly prohibited. If you have
>> received this communication in error, please contact the sender immediately
>> and delete it from your system. Thank You.
>
>

Re: Hadoop 2.2.0 MR tasks failing

Posted by Robert Dyer <ps...@gmail.com>.

So does anyone have any ideas how to track this down?

Is it perhaps an exception somewhere in an output committer that is being
swallowed and not showing up in the logs?

On Tue, Oct 22, 2013 at 2:19 AM, Robert Dyer <rd...@iastate.edu> wrote:

> The logs for the maps and reduces show nothing useful.  There are a ton of
> warnings about deprecated and final config values, but the task runs and
> seems to finish without error.  The only errors I've found in logs are the
> ones I posted above, which were in the NodeManager log files.
>
> Here's an example map log:
>
> 2013-10-21 23:14:57,241 INFO [main] org.apache.hadoop.mapred.MapTask: Map
> output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer
> 2013-10-21 23:14:57,337 INFO [main] org.apache.hadoop.mapred.MapTask:
> (EQUATOR) 0 kvi 26214396(104857584)
> 2013-10-21 23:14:57,337 INFO [main] org.apache.hadoop.mapred.MapTask:
> mapreduce.task.io.sort.mb: 100
> 2013-10-21 23:14:57,337 INFO [main] org.apache.hadoop.mapred.MapTask: soft
> limit at 83886080
> 2013-10-21 23:14:57,337 INFO [main] org.apache.hadoop.mapred.MapTask:
> bufstart = 0; bufvoid = 104857600
> 2013-10-21 23:14:57,337 INFO [main] org.apache.hadoop.mapred.MapTask:
> kvstart = 26214396; length = 6553600
> 2013-10-21 23:14:57,392 INFO [main]
> org.apache.hadoop.io.compress.zlib.ZlibFactory: Successfully loaded &
> initialized native-zlib library
> 2013-10-21 23:14:57,392 INFO [main]
> org.apache.hadoop.io.compress.CodecPool: Got brand-new decompressor
> [.deflate]
> 2013-10-21 23:15:08,610 INFO [main] org.apache.hadoop.mapred.MapTask:
> Starting flush of map output
> 2013-10-21 23:15:08,610 INFO [main] org.apache.hadoop.mapred.MapTask:
> Spilling map output
> 2013-10-21 23:15:08,611 INFO [main] org.apache.hadoop.mapred.MapTask:
> bufstart = 0; bufend = 204512; bufvoid = 104857600
> 2013-10-21 23:15:08,611 INFO [main] org.apache.hadoop.mapred.MapTask:
> kvstart = 26214396(104857584); kvend = 26182336(104729344); length =
> 32061/6553600
> 2013-10-21 23:15:08,722 INFO [main]
> org.apache.hadoop.io.compress.CodecPool: Got brand-new compressor [.snappy]
> 2013-10-21 23:15:08,856 INFO [main] org.apache.hadoop.mapred.MapTask:
> Finished spill 0
> 2013-10-21 23:15:08,859 INFO [main] org.apache.hadoop.mapred.Task:
> Task:attempt_1382415258498_0001_m_000014_0 is done. And is in the process
> of committing
> 2013-10-21 23:15:08,896 INFO [main] org.apache.hadoop.mapred.Task: Task
> 'attempt_1382415258498_0001_m_000014_0' done.
>
>
>
> On Tue, Oct 22, 2013 at 12:16 AM, Arun C Murthy <ac...@hortonworks.com>wrote:
>
>> If you follow the links on the web-ui to the logs of the map/reduce
>> tasks, what do you see there?
>>
>> Arun
>>
>> On Oct 21, 2013, at 9:55 PM, Robert Dyer <ps...@gmail.com> wrote:
>>
>> I recently setup a 2.2.0 test cluster.  For some reason, all of my MR
>> jobs are failing.  The maps and reduces all run to completion, without any
>> errors.  Yet the app is marked failed and there is no final output.  Any
>> ideas?
>>
>> Application Type: MAPREDUCE
>> State: FINISHED
>> FinalStatus: FAILED
>> Diagnostics: We crashed durring a commit
>>
>> I notice in the logs this (but not sure what to make of it):
>>
>> 2013-10-21 23:42:41,379 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: Memory usage of ProcessTree 789 for container-id container_1382415258498_0002_01_000001: 250.4 MB of 2 GB physical memory used; 2.0 GB of 6 GB virtual memory used
>> 2013-10-21 23:42:41,743 WARN org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Exit code from container container_1382415258498_0002_01_000001 is : 255
>> 2013-10-21 23:42:41,744 WARN org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Exception from container-launch with container ID: container_1382415258498_0002_01_000001 and exit code: 255
>> org.apache.hadoop.util.Shell$ExitCodeException:
>>
>> 2013-10-21 23:42:41,746 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor:
>> 2013-10-21 23:42:41,747 WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch: Container exited with a non-zero exit code 255
>> 2013-10-21 23:42:41,747 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: Container container_1382415258498_0002_01_000001 transitioned from RUNNING to EXITED_WITH_FAILURE
>> 2013-10-21 23:42:41,747 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch: Cleaning up container container_1382415258498_0002_01_000001
>> 2013-10-21 23:42:41,764 INFO org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Deleting absolute path : /hadoop/hadoop-2.2.0/cluster-data/usercache/hadoop/appcache/application_1382415258498_0002/container_1382415258498_0002_01_000001
>> 2013-10-21 23:42:41,765 WARN org.apache.hadoop.yarn.server.nodemanager.NMAuditLogger: USER=hadoop	OPERATION=Container Finished - Failed	TARGET=ContainerImpl	RESULT=FAILURE	DESCRIPTION=Container failed with state: EXITED_WITH_FAILURE	APPID=application_1382415258498_0002	CONTAINERID=container_1382415258498_0002_01_000001
>>
>>
>>  --
>> Arun C. Murthy
>> Hortonworks Inc.
>> http://hortonworks.com/
>>
>>
>>
>> CONFIDENTIALITY NOTICE
>> NOTICE: This message is intended for the use of the individual or entity
>> to which it is addressed and may contain information that is confidential,
>> privileged and exempt from disclosure under applicable law. If the reader
>> of this message is not the intended recipient, you are hereby notified that
>> any printing, copying, dissemination, distribution, disclosure or
>> forwarding of this communication is strictly prohibited. If you have
>> received this communication in error, please contact the sender immediately
>> and delete it from your system. Thank You.
>
>

Re: Hadoop 2.2.0 MR tasks failing

Posted by Robert Dyer <ps...@gmail.com>.

So does anyone have any ideas how to track this down?

Is it perhaps an exception somewhere in an output committer that is being
swallowed and not showing up in the logs?

On Tue, Oct 22, 2013 at 2:19 AM, Robert Dyer <rd...@iastate.edu> wrote:

> The logs for the maps and reduces show nothing useful.  There are a ton of
> warnings about deprecated and final config values, but the task runs and
> seems to finish without error.  The only errors I've found in logs are the
> ones I posted above, which were in the NodeManager log files.
>
> Here's an example map log:
>
> 2013-10-21 23:14:57,241 INFO [main] org.apache.hadoop.mapred.MapTask: Map
> output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer
> 2013-10-21 23:14:57,337 INFO [main] org.apache.hadoop.mapred.MapTask:
> (EQUATOR) 0 kvi 26214396(104857584)
> 2013-10-21 23:14:57,337 INFO [main] org.apache.hadoop.mapred.MapTask:
> mapreduce.task.io.sort.mb: 100
> 2013-10-21 23:14:57,337 INFO [main] org.apache.hadoop.mapred.MapTask: soft
> limit at 83886080
> 2013-10-21 23:14:57,337 INFO [main] org.apache.hadoop.mapred.MapTask:
> bufstart = 0; bufvoid = 104857600
> 2013-10-21 23:14:57,337 INFO [main] org.apache.hadoop.mapred.MapTask:
> kvstart = 26214396; length = 6553600
> 2013-10-21 23:14:57,392 INFO [main]
> org.apache.hadoop.io.compress.zlib.ZlibFactory: Successfully loaded &
> initialized native-zlib library
> 2013-10-21 23:14:57,392 INFO [main]
> org.apache.hadoop.io.compress.CodecPool: Got brand-new decompressor
> [.deflate]
> 2013-10-21 23:15:08,610 INFO [main] org.apache.hadoop.mapred.MapTask:
> Starting flush of map output
> 2013-10-21 23:15:08,610 INFO [main] org.apache.hadoop.mapred.MapTask:
> Spilling map output
> 2013-10-21 23:15:08,611 INFO [main] org.apache.hadoop.mapred.MapTask:
> bufstart = 0; bufend = 204512; bufvoid = 104857600
> 2013-10-21 23:15:08,611 INFO [main] org.apache.hadoop.mapred.MapTask:
> kvstart = 26214396(104857584); kvend = 26182336(104729344); length =
> 32061/6553600
> 2013-10-21 23:15:08,722 INFO [main]
> org.apache.hadoop.io.compress.CodecPool: Got brand-new compressor [.snappy]
> 2013-10-21 23:15:08,856 INFO [main] org.apache.hadoop.mapred.MapTask:
> Finished spill 0
> 2013-10-21 23:15:08,859 INFO [main] org.apache.hadoop.mapred.Task:
> Task:attempt_1382415258498_0001_m_000014_0 is done. And is in the process
> of committing
> 2013-10-21 23:15:08,896 INFO [main] org.apache.hadoop.mapred.Task: Task
> 'attempt_1382415258498_0001_m_000014_0' done.
>
>
>
> On Tue, Oct 22, 2013 at 12:16 AM, Arun C Murthy <ac...@hortonworks.com>wrote:
>
>> If you follow the links on the web-ui to the logs of the map/reduce
>> tasks, what do you see there?
>>
>> Arun
>>
>> On Oct 21, 2013, at 9:55 PM, Robert Dyer <ps...@gmail.com> wrote:
>>
>> I recently setup a 2.2.0 test cluster.  For some reason, all of my MR
>> jobs are failing.  The maps and reduces all run to completion, without any
>> errors.  Yet the app is marked failed and there is no final output.  Any
>> ideas?
>>
>> Application Type: MAPREDUCE
>> State: FINISHED
>> FinalStatus: FAILED
>> Diagnostics: We crashed durring a commit
>>
>> I notice in the logs this (but not sure what to make of it):
>>
>> 2013-10-21 23:42:41,379 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: Memory usage of ProcessTree 789 for container-id container_1382415258498_0002_01_000001: 250.4 MB of 2 GB physical memory used; 2.0 GB of 6 GB virtual memory used
>> 2013-10-21 23:42:41,743 WARN org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Exit code from container container_1382415258498_0002_01_000001 is : 255
>> 2013-10-21 23:42:41,744 WARN org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Exception from container-launch with container ID: container_1382415258498_0002_01_000001 and exit code: 255
>> org.apache.hadoop.util.Shell$ExitCodeException:
>>
>> 2013-10-21 23:42:41,746 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor:
>> 2013-10-21 23:42:41,747 WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch: Container exited with a non-zero exit code 255
>> 2013-10-21 23:42:41,747 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: Container container_1382415258498_0002_01_000001 transitioned from RUNNING to EXITED_WITH_FAILURE
>> 2013-10-21 23:42:41,747 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch: Cleaning up container container_1382415258498_0002_01_000001
>> 2013-10-21 23:42:41,764 INFO org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Deleting absolute path : /hadoop/hadoop-2.2.0/cluster-data/usercache/hadoop/appcache/application_1382415258498_0002/container_1382415258498_0002_01_000001
>> 2013-10-21 23:42:41,765 WARN org.apache.hadoop.yarn.server.nodemanager.NMAuditLogger: USER=hadoop	OPERATION=Container Finished - Failed	TARGET=ContainerImpl	RESULT=FAILURE	DESCRIPTION=Container failed with state: EXITED_WITH_FAILURE	APPID=application_1382415258498_0002	CONTAINERID=container_1382415258498_0002_01_000001
>>
>>
>>  --
>> Arun C. Murthy
>> Hortonworks Inc.
>> http://hortonworks.com/
>>
>>
>>
>> CONFIDENTIALITY NOTICE
>> NOTICE: This message is intended for the use of the individual or entity
>> to which it is addressed and may contain information that is confidential,
>> privileged and exempt from disclosure under applicable law. If the reader
>> of this message is not the intended recipient, you are hereby notified that
>> any printing, copying, dissemination, distribution, disclosure or
>> forwarding of this communication is strictly prohibited. If you have
>> received this communication in error, please contact the sender immediately
>> and delete it from your system. Thank You.
>
>

Re: Hadoop 2.2.0 MR tasks failing

Posted by Robert Dyer <ps...@gmail.com>.

So does anyone have any ideas how to track this down?

Is it perhaps an exception somewhere in an output committer that is being
swallowed and not showing up in the logs?

On Tue, Oct 22, 2013 at 2:19 AM, Robert Dyer <rd...@iastate.edu> wrote:

> The logs for the maps and reduces show nothing useful.  There are a ton of
> warnings about deprecated and final config values, but the task runs and
> seems to finish without error.  The only errors I've found in logs are the
> ones I posted above, which were in the NodeManager log files.
>
> Here's an example map log:
>
> 2013-10-21 23:14:57,241 INFO [main] org.apache.hadoop.mapred.MapTask: Map
> output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer
> 2013-10-21 23:14:57,337 INFO [main] org.apache.hadoop.mapred.MapTask:
> (EQUATOR) 0 kvi 26214396(104857584)
> 2013-10-21 23:14:57,337 INFO [main] org.apache.hadoop.mapred.MapTask:
> mapreduce.task.io.sort.mb: 100
> 2013-10-21 23:14:57,337 INFO [main] org.apache.hadoop.mapred.MapTask: soft
> limit at 83886080
> 2013-10-21 23:14:57,337 INFO [main] org.apache.hadoop.mapred.MapTask:
> bufstart = 0; bufvoid = 104857600
> 2013-10-21 23:14:57,337 INFO [main] org.apache.hadoop.mapred.MapTask:
> kvstart = 26214396; length = 6553600
> 2013-10-21 23:14:57,392 INFO [main]
> org.apache.hadoop.io.compress.zlib.ZlibFactory: Successfully loaded &
> initialized native-zlib library
> 2013-10-21 23:14:57,392 INFO [main]
> org.apache.hadoop.io.compress.CodecPool: Got brand-new decompressor
> [.deflate]
> 2013-10-21 23:15:08,610 INFO [main] org.apache.hadoop.mapred.MapTask:
> Starting flush of map output
> 2013-10-21 23:15:08,610 INFO [main] org.apache.hadoop.mapred.MapTask:
> Spilling map output
> 2013-10-21 23:15:08,611 INFO [main] org.apache.hadoop.mapred.MapTask:
> bufstart = 0; bufend = 204512; bufvoid = 104857600
> 2013-10-21 23:15:08,611 INFO [main] org.apache.hadoop.mapred.MapTask:
> kvstart = 26214396(104857584); kvend = 26182336(104729344); length =
> 32061/6553600
> 2013-10-21 23:15:08,722 INFO [main]
> org.apache.hadoop.io.compress.CodecPool: Got brand-new compressor [.snappy]
> 2013-10-21 23:15:08,856 INFO [main] org.apache.hadoop.mapred.MapTask:
> Finished spill 0
> 2013-10-21 23:15:08,859 INFO [main] org.apache.hadoop.mapred.Task:
> Task:attempt_1382415258498_0001_m_000014_0 is done. And is in the process
> of committing
> 2013-10-21 23:15:08,896 INFO [main] org.apache.hadoop.mapred.Task: Task
> 'attempt_1382415258498_0001_m_000014_0' done.
>
>
>
> On Tue, Oct 22, 2013 at 12:16 AM, Arun C Murthy <ac...@hortonworks.com>wrote:
>
>> If you follow the links on the web-ui to the logs of the map/reduce
>> tasks, what do you see there?
>>
>> Arun
>>
>> On Oct 21, 2013, at 9:55 PM, Robert Dyer <ps...@gmail.com> wrote:
>>
>> I recently setup a 2.2.0 test cluster.  For some reason, all of my MR
>> jobs are failing.  The maps and reduces all run to completion, without any
>> errors.  Yet the app is marked failed and there is no final output.  Any
>> ideas?
>>
>> Application Type: MAPREDUCE
>> State: FINISHED
>> FinalStatus: FAILED
>> Diagnostics: We crashed durring a commit
>>
>> I notice in the logs this (but not sure what to make of it):
>>
>> 2013-10-21 23:42:41,379 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: Memory usage of ProcessTree 789 for container-id container_1382415258498_0002_01_000001: 250.4 MB of 2 GB physical memory used; 2.0 GB of 6 GB virtual memory used
>> 2013-10-21 23:42:41,743 WARN org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Exit code from container container_1382415258498_0002_01_000001 is : 255
>> 2013-10-21 23:42:41,744 WARN org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Exception from container-launch with container ID: container_1382415258498_0002_01_000001 and exit code: 255
>> org.apache.hadoop.util.Shell$ExitCodeException:
>>
>> 2013-10-21 23:42:41,746 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor:
>> 2013-10-21 23:42:41,747 WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch: Container exited with a non-zero exit code 255
>> 2013-10-21 23:42:41,747 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: Container container_1382415258498_0002_01_000001 transitioned from RUNNING to EXITED_WITH_FAILURE
>> 2013-10-21 23:42:41,747 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch: Cleaning up container container_1382415258498_0002_01_000001
>> 2013-10-21 23:42:41,764 INFO org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Deleting absolute path : /hadoop/hadoop-2.2.0/cluster-data/usercache/hadoop/appcache/application_1382415258498_0002/container_1382415258498_0002_01_000001
>> 2013-10-21 23:42:41,765 WARN org.apache.hadoop.yarn.server.nodemanager.NMAuditLogger: USER=hadoop	OPERATION=Container Finished - Failed	TARGET=ContainerImpl	RESULT=FAILURE	DESCRIPTION=Container failed with state: EXITED_WITH_FAILURE	APPID=application_1382415258498_0002	CONTAINERID=container_1382415258498_0002_01_000001
>>
>>
>>  --
>> Arun C. Murthy
>> Hortonworks Inc.
>> http://hortonworks.com/
>>
>>
>>
>> CONFIDENTIALITY NOTICE
>> NOTICE: This message is intended for the use of the individual or entity
>> to which it is addressed and may contain information that is confidential,
>> privileged and exempt from disclosure under applicable law. If the reader
>> of this message is not the intended recipient, you are hereby notified that
>> any printing, copying, dissemination, distribution, disclosure or
>> forwarding of this communication is strictly prohibited. If you have
>> received this communication in error, please contact the sender immediately
>> and delete it from your system. Thank You.
>
>

Re: Hadoop 2.2.0 MR tasks failing

Posted by Robert Dyer <rd...@iastate.edu>.

The logs for the maps and reduces show nothing useful.  There are a ton of
warnings about deprecated and final config values, but the task runs and
seems to finish without error.  The only errors I've found in logs are the
ones I posted above, which were in the NodeManager log files.

Here's an example map log:

2013-10-21 23:14:57,241 INFO [main] org.apache.hadoop.mapred.MapTask: Map
output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer
2013-10-21 23:14:57,337 INFO [main] org.apache.hadoop.mapred.MapTask:
(EQUATOR) 0 kvi 26214396(104857584)
2013-10-21 23:14:57,337 INFO [main] org.apache.hadoop.mapred.MapTask:
mapreduce.task.io.sort.mb: 100
2013-10-21 23:14:57,337 INFO [main] org.apache.hadoop.mapred.MapTask: soft
limit at 83886080
2013-10-21 23:14:57,337 INFO [main] org.apache.hadoop.mapred.MapTask:
bufstart = 0; bufvoid = 104857600
2013-10-21 23:14:57,337 INFO [main] org.apache.hadoop.mapred.MapTask:
kvstart = 26214396; length = 6553600
2013-10-21 23:14:57,392 INFO [main]
org.apache.hadoop.io.compress.zlib.ZlibFactory: Successfully loaded &
initialized native-zlib library
2013-10-21 23:14:57,392 INFO [main]
org.apache.hadoop.io.compress.CodecPool: Got brand-new decompressor
[.deflate]
2013-10-21 23:15:08,610 INFO [main] org.apache.hadoop.mapred.MapTask:
Starting flush of map output
2013-10-21 23:15:08,610 INFO [main] org.apache.hadoop.mapred.MapTask:
Spilling map output
2013-10-21 23:15:08,611 INFO [main] org.apache.hadoop.mapred.MapTask:
bufstart = 0; bufend = 204512; bufvoid = 104857600
2013-10-21 23:15:08,611 INFO [main] org.apache.hadoop.mapred.MapTask:
kvstart = 26214396(104857584); kvend = 26182336(104729344); length =
32061/6553600
2013-10-21 23:15:08,722 INFO [main]
org.apache.hadoop.io.compress.CodecPool: Got brand-new compressor [.snappy]
2013-10-21 23:15:08,856 INFO [main] org.apache.hadoop.mapred.MapTask:
Finished spill 0
2013-10-21 23:15:08,859 INFO [main] org.apache.hadoop.mapred.Task:
Task:attempt_1382415258498_0001_m_000014_0 is done. And is in the process
of committing
2013-10-21 23:15:08,896 INFO [main] org.apache.hadoop.mapred.Task: Task
'attempt_1382415258498_0001_m_000014_0' done.



On Tue, Oct 22, 2013 at 12:16 AM, Arun C Murthy <ac...@hortonworks.com> wrote:

> If you follow the links on the web-ui to the logs of the map/reduce tasks,
> what do you see there?
>
> Arun
>
> On Oct 21, 2013, at 9:55 PM, Robert Dyer <ps...@gmail.com> wrote:
>
> I recently setup a 2.2.0 test cluster.  For some reason, all of my MR jobs
> are failing.  The maps and reduces all run to completion, without any
> errors.  Yet the app is marked failed and there is no final output.  Any
> ideas?
>
> Application Type: MAPREDUCE
> State: FINISHED
> FinalStatus: FAILED
> Diagnostics: We crashed durring a commit
>
> I notice in the logs this (but not sure what to make of it):
>
> 2013-10-21 23:42:41,379 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: Memory usage of ProcessTree 789 for container-id container_1382415258498_0002_01_000001: 250.4 MB of 2 GB physical memory used; 2.0 GB of 6 GB virtual memory used
> 2013-10-21 23:42:41,743 WARN org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Exit code from container container_1382415258498_0002_01_000001 is : 255
> 2013-10-21 23:42:41,744 WARN org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Exception from container-launch with container ID: container_1382415258498_0002_01_000001 and exit code: 255
> org.apache.hadoop.util.Shell$ExitCodeException:
>
> 2013-10-21 23:42:41,746 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor:
> 2013-10-21 23:42:41,747 WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch: Container exited with a non-zero exit code 255
> 2013-10-21 23:42:41,747 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: Container container_1382415258498_0002_01_000001 transitioned from RUNNING to EXITED_WITH_FAILURE
> 2013-10-21 23:42:41,747 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch: Cleaning up container container_1382415258498_0002_01_000001
> 2013-10-21 23:42:41,764 INFO org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Deleting absolute path : /hadoop/hadoop-2.2.0/cluster-data/usercache/hadoop/appcache/application_1382415258498_0002/container_1382415258498_0002_01_000001
> 2013-10-21 23:42:41,765 WARN org.apache.hadoop.yarn.server.nodemanager.NMAuditLogger: USER=hadoop	OPERATION=Container Finished - Failed	TARGET=ContainerImpl	RESULT=FAILURE	DESCRIPTION=Container failed with state: EXITED_WITH_FAILURE	APPID=application_1382415258498_0002	CONTAINERID=container_1382415258498_0002_01_000001
>
>
> --
> Arun C. Murthy
> Hortonworks Inc.
> http://hortonworks.com/
>
>
>
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity
> to which it is addressed and may contain information that is confidential,
> privileged and exempt from disclosure under applicable law. If the reader
> of this message is not the intended recipient, you are hereby notified that
> any printing, copying, dissemination, distribution, disclosure or
> forwarding of this communication is strictly prohibited. If you have
> received this communication in error, please contact the sender immediately
> and delete it from your system. Thank You.




-- 

Robert Dyer
rdyer@iastate.edu

Re: Hadoop 2.2.0 MR tasks failing

Posted by Robert Dyer <rd...@iastate.edu>.

The logs for the maps and reduces show nothing useful.  There are a ton of
warnings about deprecated and final config values, but the task runs and
seems to finish without error.  The only errors I've found in logs are the
ones I posted above, which were in the NodeManager log files.

Here's an example map log:

2013-10-21 23:14:57,241 INFO [main] org.apache.hadoop.mapred.MapTask: Map
output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer
2013-10-21 23:14:57,337 INFO [main] org.apache.hadoop.mapred.MapTask:
(EQUATOR) 0 kvi 26214396(104857584)
2013-10-21 23:14:57,337 INFO [main] org.apache.hadoop.mapred.MapTask:
mapreduce.task.io.sort.mb: 100
2013-10-21 23:14:57,337 INFO [main] org.apache.hadoop.mapred.MapTask: soft
limit at 83886080
2013-10-21 23:14:57,337 INFO [main] org.apache.hadoop.mapred.MapTask:
bufstart = 0; bufvoid = 104857600
2013-10-21 23:14:57,337 INFO [main] org.apache.hadoop.mapred.MapTask:
kvstart = 26214396; length = 6553600
2013-10-21 23:14:57,392 INFO [main]
org.apache.hadoop.io.compress.zlib.ZlibFactory: Successfully loaded &
initialized native-zlib library
2013-10-21 23:14:57,392 INFO [main]
org.apache.hadoop.io.compress.CodecPool: Got brand-new decompressor
[.deflate]
2013-10-21 23:15:08,610 INFO [main] org.apache.hadoop.mapred.MapTask:
Starting flush of map output
2013-10-21 23:15:08,610 INFO [main] org.apache.hadoop.mapred.MapTask:
Spilling map output
2013-10-21 23:15:08,611 INFO [main] org.apache.hadoop.mapred.MapTask:
bufstart = 0; bufend = 204512; bufvoid = 104857600
2013-10-21 23:15:08,611 INFO [main] org.apache.hadoop.mapred.MapTask:
kvstart = 26214396(104857584); kvend = 26182336(104729344); length =
32061/6553600
2013-10-21 23:15:08,722 INFO [main]
org.apache.hadoop.io.compress.CodecPool: Got brand-new compressor [.snappy]
2013-10-21 23:15:08,856 INFO [main] org.apache.hadoop.mapred.MapTask:
Finished spill 0
2013-10-21 23:15:08,859 INFO [main] org.apache.hadoop.mapred.Task:
Task:attempt_1382415258498_0001_m_000014_0 is done. And is in the process
of committing
2013-10-21 23:15:08,896 INFO [main] org.apache.hadoop.mapred.Task: Task
'attempt_1382415258498_0001_m_000014_0' done.



On Tue, Oct 22, 2013 at 12:16 AM, Arun C Murthy <ac...@hortonworks.com> wrote:

> If you follow the links on the web-ui to the logs of the map/reduce tasks,
> what do you see there?
>
> Arun
>
> On Oct 21, 2013, at 9:55 PM, Robert Dyer <ps...@gmail.com> wrote:
>
> I recently setup a 2.2.0 test cluster.  For some reason, all of my MR jobs
> are failing.  The maps and reduces all run to completion, without any
> errors.  Yet the app is marked failed and there is no final output.  Any
> ideas?
>
> Application Type: MAPREDUCE
> State: FINISHED
> FinalStatus: FAILED
> Diagnostics: We crashed durring a commit
>
> I notice in the logs this (but not sure what to make of it):
>
> 2013-10-21 23:42:41,379 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: Memory usage of ProcessTree 789 for container-id container_1382415258498_0002_01_000001: 250.4 MB of 2 GB physical memory used; 2.0 GB of 6 GB virtual memory used
> 2013-10-21 23:42:41,743 WARN org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Exit code from container container_1382415258498_0002_01_000001 is : 255
> 2013-10-21 23:42:41,744 WARN org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Exception from container-launch with container ID: container_1382415258498_0002_01_000001 and exit code: 255
> org.apache.hadoop.util.Shell$ExitCodeException:
>
> 2013-10-21 23:42:41,746 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor:
> 2013-10-21 23:42:41,747 WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch: Container exited with a non-zero exit code 255
> 2013-10-21 23:42:41,747 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: Container container_1382415258498_0002_01_000001 transitioned from RUNNING to EXITED_WITH_FAILURE
> 2013-10-21 23:42:41,747 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch: Cleaning up container container_1382415258498_0002_01_000001
> 2013-10-21 23:42:41,764 INFO org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Deleting absolute path : /hadoop/hadoop-2.2.0/cluster-data/usercache/hadoop/appcache/application_1382415258498_0002/container_1382415258498_0002_01_000001
> 2013-10-21 23:42:41,765 WARN org.apache.hadoop.yarn.server.nodemanager.NMAuditLogger: USER=hadoop	OPERATION=Container Finished - Failed	TARGET=ContainerImpl	RESULT=FAILURE	DESCRIPTION=Container failed with state: EXITED_WITH_FAILURE	APPID=application_1382415258498_0002	CONTAINERID=container_1382415258498_0002_01_000001
>
>
> --
> Arun C. Murthy
> Hortonworks Inc.
> http://hortonworks.com/
>
>
>
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity
> to which it is addressed and may contain information that is confidential,
> privileged and exempt from disclosure under applicable law. If the reader
> of this message is not the intended recipient, you are hereby notified that
> any printing, copying, dissemination, distribution, disclosure or
> forwarding of this communication is strictly prohibited. If you have
> received this communication in error, please contact the sender immediately
> and delete it from your system. Thank You.




-- 

Robert Dyer
rdyer@iastate.edu

Re: Hadoop 2.2.0 MR tasks failing

Posted by Robert Dyer <rd...@iastate.edu>.

The logs for the maps and reduces show nothing useful.  There are a ton of
warnings about deprecated and final config values, but the task runs and
seems to finish without error.  The only errors I've found in logs are the
ones I posted above, which were in the NodeManager log files.

Here's an example map log:

2013-10-21 23:14:57,241 INFO [main] org.apache.hadoop.mapred.MapTask: Map
output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer
2013-10-21 23:14:57,337 INFO [main] org.apache.hadoop.mapred.MapTask:
(EQUATOR) 0 kvi 26214396(104857584)
2013-10-21 23:14:57,337 INFO [main] org.apache.hadoop.mapred.MapTask:
mapreduce.task.io.sort.mb: 100
2013-10-21 23:14:57,337 INFO [main] org.apache.hadoop.mapred.MapTask: soft
limit at 83886080
2013-10-21 23:14:57,337 INFO [main] org.apache.hadoop.mapred.MapTask:
bufstart = 0; bufvoid = 104857600
2013-10-21 23:14:57,337 INFO [main] org.apache.hadoop.mapred.MapTask:
kvstart = 26214396; length = 6553600
2013-10-21 23:14:57,392 INFO [main]
org.apache.hadoop.io.compress.zlib.ZlibFactory: Successfully loaded &
initialized native-zlib library
2013-10-21 23:14:57,392 INFO [main]
org.apache.hadoop.io.compress.CodecPool: Got brand-new decompressor
[.deflate]
2013-10-21 23:15:08,610 INFO [main] org.apache.hadoop.mapred.MapTask:
Starting flush of map output
2013-10-21 23:15:08,610 INFO [main] org.apache.hadoop.mapred.MapTask:
Spilling map output
2013-10-21 23:15:08,611 INFO [main] org.apache.hadoop.mapred.MapTask:
bufstart = 0; bufend = 204512; bufvoid = 104857600
2013-10-21 23:15:08,611 INFO [main] org.apache.hadoop.mapred.MapTask:
kvstart = 26214396(104857584); kvend = 26182336(104729344); length =
32061/6553600
2013-10-21 23:15:08,722 INFO [main]
org.apache.hadoop.io.compress.CodecPool: Got brand-new compressor [.snappy]
2013-10-21 23:15:08,856 INFO [main] org.apache.hadoop.mapred.MapTask:
Finished spill 0
2013-10-21 23:15:08,859 INFO [main] org.apache.hadoop.mapred.Task:
Task:attempt_1382415258498_0001_m_000014_0 is done. And is in the process
of committing
2013-10-21 23:15:08,896 INFO [main] org.apache.hadoop.mapred.Task: Task
'attempt_1382415258498_0001_m_000014_0' done.



On Tue, Oct 22, 2013 at 12:16 AM, Arun C Murthy <ac...@hortonworks.com> wrote:

> If you follow the links on the web-ui to the logs of the map/reduce tasks,
> what do you see there?
>
> Arun
>
> On Oct 21, 2013, at 9:55 PM, Robert Dyer <ps...@gmail.com> wrote:
>
> I recently setup a 2.2.0 test cluster.  For some reason, all of my MR jobs
> are failing.  The maps and reduces all run to completion, without any
> errors.  Yet the app is marked failed and there is no final output.  Any
> ideas?
>
> Application Type: MAPREDUCE
> State: FINISHED
> FinalStatus: FAILED
> Diagnostics: We crashed durring a commit
>
> I notice in the logs this (but not sure what to make of it):
>
> 2013-10-21 23:42:41,379 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: Memory usage of ProcessTree 789 for container-id container_1382415258498_0002_01_000001: 250.4 MB of 2 GB physical memory used; 2.0 GB of 6 GB virtual memory used
> 2013-10-21 23:42:41,743 WARN org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Exit code from container container_1382415258498_0002_01_000001 is : 255
> 2013-10-21 23:42:41,744 WARN org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Exception from container-launch with container ID: container_1382415258498_0002_01_000001 and exit code: 255
> org.apache.hadoop.util.Shell$ExitCodeException:
>
> 2013-10-21 23:42:41,746 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor:
> 2013-10-21 23:42:41,747 WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch: Container exited with a non-zero exit code 255
> 2013-10-21 23:42:41,747 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: Container container_1382415258498_0002_01_000001 transitioned from RUNNING to EXITED_WITH_FAILURE
> 2013-10-21 23:42:41,747 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch: Cleaning up container container_1382415258498_0002_01_000001
> 2013-10-21 23:42:41,764 INFO org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Deleting absolute path : /hadoop/hadoop-2.2.0/cluster-data/usercache/hadoop/appcache/application_1382415258498_0002/container_1382415258498_0002_01_000001
> 2013-10-21 23:42:41,765 WARN org.apache.hadoop.yarn.server.nodemanager.NMAuditLogger: USER=hadoop	OPERATION=Container Finished - Failed	TARGET=ContainerImpl	RESULT=FAILURE	DESCRIPTION=Container failed with state: EXITED_WITH_FAILURE	APPID=application_1382415258498_0002	CONTAINERID=container_1382415258498_0002_01_000001
>
>
> --
> Arun C. Murthy
> Hortonworks Inc.
> http://hortonworks.com/
>
>
>
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity
> to which it is addressed and may contain information that is confidential,
> privileged and exempt from disclosure under applicable law. If the reader
> of this message is not the intended recipient, you are hereby notified that
> any printing, copying, dissemination, distribution, disclosure or
> forwarding of this communication is strictly prohibited. If you have
> received this communication in error, please contact the sender immediately
> and delete it from your system. Thank You.




-- 

Robert Dyer
rdyer@iastate.edu

Re: Hadoop 2.2.0 MR tasks failing

Posted by Robert Dyer <rd...@iastate.edu>.

The logs for the maps and reduces show nothing useful.  There are a ton of
warnings about deprecated and final config values, but the task runs and
seems to finish without error.  The only errors I've found in logs are the
ones I posted above, which were in the NodeManager log files.

Here's an example map log:

2013-10-21 23:14:57,241 INFO [main] org.apache.hadoop.mapred.MapTask: Map
output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer
2013-10-21 23:14:57,337 INFO [main] org.apache.hadoop.mapred.MapTask:
(EQUATOR) 0 kvi 26214396(104857584)
2013-10-21 23:14:57,337 INFO [main] org.apache.hadoop.mapred.MapTask:
mapreduce.task.io.sort.mb: 100
2013-10-21 23:14:57,337 INFO [main] org.apache.hadoop.mapred.MapTask: soft
limit at 83886080
2013-10-21 23:14:57,337 INFO [main] org.apache.hadoop.mapred.MapTask:
bufstart = 0; bufvoid = 104857600
2013-10-21 23:14:57,337 INFO [main] org.apache.hadoop.mapred.MapTask:
kvstart = 26214396; length = 6553600
2013-10-21 23:14:57,392 INFO [main]
org.apache.hadoop.io.compress.zlib.ZlibFactory: Successfully loaded &
initialized native-zlib library
2013-10-21 23:14:57,392 INFO [main]
org.apache.hadoop.io.compress.CodecPool: Got brand-new decompressor
[.deflate]
2013-10-21 23:15:08,610 INFO [main] org.apache.hadoop.mapred.MapTask:
Starting flush of map output
2013-10-21 23:15:08,610 INFO [main] org.apache.hadoop.mapred.MapTask:
Spilling map output
2013-10-21 23:15:08,611 INFO [main] org.apache.hadoop.mapred.MapTask:
bufstart = 0; bufend = 204512; bufvoid = 104857600
2013-10-21 23:15:08,611 INFO [main] org.apache.hadoop.mapred.MapTask:
kvstart = 26214396(104857584); kvend = 26182336(104729344); length =
32061/6553600
2013-10-21 23:15:08,722 INFO [main]
org.apache.hadoop.io.compress.CodecPool: Got brand-new compressor [.snappy]
2013-10-21 23:15:08,856 INFO [main] org.apache.hadoop.mapred.MapTask:
Finished spill 0
2013-10-21 23:15:08,859 INFO [main] org.apache.hadoop.mapred.Task:
Task:attempt_1382415258498_0001_m_000014_0 is done. And is in the process
of committing
2013-10-21 23:15:08,896 INFO [main] org.apache.hadoop.mapred.Task: Task
'attempt_1382415258498_0001_m_000014_0' done.



On Tue, Oct 22, 2013 at 12:16 AM, Arun C Murthy <ac...@hortonworks.com> wrote:

> If you follow the links on the web-ui to the logs of the map/reduce tasks,
> what do you see there?
>
> Arun
>
> On Oct 21, 2013, at 9:55 PM, Robert Dyer <ps...@gmail.com> wrote:
>
> I recently setup a 2.2.0 test cluster.  For some reason, all of my MR jobs
> are failing.  The maps and reduces all run to completion, without any
> errors.  Yet the app is marked failed and there is no final output.  Any
> ideas?
>
> Application Type: MAPREDUCE
> State: FINISHED
> FinalStatus: FAILED
> Diagnostics: We crashed durring a commit
>
> I notice in the logs this (but not sure what to make of it):
>
> 2013-10-21 23:42:41,379 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: Memory usage of ProcessTree 789 for container-id container_1382415258498_0002_01_000001: 250.4 MB of 2 GB physical memory used; 2.0 GB of 6 GB virtual memory used
> 2013-10-21 23:42:41,743 WARN org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Exit code from container container_1382415258498_0002_01_000001 is : 255
> 2013-10-21 23:42:41,744 WARN org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Exception from container-launch with container ID: container_1382415258498_0002_01_000001 and exit code: 255
> org.apache.hadoop.util.Shell$ExitCodeException:
>
> 2013-10-21 23:42:41,746 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor:
> 2013-10-21 23:42:41,747 WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch: Container exited with a non-zero exit code 255
> 2013-10-21 23:42:41,747 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: Container container_1382415258498_0002_01_000001 transitioned from RUNNING to EXITED_WITH_FAILURE
> 2013-10-21 23:42:41,747 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch: Cleaning up container container_1382415258498_0002_01_000001
> 2013-10-21 23:42:41,764 INFO org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Deleting absolute path : /hadoop/hadoop-2.2.0/cluster-data/usercache/hadoop/appcache/application_1382415258498_0002/container_1382415258498_0002_01_000001
> 2013-10-21 23:42:41,765 WARN org.apache.hadoop.yarn.server.nodemanager.NMAuditLogger: USER=hadoop	OPERATION=Container Finished - Failed	TARGET=ContainerImpl	RESULT=FAILURE	DESCRIPTION=Container failed with state: EXITED_WITH_FAILURE	APPID=application_1382415258498_0002	CONTAINERID=container_1382415258498_0002_01_000001
>
>
> --
> Arun C. Murthy
> Hortonworks Inc.
> http://hortonworks.com/
>
>
>
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity
> to which it is addressed and may contain information that is confidential,
> privileged and exempt from disclosure under applicable law. If the reader
> of this message is not the intended recipient, you are hereby notified that
> any printing, copying, dissemination, distribution, disclosure or
> forwarding of this communication is strictly prohibited. If you have
> received this communication in error, please contact the sender immediately
> and delete it from your system. Thank You.




-- 

Robert Dyer
rdyer@iastate.edu

Re: Hadoop 2.2.0 MR tasks failing

Posted by Arun C Murthy <ac...@hortonworks.com>.

If you follow the links on the web-ui to the logs of the map/reduce tasks, what do you see there?

Arun

On Oct 21, 2013, at 9:55 PM, Robert Dyer <ps...@gmail.com> wrote:

> I recently setup a 2.2.0 test cluster.  For some reason, all of my MR jobs are failing.  The maps and reduces all run to completion, without any errors.  Yet the app is marked failed and there is no final output.  Any ideas?
> 
> Application Type: MAPREDUCE
> State: FINISHED
> FinalStatus: FAILED
> Diagnostics: We crashed durring a commit
> 
> I notice in the logs this (but not sure what to make of it):
> 2013-10-21 23:42:41,379 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: Memory usage of ProcessTree 789 for container-id container_1382415258498_0002_01_000001: 250.4 MB of 2 GB physical memory used; 2.0 GB of 6 GB virtual memory used
> 2013-10-21 23:42:41,743 WARN org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Exit code from container container_1382415258498_0002_01_000001 is : 255
> 2013-10-21 23:42:41,744 WARN org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Exception from container-launch with container ID: container_1382415258498_0002_01_000001 and exit code: 255
> org.apache.hadoop.util.Shell$ExitCodeException: 
> 2013-10-21 23:42:41,746 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: 
> 2013-10-21 23:42:41,747 WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch: Container exited with a non-zero exit code 255
> 2013-10-21 23:42:41,747 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: Container container_1382415258498_0002_01_000001 transitioned from RUNNING to EXITED_WITH_FAILURE
> 2013-10-21 23:42:41,747 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch: Cleaning up container container_1382415258498_0002_01_000001
> 2013-10-21 23:42:41,764 INFO org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Deleting absolute path : /hadoop/hadoop-2.2.0/cluster-data/usercache/hadoop/appcache/application_1382415258498_0002/container_1382415258498_0002_01_000001
> 2013-10-21 23:42:41,765 WARN org.apache.hadoop.yarn.server.nodemanager.NMAuditLogger: USER=hadoop	OPERATION=Container Finished - Failed	TARGET=ContainerImpl	RESULT=FAILURE	DESCRIPTION=Container failed with state: EXITED_WITH_FAILURE	APPID=application_1382415258498_0002	CONTAINERID=container_1382415258498_0002_01_000001
> 

--
Arun C. Murthy
Hortonworks Inc.
http://hortonworks.com/



-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Re: Hadoop 2.2.0 MR tasks failing

Posted by Arun C Murthy <ac...@hortonworks.com>.

If you follow the links on the web-ui to the logs of the map/reduce tasks, what do you see there?

Arun

On Oct 21, 2013, at 9:55 PM, Robert Dyer <ps...@gmail.com> wrote:

> I recently setup a 2.2.0 test cluster.  For some reason, all of my MR jobs are failing.  The maps and reduces all run to completion, without any errors.  Yet the app is marked failed and there is no final output.  Any ideas?
> 
> Application Type: MAPREDUCE
> State: FINISHED
> FinalStatus: FAILED
> Diagnostics: We crashed durring a commit
> 
> I notice in the logs this (but not sure what to make of it):
> 2013-10-21 23:42:41,379 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: Memory usage of ProcessTree 789 for container-id container_1382415258498_0002_01_000001: 250.4 MB of 2 GB physical memory used; 2.0 GB of 6 GB virtual memory used
> 2013-10-21 23:42:41,743 WARN org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Exit code from container container_1382415258498_0002_01_000001 is : 255
> 2013-10-21 23:42:41,744 WARN org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Exception from container-launch with container ID: container_1382415258498_0002_01_000001 and exit code: 255
> org.apache.hadoop.util.Shell$ExitCodeException: 
> 2013-10-21 23:42:41,746 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: 
> 2013-10-21 23:42:41,747 WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch: Container exited with a non-zero exit code 255
> 2013-10-21 23:42:41,747 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: Container container_1382415258498_0002_01_000001 transitioned from RUNNING to EXITED_WITH_FAILURE
> 2013-10-21 23:42:41,747 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch: Cleaning up container container_1382415258498_0002_01_000001
> 2013-10-21 23:42:41,764 INFO org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Deleting absolute path : /hadoop/hadoop-2.2.0/cluster-data/usercache/hadoop/appcache/application_1382415258498_0002/container_1382415258498_0002_01_000001
> 2013-10-21 23:42:41,765 WARN org.apache.hadoop.yarn.server.nodemanager.NMAuditLogger: USER=hadoop	OPERATION=Container Finished - Failed	TARGET=ContainerImpl	RESULT=FAILURE	DESCRIPTION=Container failed with state: EXITED_WITH_FAILURE	APPID=application_1382415258498_0002	CONTAINERID=container_1382415258498_0002_01_000001
> 

--
Arun C. Murthy
Hortonworks Inc.
http://hortonworks.com/



-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Re: Hadoop 2.2.0 MR tasks failing

Posted by Arun C Murthy <ac...@hortonworks.com>.

If you follow the links on the web-ui to the logs of the map/reduce tasks, what do you see there?

Arun

On Oct 21, 2013, at 9:55 PM, Robert Dyer <ps...@gmail.com> wrote:

> I recently setup a 2.2.0 test cluster.  For some reason, all of my MR jobs are failing.  The maps and reduces all run to completion, without any errors.  Yet the app is marked failed and there is no final output.  Any ideas?
> 
> Application Type: MAPREDUCE
> State: FINISHED
> FinalStatus: FAILED
> Diagnostics: We crashed durring a commit
> 
> I notice in the logs this (but not sure what to make of it):
> 2013-10-21 23:42:41,379 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: Memory usage of ProcessTree 789 for container-id container_1382415258498_0002_01_000001: 250.4 MB of 2 GB physical memory used; 2.0 GB of 6 GB virtual memory used
> 2013-10-21 23:42:41,743 WARN org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Exit code from container container_1382415258498_0002_01_000001 is : 255
> 2013-10-21 23:42:41,744 WARN org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Exception from container-launch with container ID: container_1382415258498_0002_01_000001 and exit code: 255
> org.apache.hadoop.util.Shell$ExitCodeException: 
> 2013-10-21 23:42:41,746 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: 
> 2013-10-21 23:42:41,747 WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch: Container exited with a non-zero exit code 255
> 2013-10-21 23:42:41,747 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: Container container_1382415258498_0002_01_000001 transitioned from RUNNING to EXITED_WITH_FAILURE
> 2013-10-21 23:42:41,747 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch: Cleaning up container container_1382415258498_0002_01_000001
> 2013-10-21 23:42:41,764 INFO org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Deleting absolute path : /hadoop/hadoop-2.2.0/cluster-data/usercache/hadoop/appcache/application_1382415258498_0002/container_1382415258498_0002_01_000001
> 2013-10-21 23:42:41,765 WARN org.apache.hadoop.yarn.server.nodemanager.NMAuditLogger: USER=hadoop	OPERATION=Container Finished - Failed	TARGET=ContainerImpl	RESULT=FAILURE	DESCRIPTION=Container failed with state: EXITED_WITH_FAILURE	APPID=application_1382415258498_0002	CONTAINERID=container_1382415258498_0002_01_000001
> 

--
Arun C. Murthy
Hortonworks Inc.
http://hortonworks.com/



-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Re: Hadoop 2.2.0 MR tasks failing

Posted by Arun C Murthy <ac...@hortonworks.com>.

If you follow the links on the web-ui to the logs of the map/reduce tasks, what do you see there?

Arun

On Oct 21, 2013, at 9:55 PM, Robert Dyer <ps...@gmail.com> wrote:

> I recently setup a 2.2.0 test cluster.  For some reason, all of my MR jobs are failing.  The maps and reduces all run to completion, without any errors.  Yet the app is marked failed and there is no final output.  Any ideas?
> 
> Application Type: MAPREDUCE
> State: FINISHED
> FinalStatus: FAILED
> Diagnostics: We crashed durring a commit
> 
> I notice in the logs this (but not sure what to make of it):
> 2013-10-21 23:42:41,379 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: Memory usage of ProcessTree 789 for container-id container_1382415258498_0002_01_000001: 250.4 MB of 2 GB physical memory used; 2.0 GB of 6 GB virtual memory used
> 2013-10-21 23:42:41,743 WARN org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Exit code from container container_1382415258498_0002_01_000001 is : 255
> 2013-10-21 23:42:41,744 WARN org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Exception from container-launch with container ID: container_1382415258498_0002_01_000001 and exit code: 255
> org.apache.hadoop.util.Shell$ExitCodeException: 
> 2013-10-21 23:42:41,746 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: 
> 2013-10-21 23:42:41,747 WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch: Container exited with a non-zero exit code 255
> 2013-10-21 23:42:41,747 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: Container container_1382415258498_0002_01_000001 transitioned from RUNNING to EXITED_WITH_FAILURE
> 2013-10-21 23:42:41,747 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch: Cleaning up container container_1382415258498_0002_01_000001
> 2013-10-21 23:42:41,764 INFO org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Deleting absolute path : /hadoop/hadoop-2.2.0/cluster-data/usercache/hadoop/appcache/application_1382415258498_0002/container_1382415258498_0002_01_000001
> 2013-10-21 23:42:41,765 WARN org.apache.hadoop.yarn.server.nodemanager.NMAuditLogger: USER=hadoop	OPERATION=Container Finished - Failed	TARGET=ContainerImpl	RESULT=FAILURE	DESCRIPTION=Container failed with state: EXITED_WITH_FAILURE	APPID=application_1382415258498_0002	CONTAINERID=container_1382415258498_0002_01_000001
> 

--
Arun C. Murthy
Hortonworks Inc.
http://hortonworks.com/



-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.