You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Robert Dyer <ps...@gmail.com> on 2013/11/02 06:46:08 UTC
Re: Hadoop 2.2.0 MR tasks failing

So does anyone have any ideas how to track this down?

Is it perhaps an exception somewhere in an output committer that is being
swallowed and not showing up in the logs?

On Tue, Oct 22, 2013 at 2:19 AM, Robert Dyer <rd...@iastate.edu> wrote:

> The logs for the maps and reduces show nothing useful.  There are a ton of
> warnings about deprecated and final config values, but the task runs and
> seems to finish without error.  The only errors I've found in logs are the
> ones I posted above, which were in the NodeManager log files.
>
> Here's an example map log:
>
> 2013-10-21 23:14:57,241 INFO [main] org.apache.hadoop.mapred.MapTask: Map
> output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer
> 2013-10-21 23:14:57,337 INFO [main] org.apache.hadoop.mapred.MapTask:
> (EQUATOR) 0 kvi 26214396(104857584)
> 2013-10-21 23:14:57,337 INFO [main] org.apache.hadoop.mapred.MapTask:
> mapreduce.task.io.sort.mb: 100
> 2013-10-21 23:14:57,337 INFO [main] org.apache.hadoop.mapred.MapTask: soft
> limit at 83886080
> 2013-10-21 23:14:57,337 INFO [main] org.apache.hadoop.mapred.MapTask:
> bufstart = 0; bufvoid = 104857600
> 2013-10-21 23:14:57,337 INFO [main] org.apache.hadoop.mapred.MapTask:
> kvstart = 26214396; length = 6553600
> 2013-10-21 23:14:57,392 INFO [main]
> org.apache.hadoop.io.compress.zlib.ZlibFactory: Successfully loaded &
> initialized native-zlib library
> 2013-10-21 23:14:57,392 INFO [main]
> org.apache.hadoop.io.compress.CodecPool: Got brand-new decompressor
> [.deflate]
> 2013-10-21 23:15:08,610 INFO [main] org.apache.hadoop.mapred.MapTask:
> Starting flush of map output
> 2013-10-21 23:15:08,610 INFO [main] org.apache.hadoop.mapred.MapTask:
> Spilling map output
> 2013-10-21 23:15:08,611 INFO [main] org.apache.hadoop.mapred.MapTask:
> bufstart = 0; bufend = 204512; bufvoid = 104857600
> 2013-10-21 23:15:08,611 INFO [main] org.apache.hadoop.mapred.MapTask:
> kvstart = 26214396(104857584); kvend = 26182336(104729344); length =
> 32061/6553600
> 2013-10-21 23:15:08,722 INFO [main]
> org.apache.hadoop.io.compress.CodecPool: Got brand-new compressor [.snappy]
> 2013-10-21 23:15:08,856 INFO [main] org.apache.hadoop.mapred.MapTask:
> Finished spill 0
> 2013-10-21 23:15:08,859 INFO [main] org.apache.hadoop.mapred.Task:
> Task:attempt_1382415258498_0001_m_000014_0 is done. And is in the process
> of committing
> 2013-10-21 23:15:08,896 INFO [main] org.apache.hadoop.mapred.Task: Task
> 'attempt_1382415258498_0001_m_000014_0' done.
>
>
>
> On Tue, Oct 22, 2013 at 12:16 AM, Arun C Murthy <ac...@hortonworks.com>wrote:
>
>> If you follow the links on the web-ui to the logs of the map/reduce
>> tasks, what do you see there?
>>
>> Arun
>>
>> On Oct 21, 2013, at 9:55 PM, Robert Dyer <ps...@gmail.com> wrote:
>>
>> I recently setup a 2.2.0 test cluster.  For some reason, all of my MR
>> jobs are failing.  The maps and reduces all run to completion, without any
>> errors.  Yet the app is marked failed and there is no final output.  Any
>> ideas?
>>
>> Application Type: MAPREDUCE
>> State: FINISHED
>> FinalStatus: FAILED
>> Diagnostics: We crashed durring a commit
>>
>> I notice in the logs this (but not sure what to make of it):
>>
>> 2013-10-21 23:42:41,379 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: Memory usage of ProcessTree 789 for container-id container_1382415258498_0002_01_000001: 250.4 MB of 2 GB physical memory used; 2.0 GB of 6 GB virtual memory used
>> 2013-10-21 23:42:41,743 WARN org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Exit code from container container_1382415258498_0002_01_000001 is : 255
>> 2013-10-21 23:42:41,744 WARN org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Exception from container-launch with container ID: container_1382415258498_0002_01_000001 and exit code: 255
>> org.apache.hadoop.util.Shell$ExitCodeException:
>>
>> 2013-10-21 23:42:41,746 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor:
>> 2013-10-21 23:42:41,747 WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch: Container exited with a non-zero exit code 255
>> 2013-10-21 23:42:41,747 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: Container container_1382415258498_0002_01_000001 transitioned from RUNNING to EXITED_WITH_FAILURE
>> 2013-10-21 23:42:41,747 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch: Cleaning up container container_1382415258498_0002_01_000001
>> 2013-10-21 23:42:41,764 INFO org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Deleting absolute path : /hadoop/hadoop-2.2.0/cluster-data/usercache/hadoop/appcache/application_1382415258498_0002/container_1382415258498_0002_01_000001
>> 2013-10-21 23:42:41,765 WARN org.apache.hadoop.yarn.server.nodemanager.NMAuditLogger: USER=hadoop	OPERATION=Container Finished - Failed	TARGET=ContainerImpl	RESULT=FAILURE	DESCRIPTION=Container failed with state: EXITED_WITH_FAILURE	APPID=application_1382415258498_0002	CONTAINERID=container_1382415258498_0002_01_000001
>>
>>
>>  --
>> Arun C. Murthy
>> Hortonworks Inc.
>> http://hortonworks.com/
>>
>>
>>
>> CONFIDENTIALITY NOTICE
>> NOTICE: This message is intended for the use of the individual or entity
>> to which it is addressed and may contain information that is confidential,
>> privileged and exempt from disclosure under applicable law. If the reader
>> of this message is not the intended recipient, you are hereby notified that
>> any printing, copying, dissemination, distribution, disclosure or
>> forwarding of this communication is strictly prohibited. If you have
>> received this communication in error, please contact the sender immediately
>> and delete it from your system. Thank You.
>
>