You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-issues@hadoop.apache.org by "Clint Heath (JIRA)" <ji...@apache.org> on 2012/07/19 22:25:35 UTC

[jira] [Commented] (MAPREDUCE-4464) Reduce tasks failing with NullPointerException in ConcurrentHashMap.get()

    [ https://issues.apache.org/jira/browse/MAPREDUCE-4464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13418643#comment-13418643 ] 

Clint Heath commented on MAPREDUCE-4464:
----------------------------------------

Sorry, I should have supplied the exception that we encountered when this issue happened.  As it turned out, the host names in the cluster all had illegal DNS characters in them (the underscore "_"), so when the getHost() call was made, null was returned and we saw the following.

Mappers get about 80% complete when the reducers all begin to throw the following exceptions and then die almost immediately...eventually the whole job dies:

{noformat}
2012-06-26 15:56:02,326 FATAL org.apache.hadoop.mapred.Task: attempt_201206251823_0004_r_000036_1 GetMapEventsThread Ignoring exception : java.lang.NullPointerException
    at java.util.concurrent.ConcurrentHashMap.get(ConcurrentHashMap.java:768)
    at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$GetMapEventsThread.getMapCompletionEvents(ReduceTask.java:2835)
    at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$GetMapEventsThread.run(ReduceTask.java:2756)

2012-06-26 15:56:02,356 FATAL org.apache.hadoop.mapred.Task: attempt_201206251823_0004_r_000036_1 GetMapEventsThread Ignoring exception : org.apache.hadoop.ipc.RemoteException: java.io.IOException: JvmValidate Failed. Ignoring request from task: attempt_201206251823_0004_r_000036_1, with JvmId: jvm_201206251823_0004_r_-396118293
    at org.apache.hadoop.mapred.TaskTracker.validateJVM(TaskTracker.java:3468)
    at org.apache.hadoop.mapred.TaskTracker.getMapCompletionEvents(TaskTracker.java:3731)
    at sun.reflect.GeneratedMethodAccessor8.invoke(Unknown Source)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:557)
    at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1434)
    at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1430)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:396)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1177)
    at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1428)
    at org.apache.hadoop.ipc.Client.call(Client.java:1107)
    at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:226)
    at $Proxy0.getMapCompletionEvents(Unknown Source)
    at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$GetMapEventsThread.getMapCompletionEvents(ReduceTask.java:2798)
    at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$GetMapEventsThread.run(ReduceTask.java:2756)

2012-06-26 15:56:02,361 FATAL org.apache.hadoop.mapred.Task: Failed to contact the tasktracker
org.apache.hadoop.ipc.RemoteException: java.io.IOException: JvmValidate Failed. Ignoring request from task: attempt_201206251823_0004_r_000036_1, with JvmId: jvm_201206251823_0004_r_-396118293
    at org.apache.hadoop.mapred.TaskTracker.validateJVM(TaskTracker.java:3468)
    at org.apache.hadoop.mapred.TaskTracker.fatalError(TaskTracker.java:3714)
    at sun.reflect.GeneratedMethodAccessor9.invoke(Unknown Source)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:557)
    at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1434)
    at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1430)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:396)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1177)
    at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1428)
    at org.apache.hadoop.ipc.Client.call(Client.java:1107)
    at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:226)
    at $Proxy0.fatalError(Unknown Source)
    at org.apache.hadoop.mapred.Task.reportFatalError(Task.java:294)
    at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$GetMapEventsThread.run(ReduceTask.java:2781)
{noformat}
                
> Reduce tasks failing with NullPointerException in ConcurrentHashMap.get()
> -------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-4464
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4464
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: task
>    Affects Versions: 1.0.0
>            Reporter: Clint Heath
>            Priority: Minor
>         Attachments: MAPREDUCE-4464.patch
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> If DNS does not resolve hostnames properly, reduce tasks can fail with a very misleading exception.
> as per my peer Ahmed's diagnosis:
> In ReduceTask, it seems that event.getTaskTrackerHttp() returns a malformed URI, and so host from:
> {code}
> String host = u.getHost();
> {code}
> is evaluated to null and the NullPointerException is thrown afterwards in the ConcurrentHashMap.
> I have written a patch to check for a null hostname condition when getHost is called in the getMapCompletionEvents method and print an intelligible warning message rather than suppressing it until later when it becomes confusing and misleading.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira