You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-issues@hadoop.apache.org by "Tom White (JIRA)" <ji...@apache.org> on 2012/07/19 18:16:35 UTC

[jira] [Created] (MAPREDUCE-4463) JobTracker recovery fails with HDFS permission issue

Tom White created MAPREDUCE-4463:
------------------------------------

             Summary: JobTracker recovery fails with HDFS permission issue
                 Key: MAPREDUCE-4463
                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4463
             Project: Hadoop Map/Reduce
          Issue Type: Bug
          Components: mrv1
    Affects Versions: 1.2.0
            Reporter: Tom White
            Assignee: Tom White


Recovery fails when the job user is different to the JT owner (i.e. on anything bigger than a pseudo-distributed cluster).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAPREDUCE-4463) JobTracker recovery fails with HDFS permission issue

Posted by "Mayank Bansal (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-4463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13420929#comment-13420929 ] 

Mayank Bansal commented on MAPREDUCE-4463:
------------------------------------------

Thats correct, we would want to shut down the cluster even if thats in safe mode.

Thats a good catch to remove the TIPS which are from recovered jobs only.

+1 

Thanks,
Mayank
                
> JobTracker recovery fails with HDFS permission issue
> ----------------------------------------------------
>
>                 Key: MAPREDUCE-4463
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4463
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv1
>    Affects Versions: 1.2.0
>            Reporter: Tom White
>            Assignee: Tom White
>         Attachments: MAPREDUCE-4463.patch, MAPREDUCE-4463.patch
>
>
> Recovery fails when the job user is different to the JT owner (i.e. on anything bigger than a pseudo-distributed cluster).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAPREDUCE-4463) JobTracker recovery fails with HDFS permission issue

Posted by "Mayank Bansal (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-4463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13418520#comment-13418520 ] 

Mayank Bansal commented on MAPREDUCE-4463:
------------------------------------------

It's a good catch I missed it in my original patch.
Thank's tom for fixing it.

Looks good to me just one change I will suggest.

In teardown method if we also can check dfs.isClusterUp()

other then that this patch looks good

+1

Thanks,
Mayank
                
> JobTracker recovery fails with HDFS permission issue
> ----------------------------------------------------
>
>                 Key: MAPREDUCE-4463
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4463
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv1
>    Affects Versions: 1.2.0
>            Reporter: Tom White
>            Assignee: Tom White
>         Attachments: MAPREDUCE-4463.patch
>
>
> Recovery fails when the job user is different to the JT owner (i.e. on anything bigger than a pseudo-distributed cluster).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAPREDUCE-4463) JobTracker recovery fails with HDFS permission issue

Posted by "Tom White (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-4463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13421836#comment-13421836 ] 

Tom White commented on MAPREDUCE-4463:
--------------------------------------

Arun, I've seen an occasional race where the TT reinit actually causes removeTaskFromJob() to be called just _after_ the equivalent recovered TIP has started, so the new TIP is erroneously removed from rjob.tasks. This code protects from that happening. Does that sound reasonable?
                
> JobTracker recovery fails with HDFS permission issue
> ----------------------------------------------------
>
>                 Key: MAPREDUCE-4463
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4463
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv1
>    Affects Versions: 1.2.0
>            Reporter: Tom White
>            Assignee: Tom White
>         Attachments: MAPREDUCE-4463.patch, MAPREDUCE-4463.patch
>
>
> Recovery fails when the job user is different to the JT owner (i.e. on anything bigger than a pseudo-distributed cluster).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAPREDUCE-4463) JobTracker recovery fails with HDFS permission issue

Posted by "Alejandro Abdelnur (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-4463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13421577#comment-13421577 ] 

Alejandro Abdelnur commented on MAPREDUCE-4463:
-----------------------------------------------

+1
                
> JobTracker recovery fails with HDFS permission issue
> ----------------------------------------------------
>
>                 Key: MAPREDUCE-4463
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4463
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv1
>    Affects Versions: 1.2.0
>            Reporter: Tom White
>            Assignee: Tom White
>         Attachments: MAPREDUCE-4463.patch, MAPREDUCE-4463.patch
>
>
> Recovery fails when the job user is different to the JT owner (i.e. on anything bigger than a pseudo-distributed cluster).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAPREDUCE-4463) JobTracker recovery fails with HDFS permission issue

Posted by "Arun C Murthy (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-4463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13421603#comment-13421603 ] 

Arun C Murthy commented on MAPREDUCE-4463:
------------------------------------------

Tom, I'm not sure we need the check in the TT. The TT should actually 'kill' all it's tasks on a 'reinit' which it should get when the JT reboots.
                
> JobTracker recovery fails with HDFS permission issue
> ----------------------------------------------------
>
>                 Key: MAPREDUCE-4463
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4463
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv1
>    Affects Versions: 1.2.0
>            Reporter: Tom White
>            Assignee: Tom White
>         Attachments: MAPREDUCE-4463.patch, MAPREDUCE-4463.patch
>
>
> Recovery fails when the job user is different to the JT owner (i.e. on anything bigger than a pseudo-distributed cluster).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (MAPREDUCE-4463) JobTracker recovery fails with HDFS permission issue

Posted by "Tom White (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-4463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Tom White updated MAPREDUCE-4463:
---------------------------------

    Attachment: MAPREDUCE-4463.patch

The problem is that the recovery code tries to read the job token as the user who submitted the job, rather than the jobtracker user, yet the MR system directory has 700 perms so the user doesn't have read permission. The patch fixes this and adds a test case to run a job as a different user to the jobtracker. I also changed the test to use a mini DFS cluster for all tests (it was using the local filesystem before).
                
> JobTracker recovery fails with HDFS permission issue
> ----------------------------------------------------
>
>                 Key: MAPREDUCE-4463
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4463
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv1
>    Affects Versions: 1.2.0
>            Reporter: Tom White
>            Assignee: Tom White
>         Attachments: MAPREDUCE-4463.patch
>
>
> Recovery fails when the job user is different to the JT owner (i.e. on anything bigger than a pseudo-distributed cluster).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAPREDUCE-4463) JobTracker recovery fails with HDFS permission issue

Posted by "Tom White (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-4463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13418404#comment-13418404 ] 

Tom White commented on MAPREDUCE-4463:
--------------------------------------

The stack trace is as follows:

{noformat}
2012-07-18 11:26:05,846 INFO org.apache.hadoop.mapred.JobTracker: Starting the recovery process for 1 jobs ...
2012-07-18 11:26:05,846 INFO org.apache.hadoop.mapred.JobTracker: Submitting job job_201207181120_0001
2012-07-18 11:26:05,878 ERROR org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException as:wypoon (auth:SIMPLE) cause:org.apache.hadoop.security.AccessControlException: org.apache.hadoop.security.AccessControlException: Permission denied: user=wypoon, access=EXECUTE, inode="/tmp/mapred/system":mapred:supergroup:drwx------
2012-07-18 11:26:05,879 WARN org.apache.hadoop.mapred.JobTracker: Could not recover job job_201207181120_0001
org.apache.hadoop.security.AccessControlException: org.apache.hadoop.security.AccessControlException: Permission denied: user=wypoon, access=EXECUTE, inode="/tmp/mapred/system":mapred:supergroup:drwx------
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
        at org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:95)
        at org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:57)
        at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:914)
        at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:523)
        at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:820)
        at org.apache.hadoop.mapred.JobTracker$RecoveryManager$1.run(JobTracker.java:1523)
        at org.apache.hadoop.mapred.JobTracker$RecoveryManager$1.run(JobTracker.java:1519)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:396)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1278)

        at org.apache.hadoop.mapred.JobTracker$RecoveryManager.recover(JobTracker.java:1519)
        at org.apache.hadoop.mapred.JobTracker.offerService(JobTracker.java:2134)
        at org.apache.hadoop.mapred.JobTracker.main(JobTracker.java:4493)
Caused by: org.apache.hadoop.ipc.RemoteException: org.apache.hadoop.security.AccessControlException: Permission denied: user=wypoon, access=EXECUTE, inode="/tmp/mapred/system":mapred:supergroup:drwx------
        at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:203)
        at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkTraverse(FSPermissionChecker.java:159)
        at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:125)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:5253)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkTraverse(FSNamesystem.java:5232)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getFileInfo(FSNamesystem.java:2028)
        at org.apache.hadoop.hdfs.server.namenode.NameNode.getFileInfo(NameNode.java:897)
...
{noformat}

This problem was found by Wing Yew Poon.
                
> JobTracker recovery fails with HDFS permission issue
> ----------------------------------------------------
>
>                 Key: MAPREDUCE-4463
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4463
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv1
>    Affects Versions: 1.2.0
>            Reporter: Tom White
>            Assignee: Tom White
>
> Recovery fails when the job user is different to the JT owner (i.e. on anything bigger than a pseudo-distributed cluster).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (MAPREDUCE-4463) JobTracker recovery fails with HDFS permission issue

Posted by "Tom White (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-4463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Tom White updated MAPREDUCE-4463:
---------------------------------

    Attachment: MAPREDUCE-4463.patch

Thanks for reviewing the patch Mayank.

I don't think checking for {{dfs.isClusterUp()}} in the test is needed, and in fact it will return false if the cluster is in safemode, in which case we would still want to call shutdown. Other tests just check that the instance variable is non-null.

I've made a change to make the test more robust and also to guard against a race condition where the tasktracker reinitializes and old tasks that are still running interfere with new TIPs in recovered jobs.

                
> JobTracker recovery fails with HDFS permission issue
> ----------------------------------------------------
>
>                 Key: MAPREDUCE-4463
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4463
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv1
>    Affects Versions: 1.2.0
>            Reporter: Tom White
>            Assignee: Tom White
>         Attachments: MAPREDUCE-4463.patch, MAPREDUCE-4463.patch
>
>
> Recovery fails when the job user is different to the JT owner (i.e. on anything bigger than a pseudo-distributed cluster).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira