You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-issues@hadoop.apache.org by "Ivan Mitic (JIRA)" <ji...@apache.org> on 2012/08/27 19:36:07 UTC

[jira] [Created] (HADOOP-8731) Public distributed cache support for Windows

Ivan Mitic created HADOOP-8731:
----------------------------------

             Summary: Public distributed cache support for Windows
                 Key: HADOOP-8731
                 URL: https://issues.apache.org/jira/browse/HADOOP-8731
             Project: Hadoop Common
          Issue Type: Bug
          Components: filecache
            Reporter: Ivan Mitic
            Assignee: Ivan Mitic


A distributed cache file is considered public (sharable between MR jobs) if OTHER has read permissions on the file and +x permissions all the way up in the folder hierarchy. By default, Windows permissions are mapped to "700" all the way up to the drive letter, and it is unreasonable to ask users to change the permission on the whole drive to make the file public. IOW, it is hardly possible to have public distributed cache on Windows. 

To enable the scenario and make it more "Windows friendly", the criteria on when a file is considered public should be relaxed. One proposal is to check whether the user has given EVERYONE group permission on the file only (and discard the +x check on parent folders).

Security considerations for the proposal: Default permissions on Unix platforms are usually "775" or "755" meaning that OTHER users can read and list folders by default. What this also means is that Hadoop users have to explicitly make the files private in order to make them private in the cluster (please correct me if this is not the case in real life!). On Windows, default permissions are "700". This means that by default all files are private. In the new model, if users want to make them public, they have to explicitly add EVERYONE group permissions on the file. 

TestTrackerDistributedCacheManager fails because of this issue.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HADOOP-8731) Public distributed cache support for Windows

Posted by "Ivan Mitic (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-8731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13482238#comment-13482238 ] 

Ivan Mitic commented on HADOOP-8731:
------------------------------------

Thanks a lot Vinod, Bikas for additional clarifications! I looked thru the codebase and things add up now :)

As you pointed out, this is only a problem if the default file system is LFS and we pass in file:/// URI. This is the case in TestTrackerDistributedCacheManager hence the test fails. Would it still make sense to go with my fix (I still have to polish it up and address comments), or you'd like to look for alternative ways to only fix the test?
                
> Public distributed cache support for Windows
> --------------------------------------------
>
>                 Key: HADOOP-8731
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8731
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: filecache
>            Reporter: Ivan Mitic
>            Assignee: Ivan Mitic
>         Attachments: HADOOP-8731-PublicCache.patch
>
>
> A distributed cache file is considered public (sharable between MR jobs) if OTHER has read permissions on the file and +x permissions all the way up in the folder hierarchy. By default, Windows permissions are mapped to "700" all the way up to the drive letter, and it is unreasonable to ask users to change the permission on the whole drive to make the file public. IOW, it is hardly possible to have public distributed cache on Windows. 
> To enable the scenario and make it more "Windows friendly", the criteria on when a file is considered public should be relaxed. One proposal is to check whether the user has given EVERYONE group permission on the file only (and discard the +x check on parent folders).
> Security considerations for the proposal: Default permissions on Unix platforms are usually "775" or "755" meaning that OTHER users can read and list folders by default. What this also means is that Hadoop users have to explicitly make the files private in order to make them private in the cluster (please correct me if this is not the case in real life!). On Windows, default permissions are "700". This means that by default all files are private. In the new model, if users want to make them public, they have to explicitly add EVERYONE group permissions on the file. 
> TestTrackerDistributedCacheManager fails because of this issue.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Assigned] (HADOOP-8731) Public distributed cache support for Windows

Posted by "Vinod Kumar Vavilapalli (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-8731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Vinod Kumar Vavilapalli reassigned HADOOP-8731:
-----------------------------------------------

    Assignee: Vinod Kumar Vavilapalli  (was: Ivan Mitic)
    
> Public distributed cache support for Windows
> --------------------------------------------
>
>                 Key: HADOOP-8731
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8731
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: filecache
>            Reporter: Ivan Mitic
>            Assignee: Vinod Kumar Vavilapalli
>         Attachments: HADOOP-8731-PublicCache.patch
>
>
> A distributed cache file is considered public (sharable between MR jobs) if OTHER has read permissions on the file and +x permissions all the way up in the folder hierarchy. By default, Windows permissions are mapped to "700" all the way up to the drive letter, and it is unreasonable to ask users to change the permission on the whole drive to make the file public. IOW, it is hardly possible to have public distributed cache on Windows. 
> To enable the scenario and make it more "Windows friendly", the criteria on when a file is considered public should be relaxed. One proposal is to check whether the user has given EVERYONE group permission on the file only (and discard the +x check on parent folders).
> Security considerations for the proposal: Default permissions on Unix platforms are usually "775" or "755" meaning that OTHER users can read and list folders by default. What this also means is that Hadoop users have to explicitly make the files private in order to make them private in the cluster (please correct me if this is not the case in real life!). On Windows, default permissions are "700". This means that by default all files are private. In the new model, if users want to make them public, they have to explicitly add EVERYONE group permissions on the file. 
> TestTrackerDistributedCacheManager fails because of this issue.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HADOOP-8731) Public distributed cache support for Windows

Posted by "Ivan Mitic (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-8731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13485816#comment-13485816 ] 

Ivan Mitic commented on HADOOP-8731:
------------------------------------

Vinod, I prepared the updated patch where I addressed the feedback from above. Let me know if you see some room for improvement. Another alternative I see is to disable the problematic test on Windows what I'd like to avoid if possible.

bq. I believe TT absolutely needs to set ugo+rx for dirs containing expanded archives. This is needed to address some of the artifacts which retain permissions from the original bits that a user upload. So let's not move/change that code out of the archives code block.
Fixed.

bq. And for files, can you tell me why the 2nd line in the code-fragment shown below doesn't already do it correctly on Windows? It may in fact be because of some other bug, so asking - is it not enough to set correct permissions on the file itself in case of Windows? 
I commented in code on why this is needed.

                
> Public distributed cache support for Windows
> --------------------------------------------
>
>                 Key: HADOOP-8731
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8731
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: filecache
>            Reporter: Ivan Mitic
>            Assignee: Ivan Mitic
>         Attachments: HADOOP-8731-PublicCache.2.patch, HADOOP-8731-PublicCache.patch
>
>
> A distributed cache file is considered public (sharable between MR jobs) if OTHER has read permissions on the file and +x permissions all the way up in the folder hierarchy. By default, Windows permissions are mapped to "700" all the way up to the drive letter, and it is unreasonable to ask users to change the permission on the whole drive to make the file public. IOW, it is hardly possible to have public distributed cache on Windows. 
> To enable the scenario and make it more "Windows friendly", the criteria on when a file is considered public should be relaxed. One proposal is to check whether the user has given EVERYONE group permission on the file only (and discard the +x check on parent folders).
> Security considerations for the proposal: Default permissions on Unix platforms are usually "775" or "755" meaning that OTHER users can read and list folders by default. What this also means is that Hadoop users have to explicitly make the files private in order to make them private in the cluster (please correct me if this is not the case in real life!). On Windows, default permissions are "700". This means that by default all files are private. In the new model, if users want to make them public, they have to explicitly add EVERYONE group permissions on the file. 
> TestTrackerDistributedCacheManager fails because of this issue.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HADOOP-8731) Public distributed cache support for Windows

Posted by "Vinod Kumar Vavilapalli (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-8731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13462302#comment-13462302 ] 

Vinod Kumar Vavilapalli commented on HADOOP-8731:
-------------------------------------------------

Tx for the reiteration, Bikas.

Ivan to clarify, this is how it works:
 - JobClient is given a URI: it could be a local file:// URI or one on hdfs:///
 - JobClient sees if the path is on the same FS as JobTracker's and if not, it copies it. So if you passed a file:///tmp/a.jar to -libjars/-files/-archives, JobClient will copy it into user's jobSubmitDir.
 - So for file:/// paths, the visibility checks are not made. Only in your test-case scenario, where FS = LocalFS, these checks will be made. Hence my comment.
                
> Public distributed cache support for Windows
> --------------------------------------------
>
>                 Key: HADOOP-8731
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8731
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: filecache
>            Reporter: Ivan Mitic
>            Assignee: Ivan Mitic
>         Attachments: HADOOP-8731-PublicCache.patch
>
>
> A distributed cache file is considered public (sharable between MR jobs) if OTHER has read permissions on the file and +x permissions all the way up in the folder hierarchy. By default, Windows permissions are mapped to "700" all the way up to the drive letter, and it is unreasonable to ask users to change the permission on the whole drive to make the file public. IOW, it is hardly possible to have public distributed cache on Windows. 
> To enable the scenario and make it more "Windows friendly", the criteria on when a file is considered public should be relaxed. One proposal is to check whether the user has given EVERYONE group permission on the file only (and discard the +x check on parent folders).
> Security considerations for the proposal: Default permissions on Unix platforms are usually "775" or "755" meaning that OTHER users can read and list folders by default. What this also means is that Hadoop users have to explicitly make the files private in order to make them private in the cluster (please correct me if this is not the case in real life!). On Windows, default permissions are "700". This means that by default all files are private. In the new model, if users want to make them public, they have to explicitly add EVERYONE group permissions on the file. 
> TestTrackerDistributedCacheManager fails because of this issue.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HADOOP-8731) TestTrackerDistributedCacheManager fails on Windows

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-8731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13489924#comment-13489924 ] 

Hadoop QA commented on HADOOP-8731:
-----------------------------------

{color:red}-1 overall{color}.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12551151/HADOOP-8731-PublicCache.2.patch
  against trunk revision .

    {color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/1703//console

This message is automatically generated.
                
> TestTrackerDistributedCacheManager fails on Windows
> ---------------------------------------------------
>
>                 Key: HADOOP-8731
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8731
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: filecache
>            Reporter: Ivan Mitic
>            Assignee: Ivan Mitic
>         Attachments: HADOOP-8731-PublicCache.2.patch, HADOOP-8731-PublicCache.patch
>
>
> Jira tracking TestTrackerDistributedCacheManager test failure. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HADOOP-8731) Public distributed cache support for Windows

Posted by "Ivan Mitic (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-8731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13455944#comment-13455944 ] 

Ivan Mitic commented on HADOOP-8731:
------------------------------------

{quote}Can you please clarify the following scenario so that other folks reading this thread have it easy?
Directory A (perm for user Foo) contains directory B (perm for Everyone)
So contents of A will be private cache and contents of B will be public cache on Windows but not on Linux.
{quote}
Correct. 

The issue we are trying to mitigate on Windows comes from the default permissions. Specifically, default permissions in terms of the Unix mask map to "700" on Windows. This means that by default "others" (EVERYONE group) do not have "r+x" permissions. This further means that, if we have a path "c:\some\path\file1" and a user wants to upload "file1" to the public distributed cache he has to change the permissions on the whole drive to do so. Now, to make the scenario more "Windows friendly", we only require user to change the permissions on the "file1" to make it public (more precisely to give EVERYONE group read permissions on the "file1").

On Unix systems, given that default permissions are usually "775" or "755" the scenario is completely opposite.
                
> Public distributed cache support for Windows
> --------------------------------------------
>
>                 Key: HADOOP-8731
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8731
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: filecache
>            Reporter: Ivan Mitic
>            Assignee: Ivan Mitic
>         Attachments: HADOOP-8731-PublicCache.patch
>
>
> A distributed cache file is considered public (sharable between MR jobs) if OTHER has read permissions on the file and +x permissions all the way up in the folder hierarchy. By default, Windows permissions are mapped to "700" all the way up to the drive letter, and it is unreasonable to ask users to change the permission on the whole drive to make the file public. IOW, it is hardly possible to have public distributed cache on Windows. 
> To enable the scenario and make it more "Windows friendly", the criteria on when a file is considered public should be relaxed. One proposal is to check whether the user has given EVERYONE group permission on the file only (and discard the +x check on parent folders).
> Security considerations for the proposal: Default permissions on Unix platforms are usually "775" or "755" meaning that OTHER users can read and list folders by default. What this also means is that Hadoop users have to explicitly make the files private in order to make them private in the cluster (please correct me if this is not the case in real life!). On Windows, default permissions are "700". This means that by default all files are private. In the new model, if users want to make them public, they have to explicitly add EVERYONE group permissions on the file. 
> TestTrackerDistributedCacheManager fails because of this issue.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HADOOP-8731) Public distributed cache support for Windows

Posted by "Bikas Saha (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-8731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13455024#comment-13455024 ] 

Bikas Saha commented on HADOOP-8731:
------------------------------------

Looks like the chmod fixes an existing generic bug.

Can you please clarify the following scenario so that other folks reading this thread have it easy?
Directory A (perm for user Foo) contains directory B (perm for Everyone)
So contents of A will be private cache and contents of B will be public cache on Windows but not on Linux.

                
> Public distributed cache support for Windows
> --------------------------------------------
>
>                 Key: HADOOP-8731
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8731
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: filecache
>            Reporter: Ivan Mitic
>            Assignee: Ivan Mitic
>         Attachments: HADOOP-8731-PublicCache.patch
>
>
> A distributed cache file is considered public (sharable between MR jobs) if OTHER has read permissions on the file and +x permissions all the way up in the folder hierarchy. By default, Windows permissions are mapped to "700" all the way up to the drive letter, and it is unreasonable to ask users to change the permission on the whole drive to make the file public. IOW, it is hardly possible to have public distributed cache on Windows. 
> To enable the scenario and make it more "Windows friendly", the criteria on when a file is considered public should be relaxed. One proposal is to check whether the user has given EVERYONE group permission on the file only (and discard the +x check on parent folders).
> Security considerations for the proposal: Default permissions on Unix platforms are usually "775" or "755" meaning that OTHER users can read and list folders by default. What this also means is that Hadoop users have to explicitly make the files private in order to make them private in the cluster (please correct me if this is not the case in real life!). On Windows, default permissions are "700". This means that by default all files are private. In the new model, if users want to make them public, they have to explicitly add EVERYONE group permissions on the file. 
> TestTrackerDistributedCacheManager fails because of this issue.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HADOOP-8731) Public distributed cache support for Windows

Posted by "Ivan Mitic (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-8731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ivan Mitic updated HADOOP-8731:
-------------------------------

    Attachment: HADOOP-8731-PublicCache.2.patch

Attaching updated patch.
                
> Public distributed cache support for Windows
> --------------------------------------------
>
>                 Key: HADOOP-8731
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8731
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: filecache
>            Reporter: Ivan Mitic
>            Assignee: Ivan Mitic
>         Attachments: HADOOP-8731-PublicCache.2.patch, HADOOP-8731-PublicCache.patch
>
>
> A distributed cache file is considered public (sharable between MR jobs) if OTHER has read permissions on the file and +x permissions all the way up in the folder hierarchy. By default, Windows permissions are mapped to "700" all the way up to the drive letter, and it is unreasonable to ask users to change the permission on the whole drive to make the file public. IOW, it is hardly possible to have public distributed cache on Windows. 
> To enable the scenario and make it more "Windows friendly", the criteria on when a file is considered public should be relaxed. One proposal is to check whether the user has given EVERYONE group permission on the file only (and discard the +x check on parent folders).
> Security considerations for the proposal: Default permissions on Unix platforms are usually "775" or "755" meaning that OTHER users can read and list folders by default. What this also means is that Hadoop users have to explicitly make the files private in order to make them private in the cluster (please correct me if this is not the case in real life!). On Windows, default permissions are "700". This means that by default all files are private. In the new model, if users want to make them public, they have to explicitly add EVERYONE group permissions on the file. 
> TestTrackerDistributedCacheManager fails because of this issue.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Assigned] (HADOOP-8731) Public distributed cache support for Windows

Posted by "Vinod Kumar Vavilapalli (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-8731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Vinod Kumar Vavilapalli reassigned HADOOP-8731:
-----------------------------------------------

    Assignee: Ivan Mitic  (was: Vinod Kumar Vavilapalli)

Reverting accidental assignment.
                
> Public distributed cache support for Windows
> --------------------------------------------
>
>                 Key: HADOOP-8731
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8731
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: filecache
>            Reporter: Ivan Mitic
>            Assignee: Ivan Mitic
>         Attachments: HADOOP-8731-PublicCache.patch
>
>
> A distributed cache file is considered public (sharable between MR jobs) if OTHER has read permissions on the file and +x permissions all the way up in the folder hierarchy. By default, Windows permissions are mapped to "700" all the way up to the drive letter, and it is unreasonable to ask users to change the permission on the whole drive to make the file public. IOW, it is hardly possible to have public distributed cache on Windows. 
> To enable the scenario and make it more "Windows friendly", the criteria on when a file is considered public should be relaxed. One proposal is to check whether the user has given EVERYONE group permission on the file only (and discard the +x check on parent folders).
> Security considerations for the proposal: Default permissions on Unix platforms are usually "775" or "755" meaning that OTHER users can read and list folders by default. What this also means is that Hadoop users have to explicitly make the files private in order to make them private in the cluster (please correct me if this is not the case in real life!). On Windows, default permissions are "700". This means that by default all files are private. In the new model, if users want to make them public, they have to explicitly add EVERYONE group permissions on the file. 
> TestTrackerDistributedCacheManager fails because of this issue.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HADOOP-8731) Public distributed cache support for Windows

Posted by "Vinod Kumar Vavilapalli (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-8731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13456057#comment-13456057 ] 

Vinod Kumar Vavilapalli commented on HADOOP-8731:
-------------------------------------------------

Apologies for repeating the questions, I overlooked your answers.

There are two cases:
 - In the case of a real cluster, and with HDFS, the definition of a public dist-cache file is one which is accessible to all users; snd HDFS also has posix style permissions. The method isPublic() eventually is used by the JobClient to figure out which of the user-needed artifacts are public and which are not. So in the distributed-cluster case with DFS, this definition of public-cache doesn't need to change irrespective of whether you have Windows or Linux underneath.
 - If you are talking of distributed MR cluster working on a local-filesystem, yes your changes will be needed, but that mode is not a supported setup anyways and will most likely need many more changes besides yours.

Regarding the permissions related changes:
 - I believe TT absolutely needs to set ugo+rx for dirs containing expanded archives. This is needed to address some of the artifacts which retain permissions from the original bits that a user upload. So let's not move/change that code out of the archives code block.
 - And for files, can you tell me why the 2nd line in the code-fragment shown below doesn't already do it correctly on Windows? It may in fact be because of some other bug, so asking - is it not enough to set correct permissions on the file itself in case of Windows? 
{code}
 ...
 sourceFs.copyToLocalFile(sourcePath, workFile);
 localFs.setPermission(workFile, permission);
 if (isArchive) {
 ...
{code}
                
> Public distributed cache support for Windows
> --------------------------------------------
>
>                 Key: HADOOP-8731
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8731
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: filecache
>            Reporter: Ivan Mitic
>            Assignee: Ivan Mitic
>         Attachments: HADOOP-8731-PublicCache.patch
>
>
> A distributed cache file is considered public (sharable between MR jobs) if OTHER has read permissions on the file and +x permissions all the way up in the folder hierarchy. By default, Windows permissions are mapped to "700" all the way up to the drive letter, and it is unreasonable to ask users to change the permission on the whole drive to make the file public. IOW, it is hardly possible to have public distributed cache on Windows. 
> To enable the scenario and make it more "Windows friendly", the criteria on when a file is considered public should be relaxed. One proposal is to check whether the user has given EVERYONE group permission on the file only (and discard the +x check on parent folders).
> Security considerations for the proposal: Default permissions on Unix platforms are usually "775" or "755" meaning that OTHER users can read and list folders by default. What this also means is that Hadoop users have to explicitly make the files private in order to make them private in the cluster (please correct me if this is not the case in real life!). On Windows, default permissions are "700". This means that by default all files are private. In the new model, if users want to make them public, they have to explicitly add EVERYONE group permissions on the file. 
> TestTrackerDistributedCacheManager fails because of this issue.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HADOOP-8731) Public distributed cache support for Windows

Posted by "Bikas Saha (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-8731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13454240#comment-13454240 ] 

Bikas Saha commented on HADOOP-8731:
------------------------------------

can you please explain the chmod() changes in TrackerDistributedCacheManager.
                
> Public distributed cache support for Windows
> --------------------------------------------
>
>                 Key: HADOOP-8731
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8731
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: filecache
>            Reporter: Ivan Mitic
>            Assignee: Ivan Mitic
>         Attachments: HADOOP-8731-PublicCache.patch
>
>
> A distributed cache file is considered public (sharable between MR jobs) if OTHER has read permissions on the file and +x permissions all the way up in the folder hierarchy. By default, Windows permissions are mapped to "700" all the way up to the drive letter, and it is unreasonable to ask users to change the permission on the whole drive to make the file public. IOW, it is hardly possible to have public distributed cache on Windows. 
> To enable the scenario and make it more "Windows friendly", the criteria on when a file is considered public should be relaxed. One proposal is to check whether the user has given EVERYONE group permission on the file only (and discard the +x check on parent folders).
> Security considerations for the proposal: Default permissions on Unix platforms are usually "775" or "755" meaning that OTHER users can read and list folders by default. What this also means is that Hadoop users have to explicitly make the files private in order to make them private in the cluster (please correct me if this is not the case in real life!). On Windows, default permissions are "700". This means that by default all files are private. In the new model, if users want to make them public, they have to explicitly add EVERYONE group permissions on the file. 
> TestTrackerDistributedCacheManager fails because of this issue.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HADOOP-8731) TestTrackerDistributedCacheManager fails on Windows

Posted by "Ivan Mitic (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-8731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ivan Mitic updated HADOOP-8731:
-------------------------------

    Status: Patch Available  (was: Open)
    
> TestTrackerDistributedCacheManager fails on Windows
> ---------------------------------------------------
>
>                 Key: HADOOP-8731
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8731
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: filecache
>            Reporter: Ivan Mitic
>            Assignee: Ivan Mitic
>         Attachments: HADOOP-8731-PublicCache.2.patch, HADOOP-8731-PublicCache.patch
>
>
> Jira tracking TestTrackerDistributedCacheManager test failure. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HADOOP-8731) Public distributed cache support for Windows

Posted by "Vinod Kumar Vavilapalli (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-8731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13456045#comment-13456045 ] 

Vinod Kumar Vavilapalli commented on HADOOP-8731:
-------------------------------------------------

Thanks for the explanation, Ivan.

Patch looks good overall. Two comments:
 - In your java comment for ancestorsHaveExecutePermissions(), please also mention that this change is only needed to enable LocalJobRunner to use Public-dist-cache. I'd also like the subject of this ticket to be changed - "Public dist-cache support for LocalJobRunner on Windows"
 - The changes involving "FileUtil.chmod()" look spurious, can you explain those changes?
                
> Public distributed cache support for Windows
> --------------------------------------------
>
>                 Key: HADOOP-8731
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8731
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: filecache
>            Reporter: Ivan Mitic
>            Assignee: Ivan Mitic
>         Attachments: HADOOP-8731-PublicCache.patch
>
>
> A distributed cache file is considered public (sharable between MR jobs) if OTHER has read permissions on the file and +x permissions all the way up in the folder hierarchy. By default, Windows permissions are mapped to "700" all the way up to the drive letter, and it is unreasonable to ask users to change the permission on the whole drive to make the file public. IOW, it is hardly possible to have public distributed cache on Windows. 
> To enable the scenario and make it more "Windows friendly", the criteria on when a file is considered public should be relaxed. One proposal is to check whether the user has given EVERYONE group permission on the file only (and discard the +x check on parent folders).
> Security considerations for the proposal: Default permissions on Unix platforms are usually "775" or "755" meaning that OTHER users can read and list folders by default. What this also means is that Hadoop users have to explicitly make the files private in order to make them private in the cluster (please correct me if this is not the case in real life!). On Windows, default permissions are "700". This means that by default all files are private. In the new model, if users want to make them public, they have to explicitly add EVERYONE group permissions on the file. 
> TestTrackerDistributedCacheManager fails because of this issue.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HADOOP-8731) Public distributed cache support for Windows

Posted by "Ivan Mitic (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-8731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13456134#comment-13456134 ] 

Ivan Mitic commented on HADOOP-8731:
------------------------------------

Thanks Vinod, these are great comments! Some answers below.

{quote}In the case of a real cluster, and with HDFS, the definition of a public dist-cache file is one which is accessible to all users; snd HDFS also has posix style permissions. The method isPublic() eventually is used by the JobClient to figure out which of the user-needed artifacts are public and which are not. So in the distributed-cluster case with DFS, this definition of public-cache doesn't need to change irrespective of whether you have Windows or Linux underneath.
{quote}
I agree, the JobClient will evaluate whether a file should be public or private. Now, if I understood things correctly, based on whether the file is marked public or private on the JobClient side, it will be later downloaded from DFS to the public or private LFS location on the TT machine. What we are proposing with this change is to change the logic on the JobClient side that determines whether the file is public or private. Given that all files are by default private on Windows, it would be a real challenge for users to upload a file to the public distributed cache if we keep the old model (see my previous comments). Does this make sense? Please do comment, maybe I just didn't understand correctly how DC works.

bq. I believe TT absolutely needs to set ugo+rx for dirs containing expanded archives. This is needed to address some of the artifacts which retain permissions from the original bits that a user upload. So let's not move/change that code out of the archives code block.
Ah, didn't think of this. Will revert back the original chmod.
bq. And for files, can you tell me why the 2nd line in the code-fragment shown below doesn't already do it correctly on Windows? It may in fact be because of some other bug, so asking - is it not enough to set correct permissions on the file itself in case of Windows? 
You're right, let me debug to see what was the problem here, I made this fix a while back.

                
> Public distributed cache support for Windows
> --------------------------------------------
>
>                 Key: HADOOP-8731
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8731
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: filecache
>            Reporter: Ivan Mitic
>            Assignee: Ivan Mitic
>         Attachments: HADOOP-8731-PublicCache.patch
>
>
> A distributed cache file is considered public (sharable between MR jobs) if OTHER has read permissions on the file and +x permissions all the way up in the folder hierarchy. By default, Windows permissions are mapped to "700" all the way up to the drive letter, and it is unreasonable to ask users to change the permission on the whole drive to make the file public. IOW, it is hardly possible to have public distributed cache on Windows. 
> To enable the scenario and make it more "Windows friendly", the criteria on when a file is considered public should be relaxed. One proposal is to check whether the user has given EVERYONE group permission on the file only (and discard the +x check on parent folders).
> Security considerations for the proposal: Default permissions on Unix platforms are usually "775" or "755" meaning that OTHER users can read and list folders by default. What this also means is that Hadoop users have to explicitly make the files private in order to make them private in the cluster (please correct me if this is not the case in real life!). On Windows, default permissions are "700". This means that by default all files are private. In the new model, if users want to make them public, they have to explicitly add EVERYONE group permissions on the file. 
> TestTrackerDistributedCacheManager fails because of this issue.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HADOOP-8731) Public distributed cache support for Windows

Posted by "Ivan Mitic (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-8731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13454352#comment-13454352 ] 

Ivan Mitic commented on HADOOP-8731:
------------------------------------

bq. can you please explain the chmod() changes in TrackerDistributedCacheManager.
Thanks Bikas for reviewing. The issue is that the right permissions are not set on files if I do not make this change. If you take a look at the previous  {{FileUtils.chmod()}} it only sets permissions for archives, but not for files. Now when I moved it below, it sets the permissions for both files are archives.
                
> Public distributed cache support for Windows
> --------------------------------------------
>
>                 Key: HADOOP-8731
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8731
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: filecache
>            Reporter: Ivan Mitic
>            Assignee: Ivan Mitic
>         Attachments: HADOOP-8731-PublicCache.patch
>
>
> A distributed cache file is considered public (sharable between MR jobs) if OTHER has read permissions on the file and +x permissions all the way up in the folder hierarchy. By default, Windows permissions are mapped to "700" all the way up to the drive letter, and it is unreasonable to ask users to change the permission on the whole drive to make the file public. IOW, it is hardly possible to have public distributed cache on Windows. 
> To enable the scenario and make it more "Windows friendly", the criteria on when a file is considered public should be relaxed. One proposal is to check whether the user has given EVERYONE group permission on the file only (and discard the +x check on parent folders).
> Security considerations for the proposal: Default permissions on Unix platforms are usually "775" or "755" meaning that OTHER users can read and list folders by default. What this also means is that Hadoop users have to explicitly make the files private in order to make them private in the cluster (please correct me if this is not the case in real life!). On Windows, default permissions are "700". This means that by default all files are private. In the new model, if users want to make them public, they have to explicitly add EVERYONE group permissions on the file. 
> TestTrackerDistributedCacheManager fails because of this issue.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HADOOP-8731) Public distributed cache support for Windows

Posted by "Ivan Mitic (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-8731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13456050#comment-13456050 ] 

Ivan Mitic commented on HADOOP-8731:
------------------------------------

Thanks for reviewing Vinod!

bq. In your java comment for ancestorsHaveExecutePermissions(), please also mention that this change is only needed to enable LocalJobRunner to use Public-dist-cache. I'd also like the subject of this ticket to be changed - "Public dist-cache support for LocalJobRunner on Windows"
The change does not apply to LocalJobRunner only, but to distributed cache in general. I tried to explain what is the problem and how am I trying to solve it above, let me know if you need additional clarification.

bq. The changes involving "FileUtil.chmod()" look spurious, can you explain those changes?
Bikas asked the same question above :) Quoting my answer: {quote} The issue is that the right permissions are not set on files if I do not make this change. If you take a look at the previous FileUtils.chmod() it only sets permissions for archives, but not for files. Now when I moved it below, it sets the permissions for both files are archives. {quote}

Let me know if you have additional questions/comments.
                
> Public distributed cache support for Windows
> --------------------------------------------
>
>                 Key: HADOOP-8731
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8731
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: filecache
>            Reporter: Ivan Mitic
>            Assignee: Ivan Mitic
>         Attachments: HADOOP-8731-PublicCache.patch
>
>
> A distributed cache file is considered public (sharable between MR jobs) if OTHER has read permissions on the file and +x permissions all the way up in the folder hierarchy. By default, Windows permissions are mapped to "700" all the way up to the drive letter, and it is unreasonable to ask users to change the permission on the whole drive to make the file public. IOW, it is hardly possible to have public distributed cache on Windows. 
> To enable the scenario and make it more "Windows friendly", the criteria on when a file is considered public should be relaxed. One proposal is to check whether the user has given EVERYONE group permission on the file only (and discard the +x check on parent folders).
> Security considerations for the proposal: Default permissions on Unix platforms are usually "775" or "755" meaning that OTHER users can read and list folders by default. What this also means is that Hadoop users have to explicitly make the files private in order to make them private in the cluster (please correct me if this is not the case in real life!). On Windows, default permissions are "700". This means that by default all files are private. In the new model, if users want to make them public, they have to explicitly add EVERYONE group permissions on the file. 
> TestTrackerDistributedCacheManager fails because of this issue.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HADOOP-8731) Public distributed cache support for Windows

Posted by "Bikas Saha (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-8731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13456270#comment-13456270 ] 

Bikas Saha commented on HADOOP-8731:
------------------------------------

Ivan, I think Vinod means the following.
In a real distributed cluster where filesystem is HDFS, then the current methods work because dist cache files are on hdfs and hdfs permissions resemble posix. So isPublic() etc is called on HDFS filesystem. So things will work.
When running using the local file system, say in the LocalJobRunner scenario, dist cache files are on local fs. In that case the current methods like isPublic() do not work on Windows because of the reasons who mentioned. This is what is happening in the test. 

                
> Public distributed cache support for Windows
> --------------------------------------------
>
>                 Key: HADOOP-8731
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8731
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: filecache
>            Reporter: Ivan Mitic
>            Assignee: Ivan Mitic
>         Attachments: HADOOP-8731-PublicCache.patch
>
>
> A distributed cache file is considered public (sharable between MR jobs) if OTHER has read permissions on the file and +x permissions all the way up in the folder hierarchy. By default, Windows permissions are mapped to "700" all the way up to the drive letter, and it is unreasonable to ask users to change the permission on the whole drive to make the file public. IOW, it is hardly possible to have public distributed cache on Windows. 
> To enable the scenario and make it more "Windows friendly", the criteria on when a file is considered public should be relaxed. One proposal is to check whether the user has given EVERYONE group permission on the file only (and discard the +x check on parent folders).
> Security considerations for the proposal: Default permissions on Unix platforms are usually "775" or "755" meaning that OTHER users can read and list folders by default. What this also means is that Hadoop users have to explicitly make the files private in order to make them private in the cluster (please correct me if this is not the case in real life!). On Windows, default permissions are "700". This means that by default all files are private. In the new model, if users want to make them public, they have to explicitly add EVERYONE group permissions on the file. 
> TestTrackerDistributedCacheManager fails because of this issue.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HADOOP-8731) Public distributed cache support for Windows

Posted by "Ivan Mitic (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-8731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ivan Mitic updated HADOOP-8731:
-------------------------------

    Attachment: HADOOP-8731-PublicCache.patch

Attaching the patch.
                
> Public distributed cache support for Windows
> --------------------------------------------
>
>                 Key: HADOOP-8731
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8731
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: filecache
>            Reporter: Ivan Mitic
>            Assignee: Ivan Mitic
>         Attachments: HADOOP-8731-PublicCache.patch
>
>
> A distributed cache file is considered public (sharable between MR jobs) if OTHER has read permissions on the file and +x permissions all the way up in the folder hierarchy. By default, Windows permissions are mapped to "700" all the way up to the drive letter, and it is unreasonable to ask users to change the permission on the whole drive to make the file public. IOW, it is hardly possible to have public distributed cache on Windows. 
> To enable the scenario and make it more "Windows friendly", the criteria on when a file is considered public should be relaxed. One proposal is to check whether the user has given EVERYONE group permission on the file only (and discard the +x check on parent folders).
> Security considerations for the proposal: Default permissions on Unix platforms are usually "775" or "755" meaning that OTHER users can read and list folders by default. What this also means is that Hadoop users have to explicitly make the files private in order to make them private in the cluster (please correct me if this is not the case in real life!). On Windows, default permissions are "700". This means that by default all files are private. In the new model, if users want to make them public, they have to explicitly add EVERYONE group permissions on the file. 
> TestTrackerDistributedCacheManager fails because of this issue.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HADOOP-8731) TestTrackerDistributedCacheManager fails on Windows

Posted by "Ivan Mitic (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-8731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ivan Mitic updated HADOOP-8731:
-------------------------------

    Description: Jira tracking TestTrackerDistributedCacheManager test failure.   (was: A distributed cache file is considered public (sharable between MR jobs) if OTHER has read permissions on the file and +x permissions all the way up in the folder hierarchy. By default, Windows permissions are mapped to "700" all the way up to the drive letter, and it is unreasonable to ask users to change the permission on the whole drive to make the file public. IOW, it is hardly possible to have public distributed cache on Windows. 

To enable the scenario and make it more "Windows friendly", the criteria on when a file is considered public should be relaxed. One proposal is to check whether the user has given EVERYONE group permission on the file only (and discard the +x check on parent folders).

Security considerations for the proposal: Default permissions on Unix platforms are usually "775" or "755" meaning that OTHER users can read and list folders by default. What this also means is that Hadoop users have to explicitly make the files private in order to make them private in the cluster (please correct me if this is not the case in real life!). On Windows, default permissions are "700". This means that by default all files are private. In the new model, if users want to make them public, they have to explicitly add EVERYONE group permissions on the file. 

TestTrackerDistributedCacheManager fails because of this issue.)
        Summary: TestTrackerDistributedCacheManager fails on Windows  (was: Public distributed cache support for Windows)
    
> TestTrackerDistributedCacheManager fails on Windows
> ---------------------------------------------------
>
>                 Key: HADOOP-8731
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8731
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: filecache
>            Reporter: Ivan Mitic
>            Assignee: Ivan Mitic
>         Attachments: HADOOP-8731-PublicCache.2.patch, HADOOP-8731-PublicCache.patch
>
>
> Jira tracking TestTrackerDistributedCacheManager test failure. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira