You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-issues@hadoop.apache.org by "Kihwal Lee (Created) (JIRA)" <ji...@apache.org> on 2012/02/17 23:12:57 UTC

[jira] [Created] (HADOOP-8088) User-group mapping cache incorrectly does negative caching on transient failures

User-group mapping cache incorrectly does negative caching on transient failures
--------------------------------------------------------------------------------

                 Key: HADOOP-8088
                 URL: https://issues.apache.org/jira/browse/HADOOP-8088
             Project: Hadoop Common
          Issue Type: Bug
          Components: security
    Affects Versions: 1.0.0, 0.20.205.0, 0.24.0, 0.23.1, 1.1.0
            Reporter: Kihwal Lee
             Fix For: 0.24.0, 1.1.0, 0.23.2


We've seen a case where some getGroups calls fail when the ldap server or the network has a transient issue. Looking the code, the shell-based and the JNI-based implementation swallow exceptions and return an empty or partial list. The caller, Groups#getGroups() adds this likely empty list into the mapping cache for the user. This will function as negative caching until the cach expires. I don't think we want negative caching here, but even if we do, it should be intelligent enough to distinguish transient failures from ENOENT. The log message in the jni-based imple also needs an improvement. It should print what exception it encountered instead of just saying one happened.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HADOOP-8088) User-group mapping cache incorrectly does negative caching on transient failures

Posted by "Hudson (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-8088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13232267#comment-13232267 ] 

Hudson commented on HADOOP-8088:
--------------------------------

Integrated in Hadoop-Hdfs-trunk #988 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/988/])
    HADOOP-8088. User-group mapping cache incorrectly does negative caching on transient failures (Khiwal Lee via bobby) (Revision 1302062)

     Result = SUCCESS
bobby : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1302062
Files : 
* /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt
* /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/Groups.java
* /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/security/TestGroupsCaching.java

                
> User-group mapping cache incorrectly does negative caching on transient failures
> --------------------------------------------------------------------------------
>
>                 Key: HADOOP-8088
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8088
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: security
>    Affects Versions: 0.20.205.0, 0.24.0, 0.23.1, 1.0.0, 1.1.0
>            Reporter: Kihwal Lee
>             Fix For: 0.24.0, 1.1.0, 0.23.2
>
>         Attachments: hadoop-8088-branch-1.patch, hadoop-8088-branch-1.patch, hadoop-8088-trunk.patch, hadoop-8088-trunk.patch, hadoop-8088-trunk.patch
>
>
> We've seen a case where some getGroups() calls fail when the ldap server or the network is having transient failures. Looking at the code, the shell-based and the JNI-based implementations swallow exceptions and return an empty or partial list. The caller, Groups#getGroups() adds this likely empty list into the mapping cache for the user. This will function as negative caching until the cache expires. I don't think we want negative caching here, but even if we do, it should be intelligent enough to distinguish transient failures from ENOENT. The log message in the jni-based impl also needs an improvement. It should print what exception it encountered instead of just saying one happened.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HADOOP-8088) User-group mapping cache incorrectly does negative caching on transient failures

Posted by "Kihwal Lee (Updated) (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-8088?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Kihwal Lee updated HADOOP-8088:
-------------------------------

    Description: We've seen a case where some getGroups() calls fail when the ldap server or the network is having transient failures. Looking at the code, the shell-based and the JNI-based implementations swallow exceptions and return an empty or partial list. The caller, Groups#getGroups() adds this likely empty list into the mapping cache for the user. This will function as negative caching until the cache expires. I don't think we want negative caching here, but even if we do, it should be intelligent enough to distinguish transient failures from ENOENT. The log message in the jni-based impl also needs an improvement. It should print what exception it encountered instead of just saying one happened.  (was: We've seen a case where some getGroups() calls fail when the ldap server or the network has a transient issue. Looking the code, the shell-based and the JNI-based implementation swallow exceptions and return an empty or partial list. The caller, Groups#getGroups() adds this likely empty list into the mapping cache for the user. This will function as negative caching until the cache expires. I don't think we want negative caching here, but even if we do, it should be intelligent enough to distinguish transient failures from ENOENT. The log message in the jni-based impl also needs an improvement. It should print what exception it encountered instead of just saying one happened.)
    
> User-group mapping cache incorrectly does negative caching on transient failures
> --------------------------------------------------------------------------------
>
>                 Key: HADOOP-8088
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8088
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: security
>    Affects Versions: 0.20.205.0, 0.24.0, 0.23.1, 1.0.0, 1.1.0
>            Reporter: Kihwal Lee
>             Fix For: 0.24.0, 1.1.0, 0.23.2
>
>
> We've seen a case where some getGroups() calls fail when the ldap server or the network is having transient failures. Looking at the code, the shell-based and the JNI-based implementations swallow exceptions and return an empty or partial list. The caller, Groups#getGroups() adds this likely empty list into the mapping cache for the user. This will function as negative caching until the cache expires. I don't think we want negative caching here, but even if we do, it should be intelligent enough to distinguish transient failures from ENOENT. The log message in the jni-based impl also needs an improvement. It should print what exception it encountered instead of just saying one happened.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HADOOP-8088) User-group mapping cache incorrectly does negative caching on transient failures

Posted by "Aaron T. Myers (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-8088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13232138#comment-13232138 ] 

Aaron T. Myers commented on HADOOP-8088:
----------------------------------------

bq. Hudson appears to be down.

I actually think Jenkins itself is fine, but for some reason test-patch builds aren't getting triggered automatically when new patches are uploaded. I've just kicked Jenkins manually to run test-patch for this JIRA: https://builds.apache.org/view/G-L/view/Hadoop/job/PreCommit-HADOOP-Build/724/

                
> User-group mapping cache incorrectly does negative caching on transient failures
> --------------------------------------------------------------------------------
>
>                 Key: HADOOP-8088
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8088
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: security
>    Affects Versions: 0.20.205.0, 0.24.0, 0.23.1, 1.0.0, 1.1.0
>            Reporter: Kihwal Lee
>             Fix For: 0.24.0, 1.1.0, 0.23.2
>
>         Attachments: hadoop-8088-branch-1.patch, hadoop-8088-branch-1.patch, hadoop-8088-trunk.patch, hadoop-8088-trunk.patch, hadoop-8088-trunk.patch
>
>
> We've seen a case where some getGroups() calls fail when the ldap server or the network is having transient failures. Looking at the code, the shell-based and the JNI-based implementations swallow exceptions and return an empty or partial list. The caller, Groups#getGroups() adds this likely empty list into the mapping cache for the user. This will function as negative caching until the cache expires. I don't think we want negative caching here, but even if we do, it should be intelligent enough to distinguish transient failures from ENOENT. The log message in the jni-based impl also needs an improvement. It should print what exception it encountered instead of just saying one happened.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HADOOP-8088) User-group mapping cache incorrectly does negative caching on transient failures

Posted by "Hudson (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-8088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13232147#comment-13232147 ] 

Hudson commented on HADOOP-8088:
--------------------------------

Integrated in Hadoop-Hdfs-0.23-Commit #691 (See [https://builds.apache.org/job/Hadoop-Hdfs-0.23-Commit/691/])
    svn merge -c 1302062 trunk to branch-0.23 FIXES HADOOP-8088. User-group mapping cache incorrectly does negative caching on transient failures (Khiwal Lee via bobby) (Revision 1302063)

     Result = SUCCESS
bobby : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1302063
Files : 
* /hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-common/CHANGES.txt
* /hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/Groups.java
* /hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/security/TestGroupsCaching.java

                
> User-group mapping cache incorrectly does negative caching on transient failures
> --------------------------------------------------------------------------------
>
>                 Key: HADOOP-8088
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8088
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: security
>    Affects Versions: 0.20.205.0, 0.24.0, 0.23.1, 1.0.0, 1.1.0
>            Reporter: Kihwal Lee
>             Fix For: 0.24.0, 1.1.0, 0.23.2
>
>         Attachments: hadoop-8088-branch-1.patch, hadoop-8088-branch-1.patch, hadoop-8088-trunk.patch, hadoop-8088-trunk.patch, hadoop-8088-trunk.patch
>
>
> We've seen a case where some getGroups() calls fail when the ldap server or the network is having transient failures. Looking at the code, the shell-based and the JNI-based implementations swallow exceptions and return an empty or partial list. The caller, Groups#getGroups() adds this likely empty list into the mapping cache for the user. This will function as negative caching until the cache expires. I don't think we want negative caching here, but even if we do, it should be intelligent enough to distinguish transient failures from ENOENT. The log message in the jni-based impl also needs an improvement. It should print what exception it encountered instead of just saying one happened.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Assigned] (HADOOP-8088) User-group mapping cache incorrectly does negative caching on transient failures

Posted by "Kihwal Lee (Assigned) (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-8088?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Kihwal Lee reassigned HADOOP-8088:
----------------------------------

    Assignee: Kihwal Lee
    
> User-group mapping cache incorrectly does negative caching on transient failures
> --------------------------------------------------------------------------------
>
>                 Key: HADOOP-8088
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8088
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: security
>    Affects Versions: 0.20.205.0, 1.0.0, 1.1.0, 0.23.1, 0.24.0
>            Reporter: Kihwal Lee
>            Assignee: Kihwal Lee
>             Fix For: 1.0.2, 0.23.2, 0.24.0
>
>         Attachments: hadoop-8088-branch-1.patch, hadoop-8088-branch-1.patch, hadoop-8088-trunk.patch, hadoop-8088-trunk.patch, hadoop-8088-trunk.patch
>
>
> We've seen a case where some getGroups() calls fail when the ldap server or the network is having transient failures. Looking at the code, the shell-based and the JNI-based implementations swallow exceptions and return an empty or partial list. The caller, Groups#getGroups() adds this likely empty list into the mapping cache for the user. This will function as negative caching until the cache expires. I don't think we want negative caching here, but even if we do, it should be intelligent enough to distinguish transient failures from ENOENT. The log message in the jni-based impl also needs an improvement. It should print what exception it encountered instead of just saying one happened.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HADOOP-8088) User-group mapping cache incorrectly does negative caching on transient failures

Posted by "Kihwal Lee (Updated) (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-8088?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Kihwal Lee updated HADOOP-8088:
-------------------------------

    Description: We've seen a case where some getGroups() calls fail when the ldap server or the network has a transient issue. Looking the code, the shell-based and the JNI-based implementation swallow exceptions and return an empty or partial list. The caller, Groups#getGroups() adds this likely empty list into the mapping cache for the user. This will function as negative caching until the cache expires. I don't think we want negative caching here, but even if we do, it should be intelligent enough to distinguish transient failures from ENOENT. The log message in the jni-based impl also needs an improvement. It should print what exception it encountered instead of just saying one happened.  (was: We've seen a case where some getGroups calls fail when the ldap server or the network has a transient issue. Looking the code, the shell-based and the JNI-based implementation swallow exceptions and return an empty or partial list. The caller, Groups#getGroups() adds this likely empty list into the mapping cache for the user. This will function as negative caching until the cach expires. I don't think we want negative caching here, but even if we do, it should be intelligent enough to distinguish transient failures from ENOENT. The log message in the jni-based imple also needs an improvement. It should print what exception it encountered instead of just saying one happened.)
    
> User-group mapping cache incorrectly does negative caching on transient failures
> --------------------------------------------------------------------------------
>
>                 Key: HADOOP-8088
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8088
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: security
>    Affects Versions: 0.20.205.0, 0.24.0, 0.23.1, 1.0.0, 1.1.0
>            Reporter: Kihwal Lee
>             Fix For: 0.24.0, 1.1.0, 0.23.2
>
>
> We've seen a case where some getGroups() calls fail when the ldap server or the network has a transient issue. Looking the code, the shell-based and the JNI-based implementation swallow exceptions and return an empty or partial list. The caller, Groups#getGroups() adds this likely empty list into the mapping cache for the user. This will function as negative caching until the cache expires. I don't think we want negative caching here, but even if we do, it should be intelligent enough to distinguish transient failures from ENOENT. The log message in the jni-based impl also needs an improvement. It should print what exception it encountered instead of just saying one happened.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HADOOP-8088) User-group mapping cache incorrectly does negative caching on transient failures

Posted by "Kihwal Lee (Updated) (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-8088?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Kihwal Lee updated HADOOP-8088:
-------------------------------

    Attachment: hadoop-8088-trunk.patch
                hadoop-8088-branch-1.patch

The new patch removes the negative caching behavior. The test case was modified accordingly.
                
> User-group mapping cache incorrectly does negative caching on transient failures
> --------------------------------------------------------------------------------
>
>                 Key: HADOOP-8088
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8088
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: security
>    Affects Versions: 0.20.205.0, 0.24.0, 0.23.1, 1.0.0, 1.1.0
>            Reporter: Kihwal Lee
>             Fix For: 0.24.0, 1.1.0, 0.23.2
>
>         Attachments: hadoop-8088-branch-1.patch, hadoop-8088-branch-1.patch, hadoop-8088-trunk.patch, hadoop-8088-trunk.patch, hadoop-8088-trunk.patch
>
>
> We've seen a case where some getGroups() calls fail when the ldap server or the network is having transient failures. Looking at the code, the shell-based and the JNI-based implementations swallow exceptions and return an empty or partial list. The caller, Groups#getGroups() adds this likely empty list into the mapping cache for the user. This will function as negative caching until the cache expires. I don't think we want negative caching here, but even if we do, it should be intelligent enough to distinguish transient failures from ENOENT. The log message in the jni-based impl also needs an improvement. It should print what exception it encountered instead of just saying one happened.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HADOOP-8088) User-group mapping cache incorrectly does negative caching on transient failures

Posted by "Robert Joseph Evans (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-8088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13232155#comment-13232155 ] 

Robert Joseph Evans commented on HADOOP-8088:
---------------------------------------------

All of my tests passed.

Thanks Kihwal.  I just checked this into trunk, branch-0.23, branch-1 and branch-1.0.  I plan to check this into branch-0.23.2 shortly too, and I will resolve the JIRA once I put it into 0.23.2
                
> User-group mapping cache incorrectly does negative caching on transient failures
> --------------------------------------------------------------------------------
>
>                 Key: HADOOP-8088
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8088
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: security
>    Affects Versions: 0.20.205.0, 0.24.0, 0.23.1, 1.0.0, 1.1.0
>            Reporter: Kihwal Lee
>             Fix For: 0.24.0, 1.1.0, 0.23.2
>
>         Attachments: hadoop-8088-branch-1.patch, hadoop-8088-branch-1.patch, hadoop-8088-trunk.patch, hadoop-8088-trunk.patch, hadoop-8088-trunk.patch
>
>
> We've seen a case where some getGroups() calls fail when the ldap server or the network is having transient failures. Looking at the code, the shell-based and the JNI-based implementations swallow exceptions and return an empty or partial list. The caller, Groups#getGroups() adds this likely empty list into the mapping cache for the user. This will function as negative caching until the cache expires. I don't think we want negative caching here, but even if we do, it should be intelligent enough to distinguish transient failures from ENOENT. The log message in the jni-based impl also needs an improvement. It should print what exception it encountered instead of just saying one happened.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HADOOP-8088) User-group mapping cache incorrectly does negative caching on transient failures

Posted by "Kihwal Lee (Updated) (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-8088?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Kihwal Lee updated HADOOP-8088:
-------------------------------

    Attachment: hadoop-8088-trunk.patch

Another patch for TRUNK.
                
> User-group mapping cache incorrectly does negative caching on transient failures
> --------------------------------------------------------------------------------
>
>                 Key: HADOOP-8088
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8088
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: security
>    Affects Versions: 0.20.205.0, 0.24.0, 0.23.1, 1.0.0, 1.1.0
>            Reporter: Kihwal Lee
>             Fix For: 0.24.0, 1.1.0, 0.23.2
>
>         Attachments: hadoop-8088-branch-1.patch, hadoop-8088-trunk.patch
>
>
> We've seen a case where some getGroups() calls fail when the ldap server or the network is having transient failures. Looking at the code, the shell-based and the JNI-based implementations swallow exceptions and return an empty or partial list. The caller, Groups#getGroups() adds this likely empty list into the mapping cache for the user. This will function as negative caching until the cache expires. I don't think we want negative caching here, but even if we do, it should be intelligent enough to distinguish transient failures from ENOENT. The log message in the jni-based impl also needs an improvement. It should print what exception it encountered instead of just saying one happened.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HADOOP-8088) User-group mapping cache incorrectly does negative caching on transient failures

Posted by "Kihwal Lee (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-8088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13231394#comment-13231394 ] 

Kihwal Lee commented on HADOOP-8088:
------------------------------------

We use a refresh cycle of 4 hrs. It becomes a serious issue when headless users associated with tight SLA jobs get in there. This behavior is not a hypothetical one. We've seen this in production. 

I do agree with you on the problem of having too many configs, but we often end up going this route in (sometimes justifiable) fear of breaking compatibility even if the bug is clearly there and in need of fix. I would appreciate your further input on how we can address the issue with the minimum negative impact. 
                
> User-group mapping cache incorrectly does negative caching on transient failures
> --------------------------------------------------------------------------------
>
>                 Key: HADOOP-8088
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8088
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: security
>    Affects Versions: 0.20.205.0, 0.24.0, 0.23.1, 1.0.0, 1.1.0
>            Reporter: Kihwal Lee
>             Fix For: 0.24.0, 1.1.0, 0.23.2
>
>         Attachments: hadoop-8088-branch-1.patch, hadoop-8088-trunk.patch, hadoop-8088-trunk.patch
>
>
> We've seen a case where some getGroups() calls fail when the ldap server or the network is having transient failures. Looking at the code, the shell-based and the JNI-based implementations swallow exceptions and return an empty or partial list. The caller, Groups#getGroups() adds this likely empty list into the mapping cache for the user. This will function as negative caching until the cache expires. I don't think we want negative caching here, but even if we do, it should be intelligent enough to distinguish transient failures from ENOENT. The log message in the jni-based impl also needs an improvement. It should print what exception it encountered instead of just saying one happened.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HADOOP-8088) User-group mapping cache incorrectly does negative caching on transient failures

Posted by "Hudson (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-8088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13246334#comment-13246334 ] 

Hudson commented on HADOOP-8088:
--------------------------------

Integrated in Hadoop-Hdfs-trunk-Commit #2065 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/2065/])
    moved HADOOP-8088 to 0.23.3 (Revision 1309436)

     Result = SUCCESS
bobby : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1309436
Files : 
* /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt

                
> User-group mapping cache incorrectly does negative caching on transient failures
> --------------------------------------------------------------------------------
>
>                 Key: HADOOP-8088
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8088
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: security
>    Affects Versions: 0.20.205.0, 1.0.0, 1.1.0, 0.23.1, 0.24.0
>            Reporter: Kihwal Lee
>            Assignee: Kihwal Lee
>             Fix For: 1.0.2, 0.23.3, 0.24.0
>
>         Attachments: hadoop-8088-branch-1.patch, hadoop-8088-branch-1.patch, hadoop-8088-trunk.patch, hadoop-8088-trunk.patch, hadoop-8088-trunk.patch
>
>
> We've seen a case where some getGroups() calls fail when the ldap server or the network is having transient failures. Looking at the code, the shell-based and the JNI-based implementations swallow exceptions and return an empty or partial list. The caller, Groups#getGroups() adds this likely empty list into the mapping cache for the user. This will function as negative caching until the cache expires. I don't think we want negative caching here, but even if we do, it should be intelligent enough to distinguish transient failures from ENOENT. The log message in the jni-based impl also needs an improvement. It should print what exception it encountered instead of just saying one happened.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HADOOP-8088) User-group mapping cache incorrectly does negative caching on transient failures

Posted by "Kihwal Lee (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-8088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13231574#comment-13231574 ] 

Kihwal Lee commented on HADOOP-8088:
------------------------------------

bq. Thinking about it, it may make sense to not cache negatives at all.

I initially thought so, but was not sure about the original design goal of the group caching. On many systems, nscd or equivalent can do negative caching to reduce unnecessary network traffic to ldap server, etc. So removing the behavior might not be too much of concern.
                
> User-group mapping cache incorrectly does negative caching on transient failures
> --------------------------------------------------------------------------------
>
>                 Key: HADOOP-8088
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8088
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: security
>    Affects Versions: 0.20.205.0, 0.24.0, 0.23.1, 1.0.0, 1.1.0
>            Reporter: Kihwal Lee
>             Fix For: 0.24.0, 1.1.0, 0.23.2
>
>         Attachments: hadoop-8088-branch-1.patch, hadoop-8088-trunk.patch, hadoop-8088-trunk.patch
>
>
> We've seen a case where some getGroups() calls fail when the ldap server or the network is having transient failures. Looking at the code, the shell-based and the JNI-based implementations swallow exceptions and return an empty or partial list. The caller, Groups#getGroups() adds this likely empty list into the mapping cache for the user. This will function as negative caching until the cache expires. I don't think we want negative caching here, but even if we do, it should be intelligent enough to distinguish transient failures from ENOENT. The log message in the jni-based impl also needs an improvement. It should print what exception it encountered instead of just saying one happened.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HADOOP-8088) User-group mapping cache incorrectly does negative caching on transient failures

Posted by "Hudson (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-8088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13248359#comment-13248359 ] 

Hudson commented on HADOOP-8088:
--------------------------------

Integrated in Hadoop-Mapreduce-trunk #1041 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1041/])
    moved HADOOP-8088 to 0.23.3 (Revision 1309436)

     Result = SUCCESS
bobby : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1309436
Files : 
* /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt

                
> User-group mapping cache incorrectly does negative caching on transient failures
> --------------------------------------------------------------------------------
>
>                 Key: HADOOP-8088
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8088
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: security
>    Affects Versions: 0.20.205.0, 1.0.0, 1.1.0, 0.23.1, 0.24.0
>            Reporter: Kihwal Lee
>            Assignee: Kihwal Lee
>             Fix For: 1.0.2, 0.23.3, 2.0.0
>
>         Attachments: hadoop-8088-branch-1.patch, hadoop-8088-branch-1.patch, hadoop-8088-trunk.patch, hadoop-8088-trunk.patch, hadoop-8088-trunk.patch
>
>
> We've seen a case where some getGroups() calls fail when the ldap server or the network is having transient failures. Looking at the code, the shell-based and the JNI-based implementations swallow exceptions and return an empty or partial list. The caller, Groups#getGroups() adds this likely empty list into the mapping cache for the user. This will function as negative caching until the cache expires. I don't think we want negative caching here, but even if we do, it should be intelligent enough to distinguish transient failures from ENOENT. The log message in the jni-based impl also needs an improvement. It should print what exception it encountered instead of just saying one happened.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HADOOP-8088) User-group mapping cache incorrectly does negative caching on transient failures

Posted by "Hudson (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-8088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13246337#comment-13246337 ] 

Hudson commented on HADOOP-8088:
--------------------------------

Integrated in Hadoop-Common-trunk-Commit #1990 (See [https://builds.apache.org/job/Hadoop-Common-trunk-Commit/1990/])
    moved HADOOP-8088 to 0.23.3 (Revision 1309436)

     Result = SUCCESS
bobby : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1309436
Files : 
* /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt

                
> User-group mapping cache incorrectly does negative caching on transient failures
> --------------------------------------------------------------------------------
>
>                 Key: HADOOP-8088
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8088
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: security
>    Affects Versions: 0.20.205.0, 1.0.0, 1.1.0, 0.23.1, 0.24.0
>            Reporter: Kihwal Lee
>            Assignee: Kihwal Lee
>             Fix For: 1.0.2, 0.23.3, 0.24.0
>
>         Attachments: hadoop-8088-branch-1.patch, hadoop-8088-branch-1.patch, hadoop-8088-trunk.patch, hadoop-8088-trunk.patch, hadoop-8088-trunk.patch
>
>
> We've seen a case where some getGroups() calls fail when the ldap server or the network is having transient failures. Looking at the code, the shell-based and the JNI-based implementations swallow exceptions and return an empty or partial list. The caller, Groups#getGroups() adds this likely empty list into the mapping cache for the user. This will function as negative caching until the cache expires. I don't think we want negative caching here, but even if we do, it should be intelligent enough to distinguish transient failures from ENOENT. The log message in the jni-based impl also needs an improvement. It should print what exception it encountered instead of just saying one happened.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HADOOP-8088) User-group mapping cache incorrectly does negative caching on transient failures

Posted by "Hudson (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-8088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13232167#comment-13232167 ] 

Hudson commented on HADOOP-8088:
--------------------------------

Integrated in Hadoop-Mapreduce-0.23-Commit #708 (See [https://builds.apache.org/job/Hadoop-Mapreduce-0.23-Commit/708/])
    svn merge -c 1302062 trunk to branch-0.23 FIXES HADOOP-8088. User-group mapping cache incorrectly does negative caching on transient failures (Khiwal Lee via bobby) (Revision 1302063)

     Result = ABORTED
bobby : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1302063
Files : 
* /hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-common/CHANGES.txt
* /hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/Groups.java
* /hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/security/TestGroupsCaching.java

                
> User-group mapping cache incorrectly does negative caching on transient failures
> --------------------------------------------------------------------------------
>
>                 Key: HADOOP-8088
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8088
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: security
>    Affects Versions: 0.20.205.0, 0.24.0, 0.23.1, 1.0.0, 1.1.0
>            Reporter: Kihwal Lee
>             Fix For: 0.24.0, 1.1.0, 0.23.2
>
>         Attachments: hadoop-8088-branch-1.patch, hadoop-8088-branch-1.patch, hadoop-8088-trunk.patch, hadoop-8088-trunk.patch, hadoop-8088-trunk.patch
>
>
> We've seen a case where some getGroups() calls fail when the ldap server or the network is having transient failures. Looking at the code, the shell-based and the JNI-based implementations swallow exceptions and return an empty or partial list. The caller, Groups#getGroups() adds this likely empty list into the mapping cache for the user. This will function as negative caching until the cache expires. I don't think we want negative caching here, but even if we do, it should be intelligent enough to distinguish transient failures from ENOENT. The log message in the jni-based impl also needs an improvement. It should print what exception it encountered instead of just saying one happened.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HADOOP-8088) User-group mapping cache incorrectly does negative caching on transient failures

Posted by "Kihwal Lee (Updated) (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-8088?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Kihwal Lee updated HADOOP-8088:
-------------------------------

    Attachment: hadoop-8088-branch-1.patch

For the branch-1 patch

{noformat}
     [exec] -1 overall.  
     [exec] 
     [exec]     +1 @author.  The patch does not contain any @author tags.
     [exec] 
     [exec]     +1 tests included.  The patch appears to include 13 new or modified tests.
     [exec] 
     [exec]     +1 javadoc.  The javadoc tool did not generate any warning messages.
     [exec] 
     [exec]     +1 javac.  The applied patch does not increase the total number of javac compiler warnings.
     [exec] 
     [exec]     -1 findbugs.  The patch appears to introduce 6 new Findbugs (version 1.3.9) warnings.
{noformat}

I manually ran findbugs again. There is no difference before/after the patch: Both have the same 217 warnings.
                
> User-group mapping cache incorrectly does negative caching on transient failures
> --------------------------------------------------------------------------------
>
>                 Key: HADOOP-8088
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8088
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: security
>    Affects Versions: 0.20.205.0, 0.24.0, 0.23.1, 1.0.0, 1.1.0
>            Reporter: Kihwal Lee
>             Fix For: 0.24.0, 1.1.0, 0.23.2
>
>         Attachments: hadoop-8088-branch-1.patch, hadoop-8088-trunk.patch
>
>
> We've seen a case where some getGroups() calls fail when the ldap server or the network is having transient failures. Looking at the code, the shell-based and the JNI-based implementations swallow exceptions and return an empty or partial list. The caller, Groups#getGroups() adds this likely empty list into the mapping cache for the user. This will function as negative caching until the cache expires. I don't think we want negative caching here, but even if we do, it should be intelligent enough to distinguish transient failures from ENOENT. The log message in the jni-based impl also needs an improvement. It should print what exception it encountered instead of just saying one happened.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HADOOP-8088) User-group mapping cache incorrectly does negative caching on transient failures

Posted by "Hudson (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-8088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13232270#comment-13232270 ] 

Hudson commented on HADOOP-8088:
--------------------------------

Integrated in Hadoop-Hdfs-0.23-Build #201 (See [https://builds.apache.org/job/Hadoop-Hdfs-0.23-Build/201/])
    svn merge -c 1302062 trunk to branch-0.23 FIXES HADOOP-8088. User-group mapping cache incorrectly does negative caching on transient failures (Khiwal Lee via bobby) (Revision 1302063)

     Result = SUCCESS
bobby : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1302063
Files : 
* /hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-common/CHANGES.txt
* /hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/Groups.java
* /hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/security/TestGroupsCaching.java

                
> User-group mapping cache incorrectly does negative caching on transient failures
> --------------------------------------------------------------------------------
>
>                 Key: HADOOP-8088
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8088
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: security
>    Affects Versions: 0.20.205.0, 0.24.0, 0.23.1, 1.0.0, 1.1.0
>            Reporter: Kihwal Lee
>             Fix For: 0.24.0, 1.1.0, 0.23.2
>
>         Attachments: hadoop-8088-branch-1.patch, hadoop-8088-branch-1.patch, hadoop-8088-trunk.patch, hadoop-8088-trunk.patch, hadoop-8088-trunk.patch
>
>
> We've seen a case where some getGroups() calls fail when the ldap server or the network is having transient failures. Looking at the code, the shell-based and the JNI-based implementations swallow exceptions and return an empty or partial list. The caller, Groups#getGroups() adds this likely empty list into the mapping cache for the user. This will function as negative caching until the cache expires. I don't think we want negative caching here, but even if we do, it should be intelligent enough to distinguish transient failures from ENOENT. The log message in the jni-based impl also needs an improvement. It should print what exception it encountered instead of just saying one happened.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HADOOP-8088) User-group mapping cache incorrectly does negative caching on transient failures

Posted by "Jitendra Nath Pandey (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-8088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13231583#comment-13231583 ] 

Jitendra Nath Pandey commented on HADOOP-8088:
----------------------------------------------

I agree with devaraj's proposal. Too many queries for a non-existent user should happen only in case of a security attack, but authentication should prevent any rogue user if security is enabled. 
                
> User-group mapping cache incorrectly does negative caching on transient failures
> --------------------------------------------------------------------------------
>
>                 Key: HADOOP-8088
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8088
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: security
>    Affects Versions: 0.20.205.0, 0.24.0, 0.23.1, 1.0.0, 1.1.0
>            Reporter: Kihwal Lee
>             Fix For: 0.24.0, 1.1.0, 0.23.2
>
>         Attachments: hadoop-8088-branch-1.patch, hadoop-8088-trunk.patch, hadoop-8088-trunk.patch
>
>
> We've seen a case where some getGroups() calls fail when the ldap server or the network is having transient failures. Looking at the code, the shell-based and the JNI-based implementations swallow exceptions and return an empty or partial list. The caller, Groups#getGroups() adds this likely empty list into the mapping cache for the user. This will function as negative caching until the cache expires. I don't think we want negative caching here, but even if we do, it should be intelligent enough to distinguish transient failures from ENOENT. The log message in the jni-based impl also needs an improvement. It should print what exception it encountered instead of just saying one happened.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HADOOP-8088) User-group mapping cache incorrectly does negative caching on transient failures

Posted by "Kihwal Lee (Updated) (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-8088?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Kihwal Lee updated HADOOP-8088:
-------------------------------

    Status: Patch Available  (was: Open)
    
> User-group mapping cache incorrectly does negative caching on transient failures
> --------------------------------------------------------------------------------
>
>                 Key: HADOOP-8088
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8088
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: security
>    Affects Versions: 1.0.0, 0.23.1, 0.20.205.0, 0.24.0, 1.1.0
>            Reporter: Kihwal Lee
>             Fix For: 0.24.0, 1.1.0, 0.23.2
>
>         Attachments: hadoop-8088-branch-1.patch, hadoop-8088-trunk.patch
>
>
> We've seen a case where some getGroups() calls fail when the ldap server or the network is having transient failures. Looking at the code, the shell-based and the JNI-based implementations swallow exceptions and return an empty or partial list. The caller, Groups#getGroups() adds this likely empty list into the mapping cache for the user. This will function as negative caching until the cache expires. I don't think we want negative caching here, but even if we do, it should be intelligent enough to distinguish transient failures from ENOENT. The log message in the jni-based impl also needs an improvement. It should print what exception it encountered instead of just saying one happened.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HADOOP-8088) User-group mapping cache incorrectly does negative caching on transient failures

Posted by "Hudson (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-8088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13232151#comment-13232151 ] 

Hudson commented on HADOOP-8088:
--------------------------------

Integrated in Hadoop-Common-0.23-Commit #700 (See [https://builds.apache.org/job/Hadoop-Common-0.23-Commit/700/])
    svn merge -c 1302062 trunk to branch-0.23 FIXES HADOOP-8088. User-group mapping cache incorrectly does negative caching on transient failures (Khiwal Lee via bobby) (Revision 1302063)

     Result = SUCCESS
bobby : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1302063
Files : 
* /hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-common/CHANGES.txt
* /hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/Groups.java
* /hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/security/TestGroupsCaching.java

                
> User-group mapping cache incorrectly does negative caching on transient failures
> --------------------------------------------------------------------------------
>
>                 Key: HADOOP-8088
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8088
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: security
>    Affects Versions: 0.20.205.0, 0.24.0, 0.23.1, 1.0.0, 1.1.0
>            Reporter: Kihwal Lee
>             Fix For: 0.24.0, 1.1.0, 0.23.2
>
>         Attachments: hadoop-8088-branch-1.patch, hadoop-8088-branch-1.patch, hadoop-8088-trunk.patch, hadoop-8088-trunk.patch, hadoop-8088-trunk.patch
>
>
> We've seen a case where some getGroups() calls fail when the ldap server or the network is having transient failures. Looking at the code, the shell-based and the JNI-based implementations swallow exceptions and return an empty or partial list. The caller, Groups#getGroups() adds this likely empty list into the mapping cache for the user. This will function as negative caching until the cache expires. I don't think we want negative caching here, but even if we do, it should be intelligent enough to distinguish transient failures from ENOENT. The log message in the jni-based impl also needs an improvement. It should print what exception it encountered instead of just saying one happened.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HADOOP-8088) User-group mapping cache incorrectly does negative caching on transient failures

Posted by "Robert Joseph Evans (Updated) (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-8088?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Robert Joseph Evans updated HADOOP-8088:
----------------------------------------

       Resolution: Fixed
    Fix Version/s:     (was: 0.24.0)
                   2.0.0
           Status: Resolved  (was: Patch Available)
    
> User-group mapping cache incorrectly does negative caching on transient failures
> --------------------------------------------------------------------------------
>
>                 Key: HADOOP-8088
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8088
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: security
>    Affects Versions: 0.20.205.0, 1.0.0, 1.1.0, 0.23.1, 0.24.0
>            Reporter: Kihwal Lee
>            Assignee: Kihwal Lee
>             Fix For: 1.0.2, 0.23.3, 2.0.0
>
>         Attachments: hadoop-8088-branch-1.patch, hadoop-8088-branch-1.patch, hadoop-8088-trunk.patch, hadoop-8088-trunk.patch, hadoop-8088-trunk.patch
>
>
> We've seen a case where some getGroups() calls fail when the ldap server or the network is having transient failures. Looking at the code, the shell-based and the JNI-based implementations swallow exceptions and return an empty or partial list. The caller, Groups#getGroups() adds this likely empty list into the mapping cache for the user. This will function as negative caching until the cache expires. I don't think we want negative caching here, but even if we do, it should be intelligent enough to distinguish transient failures from ENOENT. The log message in the jni-based impl also needs an improvement. It should print what exception it encountered instead of just saying one happened.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HADOOP-8088) User-group mapping cache incorrectly does negative caching on transient failures

Posted by "Aaron T. Myers (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-8088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13231419#comment-13231419 ] 

Aaron T. Myers commented on HADOOP-8088:
----------------------------------------

bq. I do agree with you on the problem of having too many configs, but we often end up going this route in (sometimes justifiable) fear of breaking compatibility even if the bug is clearly there and in need of fix. I would appreciate your further input on how we can address the issue with the minimum negative impact.

The problem of operability is not in having too many configs, but in having too many configs that one *needs to tweak.* I think having a config for this is fine, as long as we choose the correct default. If after deploying this on a cluster an operator decides they want to change the behavior, they'll be very happy there's a config to do it, so they won't have to recompile/redeploy.
                
> User-group mapping cache incorrectly does negative caching on transient failures
> --------------------------------------------------------------------------------
>
>                 Key: HADOOP-8088
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8088
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: security
>    Affects Versions: 0.20.205.0, 0.24.0, 0.23.1, 1.0.0, 1.1.0
>            Reporter: Kihwal Lee
>             Fix For: 0.24.0, 1.1.0, 0.23.2
>
>         Attachments: hadoop-8088-branch-1.patch, hadoop-8088-trunk.patch, hadoop-8088-trunk.patch
>
>
> We've seen a case where some getGroups() calls fail when the ldap server or the network is having transient failures. Looking at the code, the shell-based and the JNI-based implementations swallow exceptions and return an empty or partial list. The caller, Groups#getGroups() adds this likely empty list into the mapping cache for the user. This will function as negative caching until the cache expires. I don't think we want negative caching here, but even if we do, it should be intelligent enough to distinguish transient failures from ENOENT. The log message in the jni-based impl also needs an improvement. It should print what exception it encountered instead of just saying one happened.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HADOOP-8088) User-group mapping cache incorrectly does negative caching on transient failures

Posted by "Hadoop QA (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-8088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13237215#comment-13237215 ] 

Hadoop QA commented on HADOOP-8088:
-----------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12518767/hadoop-8088-trunk.patch
  against trunk revision .

    +1 @author.  The patch does not contain any @author tags.

    +1 tests included.  The patch appears to include 3 new or modified tests.

    -1 patch.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/760//console

This message is automatically generated.
                
> User-group mapping cache incorrectly does negative caching on transient failures
> --------------------------------------------------------------------------------
>
>                 Key: HADOOP-8088
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8088
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: security
>    Affects Versions: 0.20.205.0, 1.0.0, 1.1.0, 0.23.1, 0.24.0
>            Reporter: Kihwal Lee
>             Fix For: 1.0.2, 0.23.2, 0.24.0
>
>         Attachments: hadoop-8088-branch-1.patch, hadoop-8088-branch-1.patch, hadoop-8088-trunk.patch, hadoop-8088-trunk.patch, hadoop-8088-trunk.patch
>
>
> We've seen a case where some getGroups() calls fail when the ldap server or the network is having transient failures. Looking at the code, the shell-based and the JNI-based implementations swallow exceptions and return an empty or partial list. The caller, Groups#getGroups() adds this likely empty list into the mapping cache for the user. This will function as negative caching until the cache expires. I don't think we want negative caching here, but even if we do, it should be intelligent enough to distinguish transient failures from ENOENT. The log message in the jni-based impl also needs an improvement. It should print what exception it encountered instead of just saying one happened.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HADOOP-8088) User-group mapping cache incorrectly does negative caching on transient failures

Posted by "Hudson (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-8088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13232277#comment-13232277 ] 

Hudson commented on HADOOP-8088:
--------------------------------

Integrated in Hadoop-Mapreduce-trunk #1023 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1023/])
    HADOOP-8088. User-group mapping cache incorrectly does negative caching on transient failures (Khiwal Lee via bobby) (Revision 1302062)

     Result = SUCCESS
bobby : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1302062
Files : 
* /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt
* /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/Groups.java
* /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/security/TestGroupsCaching.java

                
> User-group mapping cache incorrectly does negative caching on transient failures
> --------------------------------------------------------------------------------
>
>                 Key: HADOOP-8088
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8088
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: security
>    Affects Versions: 0.20.205.0, 0.24.0, 0.23.1, 1.0.0, 1.1.0
>            Reporter: Kihwal Lee
>             Fix For: 0.24.0, 1.1.0, 0.23.2
>
>         Attachments: hadoop-8088-branch-1.patch, hadoop-8088-branch-1.patch, hadoop-8088-trunk.patch, hadoop-8088-trunk.patch, hadoop-8088-trunk.patch
>
>
> We've seen a case where some getGroups() calls fail when the ldap server or the network is having transient failures. Looking at the code, the shell-based and the JNI-based implementations swallow exceptions and return an empty or partial list. The caller, Groups#getGroups() adds this likely empty list into the mapping cache for the user. This will function as negative caching until the cache expires. I don't think we want negative caching here, but even if we do, it should be intelligent enough to distinguish transient failures from ENOENT. The log message in the jni-based impl also needs an improvement. It should print what exception it encountered instead of just saying one happened.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HADOOP-8088) User-group mapping cache incorrectly does negative caching on transient failures

Posted by "Devaraj Das (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-8088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13231463#comment-13231463 ] 

Devaraj Das commented on HADOOP-8088:
-------------------------------------

Thinking about it, it may make sense to not cache negatives at all (I don't think the original implementation aimed to handle this use-case). I can't imagine a situation where we'd have many queries for non-existent users (so we'd still be fine with not overloading the infrastructure, like ldap, which was the primary motivation for the cache). Does that make sense?
                
> User-group mapping cache incorrectly does negative caching on transient failures
> --------------------------------------------------------------------------------
>
>                 Key: HADOOP-8088
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8088
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: security
>    Affects Versions: 0.20.205.0, 0.24.0, 0.23.1, 1.0.0, 1.1.0
>            Reporter: Kihwal Lee
>             Fix For: 0.24.0, 1.1.0, 0.23.2
>
>         Attachments: hadoop-8088-branch-1.patch, hadoop-8088-trunk.patch, hadoop-8088-trunk.patch
>
>
> We've seen a case where some getGroups() calls fail when the ldap server or the network is having transient failures. Looking at the code, the shell-based and the JNI-based implementations swallow exceptions and return an empty or partial list. The caller, Groups#getGroups() adds this likely empty list into the mapping cache for the user. This will function as negative caching until the cache expires. I don't think we want negative caching here, but even if we do, it should be intelligent enough to distinguish transient failures from ENOENT. The log message in the jni-based impl also needs an improvement. It should print what exception it encountered instead of just saying one happened.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HADOOP-8088) User-group mapping cache incorrectly does negative caching on transient failures

Posted by "Matt Foley (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-8088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13232247#comment-13232247 ] 

Matt Foley commented on HADOOP-8088:
------------------------------------

Just squeaked into 1.0.2-rc1.
                
> User-group mapping cache incorrectly does negative caching on transient failures
> --------------------------------------------------------------------------------
>
>                 Key: HADOOP-8088
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8088
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: security
>    Affects Versions: 0.20.205.0, 0.24.0, 0.23.1, 1.0.0, 1.1.0
>            Reporter: Kihwal Lee
>             Fix For: 0.24.0, 1.1.0, 0.23.2
>
>         Attachments: hadoop-8088-branch-1.patch, hadoop-8088-branch-1.patch, hadoop-8088-trunk.patch, hadoop-8088-trunk.patch, hadoop-8088-trunk.patch
>
>
> We've seen a case where some getGroups() calls fail when the ldap server or the network is having transient failures. Looking at the code, the shell-based and the JNI-based implementations swallow exceptions and return an empty or partial list. The caller, Groups#getGroups() adds this likely empty list into the mapping cache for the user. This will function as negative caching until the cache expires. I don't think we want negative caching here, but even if we do, it should be intelligent enough to distinguish transient failures from ENOENT. The log message in the jni-based impl also needs an improvement. It should print what exception it encountered instead of just saying one happened.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HADOOP-8088) User-group mapping cache incorrectly does negative caching on transient failures

Posted by "Robert Joseph Evans (Updated) (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-8088?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Robert Joseph Evans updated HADOOP-8088:
----------------------------------------

    Fix Version/s:     (was: 1.1.0)
                   1.0.2
    
> User-group mapping cache incorrectly does negative caching on transient failures
> --------------------------------------------------------------------------------
>
>                 Key: HADOOP-8088
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8088
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: security
>    Affects Versions: 0.20.205.0, 0.24.0, 0.23.1, 1.0.0, 1.1.0
>            Reporter: Kihwal Lee
>             Fix For: 0.24.0, 0.23.2, 1.0.2
>
>         Attachments: hadoop-8088-branch-1.patch, hadoop-8088-branch-1.patch, hadoop-8088-trunk.patch, hadoop-8088-trunk.patch, hadoop-8088-trunk.patch
>
>
> We've seen a case where some getGroups() calls fail when the ldap server or the network is having transient failures. Looking at the code, the shell-based and the JNI-based implementations swallow exceptions and return an empty or partial list. The caller, Groups#getGroups() adds this likely empty list into the mapping cache for the user. This will function as negative caching until the cache expires. I don't think we want negative caching here, but even if we do, it should be intelligent enough to distinguish transient failures from ENOENT. The log message in the jni-based impl also needs an improvement. It should print what exception it encountered instead of just saying one happened.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HADOOP-8088) User-group mapping cache incorrectly does negative caching on transient failures

Posted by "Hudson (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-8088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13248273#comment-13248273 ] 

Hudson commented on HADOOP-8088:
--------------------------------

Integrated in Hadoop-Hdfs-0.23-Build #219 (See [https://builds.apache.org/job/Hadoop-Hdfs-0.23-Build/219/])
    svn merge -c 1302062 trunk to branch-0.23 FIXES HADOOP-8088. User-group mapping cache incorrectly does negative caching on transient failures (Revision 1309431)

     Result = SUCCESS
bobby : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1309431
Files : 
* /hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-common/CHANGES.txt
* /hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/Groups.java
* /hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/security/TestGroupsCaching.java

                
> User-group mapping cache incorrectly does negative caching on transient failures
> --------------------------------------------------------------------------------
>
>                 Key: HADOOP-8088
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8088
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: security
>    Affects Versions: 0.20.205.0, 1.0.0, 1.1.0, 0.23.1, 0.24.0
>            Reporter: Kihwal Lee
>            Assignee: Kihwal Lee
>             Fix For: 1.0.2, 0.23.3, 2.0.0
>
>         Attachments: hadoop-8088-branch-1.patch, hadoop-8088-branch-1.patch, hadoop-8088-trunk.patch, hadoop-8088-trunk.patch, hadoop-8088-trunk.patch
>
>
> We've seen a case where some getGroups() calls fail when the ldap server or the network is having transient failures. Looking at the code, the shell-based and the JNI-based implementations swallow exceptions and return an empty or partial list. The caller, Groups#getGroups() adds this likely empty list into the mapping cache for the user. This will function as negative caching until the cache expires. I don't think we want negative caching here, but even if we do, it should be intelligent enough to distinguish transient failures from ENOENT. The log message in the jni-based impl also needs an improvement. It should print what exception it encountered instead of just saying one happened.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HADOOP-8088) User-group mapping cache incorrectly does negative caching on transient failures

Posted by "Hadoop QA (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-8088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13232141#comment-13232141 ] 

Hadoop QA commented on HADOOP-8088:
-----------------------------------

+1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12518767/hadoop-8088-trunk.patch
  against trunk revision .

    +1 @author.  The patch does not contain any @author tags.

    +1 tests included.  The patch appears to include 3 new or modified tests.

    +1 javadoc.  The javadoc tool did not generate any warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    +1 eclipse:eclipse.  The patch built with eclipse:eclipse.

    +1 findbugs.  The patch does not introduce any new Findbugs (version 1.3.9) warnings.

    +1 release audit.  The applied patch does not increase the total number of release audit warnings.

    +1 core tests.  The patch passed unit tests in .

    +1 contrib tests.  The patch passed contrib unit tests.

Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/724//testReport/
Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/724//console

This message is automatically generated.
                
> User-group mapping cache incorrectly does negative caching on transient failures
> --------------------------------------------------------------------------------
>
>                 Key: HADOOP-8088
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8088
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: security
>    Affects Versions: 0.20.205.0, 0.24.0, 0.23.1, 1.0.0, 1.1.0
>            Reporter: Kihwal Lee
>             Fix For: 0.24.0, 1.1.0, 0.23.2
>
>         Attachments: hadoop-8088-branch-1.patch, hadoop-8088-branch-1.patch, hadoop-8088-trunk.patch, hadoop-8088-trunk.patch, hadoop-8088-trunk.patch
>
>
> We've seen a case where some getGroups() calls fail when the ldap server or the network is having transient failures. Looking at the code, the shell-based and the JNI-based implementations swallow exceptions and return an empty or partial list. The caller, Groups#getGroups() adds this likely empty list into the mapping cache for the user. This will function as negative caching until the cache expires. I don't think we want negative caching here, but even if we do, it should be intelligent enough to distinguish transient failures from ENOENT. The log message in the jni-based impl also needs an improvement. It should print what exception it encountered instead of just saying one happened.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HADOOP-8088) User-group mapping cache incorrectly does negative caching on transient failures

Posted by "Hudson (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-8088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13232166#comment-13232166 ] 

Hudson commented on HADOOP-8088:
--------------------------------

Integrated in Hadoop-Mapreduce-trunk-Commit #1905 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Commit/1905/])
    HADOOP-8088. User-group mapping cache incorrectly does negative caching on transient failures (Khiwal Lee via bobby) (Revision 1302062)

     Result = ABORTED
bobby : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1302062
Files : 
* /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt
* /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/Groups.java
* /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/security/TestGroupsCaching.java

                
> User-group mapping cache incorrectly does negative caching on transient failures
> --------------------------------------------------------------------------------
>
>                 Key: HADOOP-8088
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8088
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: security
>    Affects Versions: 0.20.205.0, 0.24.0, 0.23.1, 1.0.0, 1.1.0
>            Reporter: Kihwal Lee
>             Fix For: 0.24.0, 1.1.0, 0.23.2
>
>         Attachments: hadoop-8088-branch-1.patch, hadoop-8088-branch-1.patch, hadoop-8088-trunk.patch, hadoop-8088-trunk.patch, hadoop-8088-trunk.patch
>
>
> We've seen a case where some getGroups() calls fail when the ldap server or the network is having transient failures. Looking at the code, the shell-based and the JNI-based implementations swallow exceptions and return an empty or partial list. The caller, Groups#getGroups() adds this likely empty list into the mapping cache for the user. This will function as negative caching until the cache expires. I don't think we want negative caching here, but even if we do, it should be intelligent enough to distinguish transient failures from ENOENT. The log message in the jni-based impl also needs an improvement. It should print what exception it encountered instead of just saying one happened.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HADOOP-8088) User-group mapping cache incorrectly does negative caching on transient failures

Posted by "Hudson (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-8088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13246382#comment-13246382 ] 

Hudson commented on HADOOP-8088:
--------------------------------

Integrated in Hadoop-Mapreduce-trunk-Commit #2003 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Commit/2003/])
    moved HADOOP-8088 to 0.23.3 (Revision 1309436)

     Result = ABORTED
bobby : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1309436
Files : 
* /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt

                
> User-group mapping cache incorrectly does negative caching on transient failures
> --------------------------------------------------------------------------------
>
>                 Key: HADOOP-8088
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8088
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: security
>    Affects Versions: 0.20.205.0, 1.0.0, 1.1.0, 0.23.1, 0.24.0
>            Reporter: Kihwal Lee
>            Assignee: Kihwal Lee
>             Fix For: 1.0.2, 0.23.3, 0.24.0
>
>         Attachments: hadoop-8088-branch-1.patch, hadoop-8088-branch-1.patch, hadoop-8088-trunk.patch, hadoop-8088-trunk.patch, hadoop-8088-trunk.patch
>
>
> We've seen a case where some getGroups() calls fail when the ldap server or the network is having transient failures. Looking at the code, the shell-based and the JNI-based implementations swallow exceptions and return an empty or partial list. The caller, Groups#getGroups() adds this likely empty list into the mapping cache for the user. This will function as negative caching until the cache expires. I don't think we want negative caching here, but even if we do, it should be intelligent enough to distinguish transient failures from ENOENT. The log message in the jni-based impl also needs an improvement. It should print what exception it encountered instead of just saying one happened.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HADOOP-8088) User-group mapping cache incorrectly does negative caching on transient failures

Posted by "Devaraj Das (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-8088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13231371#comment-13231371 ] 

Devaraj Das commented on HADOOP-8088:
-------------------------------------

The config for negative cache seems like an overkill. I'd suggest that we just handle the transient failures better.
                
> User-group mapping cache incorrectly does negative caching on transient failures
> --------------------------------------------------------------------------------
>
>                 Key: HADOOP-8088
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8088
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: security
>    Affects Versions: 0.20.205.0, 0.24.0, 0.23.1, 1.0.0, 1.1.0
>            Reporter: Kihwal Lee
>             Fix For: 0.24.0, 1.1.0, 0.23.2
>
>         Attachments: hadoop-8088-branch-1.patch, hadoop-8088-trunk.patch, hadoop-8088-trunk.patch
>
>
> We've seen a case where some getGroups() calls fail when the ldap server or the network is having transient failures. Looking at the code, the shell-based and the JNI-based implementations swallow exceptions and return an empty or partial list. The caller, Groups#getGroups() adds this likely empty list into the mapping cache for the user. This will function as negative caching until the cache expires. I don't think we want negative caching here, but even if we do, it should be intelligent enough to distinguish transient failures from ENOENT. The log message in the jni-based impl also needs an improvement. It should print what exception it encountered instead of just saying one happened.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HADOOP-8088) User-group mapping cache incorrectly does negative caching on transient failures

Posted by "Jitendra Nath Pandey (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-8088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13231362#comment-13231362 ] 

Jitendra Nath Pandey commented on HADOOP-8088:
----------------------------------------------

 I am concerned about adding another configuration for negative caching. Every new configuration parameter adds to operational complexity. I was wondering how serious is the issue that empty groups are cached? The groups will be recovered after cache expires. How long is the cache expiry?
                
> User-group mapping cache incorrectly does negative caching on transient failures
> --------------------------------------------------------------------------------
>
>                 Key: HADOOP-8088
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8088
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: security
>    Affects Versions: 0.20.205.0, 0.24.0, 0.23.1, 1.0.0, 1.1.0
>            Reporter: Kihwal Lee
>             Fix For: 0.24.0, 1.1.0, 0.23.2
>
>         Attachments: hadoop-8088-branch-1.patch, hadoop-8088-trunk.patch, hadoop-8088-trunk.patch
>
>
> We've seen a case where some getGroups() calls fail when the ldap server or the network is having transient failures. Looking at the code, the shell-based and the JNI-based implementations swallow exceptions and return an empty or partial list. The caller, Groups#getGroups() adds this likely empty list into the mapping cache for the user. This will function as negative caching until the cache expires. I don't think we want negative caching here, but even if we do, it should be intelligent enough to distinguish transient failures from ENOENT. The log message in the jni-based impl also needs an improvement. It should print what exception it encountered instead of just saying one happened.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HADOOP-8088) User-group mapping cache incorrectly does negative caching on transient failures

Posted by "Robert Joseph Evans (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-8088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13232135#comment-13232135 ] 

Robert Joseph Evans commented on HADOOP-8088:
---------------------------------------------

Hudson appears to be down.  I looked over the patch and it looks good.  +1 assuming that the tests pass when I run them manually.
                
> User-group mapping cache incorrectly does negative caching on transient failures
> --------------------------------------------------------------------------------
>
>                 Key: HADOOP-8088
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8088
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: security
>    Affects Versions: 0.20.205.0, 0.24.0, 0.23.1, 1.0.0, 1.1.0
>            Reporter: Kihwal Lee
>             Fix For: 0.24.0, 1.1.0, 0.23.2
>
>         Attachments: hadoop-8088-branch-1.patch, hadoop-8088-branch-1.patch, hadoop-8088-trunk.patch, hadoop-8088-trunk.patch, hadoop-8088-trunk.patch
>
>
> We've seen a case where some getGroups() calls fail when the ldap server or the network is having transient failures. Looking at the code, the shell-based and the JNI-based implementations swallow exceptions and return an empty or partial list. The caller, Groups#getGroups() adds this likely empty list into the mapping cache for the user. This will function as negative caching until the cache expires. I don't think we want negative caching here, but even if we do, it should be intelligent enough to distinguish transient failures from ENOENT. The log message in the jni-based impl also needs an improvement. It should print what exception it encountered instead of just saying one happened.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HADOOP-8088) User-group mapping cache incorrectly does negative caching on transient failures

Posted by "Robert Joseph Evans (Updated) (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-8088?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Robert Joseph Evans updated HADOOP-8088:
----------------------------------------

    Fix Version/s:     (was: 0.23.2)
                   0.23.3

I pulled this into 0.23.3.  It was listed as being in 0.23.2, but it only made it into branch-0.23, which was renamed to branch-2.
                
> User-group mapping cache incorrectly does negative caching on transient failures
> --------------------------------------------------------------------------------
>
>                 Key: HADOOP-8088
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8088
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: security
>    Affects Versions: 0.20.205.0, 1.0.0, 1.1.0, 0.23.1, 0.24.0
>            Reporter: Kihwal Lee
>            Assignee: Kihwal Lee
>             Fix For: 1.0.2, 0.23.3, 0.24.0
>
>         Attachments: hadoop-8088-branch-1.patch, hadoop-8088-branch-1.patch, hadoop-8088-trunk.patch, hadoop-8088-trunk.patch, hadoop-8088-trunk.patch
>
>
> We've seen a case where some getGroups() calls fail when the ldap server or the network is having transient failures. Looking at the code, the shell-based and the JNI-based implementations swallow exceptions and return an empty or partial list. The caller, Groups#getGroups() adds this likely empty list into the mapping cache for the user. This will function as negative caching until the cache expires. I don't think we want negative caching here, but even if we do, it should be intelligent enough to distinguish transient failures from ENOENT. The log message in the jni-based impl also needs an improvement. It should print what exception it encountered instead of just saying one happened.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HADOOP-8088) User-group mapping cache incorrectly does negative caching on transient failures

Posted by "Kihwal Lee (Updated) (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-8088?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Kihwal Lee updated HADOOP-8088:
-------------------------------

    Attachment: hadoop-8088-trunk.patch

Resubmiting the same trunk patch for jenkins to pick up.
                
> User-group mapping cache incorrectly does negative caching on transient failures
> --------------------------------------------------------------------------------
>
>                 Key: HADOOP-8088
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8088
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: security
>    Affects Versions: 0.20.205.0, 0.24.0, 0.23.1, 1.0.0, 1.1.0
>            Reporter: Kihwal Lee
>             Fix For: 0.24.0, 1.1.0, 0.23.2
>
>         Attachments: hadoop-8088-branch-1.patch, hadoop-8088-trunk.patch, hadoop-8088-trunk.patch
>
>
> We've seen a case where some getGroups() calls fail when the ldap server or the network is having transient failures. Looking at the code, the shell-based and the JNI-based implementations swallow exceptions and return an empty or partial list. The caller, Groups#getGroups() adds this likely empty list into the mapping cache for the user. This will function as negative caching until the cache expires. I don't think we want negative caching here, but even if we do, it should be intelligent enough to distinguish transient failures from ENOENT. The log message in the jni-based impl also needs an improvement. It should print what exception it encountered instead of just saying one happened.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HADOOP-8088) User-group mapping cache incorrectly does negative caching on transient failures

Posted by "Hudson (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-8088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13232271#comment-13232271 ] 

Hudson commented on HADOOP-8088:
--------------------------------

Integrated in Hadoop-Mapreduce-0.23-Build #229 (See [https://builds.apache.org/job/Hadoop-Mapreduce-0.23-Build/229/])
    svn merge -c 1302062 trunk to branch-0.23 FIXES HADOOP-8088. User-group mapping cache incorrectly does negative caching on transient failures (Khiwal Lee via bobby) (Revision 1302063)

     Result = FAILURE
bobby : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1302063
Files : 
* /hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-common/CHANGES.txt
* /hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/Groups.java
* /hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/security/TestGroupsCaching.java

                
> User-group mapping cache incorrectly does negative caching on transient failures
> --------------------------------------------------------------------------------
>
>                 Key: HADOOP-8088
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8088
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: security
>    Affects Versions: 0.20.205.0, 0.24.0, 0.23.1, 1.0.0, 1.1.0
>            Reporter: Kihwal Lee
>             Fix For: 0.24.0, 1.1.0, 0.23.2
>
>         Attachments: hadoop-8088-branch-1.patch, hadoop-8088-branch-1.patch, hadoop-8088-trunk.patch, hadoop-8088-trunk.patch, hadoop-8088-trunk.patch
>
>
> We've seen a case where some getGroups() calls fail when the ldap server or the network is having transient failures. Looking at the code, the shell-based and the JNI-based implementations swallow exceptions and return an empty or partial list. The caller, Groups#getGroups() adds this likely empty list into the mapping cache for the user. This will function as negative caching until the cache expires. I don't think we want negative caching here, but even if we do, it should be intelligent enough to distinguish transient failures from ENOENT. The log message in the jni-based impl also needs an improvement. It should print what exception it encountered instead of just saying one happened.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HADOOP-8088) User-group mapping cache incorrectly does negative caching on transient failures

Posted by "Hudson (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-8088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13232146#comment-13232146 ] 

Hudson commented on HADOOP-8088:
--------------------------------

Integrated in Hadoop-Hdfs-trunk-Commit #1971 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/1971/])
    HADOOP-8088. User-group mapping cache incorrectly does negative caching on transient failures (Khiwal Lee via bobby) (Revision 1302062)

     Result = SUCCESS
bobby : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1302062
Files : 
* /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt
* /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/Groups.java
* /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/security/TestGroupsCaching.java

                
> User-group mapping cache incorrectly does negative caching on transient failures
> --------------------------------------------------------------------------------
>
>                 Key: HADOOP-8088
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8088
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: security
>    Affects Versions: 0.20.205.0, 0.24.0, 0.23.1, 1.0.0, 1.1.0
>            Reporter: Kihwal Lee
>             Fix For: 0.24.0, 1.1.0, 0.23.2
>
>         Attachments: hadoop-8088-branch-1.patch, hadoop-8088-branch-1.patch, hadoop-8088-trunk.patch, hadoop-8088-trunk.patch, hadoop-8088-trunk.patch
>
>
> We've seen a case where some getGroups() calls fail when the ldap server or the network is having transient failures. Looking at the code, the shell-based and the JNI-based implementations swallow exceptions and return an empty or partial list. The caller, Groups#getGroups() adds this likely empty list into the mapping cache for the user. This will function as negative caching until the cache expires. I don't think we want negative caching here, but even if we do, it should be intelligent enough to distinguish transient failures from ENOENT. The log message in the jni-based impl also needs an improvement. It should print what exception it encountered instead of just saying one happened.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HADOOP-8088) User-group mapping cache incorrectly does negative caching on transient failures

Posted by "Hudson (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-8088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13248304#comment-13248304 ] 

Hudson commented on HADOOP-8088:
--------------------------------

Integrated in Hadoop-Hdfs-trunk #1006 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1006/])
    moved HADOOP-8088 to 0.23.3 (Revision 1309436)

     Result = FAILURE
bobby : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1309436
Files : 
* /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt

                
> User-group mapping cache incorrectly does negative caching on transient failures
> --------------------------------------------------------------------------------
>
>                 Key: HADOOP-8088
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8088
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: security
>    Affects Versions: 0.20.205.0, 1.0.0, 1.1.0, 0.23.1, 0.24.0
>            Reporter: Kihwal Lee
>            Assignee: Kihwal Lee
>             Fix For: 1.0.2, 0.23.3, 2.0.0
>
>         Attachments: hadoop-8088-branch-1.patch, hadoop-8088-branch-1.patch, hadoop-8088-trunk.patch, hadoop-8088-trunk.patch, hadoop-8088-trunk.patch
>
>
> We've seen a case where some getGroups() calls fail when the ldap server or the network is having transient failures. Looking at the code, the shell-based and the JNI-based implementations swallow exceptions and return an empty or partial list. The caller, Groups#getGroups() adds this likely empty list into the mapping cache for the user. This will function as negative caching until the cache expires. I don't think we want negative caching here, but even if we do, it should be intelligent enough to distinguish transient failures from ENOENT. The log message in the jni-based impl also needs an improvement. It should print what exception it encountered instead of just saying one happened.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HADOOP-8088) User-group mapping cache incorrectly does negative caching on transient failures

Posted by "Hudson (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-8088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13232148#comment-13232148 ] 

Hudson commented on HADOOP-8088:
--------------------------------

Integrated in Hadoop-Common-trunk-Commit #1897 (See [https://builds.apache.org/job/Hadoop-Common-trunk-Commit/1897/])
    HADOOP-8088. User-group mapping cache incorrectly does negative caching on transient failures (Khiwal Lee via bobby) (Revision 1302062)

     Result = SUCCESS
bobby : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1302062
Files : 
* /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt
* /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/Groups.java
* /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/security/TestGroupsCaching.java

                
> User-group mapping cache incorrectly does negative caching on transient failures
> --------------------------------------------------------------------------------
>
>                 Key: HADOOP-8088
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8088
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: security
>    Affects Versions: 0.20.205.0, 0.24.0, 0.23.1, 1.0.0, 1.1.0
>            Reporter: Kihwal Lee
>             Fix For: 0.24.0, 1.1.0, 0.23.2
>
>         Attachments: hadoop-8088-branch-1.patch, hadoop-8088-branch-1.patch, hadoop-8088-trunk.patch, hadoop-8088-trunk.patch, hadoop-8088-trunk.patch
>
>
> We've seen a case where some getGroups() calls fail when the ldap server or the network is having transient failures. Looking at the code, the shell-based and the JNI-based implementations swallow exceptions and return an empty or partial list. The caller, Groups#getGroups() adds this likely empty list into the mapping cache for the user. This will function as negative caching until the cache expires. I don't think we want negative caching here, but even if we do, it should be intelligent enough to distinguish transient failures from ENOENT. The log message in the jni-based impl also needs an improvement. It should print what exception it encountered instead of just saying one happened.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira