You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-issues@hadoop.apache.org by "Todd Lipcon (JIRA)" <ji...@apache.org> on 2011/02/25 10:23:38 UTC

[jira] Created: (HADOOP-7156) getpwuid_r is not thread-safe on RHEL6

getpwuid_r is not thread-safe on RHEL6
--------------------------------------

                 Key: HADOOP-7156
                 URL: https://issues.apache.org/jira/browse/HADOOP-7156
             Project: Hadoop Common
          Issue Type: Bug
    Affects Versions: 0.22.0
         Environment: RHEL 6.0 "Santiago"
            Reporter: Todd Lipcon
            Assignee: Todd Lipcon
            Priority: Critical
             Fix For: 0.22.0


Due to the following bug in SSSD, functions like getpwuid_r are not thread-safe in RHEL 6.0 if sssd is specified in /etc/nsswitch.conf (as it is by default):

https://fedorahosted.org/sssd/ticket/640

This causes many fetch failures in the case that the native libraries are available, since the SecureIO functions call getpwuid_r as part of fstat. By enabling -Xcheck:jni I get the following trace on JVM crash:

*** glibc detected *** /mnt/toolchain/JDK6u20-64bit/bin/java: free(): invalid pointer: 0x0000003575741d23 ***
======= Backtrace: =========
/lib64/libc.so.6[0x3575675676]
/lib64/libnss_sss.so.2(_nss_sss_getpwuid_r+0x11b)[0x7fe716cb42cb]
/lib64/libc.so.6(getpwuid_r+0xdd)[0x35756a5dfd]


-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Commented: (HADOOP-7156) getpwuid_r is not thread-safe on RHEL6

Posted by "Greg Roelofs (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-7156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13004333#comment-13004333 ] 

Greg Roelofs commented on HADOOP-7156:
--------------------------------------



That wasn't my personal experience (with the setup-related ones), but
admittedly it's a tiny sample.


Dude, I'm still running a 2004-era distro on one of my desktop boxes!
Sure, that's unusual, but ancient servers in production definitely
aren't.  The more you have, the harder it is to validate and move to
a new version.  You've got, what, a few hundred servers to worry about?
We've got a few hundred thousand.  (Not all Hadoop-related, I mean, just
in general.)  I have no idea what the overall mix is, but I was aware of
both RHEL 5.x and 4.x subsets up until fairly recently (the latter also
being 2004-era; the former more like 2007, I think), and I wouldn't be
surprised if there were still *BSD boxes lurking somewhere.  I don't
think there are any RHEL 6.x yet.

So no, OS-related docs don't get stale _that_ quickly, at least not for
some of us. :-)



Not at all.  It's simply a statement that those working most closely with
the code and XML configs--i.e., those who _do_ watch JIRAs and the mailing
lists religiously--are more likely to keep things updated if they're
nearby/noticeable.  We all have limited attention-hours to devote to
maintenance, so anything that makes it easier, faster, or more likely to
happen is a Good Thing.

And I _know_ you're aware that the default state for documentation is
"nonexistent."  Especially where developers are concerned.

(Btw, wasn't there a recent suggestion to automatically pull config keys
and descriptions out of the foo-default.xml files and publish them?  With
something like that in place, the docs you want would fall out of this
and similar patches immediately.  One of my mottos for the last few years
has been, "if it can be automated, it should be automated.")

I guess this has drifted a bit; perhaps further discussion should go to
one of the lists?


> getpwuid_r is not thread-safe on RHEL6
> --------------------------------------
>
>                 Key: HADOOP-7156
>                 URL: https://issues.apache.org/jira/browse/HADOOP-7156
>             Project: Hadoop Common
>          Issue Type: Bug
>    Affects Versions: 0.22.0
>         Environment: RHEL 6.0 "Santiago"
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>            Priority: Critical
>             Fix For: 0.22.0
>
>         Attachments: hadoop-7156.txt, hadoop-7156.txt, hadoop-7156.txt
>
>
> Due to the following bug in SSSD, functions like getpwuid_r are not thread-safe in RHEL 6.0 if sssd is specified in /etc/nsswitch.conf (as it is by default):
> https://fedorahosted.org/sssd/ticket/640
> This causes many fetch failures in the case that the native libraries are available, since the SecureIO functions call getpwuid_r as part of fstat. By enabling -Xcheck:jni I get the following trace on JVM crash:
> *** glibc detected *** /mnt/toolchain/JDK6u20-64bit/bin/java: free(): invalid pointer: 0x0000003575741d23 ***
> ======= Backtrace: =========
> /lib64/libc.so.6[0x3575675676]
> /lib64/libnss_sss.so.2(_nss_sss_getpwuid_r+0x11b)[0x7fe716cb42cb]
> /lib64/libc.so.6(getpwuid_r+0xdd)[0x35756a5dfd]

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Commented: (HADOOP-7156) getpwuid_r is not thread-safe on RHEL6

Posted by "Todd Lipcon (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-7156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13004691#comment-13004691 ] 

Todd Lipcon commented on HADOOP-7156:
-------------------------------------

I will answer all of your questions and hope you will withdraw your -1:

bq.  Are we going to update this list if Debian/Scientific Linux/Mandriva/... are found to be non-POSIX compliant?

I will happily volunteer to +1 and commit a patch that anyone should submit to update the list. Please feel free to do so. The list doesn't claim to be complete, only to provide some examples of systems where it's the case.

bq. If yes, are we going to push a new Apache release when this list is updated?

No, like any other documentation improvement it should wait for the next release. Our definitive documentation lives in the source tree. The fact that some of it is in the wiki doesn't change that.

bq. What does RHEL 6.0 actually mean? If I put a new pam rpm that fixes the issue, am I still running RHEL 6.0?

I would assume RHEL 6.0 means that you are not installing packages that are not part of the RHEL 6.0 release. If you went and installed a random RPM that you built from source or some other vendor, you're no longer running a stock RHEL 6.0. Again, the docs are not meant to be complete. Assumedly if you've updated your pam manually to workaround this issue, you'd know that, and you wouldn't turn on the config option!


And, seriously, with the HADOOP-7115 cache in place, this becomes a really rare race condition. Is it really worth making such a big deal? I'd like to move on to fixing other bugs, please.


> getpwuid_r is not thread-safe on RHEL6
> --------------------------------------
>
>                 Key: HADOOP-7156
>                 URL: https://issues.apache.org/jira/browse/HADOOP-7156
>             Project: Hadoop Common
>          Issue Type: Bug
>    Affects Versions: 0.22.0
>         Environment: RHEL 6.0 "Santiago"
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>            Priority: Critical
>             Fix For: 0.22.0
>
>         Attachments: hadoop-7156.txt, hadoop-7156.txt, hadoop-7156.txt
>
>
> Due to the following bug in SSSD, functions like getpwuid_r are not thread-safe in RHEL 6.0 if sssd is specified in /etc/nsswitch.conf (as it is by default):
> https://fedorahosted.org/sssd/ticket/640
> This causes many fetch failures in the case that the native libraries are available, since the SecureIO functions call getpwuid_r as part of fstat. By enabling -Xcheck:jni I get the following trace on JVM crash:
> *** glibc detected *** /mnt/toolchain/JDK6u20-64bit/bin/java: free(): invalid pointer: 0x0000003575741d23 ***
> ======= Backtrace: =========
> /lib64/libc.so.6[0x3575675676]
> /lib64/libnss_sss.so.2(_nss_sss_getpwuid_r+0x11b)[0x7fe716cb42cb]
> /lib64/libc.so.6(getpwuid_r+0xdd)[0x35756a5dfd]

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Commented: (HADOOP-7156) getpwuid_r is not thread-safe on RHEL6

Posted by "Hudson (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-7156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13005861#comment-13005861 ] 

Hudson commented on HADOOP-7156:
--------------------------------

Integrated in Hadoop-Common-22-branch #32 (See [https://hudson.apache.org/hudson/job/Hadoop-Common-22-branch/32/])
    HADOOP-7156. Workaround for unsafe implementations of getpwuid_r. Contributed by Todd Lipcon.


> getpwuid_r is not thread-safe on RHEL6
> --------------------------------------
>
>                 Key: HADOOP-7156
>                 URL: https://issues.apache.org/jira/browse/HADOOP-7156
>             Project: Hadoop Common
>          Issue Type: Bug
>    Affects Versions: 0.22.0
>         Environment: RHEL 6.0 "Santiago"
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>            Priority: Critical
>             Fix For: 0.22.0
>
>         Attachments: hadoop-7156.txt, hadoop-7156.txt, hadoop-7156.txt, hadoop-7156.txt
>
>
> Due to the following bug in SSSD, functions like getpwuid_r are not thread-safe in RHEL 6.0 if sssd is specified in /etc/nsswitch.conf (as it is by default):
> https://fedorahosted.org/sssd/ticket/640
> This causes many fetch failures in the case that the native libraries are available, since the SecureIO functions call getpwuid_r as part of fstat. By enabling -Xcheck:jni I get the following trace on JVM crash:
> *** glibc detected *** /mnt/toolchain/JDK6u20-64bit/bin/java: free(): invalid pointer: 0x0000003575741d23 ***
> ======= Backtrace: =========
> /lib64/libc.so.6[0x3575675676]
> /lib64/libnss_sss.so.2(_nss_sss_getpwuid_r+0x11b)[0x7fe716cb42cb]
> /lib64/libc.so.6(getpwuid_r+0xdd)[0x35756a5dfd]

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Updated: (HADOOP-7156) getpwuid_r is not thread-safe on RHEL6

Posted by "Todd Lipcon (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-7156?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Todd Lipcon updated HADOOP-7156:
--------------------------------

    Status: Patch Available  (was: Open)

> getpwuid_r is not thread-safe on RHEL6
> --------------------------------------
>
>                 Key: HADOOP-7156
>                 URL: https://issues.apache.org/jira/browse/HADOOP-7156
>             Project: Hadoop Common
>          Issue Type: Bug
>    Affects Versions: 0.22.0
>         Environment: RHEL 6.0 "Santiago"
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>            Priority: Critical
>             Fix For: 0.22.0
>
>         Attachments: hadoop-7156.txt, hadoop-7156.txt
>
>
> Due to the following bug in SSSD, functions like getpwuid_r are not thread-safe in RHEL 6.0 if sssd is specified in /etc/nsswitch.conf (as it is by default):
> https://fedorahosted.org/sssd/ticket/640
> This causes many fetch failures in the case that the native libraries are available, since the SecureIO functions call getpwuid_r as part of fstat. By enabling -Xcheck:jni I get the following trace on JVM crash:
> *** glibc detected *** /mnt/toolchain/JDK6u20-64bit/bin/java: free(): invalid pointer: 0x0000003575741d23 ***
> ======= Backtrace: =========
> /lib64/libc.so.6[0x3575675676]
> /lib64/libnss_sss.so.2(_nss_sss_getpwuid_r+0x11b)[0x7fe716cb42cb]
> /lib64/libc.so.6(getpwuid_r+0xdd)[0x35756a5dfd]

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Commented: (HADOOP-7156) getpwuid_r is not thread-safe on RHEL6

Posted by "Allen Wittenauer (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-7156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13004300#comment-13004300 ] 

Allen Wittenauer commented on HADOOP-7156:
------------------------------------------

>How many of the "currently relevant" wiki pages have been updated in 
> that year-plus? 

Considering how little has changed since Apache 0.20.0 was released almost two years, not many because there was no need.

This JIRA is a great example of something that will likely be long fixed before an actual Apache release with the fix sees the light of day.  Which is why I'm mostly opposed to it going in, but not enough to -1 it.  I think we'll look like fools when a year down the road we have a reference to a bug that was long fixed, but whatever.

> Many of us live in the repo, employmentally speaking. 

In other words, "Who gives a damn about the people who are not running some form of trunk and are not watching JIRAs religiously."  

> getpwuid_r is not thread-safe on RHEL6
> --------------------------------------
>
>                 Key: HADOOP-7156
>                 URL: https://issues.apache.org/jira/browse/HADOOP-7156
>             Project: Hadoop Common
>          Issue Type: Bug
>    Affects Versions: 0.22.0
>         Environment: RHEL 6.0 "Santiago"
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>            Priority: Critical
>             Fix For: 0.22.0
>
>         Attachments: hadoop-7156.txt, hadoop-7156.txt, hadoop-7156.txt
>
>
> Due to the following bug in SSSD, functions like getpwuid_r are not thread-safe in RHEL 6.0 if sssd is specified in /etc/nsswitch.conf (as it is by default):
> https://fedorahosted.org/sssd/ticket/640
> This causes many fetch failures in the case that the native libraries are available, since the SecureIO functions call getpwuid_r as part of fstat. By enabling -Xcheck:jni I get the following trace on JVM crash:
> *** glibc detected *** /mnt/toolchain/JDK6u20-64bit/bin/java: free(): invalid pointer: 0x0000003575741d23 ***
> ======= Backtrace: =========
> /lib64/libc.so.6[0x3575675676]
> /lib64/libnss_sss.so.2(_nss_sss_getpwuid_r+0x11b)[0x7fe716cb42cb]
> /lib64/libc.so.6(getpwuid_r+0xdd)[0x35756a5dfd]

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Commented: (HADOOP-7156) getpwuid_r is not thread-safe on RHEL6

Posted by "Hudson (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-7156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13006028#comment-13006028 ] 

Hudson commented on HADOOP-7156:
--------------------------------

Integrated in Hadoop-Common-trunk #628 (See [https://hudson.apache.org/hudson/job/Hadoop-Common-trunk/628/])
    HADOOP-7156. Workaround for unsafe implementations of getpwuid_r. Contributed by Todd Lipcon.


> getpwuid_r is not thread-safe on RHEL6
> --------------------------------------
>
>                 Key: HADOOP-7156
>                 URL: https://issues.apache.org/jira/browse/HADOOP-7156
>             Project: Hadoop Common
>          Issue Type: Bug
>    Affects Versions: 0.22.0
>         Environment: RHEL 6.0 "Santiago"
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>            Priority: Critical
>             Fix For: 0.22.0
>
>         Attachments: hadoop-7156.txt, hadoop-7156.txt, hadoop-7156.txt, hadoop-7156.txt
>
>
> Due to the following bug in SSSD, functions like getpwuid_r are not thread-safe in RHEL 6.0 if sssd is specified in /etc/nsswitch.conf (as it is by default):
> https://fedorahosted.org/sssd/ticket/640
> This causes many fetch failures in the case that the native libraries are available, since the SecureIO functions call getpwuid_r as part of fstat. By enabling -Xcheck:jni I get the following trace on JVM crash:
> *** glibc detected *** /mnt/toolchain/JDK6u20-64bit/bin/java: free(): invalid pointer: 0x0000003575741d23 ***
> ======= Backtrace: =========
> /lib64/libc.so.6[0x3575675676]
> /lib64/libnss_sss.so.2(_nss_sss_getpwuid_r+0x11b)[0x7fe716cb42cb]
> /lib64/libc.so.6(getpwuid_r+0xdd)[0x35756a5dfd]

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Updated: (HADOOP-7156) getpwuid_r is not thread-safe on RHEL6

Posted by "Todd Lipcon (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-7156?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Todd Lipcon updated HADOOP-7156:
--------------------------------

    Attachment: hadoop-7156.txt

Made a wiki page and linked to it.

If you don't like this patch or the wiki page, please edit it and contribute a version you find acceptable. Otherwise I will consider this version acceptable and commit tomorrow.

> getpwuid_r is not thread-safe on RHEL6
> --------------------------------------
>
>                 Key: HADOOP-7156
>                 URL: https://issues.apache.org/jira/browse/HADOOP-7156
>             Project: Hadoop Common
>          Issue Type: Bug
>    Affects Versions: 0.22.0
>         Environment: RHEL 6.0 "Santiago"
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>            Priority: Critical
>             Fix For: 0.22.0
>
>         Attachments: hadoop-7156.txt, hadoop-7156.txt, hadoop-7156.txt, hadoop-7156.txt
>
>
> Due to the following bug in SSSD, functions like getpwuid_r are not thread-safe in RHEL 6.0 if sssd is specified in /etc/nsswitch.conf (as it is by default):
> https://fedorahosted.org/sssd/ticket/640
> This causes many fetch failures in the case that the native libraries are available, since the SecureIO functions call getpwuid_r as part of fstat. By enabling -Xcheck:jni I get the following trace on JVM crash:
> *** glibc detected *** /mnt/toolchain/JDK6u20-64bit/bin/java: free(): invalid pointer: 0x0000003575741d23 ***
> ======= Backtrace: =========
> /lib64/libc.so.6[0x3575675676]
> /lib64/libnss_sss.so.2(_nss_sss_getpwuid_r+0x11b)[0x7fe716cb42cb]
> /lib64/libc.so.6(getpwuid_r+0xdd)[0x35756a5dfd]

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Commented: (HADOOP-7156) getpwuid_r is not thread-safe on RHEL6

Posted by "Todd Lipcon (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-7156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13005781#comment-13005781 ] 

Todd Lipcon commented on HADOOP-7156:
-------------------------------------

Thanks. Committed to trunk and 22

> getpwuid_r is not thread-safe on RHEL6
> --------------------------------------
>
>                 Key: HADOOP-7156
>                 URL: https://issues.apache.org/jira/browse/HADOOP-7156
>             Project: Hadoop Common
>          Issue Type: Bug
>    Affects Versions: 0.22.0
>         Environment: RHEL 6.0 "Santiago"
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>            Priority: Critical
>             Fix For: 0.22.0
>
>         Attachments: hadoop-7156.txt, hadoop-7156.txt, hadoop-7156.txt, hadoop-7156.txt
>
>
> Due to the following bug in SSSD, functions like getpwuid_r are not thread-safe in RHEL 6.0 if sssd is specified in /etc/nsswitch.conf (as it is by default):
> https://fedorahosted.org/sssd/ticket/640
> This causes many fetch failures in the case that the native libraries are available, since the SecureIO functions call getpwuid_r as part of fstat. By enabling -Xcheck:jni I get the following trace on JVM crash:
> *** glibc detected *** /mnt/toolchain/JDK6u20-64bit/bin/java: free(): invalid pointer: 0x0000003575741d23 ***
> ======= Backtrace: =========
> /lib64/libc.so.6[0x3575675676]
> /lib64/libnss_sss.so.2(_nss_sss_getpwuid_r+0x11b)[0x7fe716cb42cb]
> /lib64/libc.so.6(getpwuid_r+0xdd)[0x35756a5dfd]

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Commented: (HADOOP-7156) getpwuid_r is not thread-safe on RHEL6

Posted by "Allen Wittenauer (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-7156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13004195#comment-13004195 ] 

Allen Wittenauer commented on HADOOP-7156:
------------------------------------------

> have broken implementations of 

Let's call it for what it is: POSIX non-compliance.

> Systems known to exhibit this issue include Red Hat Enterprise Linux 6.0,
> CentOS 6.0, and Vintela Authentication Services.

Why is this text in here?  What happens when RHEL 6 gets this fixed? Do we make a new release to pull the text out?  To me, it makes much more sense to include a link to a wiki page where we can keep the information up-to-date.



> getpwuid_r is not thread-safe on RHEL6
> --------------------------------------
>
>                 Key: HADOOP-7156
>                 URL: https://issues.apache.org/jira/browse/HADOOP-7156
>             Project: Hadoop Common
>          Issue Type: Bug
>    Affects Versions: 0.22.0
>         Environment: RHEL 6.0 "Santiago"
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>            Priority: Critical
>             Fix For: 0.22.0
>
>         Attachments: hadoop-7156.txt, hadoop-7156.txt, hadoop-7156.txt
>
>
> Due to the following bug in SSSD, functions like getpwuid_r are not thread-safe in RHEL 6.0 if sssd is specified in /etc/nsswitch.conf (as it is by default):
> https://fedorahosted.org/sssd/ticket/640
> This causes many fetch failures in the case that the native libraries are available, since the SecureIO functions call getpwuid_r as part of fstat. By enabling -Xcheck:jni I get the following trace on JVM crash:
> *** glibc detected *** /mnt/toolchain/JDK6u20-64bit/bin/java: free(): invalid pointer: 0x0000003575741d23 ***
> ======= Backtrace: =========
> /lib64/libc.so.6[0x3575675676]
> /lib64/libnss_sss.so.2(_nss_sss_getpwuid_r+0x11b)[0x7fe716cb42cb]
> /lib64/libc.so.6(getpwuid_r+0xdd)[0x35756a5dfd]

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Commented: (HADOOP-7156) getpwuid_r is not thread-safe on RHEL6

Posted by "Hudson (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-7156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13005788#comment-13005788 ] 

Hudson commented on HADOOP-7156:
--------------------------------

Integrated in Hadoop-Common-trunk-Commit #524 (See [https://hudson.apache.org/hudson/job/Hadoop-Common-trunk-Commit/524/])
    HADOOP-7156. Workaround for unsafe implementations of getpwuid_r. Contributed by Todd Lipcon.


> getpwuid_r is not thread-safe on RHEL6
> --------------------------------------
>
>                 Key: HADOOP-7156
>                 URL: https://issues.apache.org/jira/browse/HADOOP-7156
>             Project: Hadoop Common
>          Issue Type: Bug
>    Affects Versions: 0.22.0
>         Environment: RHEL 6.0 "Santiago"
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>            Priority: Critical
>             Fix For: 0.22.0
>
>         Attachments: hadoop-7156.txt, hadoop-7156.txt, hadoop-7156.txt, hadoop-7156.txt
>
>
> Due to the following bug in SSSD, functions like getpwuid_r are not thread-safe in RHEL 6.0 if sssd is specified in /etc/nsswitch.conf (as it is by default):
> https://fedorahosted.org/sssd/ticket/640
> This causes many fetch failures in the case that the native libraries are available, since the SecureIO functions call getpwuid_r as part of fstat. By enabling -Xcheck:jni I get the following trace on JVM crash:
> *** glibc detected *** /mnt/toolchain/JDK6u20-64bit/bin/java: free(): invalid pointer: 0x0000003575741d23 ***
> ======= Backtrace: =========
> /lib64/libc.so.6[0x3575675676]
> /lib64/libnss_sss.so.2(_nss_sss_getpwuid_r+0x11b)[0x7fe716cb42cb]
> /lib64/libc.so.6(getpwuid_r+0xdd)[0x35756a5dfd]

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Updated: (HADOOP-7156) getpwuid_r is not thread-safe on RHEL6

Posted by "Todd Lipcon (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-7156?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Todd Lipcon updated HADOOP-7156:
--------------------------------

    Attachment: hadoop-7156.txt

Here's a patch which implements the configuration option.

I did not add it to core-default since it's a corner case... does that seem reasonable or should it be documented?

> getpwuid_r is not thread-safe on RHEL6
> --------------------------------------
>
>                 Key: HADOOP-7156
>                 URL: https://issues.apache.org/jira/browse/HADOOP-7156
>             Project: Hadoop Common
>          Issue Type: Bug
>    Affects Versions: 0.22.0
>         Environment: RHEL 6.0 "Santiago"
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>            Priority: Critical
>             Fix For: 0.22.0
>
>         Attachments: hadoop-7156.txt, hadoop-7156.txt
>
>
> Due to the following bug in SSSD, functions like getpwuid_r are not thread-safe in RHEL 6.0 if sssd is specified in /etc/nsswitch.conf (as it is by default):
> https://fedorahosted.org/sssd/ticket/640
> This causes many fetch failures in the case that the native libraries are available, since the SecureIO functions call getpwuid_r as part of fstat. By enabling -Xcheck:jni I get the following trace on JVM crash:
> *** glibc detected *** /mnt/toolchain/JDK6u20-64bit/bin/java: free(): invalid pointer: 0x0000003575741d23 ***
> ======= Backtrace: =========
> /lib64/libc.so.6[0x3575675676]
> /lib64/libnss_sss.so.2(_nss_sss_getpwuid_r+0x11b)[0x7fe716cb42cb]
> /lib64/libc.so.6(getpwuid_r+0xdd)[0x35756a5dfd]

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Commented: (HADOOP-7156) getpwuid_r is not thread-safe on RHEL6

Posted by "Allen Wittenauer (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-7156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13003453#comment-13003453 ] 

Allen Wittenauer commented on HADOOP-7156:
------------------------------------------

If someone wants to use non-conforming software, that is their problem, not ours.  I'd much rather shame a company into fixing it than pretending that everything is a-ok.  Yes, I realize that is idealistic.

In the case of RHEL 6, it is what? 2 months old?  Like any .0, it is going to have growing pains during its first few months that should get fixed by the vendor.  As for VAS, I'm not really sure I care.

In the case of #1, what is the point of mutexes around a thread-safe function?  Why not just call the unsafe one and avoid the extra performance hit on machines that follow POSIX?

In the case of #3, I'll be OK with this if we also add a huge warning in the task log telling the user that their system isn't POSIX compliant and to call their vendor to get it fixed. :p



> getpwuid_r is not thread-safe on RHEL6
> --------------------------------------
>
>                 Key: HADOOP-7156
>                 URL: https://issues.apache.org/jira/browse/HADOOP-7156
>             Project: Hadoop Common
>          Issue Type: Bug
>    Affects Versions: 0.22.0
>         Environment: RHEL 6.0 "Santiago"
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>            Priority: Critical
>             Fix For: 0.22.0
>
>         Attachments: hadoop-7156.txt
>
>
> Due to the following bug in SSSD, functions like getpwuid_r are not thread-safe in RHEL 6.0 if sssd is specified in /etc/nsswitch.conf (as it is by default):
> https://fedorahosted.org/sssd/ticket/640
> This causes many fetch failures in the case that the native libraries are available, since the SecureIO functions call getpwuid_r as part of fstat. By enabling -Xcheck:jni I get the following trace on JVM crash:
> *** glibc detected *** /mnt/toolchain/JDK6u20-64bit/bin/java: free(): invalid pointer: 0x0000003575741d23 ***
> ======= Backtrace: =========
> /lib64/libc.so.6[0x3575675676]
> /lib64/libnss_sss.so.2(_nss_sss_getpwuid_r+0x11b)[0x7fe716cb42cb]
> /lib64/libc.so.6(getpwuid_r+0xdd)[0x35756a5dfd]

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Commented: (HADOOP-7156) getpwuid_r is not thread-safe on RHEL6

Posted by "Greg Roelofs (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-7156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13003796#comment-13003796 ] 

Greg Roelofs commented on HADOOP-7156:
--------------------------------------

bq. I did not add it to core-default since it's a corner case... does that seem reasonable or should it be documented?

Please please please...always document things like this. Picture it from the perspective of some new user who hasn't discovered the lists yet (or hasn't been subscribed for very long, and then only on the -user list(s)): they know they're running RHEL-6 and probably will notice a config item that explicitly mentions it (which I'd recommend, along with CentOS and any other known distro versions), but they're much less likely to know what's up with a random glibc crash several months later.

> getpwuid_r is not thread-safe on RHEL6
> --------------------------------------
>
>                 Key: HADOOP-7156
>                 URL: https://issues.apache.org/jira/browse/HADOOP-7156
>             Project: Hadoop Common
>          Issue Type: Bug
>    Affects Versions: 0.22.0
>         Environment: RHEL 6.0 "Santiago"
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>            Priority: Critical
>             Fix For: 0.22.0
>
>         Attachments: hadoop-7156.txt, hadoop-7156.txt
>
>
> Due to the following bug in SSSD, functions like getpwuid_r are not thread-safe in RHEL 6.0 if sssd is specified in /etc/nsswitch.conf (as it is by default):
> https://fedorahosted.org/sssd/ticket/640
> This causes many fetch failures in the case that the native libraries are available, since the SecureIO functions call getpwuid_r as part of fstat. By enabling -Xcheck:jni I get the following trace on JVM crash:
> *** glibc detected *** /mnt/toolchain/JDK6u20-64bit/bin/java: free(): invalid pointer: 0x0000003575741d23 ***
> ======= Backtrace: =========
> /lib64/libc.so.6[0x3575675676]
> /lib64/libnss_sss.so.2(_nss_sss_getpwuid_r+0x11b)[0x7fe716cb42cb]
> /lib64/libc.so.6(getpwuid_r+0xdd)[0x35756a5dfd]

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Commented: (HADOOP-7156) getpwuid_r is not thread-safe on RHEL6

Posted by "Greg Roelofs (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-7156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13004258#comment-13004258 ] 

Greg Roelofs commented on HADOOP-7156:
--------------------------------------

Doh!  FF crashed while I was replying, sigh.  Switching to e-mail:

bq. In my experience, we do a really bad job of keeping the wiki up to date. Greg, what do you think?

I agree--we're much better at keeping the code up to date (frequently in
parallel across multiple branches ;-) ) than at keeping the wiki current.

I think the XML config text is fine; you could optionally prefix it with
"As of March 2011, systems known to ..." as a hint to users or future versions
of us to recheck it if significant time has passed.  The comment in NativeIO.c
probably should be modified; perhaps "monitor used for working around a bug
in the sssd security daemon, which was observed in getpwuid_r() on RHEL 6.0,"
or words to that effect.  (Need not be that verbose, of course.)

I also agree with Eli that we can leave the workaround disabled for tests.
It might be worthwhile to add a log message at the start that "this test
may fail (crash) with an invalid free() on some systems; see HADOOP-7156
for details."  Again, feel free to word it however you wish.

Trivial grammo:  "workaround" is a noun; the verb form is "work around"
(similar to layout, backup, setup, cleanup, checkin, cutoff, etc.).  The
various variable names would be more proper if they reflected this (e.g.,
WORK_AROUND_NON_THREADSAFE_CALLS_KEY, workAroundNonThreadSafePasswdCalls
[or workAroundNonThreadsafePasswdCalls, since you're using "threadsafe"
as a single word elsewhere]), but I won't fuss if you leave them as is.


> getpwuid_r is not thread-safe on RHEL6
> --------------------------------------
>
>                 Key: HADOOP-7156
>                 URL: https://issues.apache.org/jira/browse/HADOOP-7156
>             Project: Hadoop Common
>          Issue Type: Bug
>    Affects Versions: 0.22.0
>         Environment: RHEL 6.0 "Santiago"
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>            Priority: Critical
>             Fix For: 0.22.0
>
>         Attachments: hadoop-7156.txt, hadoop-7156.txt, hadoop-7156.txt
>
>
> Due to the following bug in SSSD, functions like getpwuid_r are not thread-safe in RHEL 6.0 if sssd is specified in /etc/nsswitch.conf (as it is by default):
> https://fedorahosted.org/sssd/ticket/640
> This causes many fetch failures in the case that the native libraries are available, since the SecureIO functions call getpwuid_r as part of fstat. By enabling -Xcheck:jni I get the following trace on JVM crash:
> *** glibc detected *** /mnt/toolchain/JDK6u20-64bit/bin/java: free(): invalid pointer: 0x0000003575741d23 ***
> ======= Backtrace: =========
> /lib64/libc.so.6[0x3575675676]
> /lib64/libnss_sss.so.2(_nss_sss_getpwuid_r+0x11b)[0x7fe716cb42cb]
> /lib64/libc.so.6(getpwuid_r+0xdd)[0x35756a5dfd]

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Commented: (HADOOP-7156) getpwuid_r is not thread-safe on RHEL6

Posted by "Allen Wittenauer (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-7156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13004684#comment-13004684 ] 

Allen Wittenauer commented on HADOOP-7156:
------------------------------------------

Actually, rather than let you dig Yahoo! in deeper, I'm just going to -1 the patch due to the config description on the grounds that it is both too generic and too specific:

* Are we going to update this list if Debian/Scientific Linux/Mandriva/... are found to be non-POSIX compliant?
* If yes, are we going to push a new Apache release when this list is updated?
* What does RHEL 6.0 actually mean?  If I put a new pam rpm that fixes the issue, am I still running RHEL 6.0?

A living document for when to use this as an option is going to be much better over the long haul than a static file.

> getpwuid_r is not thread-safe on RHEL6
> --------------------------------------
>
>                 Key: HADOOP-7156
>                 URL: https://issues.apache.org/jira/browse/HADOOP-7156
>             Project: Hadoop Common
>          Issue Type: Bug
>    Affects Versions: 0.22.0
>         Environment: RHEL 6.0 "Santiago"
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>            Priority: Critical
>             Fix For: 0.22.0
>
>         Attachments: hadoop-7156.txt, hadoop-7156.txt, hadoop-7156.txt
>
>
> Due to the following bug in SSSD, functions like getpwuid_r are not thread-safe in RHEL 6.0 if sssd is specified in /etc/nsswitch.conf (as it is by default):
> https://fedorahosted.org/sssd/ticket/640
> This causes many fetch failures in the case that the native libraries are available, since the SecureIO functions call getpwuid_r as part of fstat. By enabling -Xcheck:jni I get the following trace on JVM crash:
> *** glibc detected *** /mnt/toolchain/JDK6u20-64bit/bin/java: free(): invalid pointer: 0x0000003575741d23 ***
> ======= Backtrace: =========
> /lib64/libc.so.6[0x3575675676]
> /lib64/libnss_sss.so.2(_nss_sss_getpwuid_r+0x11b)[0x7fe716cb42cb]
> /lib64/libc.so.6(getpwuid_r+0xdd)[0x35756a5dfd]

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Commented: (HADOOP-7156) getpwuid_r is not thread-safe on RHEL6

Posted by "Greg Roelofs (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-7156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13004284#comment-13004284 ] 

Greg Roelofs commented on HADOOP-7156:
--------------------------------------

bq. Really?  We haven't had a release marked stable in over a year.  (Releases outside Apache do *not* count.)

How many of the "currently relevant" wiki pages have been updated in that
year-plus?  (I haven't checked, but the one or two "getting started" ones
I looked at way back when were incomplete at best, and possibly wrong in
places, so I'm guessing the answer is "not many."  That's certainly been
my repeated experience with internal, corporate wikis.)

In my experience, the farther the docs are from the code, the less likely
they are to be current and correct.  That doesn't mean there can't be
additional forms of non-repo docs, but docs within the source-code repo
should be the priority.  Many of us _live_ in the repo, employmentally
speaking.  (New word.  Cool.)


> getpwuid_r is not thread-safe on RHEL6
> --------------------------------------
>
>                 Key: HADOOP-7156
>                 URL: https://issues.apache.org/jira/browse/HADOOP-7156
>             Project: Hadoop Common
>          Issue Type: Bug
>    Affects Versions: 0.22.0
>         Environment: RHEL 6.0 "Santiago"
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>            Priority: Critical
>             Fix For: 0.22.0
>
>         Attachments: hadoop-7156.txt, hadoop-7156.txt, hadoop-7156.txt
>
>
> Due to the following bug in SSSD, functions like getpwuid_r are not thread-safe in RHEL 6.0 if sssd is specified in /etc/nsswitch.conf (as it is by default):
> https://fedorahosted.org/sssd/ticket/640
> This causes many fetch failures in the case that the native libraries are available, since the SecureIO functions call getpwuid_r as part of fstat. By enabling -Xcheck:jni I get the following trace on JVM crash:
> *** glibc detected *** /mnt/toolchain/JDK6u20-64bit/bin/java: free(): invalid pointer: 0x0000003575741d23 ***
> ======= Backtrace: =========
> /lib64/libc.so.6[0x3575675676]
> /lib64/libnss_sss.so.2(_nss_sss_getpwuid_r+0x11b)[0x7fe716cb42cb]
> /lib64/libc.so.6(getpwuid_r+0xdd)[0x35756a5dfd]

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Updated: (HADOOP-7156) getpwuid_r is not thread-safe on RHEL6

Posted by "Todd Lipcon (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-7156?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Todd Lipcon updated HADOOP-7156:
--------------------------------

      Resolution: Fixed
    Release Note: Adds a new configuration hadoop.work.around.non.threadsafe.getpwuid which can be used to enable a mutex around this call to workaround thread-unsafe implementations of getpwuid_r. Users should consult http://wiki.apache.org/hadoop/KnownBrokenPwuidImplementations for a list of such systems.
    Hadoop Flags: [Reviewed]
          Status: Resolved  (was: Patch Available)

> getpwuid_r is not thread-safe on RHEL6
> --------------------------------------
>
>                 Key: HADOOP-7156
>                 URL: https://issues.apache.org/jira/browse/HADOOP-7156
>             Project: Hadoop Common
>          Issue Type: Bug
>    Affects Versions: 0.22.0
>         Environment: RHEL 6.0 "Santiago"
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>            Priority: Critical
>             Fix For: 0.22.0
>
>         Attachments: hadoop-7156.txt, hadoop-7156.txt, hadoop-7156.txt, hadoop-7156.txt
>
>
> Due to the following bug in SSSD, functions like getpwuid_r are not thread-safe in RHEL 6.0 if sssd is specified in /etc/nsswitch.conf (as it is by default):
> https://fedorahosted.org/sssd/ticket/640
> This causes many fetch failures in the case that the native libraries are available, since the SecureIO functions call getpwuid_r as part of fstat. By enabling -Xcheck:jni I get the following trace on JVM crash:
> *** glibc detected *** /mnt/toolchain/JDK6u20-64bit/bin/java: free(): invalid pointer: 0x0000003575741d23 ***
> ======= Backtrace: =========
> /lib64/libc.so.6[0x3575675676]
> /lib64/libnss_sss.so.2(_nss_sss_getpwuid_r+0x11b)[0x7fe716cb42cb]
> /lib64/libc.so.6(getpwuid_r+0xdd)[0x35756a5dfd]

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Commented: (HADOOP-7156) getpwuid_r is not thread-safe on RHEL6

Posted by "Allen Wittenauer (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-7156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13002046#comment-13002046 ] 

Allen Wittenauer commented on HADOOP-7156:
------------------------------------------

Again, we're patching around a particular bug in a particular OS, just as we did in HADOOP-7115.  This trend is very disturbing.  

If we really want to go down this road, then yes, this should be compile time.  But for now, I'd rather just advertise "RHEL 6.0 is broken; don't use it" just like we do for JREs. 

> getpwuid_r is not thread-safe on RHEL6
> --------------------------------------
>
>                 Key: HADOOP-7156
>                 URL: https://issues.apache.org/jira/browse/HADOOP-7156
>             Project: Hadoop Common
>          Issue Type: Bug
>    Affects Versions: 0.22.0
>         Environment: RHEL 6.0 "Santiago"
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>            Priority: Critical
>             Fix For: 0.22.0
>
>         Attachments: hadoop-7156.txt
>
>
> Due to the following bug in SSSD, functions like getpwuid_r are not thread-safe in RHEL 6.0 if sssd is specified in /etc/nsswitch.conf (as it is by default):
> https://fedorahosted.org/sssd/ticket/640
> This causes many fetch failures in the case that the native libraries are available, since the SecureIO functions call getpwuid_r as part of fstat. By enabling -Xcheck:jni I get the following trace on JVM crash:
> *** glibc detected *** /mnt/toolchain/JDK6u20-64bit/bin/java: free(): invalid pointer: 0x0000003575741d23 ***
> ======= Backtrace: =========
> /lib64/libc.so.6[0x3575675676]
> /lib64/libnss_sss.so.2(_nss_sss_getpwuid_r+0x11b)[0x7fe716cb42cb]
> /lib64/libc.so.6(getpwuid_r+0xdd)[0x35756a5dfd]

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Commented: (HADOOP-7156) getpwuid_r is not thread-safe on RHEL6

Posted by "Todd Lipcon (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-7156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13002903#comment-13002903 ] 

Todd Lipcon commented on HADOOP-7156:
-------------------------------------

Apparently no one out there who implements pam modules knows that getpwuid_r is supposed to be reentrant :) A user saw this same issue with VAS:

/lib64/libc.so.6[0x3757c7230f]
/lib64/libc.so.6(cfree+0x4b)[0x3757c7276b]
/opt/quest/lib64/libvas.so.4(vassym_sqlite3FreeX+0xe)[0x2aaaab2bd15e]
/opt/quest/lib64/libvas.so.4[0x2aaaab2ac9b9]
/opt/quest/lib64/libvas.so.4(vassym_sqlite3OsClose+0x93)[0x2aaaab2ad6d3]
/opt/quest/lib64/libvas.so.4(vassym_sqlite3pager_close+0x4d)[0x2aaaab2af1cd]
/opt/quest/lib64/libvas.so.4(vassym_sqlite3BtreeClose+0x15)[0x2aaaab29a235]
/opt/quest/lib64/libvas.so.4(vassym_sqlite3_close+0x15b)[0x2aaaab2ab62b]
/opt/quest/lib64/libvas.so.4(vassql_deinit+0xdf)[0x2aaaab3158de]
/opt/quest/lib64/libvas.so.4[0x2aaaab2f3ef9]
/opt/quest/lib64/libvas.so.4(vascache_deinit+0x30)[0x2aaaab2f438f]
/lib64/libnss_vas3.so.2[0x2aaaab027240]
/lib64/libnss_vas3.so.2(_nss_vas3_getXXent_release_tsd+0x10e)[0x2aaaab0284ff]
/lib64/libnss_vas3.so.2[0x2aaaab02c3c0]
/lib64/libnss_vas3.so.2(_nss_vas3_getpwuid_r+0x36)[0x2aaaab02d242]
/lib64/libc.so.6(getpwuid_r+0xa5)[0x3757c99515]


> getpwuid_r is not thread-safe on RHEL6
> --------------------------------------
>
>                 Key: HADOOP-7156
>                 URL: https://issues.apache.org/jira/browse/HADOOP-7156
>             Project: Hadoop Common
>          Issue Type: Bug
>    Affects Versions: 0.22.0
>         Environment: RHEL 6.0 "Santiago"
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>            Priority: Critical
>             Fix For: 0.22.0
>
>         Attachments: hadoop-7156.txt
>
>
> Due to the following bug in SSSD, functions like getpwuid_r are not thread-safe in RHEL 6.0 if sssd is specified in /etc/nsswitch.conf (as it is by default):
> https://fedorahosted.org/sssd/ticket/640
> This causes many fetch failures in the case that the native libraries are available, since the SecureIO functions call getpwuid_r as part of fstat. By enabling -Xcheck:jni I get the following trace on JVM crash:
> *** glibc detected *** /mnt/toolchain/JDK6u20-64bit/bin/java: free(): invalid pointer: 0x0000003575741d23 ***
> ======= Backtrace: =========
> /lib64/libc.so.6[0x3575675676]
> /lib64/libnss_sss.so.2(_nss_sss_getpwuid_r+0x11b)[0x7fe716cb42cb]
> /lib64/libc.so.6(getpwuid_r+0xdd)[0x35756a5dfd]

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Commented: (HADOOP-7156) getpwuid_r is not thread-safe on RHEL6

Posted by "Allen Wittenauer (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-7156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13004271#comment-13004271 ] 

Allen Wittenauer commented on HADOOP-7156:
------------------------------------------

> That will be RHEL 6.1, not RHEL 6.0. So this should remain accurate.

If I rpm install just a new pam from the Red Hat repo, what version am I running?

> I agree--we're much better at keeping the code up to date (frequently in
> parallel across multiple branches ) than at keeping the wiki current.

Really?  We haven't had a release marked stable in over a year.  (Releases outside Apache do *not* count.)



> getpwuid_r is not thread-safe on RHEL6
> --------------------------------------
>
>                 Key: HADOOP-7156
>                 URL: https://issues.apache.org/jira/browse/HADOOP-7156
>             Project: Hadoop Common
>          Issue Type: Bug
>    Affects Versions: 0.22.0
>         Environment: RHEL 6.0 "Santiago"
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>            Priority: Critical
>             Fix For: 0.22.0
>
>         Attachments: hadoop-7156.txt, hadoop-7156.txt, hadoop-7156.txt
>
>
> Due to the following bug in SSSD, functions like getpwuid_r are not thread-safe in RHEL 6.0 if sssd is specified in /etc/nsswitch.conf (as it is by default):
> https://fedorahosted.org/sssd/ticket/640
> This causes many fetch failures in the case that the native libraries are available, since the SecureIO functions call getpwuid_r as part of fstat. By enabling -Xcheck:jni I get the following trace on JVM crash:
> *** glibc detected *** /mnt/toolchain/JDK6u20-64bit/bin/java: free(): invalid pointer: 0x0000003575741d23 ***
> ======= Backtrace: =========
> /lib64/libc.so.6[0x3575675676]
> /lib64/libnss_sss.so.2(_nss_sss_getpwuid_r+0x11b)[0x7fe716cb42cb]
> /lib64/libc.so.6(getpwuid_r+0xdd)[0x35756a5dfd]

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Updated: (HADOOP-7156) getpwuid_r is not thread-safe on RHEL6

Posted by "Todd Lipcon (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-7156?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Todd Lipcon updated HADOOP-7156:
--------------------------------

    Attachment: hadoop-7156.txt

Updated version includes docs in core-default.xml

Do you agree with the current test case? It will fail with JVM exit on RHEL6 currently. Should we set the workaround variable to true in the test settings?

Another option is to add a bit of code to apply some regexes to /etc/redhat-release and enable by default if running RHEL or CentOS 6.

> getpwuid_r is not thread-safe on RHEL6
> --------------------------------------
>
>                 Key: HADOOP-7156
>                 URL: https://issues.apache.org/jira/browse/HADOOP-7156
>             Project: Hadoop Common
>          Issue Type: Bug
>    Affects Versions: 0.22.0
>         Environment: RHEL 6.0 "Santiago"
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>            Priority: Critical
>             Fix For: 0.22.0
>
>         Attachments: hadoop-7156.txt, hadoop-7156.txt, hadoop-7156.txt
>
>
> Due to the following bug in SSSD, functions like getpwuid_r are not thread-safe in RHEL 6.0 if sssd is specified in /etc/nsswitch.conf (as it is by default):
> https://fedorahosted.org/sssd/ticket/640
> This causes many fetch failures in the case that the native libraries are available, since the SecureIO functions call getpwuid_r as part of fstat. By enabling -Xcheck:jni I get the following trace on JVM crash:
> *** glibc detected *** /mnt/toolchain/JDK6u20-64bit/bin/java: free(): invalid pointer: 0x0000003575741d23 ***
> ======= Backtrace: =========
> /lib64/libc.so.6[0x3575675676]
> /lib64/libnss_sss.so.2(_nss_sss_getpwuid_r+0x11b)[0x7fe716cb42cb]
> /lib64/libc.so.6(getpwuid_r+0xdd)[0x35756a5dfd]

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Commented: (HADOOP-7156) getpwuid_r is not thread-safe on RHEL6

Posted by "Todd Lipcon (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-7156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12999284#comment-12999284 ] 

Todd Lipcon commented on HADOOP-7156:
-------------------------------------

Filed a RHEL bug at: https://bugzilla.redhat.com/show_bug.cgi?id=680367

In the meantime we might consider adding a mutex around this call which can be #ifdeffed in on this platform as a workaround.

> getpwuid_r is not thread-safe on RHEL6
> --------------------------------------
>
>                 Key: HADOOP-7156
>                 URL: https://issues.apache.org/jira/browse/HADOOP-7156
>             Project: Hadoop Common
>          Issue Type: Bug
>    Affects Versions: 0.22.0
>         Environment: RHEL 6.0 "Santiago"
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>            Priority: Critical
>             Fix For: 0.22.0
>
>
> Due to the following bug in SSSD, functions like getpwuid_r are not thread-safe in RHEL 6.0 if sssd is specified in /etc/nsswitch.conf (as it is by default):
> https://fedorahosted.org/sssd/ticket/640
> This causes many fetch failures in the case that the native libraries are available, since the SecureIO functions call getpwuid_r as part of fstat. By enabling -Xcheck:jni I get the following trace on JVM crash:
> *** glibc detected *** /mnt/toolchain/JDK6u20-64bit/bin/java: free(): invalid pointer: 0x0000003575741d23 ***
> ======= Backtrace: =========
> /lib64/libc.so.6[0x3575675676]
> /lib64/libnss_sss.so.2(_nss_sss_getpwuid_r+0x11b)[0x7fe716cb42cb]
> /lib64/libc.so.6(getpwuid_r+0xdd)[0x35756a5dfd]

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Commented: (HADOOP-7156) getpwuid_r is not thread-safe on RHEL6

Posted by "Todd Lipcon (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-7156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13004197#comment-13004197 ] 

Todd Lipcon commented on HADOOP-7156:
-------------------------------------

bq. What happens when RHEL 6 gets this fixed

That will be RHEL 6.1, not RHEL 6.0. So this should remain accurate.
Happy to change "Vintela Authentication Services" to "some version of Vintela Authentication Services".

bq. it makes much more sense to include a link to a wiki page where we can keep the information up-to-date

In my experience, we do a really bad job of keeping the wiki up to date. Greg, what do you think?

> getpwuid_r is not thread-safe on RHEL6
> --------------------------------------
>
>                 Key: HADOOP-7156
>                 URL: https://issues.apache.org/jira/browse/HADOOP-7156
>             Project: Hadoop Common
>          Issue Type: Bug
>    Affects Versions: 0.22.0
>         Environment: RHEL 6.0 "Santiago"
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>            Priority: Critical
>             Fix For: 0.22.0
>
>         Attachments: hadoop-7156.txt, hadoop-7156.txt, hadoop-7156.txt
>
>
> Due to the following bug in SSSD, functions like getpwuid_r are not thread-safe in RHEL 6.0 if sssd is specified in /etc/nsswitch.conf (as it is by default):
> https://fedorahosted.org/sssd/ticket/640
> This causes many fetch failures in the case that the native libraries are available, since the SecureIO functions call getpwuid_r as part of fstat. By enabling -Xcheck:jni I get the following trace on JVM crash:
> *** glibc detected *** /mnt/toolchain/JDK6u20-64bit/bin/java: free(): invalid pointer: 0x0000003575741d23 ***
> ======= Backtrace: =========
> /lib64/libc.so.6[0x3575675676]
> /lib64/libnss_sss.so.2(_nss_sss_getpwuid_r+0x11b)[0x7fe716cb42cb]
> /lib64/libc.so.6(getpwuid_r+0xdd)[0x35756a5dfd]

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Commented: (HADOOP-7156) getpwuid_r is not thread-safe on RHEL6

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-7156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13005493#comment-13005493 ] 

Hadoop QA commented on HADOOP-7156:
-----------------------------------

+1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12473179/hadoop-7156.txt
  against trunk revision 1080396.

    +1 @author.  The patch does not contain any @author tags.

    +1 tests included.  The patch appears to include 3 new or modified tests.

    +1 javadoc.  The javadoc tool did not generate any warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    +1 findbugs.  The patch does not introduce any new Findbugs (version 1.3.9) warnings.

    +1 release audit.  The applied patch does not increase the total number of release audit warnings.

    +1 core tests.  The patch passed core unit tests.

    +1 contrib tests.  The patch passed contrib unit tests.

    +1 system test framework.  The patch passed system test framework compile.

Test results: https://hudson.apache.org/hudson/job/PreCommit-HADOOP-Build/306//testReport/
Findbugs warnings: https://hudson.apache.org/hudson/job/PreCommit-HADOOP-Build/306//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: https://hudson.apache.org/hudson/job/PreCommit-HADOOP-Build/306//console

This message is automatically generated.

> getpwuid_r is not thread-safe on RHEL6
> --------------------------------------
>
>                 Key: HADOOP-7156
>                 URL: https://issues.apache.org/jira/browse/HADOOP-7156
>             Project: Hadoop Common
>          Issue Type: Bug
>    Affects Versions: 0.22.0
>         Environment: RHEL 6.0 "Santiago"
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>            Priority: Critical
>             Fix For: 0.22.0
>
>         Attachments: hadoop-7156.txt, hadoop-7156.txt, hadoop-7156.txt, hadoop-7156.txt
>
>
> Due to the following bug in SSSD, functions like getpwuid_r are not thread-safe in RHEL 6.0 if sssd is specified in /etc/nsswitch.conf (as it is by default):
> https://fedorahosted.org/sssd/ticket/640
> This causes many fetch failures in the case that the native libraries are available, since the SecureIO functions call getpwuid_r as part of fstat. By enabling -Xcheck:jni I get the following trace on JVM crash:
> *** glibc detected *** /mnt/toolchain/JDK6u20-64bit/bin/java: free(): invalid pointer: 0x0000003575741d23 ***
> ======= Backtrace: =========
> /lib64/libc.so.6[0x3575675676]
> /lib64/libnss_sss.so.2(_nss_sss_getpwuid_r+0x11b)[0x7fe716cb42cb]
> /lib64/libc.so.6(getpwuid_r+0xdd)[0x35756a5dfd]

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Commented: (HADOOP-7156) getpwuid_r is not thread-safe on RHEL6

Posted by "Allen Wittenauer (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-7156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13004720#comment-13004720 ] 

Allen Wittenauer commented on HADOOP-7156:
------------------------------------------

> No, like any other documentation improvement it should wait for the 
> next release. Our definitive documentation lives in the source tree. 
> The fact that some of it is in the wiki doesn't change that.

I agree to a certain extent.  But listing a moving target from suffering from a problem that exists today that may not exist tomorrow for a release of our software that is still months out is a mistake.  CentOS 6.0 doesn't even exist yet in any official form yet we're declaring it bugged. 

> Assumedly if you've updated your pam manually to workaround this issue, 
> you'd know that, and you wouldn't turn on the config option!

What I am applying the 6-updates repo after a kick? Are packages from https://rhn.redhat.com/errata/rhel-server-6-errata.html not considered part of RHEL 6?  I suspect Red Hat would disagree.

I realize that for the majority of committers don't actually use an Apache release so this doesn't seem like a big deal since you folks release seemingly monthly, if not more frequent.  I also realize that in order for the various commercial interests involved it behooves them to get any and all patches committed to show that they are "contributing".  The reality is that by the time this patch shows up in an actual Apache release, it is bound to be outdated.... and that's where I take offense.


> getpwuid_r is not thread-safe on RHEL6
> --------------------------------------
>
>                 Key: HADOOP-7156
>                 URL: https://issues.apache.org/jira/browse/HADOOP-7156
>             Project: Hadoop Common
>          Issue Type: Bug
>    Affects Versions: 0.22.0
>         Environment: RHEL 6.0 "Santiago"
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>            Priority: Critical
>             Fix For: 0.22.0
>
>         Attachments: hadoop-7156.txt, hadoop-7156.txt, hadoop-7156.txt
>
>
> Due to the following bug in SSSD, functions like getpwuid_r are not thread-safe in RHEL 6.0 if sssd is specified in /etc/nsswitch.conf (as it is by default):
> https://fedorahosted.org/sssd/ticket/640
> This causes many fetch failures in the case that the native libraries are available, since the SecureIO functions call getpwuid_r as part of fstat. By enabling -Xcheck:jni I get the following trace on JVM crash:
> *** glibc detected *** /mnt/toolchain/JDK6u20-64bit/bin/java: free(): invalid pointer: 0x0000003575741d23 ***
> ======= Backtrace: =========
> /lib64/libc.so.6[0x3575675676]
> /lib64/libnss_sss.so.2(_nss_sss_getpwuid_r+0x11b)[0x7fe716cb42cb]
> /lib64/libc.so.6(getpwuid_r+0xdd)[0x35756a5dfd]

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Commented: (HADOOP-7156) getpwuid_r is not thread-safe on RHEL6

Posted by "Eli Collins (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-7156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13005535#comment-13005535 ] 

Eli Collins commented on HADOOP-7156:
-------------------------------------

+1  for the latest patch

> getpwuid_r is not thread-safe on RHEL6
> --------------------------------------
>
>                 Key: HADOOP-7156
>                 URL: https://issues.apache.org/jira/browse/HADOOP-7156
>             Project: Hadoop Common
>          Issue Type: Bug
>    Affects Versions: 0.22.0
>         Environment: RHEL 6.0 "Santiago"
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>            Priority: Critical
>             Fix For: 0.22.0
>
>         Attachments: hadoop-7156.txt, hadoop-7156.txt, hadoop-7156.txt, hadoop-7156.txt
>
>
> Due to the following bug in SSSD, functions like getpwuid_r are not thread-safe in RHEL 6.0 if sssd is specified in /etc/nsswitch.conf (as it is by default):
> https://fedorahosted.org/sssd/ticket/640
> This causes many fetch failures in the case that the native libraries are available, since the SecureIO functions call getpwuid_r as part of fstat. By enabling -Xcheck:jni I get the following trace on JVM crash:
> *** glibc detected *** /mnt/toolchain/JDK6u20-64bit/bin/java: free(): invalid pointer: 0x0000003575741d23 ***
> ======= Backtrace: =========
> /lib64/libc.so.6[0x3575675676]
> /lib64/libnss_sss.so.2(_nss_sss_getpwuid_r+0x11b)[0x7fe716cb42cb]
> /lib64/libc.so.6(getpwuid_r+0xdd)[0x35756a5dfd]

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HADOOP-7156) getpwuid_r is not thread-safe on RHEL6

Posted by "Allen Wittenauer (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-7156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13037223#comment-13037223 ] 

Allen Wittenauer commented on HADOOP-7156:
------------------------------------------

Red Hat has released a fix for this as part of RHEL 6.1: http://rhn.redhat.com/errata/RHSA-2011-0560.html .

> getpwuid_r is not thread-safe on RHEL6
> --------------------------------------
>
>                 Key: HADOOP-7156
>                 URL: https://issues.apache.org/jira/browse/HADOOP-7156
>             Project: Hadoop Common
>          Issue Type: Bug
>    Affects Versions: 0.22.0
>         Environment: RHEL 6.0 "Santiago"
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>            Priority: Critical
>             Fix For: 0.22.0
>
>         Attachments: hadoop-7156.txt, hadoop-7156.txt, hadoop-7156.txt, hadoop-7156.txt
>
>
> Due to the following bug in SSSD, functions like getpwuid_r are not thread-safe in RHEL 6.0 if sssd is specified in /etc/nsswitch.conf (as it is by default):
> https://fedorahosted.org/sssd/ticket/640
> This causes many fetch failures in the case that the native libraries are available, since the SecureIO functions call getpwuid_r as part of fstat. By enabling -Xcheck:jni I get the following trace on JVM crash:
> *** glibc detected *** /mnt/toolchain/JDK6u20-64bit/bin/java: free(): invalid pointer: 0x0000003575741d23 ***
> ======= Backtrace: =========
> /lib64/libc.so.6[0x3575675676]
> /lib64/libnss_sss.so.2(_nss_sss_getpwuid_r+0x11b)[0x7fe716cb42cb]
> /lib64/libc.so.6(getpwuid_r+0xdd)[0x35756a5dfd]

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Commented: (HADOOP-7156) getpwuid_r is not thread-safe on RHEL6

Posted by "Todd Lipcon (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-7156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13003787#comment-13003787 ] 

Todd Lipcon commented on HADOOP-7156:
-------------------------------------

I'll assume you're joking about the "huge warning" and implement option #3. Thanks Allen.

> getpwuid_r is not thread-safe on RHEL6
> --------------------------------------
>
>                 Key: HADOOP-7156
>                 URL: https://issues.apache.org/jira/browse/HADOOP-7156
>             Project: Hadoop Common
>          Issue Type: Bug
>    Affects Versions: 0.22.0
>         Environment: RHEL 6.0 "Santiago"
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>            Priority: Critical
>             Fix For: 0.22.0
>
>         Attachments: hadoop-7156.txt
>
>
> Due to the following bug in SSSD, functions like getpwuid_r are not thread-safe in RHEL 6.0 if sssd is specified in /etc/nsswitch.conf (as it is by default):
> https://fedorahosted.org/sssd/ticket/640
> This causes many fetch failures in the case that the native libraries are available, since the SecureIO functions call getpwuid_r as part of fstat. By enabling -Xcheck:jni I get the following trace on JVM crash:
> *** glibc detected *** /mnt/toolchain/JDK6u20-64bit/bin/java: free(): invalid pointer: 0x0000003575741d23 ***
> ======= Backtrace: =========
> /lib64/libc.so.6[0x3575675676]
> /lib64/libnss_sss.so.2(_nss_sss_getpwuid_r+0x11b)[0x7fe716cb42cb]
> /lib64/libc.so.6(getpwuid_r+0xdd)[0x35756a5dfd]

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Commented: (HADOOP-7156) getpwuid_r is not thread-safe on RHEL6

Posted by "Todd Lipcon (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-7156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13001702#comment-13001702 ] 

Todd Lipcon commented on HADOOP-7156:
-------------------------------------

Another possibility is to look in /etc/redhat-release and match the contents against known-buggy releases.

> getpwuid_r is not thread-safe on RHEL6
> --------------------------------------
>
>                 Key: HADOOP-7156
>                 URL: https://issues.apache.org/jira/browse/HADOOP-7156
>             Project: Hadoop Common
>          Issue Type: Bug
>    Affects Versions: 0.22.0
>         Environment: RHEL 6.0 "Santiago"
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>            Priority: Critical
>             Fix For: 0.22.0
>
>         Attachments: hadoop-7156.txt
>
>
> Due to the following bug in SSSD, functions like getpwuid_r are not thread-safe in RHEL 6.0 if sssd is specified in /etc/nsswitch.conf (as it is by default):
> https://fedorahosted.org/sssd/ticket/640
> This causes many fetch failures in the case that the native libraries are available, since the SecureIO functions call getpwuid_r as part of fstat. By enabling -Xcheck:jni I get the following trace on JVM crash:
> *** glibc detected *** /mnt/toolchain/JDK6u20-64bit/bin/java: free(): invalid pointer: 0x0000003575741d23 ***
> ======= Backtrace: =========
> /lib64/libc.so.6[0x3575675676]
> /lib64/libnss_sss.so.2(_nss_sss_getpwuid_r+0x11b)[0x7fe716cb42cb]
> /lib64/libc.so.6(getpwuid_r+0xdd)[0x35756a5dfd]

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Commented: (HADOOP-7156) getpwuid_r is not thread-safe on RHEL6

Posted by "Todd Lipcon (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-7156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13003271#comment-13003271 ] 

Todd Lipcon commented on HADOOP-7156:
-------------------------------------

bq. But for now, I'd rather just advertise "RHEL 6.0 is broken; don't use it" just like we do for JREs.

Unfortunately many of us are not in a position to do this - RHEL 6.0 is a must for us, regardless of some bugs it might have. Same with support for Vintela Authentication Services (VAS) which has a similar bug. Asking users to switch their entire OS or auth system is not an option.

I think there are three workable options here from my perspective:

1) Always lock around getpwuid_r. Devaraj is added a cache for this function as part of another JIRA, so it shouldn't be a big performance issue.
2) Add a compile-time macro like LOCK_AROUND_PWUID, which, when set, will add the monitor lock around these calls.
3) Add a runtime Hadoop configuration option like hadoop.workaround.broken.getpwuid, which when enabled adds the lock.

Which, if any, of these seem acceptable to you?

Since we've found that this isn't RHEL6 specific, but in fact occurs with other broken pieces of software as well, I'm leaning towards option #1 or #3.

> getpwuid_r is not thread-safe on RHEL6
> --------------------------------------
>
>                 Key: HADOOP-7156
>                 URL: https://issues.apache.org/jira/browse/HADOOP-7156
>             Project: Hadoop Common
>          Issue Type: Bug
>    Affects Versions: 0.22.0
>         Environment: RHEL 6.0 "Santiago"
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>            Priority: Critical
>             Fix For: 0.22.0
>
>         Attachments: hadoop-7156.txt
>
>
> Due to the following bug in SSSD, functions like getpwuid_r are not thread-safe in RHEL 6.0 if sssd is specified in /etc/nsswitch.conf (as it is by default):
> https://fedorahosted.org/sssd/ticket/640
> This causes many fetch failures in the case that the native libraries are available, since the SecureIO functions call getpwuid_r as part of fstat. By enabling -Xcheck:jni I get the following trace on JVM crash:
> *** glibc detected *** /mnt/toolchain/JDK6u20-64bit/bin/java: free(): invalid pointer: 0x0000003575741d23 ***
> ======= Backtrace: =========
> /lib64/libc.so.6[0x3575675676]
> /lib64/libnss_sss.so.2(_nss_sss_getpwuid_r+0x11b)[0x7fe716cb42cb]
> /lib64/libc.so.6(getpwuid_r+0xdd)[0x35756a5dfd]

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HADOOP-7156) getpwuid_r is not thread-safe on RHEL6

Posted by "Scott Carey (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-7156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13103819#comment-13103819 ] 

Scott Carey commented on HADOOP-7156:
-------------------------------------

{quote}Red Hat has released a fix for this as part of RHEL 6.1{quote}
And months later, CentOS 6.1 is not out -- I would venture that many Hadoop users are on CentOS (or SL) and not RHEL.

Thanks for fixing this!  For years to come, I suspect that some will still have RHEL 6.0 or CentOS 6.0 servers floating around that might run into this.

FWIW, we're happily using CentOS 6.0 with Hadoop right now.


> getpwuid_r is not thread-safe on RHEL6
> --------------------------------------
>
>                 Key: HADOOP-7156
>                 URL: https://issues.apache.org/jira/browse/HADOOP-7156
>             Project: Hadoop Common
>          Issue Type: Bug
>    Affects Versions: 0.22.0
>         Environment: RHEL 6.0 "Santiago"
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>            Priority: Critical
>             Fix For: 0.22.0
>
>         Attachments: hadoop-7156.txt, hadoop-7156.txt, hadoop-7156.txt, hadoop-7156.txt
>
>
> Due to the following bug in SSSD, functions like getpwuid_r are not thread-safe in RHEL 6.0 if sssd is specified in /etc/nsswitch.conf (as it is by default):
> https://fedorahosted.org/sssd/ticket/640
> This causes many fetch failures in the case that the native libraries are available, since the SecureIO functions call getpwuid_r as part of fstat. By enabling -Xcheck:jni I get the following trace on JVM crash:
> *** glibc detected *** /mnt/toolchain/JDK6u20-64bit/bin/java: free(): invalid pointer: 0x0000003575741d23 ***
> ======= Backtrace: =========
> /lib64/libc.so.6[0x3575675676]
> /lib64/libnss_sss.so.2(_nss_sss_getpwuid_r+0x11b)[0x7fe716cb42cb]
> /lib64/libc.so.6(getpwuid_r+0xdd)[0x35756a5dfd]

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Updated: (HADOOP-7156) getpwuid_r is not thread-safe on RHEL6

Posted by "Todd Lipcon (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-7156?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Todd Lipcon updated HADOOP-7156:
--------------------------------

    Attachment: hadoop-7156.txt

Here's a heinous workaround for this issue:

When initializing the native library, we check for the problematic nss module - if it's on the system, we're likely to run into this issue, so it adds a lock around the getpwuid_r calls.


Do people think this hack is the right approach? My other thought was to add a new boolean config flag like hadoop.nativeio.workaround.hadoop7156 or some nonsense like that, and if that flag is set, add the monitor lock.


> getpwuid_r is not thread-safe on RHEL6
> --------------------------------------
>
>                 Key: HADOOP-7156
>                 URL: https://issues.apache.org/jira/browse/HADOOP-7156
>             Project: Hadoop Common
>          Issue Type: Bug
>    Affects Versions: 0.22.0
>         Environment: RHEL 6.0 "Santiago"
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>            Priority: Critical
>             Fix For: 0.22.0
>
>         Attachments: hadoop-7156.txt
>
>
> Due to the following bug in SSSD, functions like getpwuid_r are not thread-safe in RHEL 6.0 if sssd is specified in /etc/nsswitch.conf (as it is by default):
> https://fedorahosted.org/sssd/ticket/640
> This causes many fetch failures in the case that the native libraries are available, since the SecureIO functions call getpwuid_r as part of fstat. By enabling -Xcheck:jni I get the following trace on JVM crash:
> *** glibc detected *** /mnt/toolchain/JDK6u20-64bit/bin/java: free(): invalid pointer: 0x0000003575741d23 ***
> ======= Backtrace: =========
> /lib64/libc.so.6[0x3575675676]
> /lib64/libnss_sss.so.2(_nss_sss_getpwuid_r+0x11b)[0x7fe716cb42cb]
> /lib64/libc.so.6(getpwuid_r+0xdd)[0x35756a5dfd]

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Commented: (HADOOP-7156) getpwuid_r is not thread-safe on RHEL6

Posted by "Eli Collins (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-7156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13004956#comment-13004956 ] 

Eli Collins commented on HADOOP-7156:
-------------------------------------

Hey Allen,

If you find the config description "both too generic and too specific" perhaps you could suggest a better one before veto'ing the patch?

Thanks,
Eli

> getpwuid_r is not thread-safe on RHEL6
> --------------------------------------
>
>                 Key: HADOOP-7156
>                 URL: https://issues.apache.org/jira/browse/HADOOP-7156
>             Project: Hadoop Common
>          Issue Type: Bug
>    Affects Versions: 0.22.0
>         Environment: RHEL 6.0 "Santiago"
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>            Priority: Critical
>             Fix For: 0.22.0
>
>         Attachments: hadoop-7156.txt, hadoop-7156.txt, hadoop-7156.txt, hadoop-7156.txt
>
>
> Due to the following bug in SSSD, functions like getpwuid_r are not thread-safe in RHEL 6.0 if sssd is specified in /etc/nsswitch.conf (as it is by default):
> https://fedorahosted.org/sssd/ticket/640
> This causes many fetch failures in the case that the native libraries are available, since the SecureIO functions call getpwuid_r as part of fstat. By enabling -Xcheck:jni I get the following trace on JVM crash:
> *** glibc detected *** /mnt/toolchain/JDK6u20-64bit/bin/java: free(): invalid pointer: 0x0000003575741d23 ***
> ======= Backtrace: =========
> /lib64/libc.so.6[0x3575675676]
> /lib64/libnss_sss.so.2(_nss_sss_getpwuid_r+0x11b)[0x7fe716cb42cb]
> /lib64/libc.so.6(getpwuid_r+0xdd)[0x35756a5dfd]

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Commented: (HADOOP-7156) getpwuid_r is not thread-safe on RHEL6

Posted by "Eli Collins (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-7156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13004209#comment-13004209 ] 

Eli Collins commented on HADOOP-7156:
-------------------------------------

The approach and code look good to me. I would leave as is, no need to add autodetection, but we should document it in core-default so users are aware of the issue. 

I think leaving the workaround disabled for the tests is the right thing, this way it fails on a broken system which will help users identify that they need to enable the workaround.

Also, the description seems appropriate to me, this isn't about POSIX compliance, it's just a bug in a library function, that will probably be fixed.

> getpwuid_r is not thread-safe on RHEL6
> --------------------------------------
>
>                 Key: HADOOP-7156
>                 URL: https://issues.apache.org/jira/browse/HADOOP-7156
>             Project: Hadoop Common
>          Issue Type: Bug
>    Affects Versions: 0.22.0
>         Environment: RHEL 6.0 "Santiago"
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>            Priority: Critical
>             Fix For: 0.22.0
>
>         Attachments: hadoop-7156.txt, hadoop-7156.txt, hadoop-7156.txt
>
>
> Due to the following bug in SSSD, functions like getpwuid_r are not thread-safe in RHEL 6.0 if sssd is specified in /etc/nsswitch.conf (as it is by default):
> https://fedorahosted.org/sssd/ticket/640
> This causes many fetch failures in the case that the native libraries are available, since the SecureIO functions call getpwuid_r as part of fstat. By enabling -Xcheck:jni I get the following trace on JVM crash:
> *** glibc detected *** /mnt/toolchain/JDK6u20-64bit/bin/java: free(): invalid pointer: 0x0000003575741d23 ***
> ======= Backtrace: =========
> /lib64/libc.so.6[0x3575675676]
> /lib64/libnss_sss.so.2(_nss_sss_getpwuid_r+0x11b)[0x7fe716cb42cb]
> /lib64/libc.so.6(getpwuid_r+0xdd)[0x35756a5dfd]

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Commented: (HADOOP-7156) getpwuid_r is not thread-safe on RHEL6

Posted by "Allen Wittenauer (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-7156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13005780#comment-13005780 ] 

Allen Wittenauer commented on HADOOP-7156:
------------------------------------------

Removing my -1.

> getpwuid_r is not thread-safe on RHEL6
> --------------------------------------
>
>                 Key: HADOOP-7156
>                 URL: https://issues.apache.org/jira/browse/HADOOP-7156
>             Project: Hadoop Common
>          Issue Type: Bug
>    Affects Versions: 0.22.0
>         Environment: RHEL 6.0 "Santiago"
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>            Priority: Critical
>             Fix For: 0.22.0
>
>         Attachments: hadoop-7156.txt, hadoop-7156.txt, hadoop-7156.txt, hadoop-7156.txt
>
>
> Due to the following bug in SSSD, functions like getpwuid_r are not thread-safe in RHEL 6.0 if sssd is specified in /etc/nsswitch.conf (as it is by default):
> https://fedorahosted.org/sssd/ticket/640
> This causes many fetch failures in the case that the native libraries are available, since the SecureIO functions call getpwuid_r as part of fstat. By enabling -Xcheck:jni I get the following trace on JVM crash:
> *** glibc detected *** /mnt/toolchain/JDK6u20-64bit/bin/java: free(): invalid pointer: 0x0000003575741d23 ***
> ======= Backtrace: =========
> /lib64/libc.so.6[0x3575675676]
> /lib64/libnss_sss.so.2(_nss_sss_getpwuid_r+0x11b)[0x7fe716cb42cb]
> /lib64/libc.so.6(getpwuid_r+0xdd)[0x35756a5dfd]

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira