You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by malcolm <ma...@oracle.com> on 2014/12/08 12:30:39 UTC

Solaris Port

I have ported Hadoop  native libraries to Solaris 11 (both Sparc and Intel )
Oracle have agreed to release my changes to the community so that 
Solaris platforms can benefit.
Reading the HowToContribute and GitandHadoop documents, I am not 100% 
clear on how to get my changes into the main tree. I am also a Git(hub) 
newbie, and was using svn previously.

Please let me know if I am going the correct path:

1. I forked Hadoop on Github and downloaded a clone to my development 
machine.

2. The changes I made were to 2.2.0, can I still add changes to this 
branch, and hopefully get them accepted or must I migrate my changes to 
2.6 ? (On the main Hadoop download page, 2.2 is still listed as the GA 
version )

3. I understand that I should create a new branch for my changes, and 
then generate pull requests after uploading them to Github.

4. I also registered  at Jira in the understanding that I need to 
generate a Jira number for my changes, and to name my branch accordingly ?

Does all this make sense ?

Thanks,
Malcolm



Re: Solaris Port

Posted by Steve Loughran <st...@hortonworks.com>.
On 10 December 2014 at 10:31, malcolm <ma...@oracle.com> wrote:

> Also, I have been requested to ensure my port is available on 2.4,
> perceived as a more stable release. If I make changes to this branch are
> they automatically available for 2.6, or will I need multiple JIRAs ?


nobody is backporting patches to ASF 2.4 these days; 2.5.x is getting
security fixes, 2.6 functionality. If you do want it on 2.4 then that will
have to be your own fork of it, with all the test problems that follow.
Look to Apache Bigtop for integration testing there.

w.r.t to 2.6, if the code is the same on both branches, we can usually just
commit to 2.6 then cherry pick forwards to branch-2 & trunk.

if a different patch is needed, then that needs to be a separate JIRA.

One thing we do worry about is testing of this stuff, which we do
automatically with  jenkins:
https://builds.apache.org/view/H-L/view/Hadoop/

There is a solaris machine in the ASF jenkins farm, getting some help
setting this up would be great. You wont have permissions to edit the
settings —if you have a colleague working on ASF projects then they might,
so getting them to help would aid setup.

-Steve

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Re: Solaris Port

Posted by Steve Loughran <st...@hortonworks.com>.
On 14 December 2014 at 16:52, Allen Wittenauer <aw...@altiscale.com> wrote:

> Well, slight correction:  only one thing in the code that has been
> replaced.  There are a two patches waiting to get reviewed and applied that
> fix the rest of the shipping shell code: HADOOP-10788 and HADOOP-11346.
> HDFS-7460 is waiting on HADOOP-10788 since I’m consolidating the tomcat
> driver code, but it will be effectively a clone of HADOOP-10788’s code.
>

looked @ these, and they are beyond my current knowledge of the shell
scripts, so I'm not in a position to review right now, sorry.


>
>         I haven’t looked at all at things like test-patch or other stuff
> in dev-support, which I’m sure are all in an… uhh… “interesting” state.
>


there's been recent work on that script; it's working fairly well right now.

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Re: Solaris Port

Posted by Allen Wittenauer <aw...@altiscale.com>.
On Dec 14, 2014, at 8:38 AM, Allen Wittenauer <aw...@altiscale.com> wrote:

> 
> On Dec 13, 2014, at 4:05 AM, Steve Loughran <st...@hortonworks.com> wrote:
>> 
>> The shell scripts are also undertested and only intermittently maintained
>> —there's been recent work there in HADOOP-9902(?)  which is a great
>> improvement to trunk. If you can help test the CLI on your OS, that will
>> save you support calls.
> 
> 	That reminds me to file a JIRA for this…
> 
> 	AFAIK, there is only one thing that I’ve found breaks on modern-ish Solaris/Illumos:

	Well, slight correction:  only one thing in the code that has been replaced.  There are a two patches waiting to get reviewed and applied that fix the rest of the shipping shell code: HADOOP-10788 and HADOOP-11346. HDFS-7460 is waiting on HADOOP-10788 since I’m consolidating the tomcat driver code, but it will be effectively a clone of HADOOP-10788’s code.  

	I haven’t looked at all at things like test-patch or other stuff in dev-support, which I’m sure are all in an… uhh… “interesting” state.



Re: Solaris Port

Posted by Allen Wittenauer <aw...@altiscale.com>.
On Dec 13, 2014, at 4:05 AM, Steve Loughran <st...@hortonworks.com> wrote:
> 
> The shell scripts are also undertested and only intermittently maintained
> —there's been recent work there in HADOOP-9902(?)  which is a great
> improvement to trunk. If you can help test the CLI on your OS, that will
> save you support calls.

	That reminds me to file a JIRA for this…

	AFAIK, there is only one thing that I’ve found breaks on modern-ish Solaris/Illumos:  xargs -P in the ssh handler.  It’s a GNU-extension that I mistakenly thought was POSIX.  It’s trivially to use the OS check bits to use a non-parallel version of it.  But realistically, users should just install pdsh instead.  (even if they have a working xargs…)

	It’s been a while since I’ve run all of the shell bits on my Illumos box, so other stuff may have snuck in since then.

Re: Solaris Port

Posted by Steve Loughran <st...@hortonworks.com>.
On 12 December 2014 at 04:35, malcolm <ma...@oracle.com> wrote:

> So, turns out that if I had naively changed all calls to terror or
> references to sys_errlist, to using strerror_r, then I would have broken
> code for Windows and HPUX (and possibly other OSes).
>
> If we are to assume that current code runs fine on all platforms (maybe
> even AIX an MacOS, for example), then any change/additions made to the code
> and not ifdeffed appropriately can break on other OSes. On the other hand,
> too many ifdefs can pollute the code source and render it less readable
> (though possibly less important).
>
>
readability is less important that correct functionality. The native
libraries are not the main focus of development, and less people work on
them. That's a strength: those that do are presumably competent, and a
weakness: less people are able to review and devlop it.


> In the general case what are code contributors responsibilities to adding
> code regarding OSes besides Linux ?
> What OSes does jenkins test on ?
>

ASF jenkins currently tests on Debian on a pool of hosts

https://builds.apache.org/view/H-L/view/Hadoop/


There are some windows hosts to which jenkins could also schedule work,
though I've found them problematic (you can't run MiniYARN cluster there,
specifically, due to some FS permissions
https://builds.apache.org/job/slider-develop-windows/ )

There is a Solaris host in the jenkins pool, adding a hadoop-solaris would
be the best way of guaranteeing that regressions in the code which broke
solaris would be picked up early.


> I guess maintainers of code on non-tested platforms are responsible for
> their own testing ?
>
>
My colleagues run jenkins builds and tests on our other supported
platforms, RHEL/CentOS and Windows; regressions which break these will be
caught relatively rapidly. As an example from yesterday,
https://issues.apache.org/jira/browse/HDFS-7514 , "TestTextCommand fails on
Windows". Path setup is always a problem there, usually in tests.

I assume others are doing similar things, especially those using Apache
Bigtop as a test suite for alternative filesystems. The tests there are
what you should really try doing internally.

OS/X is one of the two primary dev platforms, any regression there will be
picked up by the engineers within 24h


> How do we avoid the ping-pong effect, i.e. I make a generic change to code
> which breaks on Windows, then the Windows maintainer reverts changes to
> break on Solaris for example ? Or does this not happen in actuality ?
>

Ideally regressions should be picked up during the review phase

If something does slip through, then the following is likely to take place

   1. A new JIRA is created, with a "caused-by" link pointing to the patch
   which caused the problem.
   2. Either: the patch at fault is reverted, pending a revised, fixed
   version.
   3. A patch is applied to fix the problem. This is likely if the problem
   is trivial.

There isn't likely to be a ping-pong effect as if a patch breaks
compatibility and is reverted. It then effectively responsibility of the
authors of the original patch to fix things and resubmit a working patch.
And, after the first failure, reviews will be a lot stricter.

More succinctly: *stuff which causes regressions doesn't get in*

One thing you can do is set up a Windows VM for build and test -we have the
instructions, I have such a VM to hand. I'd also recommend you have a
kerberized linux VM to test what happens there.

You also have the freedom to review other submitted patches and highlight
where you think they will cause problems on Solaris. Unless/until you
become a committer you don't get the right to say +1 and get the patch in,
but you can certainly flag up where a change isn't compatible, then work
with the patch developers to nurture it into shape.


Where we do have trouble in the ASF is
 -nonstandard JVMs, i.e. anything other than oracle and openjdk. You should
not have that problem.
 -native filesystems that don't get used much (yet)
 -non-HDFS filesystems. This is why we don't support any except the object
store libraries bundled in the hadoop source tree, object stores anyone
with the relevant cloud logins can test from their desk.
 -nonstandard networking setups. Anything to test there and find problems
is welcome.

The shell scripts are also undertested and only intermittently maintained
—there's been recent work there in HADOOP-9902(?)  which is a great
improvement to trunk. If you can help test the CLI on your OS, that will
save you support calls.

-Steve

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Re: Solaris Port

Posted by Steve Loughran <st...@hortonworks.com>.
On 13 December 2014 at 09:29, malcolm <ma...@oracle.com> wrote:

> I am not sure what you mean by a thread-local buffer (in native code). In
> Java this is pretty standard, but I couldn't find any implementation for C
> code.


There's stuff around; I don't know how well it works across different
platforms; certainly windows is very different

https://docs.oracle.com/cd/E18659_01/html/821-1383/bkaeg.html
https://gcc.gnu.org/onlinedocs/gcc-3.3.3/gcc/Thread-Local.html

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Re: Solaris Port SOLVED!

Posted by Charles Lamb <cl...@cloudera.com>.
On 12/16/2014 11:01 AM, malcolm wrote:
> This is weird, Jenkins complains about:
>
> 1. Findbugs , 3 warnings in Java code (which of course I did not touch)

The FB warnings seem to be a recent phenomenon. I have seen them on a 
recent test run of my own and they come and go depending on the run. I 
think they can be safely ignored. However, if you want to be sure, then 
you could do the findbugs run on your local machine both with and 
without your patch applied and compare the results. If you find that 
there's no difference, then just put a comment in the Jira stating that.

> 2. Test failures also with no connection to terror: A java socket 
> timeout,
Yes, probably unrelated. To be sure, run those same tests on your local 
machine and if they pass, then put a comment in the Jira saying that 
they run on your local machine. If they fail, then run them with and 
without the patch to make sure they fail both ways.

Charles


Re: Solaris Port SOLVED!

Posted by Steve Loughran <st...@hortonworks.com>.
On 16 December 2014 at 16:01, malcolm <ma...@oracle.com> wrote:

> 1. Findbugs , 3 warnings in Java code (which of course I did not touch)
> 2. Test failures also with no connection to terror: A java socket timeout,
>

ongoing issues with (1) transition to java 7 builds and (2) some
intermittent tests that need to get fixed. ignore them

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Re: Solaris Port SOLVED!

Posted by malcolm <ma...@oracle.com>.
This is weird, Jenkins complains about:

1. Findbugs , 3 warnings in Java code (which of course I did not touch)
2. Test failures also with no connection to terror: A java socket timeout,

As a newbie, I am not quite sure how to relate to this.
(I could just revert the code back, and see if I get the same errors 
anyway.)

On 12/16/2014 06:57 AM, malcolm wrote:
> Done, and added the comment as you requested.
> I attached a second patch file to the JIRA (with .002 appended as per 
> convention) assuming Jenkins knows to take the latest version, since I 
> understand that I cannot remove the previous patch file .
>
> On 12/16/2014 04:12 AM, Colin McCabe wrote:
>> Thanks, Malcom.  I reviewed it.  The only thing you still have to do
>> is hit "submit patch" to get a Jenkins run.  See our HowToContribute
>> wiki page for more details.
>>
>> wiki.apache.org/hadoop/HowToContribute
>>
>> best,
>> Colin
>>
>> On Sat, Dec 13, 2014 at 9:22 PM, malcolm 
>> <ma...@oracle.com> wrote:
>>> I am checking on the latest release of Solaris 11 and yes, it is still
>>> thread safe (or MT Safe as documented on the man page).
>>>
>>> strerror checks the error code, and returns the same "unknown error" 
>>> string
>>> as terror does, if it receives an invalid code. I checked this on 
>>> Windows,
>>> Solaris and Linux (though my changes only affect Solaris platforms).
>>>
>>> JIRA newbie question:
>>>
>>> I have filed the JIRA attaching the patch  HADOOP-11403 against the 
>>> trunk,
>>> asking for reviewers in the comments section.
>>> Is there any other protocol I should follow ?
>>>
>>> Thanks,
>>> Malcolm
>>>
>>>
>>> On 12/14/2014 01:08 AM, Asokan, M wrote:
>>>> Malcom,
>>>>      That's great! Is strerror() thread-safe in the recent version of
>>>> Solaris?  In any case, to be correct you still need to make sure 
>>>> that the
>>>> code passed to strerror() is a valid one.  For this you need to 
>>>> check errno
>>>> after the call to strerror().  Please check the code snippet I sent 
>>>> earlier
>>>> for HPUX.
>>>>
>>>> -- Asokan
>>>> ________________________________________
>>>> From: malcolm [malcolm.kavalsky@oracle.com]
>>>> Sent: Saturday, December 13, 2014 3:13 PM
>>>> To: common-dev@hadoop.apache.org
>>>> Subject: Re: Solaris Port SOLVED!
>>>>
>>>> Wiping egg off face  ...
>>>>
>>>> After consulting with the Solaris team (and looking at the source code
>>>> and man page) ,  it turns out that strerror itself on Solaris is 
>>>> MT-Safe
>>>> ! (Just like HPUX)
>>>>
>>>> So, after all this effort, all I need to do is modify terror as 
>>>> follows:
>>>>
>>>>       const char* terror(int errnum)
>>>>       {
>>>>
>>>>       #if defined(__sun)
>>>>          return strerror(errnum); //  MT-Safe under Solaris
>>>>       #else
>>>>          if ((errnum < 0) || (errnum >= sys_nerr)) {
>>>>            return "unknown error.";
>>>>          }
>>>>          return sys_errlist[errnum];
>>>>       #endif
>>>>       }
>>>>
>>>> And in two other files where sys_errlist is referenced directly
>>>> (NativeIO and hdfs_http_client.c), I replaced this direct access 
>>>> instead
>>>> with a call to terror.
>>>>
>>>> Thanks for all your help and patience,
>>>>
>>>> I'll file a JIRA asap,
>>>>
>>>> Cheers,
>>>> Malcolm
>>>>
>>>> On 12/13/2014 05:26 PM, malcolm wrote:
>>>>> Thanks Asokan,
>>>>>
>>>>> Looked up Gcc's thread local variables, seems a bit complex though 
>>>>> and
>>>>> quite specific to Gnu.
>>>>>
>>>>> Intialization of the static errlist array should be thread safe i.e.
>>>>> initially the array is nulled out, and afterwards if two threads 
>>>>> write
>>>>> to the same address, then they would be writing the same string.
>>>>>
>>>>> But if we are ok with changing 5 files, not just terror, then I would
>>>>> just remove terror completely and use strerror_r (or the alternatives
>>>>> for Windows and HP_UX) in the caller code instead i.e. using your
>>>>> suggestion for a local buffer in the caller, wherever needed. The 
>>>>> more
>>>>> I think about it, the more this seems to be the right thing to do.
>>>>>
>>>>> Cheers,
>>>>> Malcolm
>>>>>
>>>>>
>>>>> On 12/13/2014 04:38 PM, Asokan, M wrote:
>>>>>> Malcom,
>>>>>>       Gcc supports thread-local variables. See
>>>>>>
>>>>>> https://gcc.gnu.org/onlinedocs/gcc-3.3.1/gcc/Thread-Local.html
>>>>>>
>>>>>> I am not sure about native compilers on Solaris, HPUX, or AIX.
>>>>>>
>>>>>> In any case, I found out that the Windows native code in Hadoop 
>>>>>> seems
>>>>>> to handle error messages properly. Here is what I found:
>>>>>>
>>>>>> $ find ~/work/hadoop/hadoop-trunk/ -name '*.c'|xargs grephadoop 
>>>>>> how to
>>>>>> file a jira
>>>>>>
>>>>>> FormatMessage|awk -F: '{print $1}'|sort -u
>>>>>>
>>>>>> /home/asokan/work/hadoop/hadoop-trunk/hadoop-common-project/hadoop-common/src/main/native/src/org/apache/hadoop/io/nativeio/NativeIO.c 
>>>>>>
>>>>>>
>>>>>>
>>>>>> /home/asokan/work/hadoop/hadoop-trunk/hadoop-common-project/hadoop-common/src/main/native/src/org/apache/hadoop/security/JniBasedUnixGroupsMappingWin.c 
>>>>>>
>>>>>>
>>>>>>
>>>>>> /home/asokan/work/hadoop/hadoop-trunk/hadoop-common-project/hadoop-common/src/main/winutils/libwinutils.c 
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> $ find ~/work/hadoop/hadoop-trunk/ -name '*.c'|xargs grep terror|awk
>>>>>> -F: '{print $1}'|sort -u
>>>>>>
>>>>>> /home/asokan/work/hadoop/hadoop-trunk/hadoop-common-project/hadoop-common/src/main/native/src/exception.c 
>>>>>>
>>>>>>
>>>>>>
>>>>>> /home/asokan/work/hadoop/hadoop-trunk/hadoop-common-project/hadoop-common/src/main/native/src/org/apache/hadoop/io/nativeio/SharedFileDescriptorFactory.c 
>>>>>>
>>>>>>
>>>>>>
>>>>>> /home/asokan/work/hadoop/hadoop-trunk/hadoop-common-project/hadoop-common/src/main/native/src/org/apache/hadoop/net/unix/DomainSocket.c 
>>>>>>
>>>>>>
>>>>>>
>>>>>> /home/asokan/work/hadoop/hadoop-trunk/hadoop-common-project/hadoop-common/src/main/native/src/org/apache/hadoop/net/unix/DomainSocketWatcher.c 
>>>>>>
>>>>>>
>>>>>>
>>>>>> /home/asokan/work/hadoop/hadoop-trunk/hadoop-common-project/hadoop-common/src/main/native/src/org/apache/hadoop/security/JniBasedUnixGroupsMapping.c 
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> This means you need not worry about the Windows version of terror().
>>>>>> You need to change five files that contain UNIX specific native 
>>>>>> code.
>>>>>>
>>>>>> I have a question on your suggested implementation:
>>>>>>
>>>>>> How do you initialize the static errlist array in a thread-safe 
>>>>>> manner?
>>>>>>
>>>>>> ________________________________
>>>>>> Here is another thread-safe implementation that I could come up 
>>>>>> with:
>>>>>>
>>>>>> #include <string.h>
>>>>>> #include <stdlib.h>
>>>>>> #include <errno.h>
>>>>>> #include <stdio.h>
>>>>>>
>>>>>> #define MESSAGE_BUFFER_SIZE 256
>>>>>>
>>>>>> char * getSystemErrorMessage(char * buf, int buf_len, int code) {
>>>>>> #if defined(_HPUX_SOURCE)
>>>>>>      char * msg;
>>>>>>      errno = 0;
>>>>>>      msg = strerror(code);
>>>>>>      if (errno == 0) {
>>>>>>        strncpy(buf, msg, buf_len-1);
>>>>>>        buf[buf_len-1] = '\0';
>>>>>>      } else {
>>>>>>        snprintf(buf, buf_len, "%s %d",
>>>>>>            "Can't get system error message for code", code);
>>>>>>      }
>>>>>> #else
>>>>>>      if (strerror_r(code, buf, buf_len) != 0) {
>>>>>>        snprintf(buf, buf_len, "%s %d",
>>>>>>            "Can't get system error message for code", code);
>>>>>>      }
>>>>>> #endif
>>>>>>      return buf;
>>>>>> }
>>>>>>
>>>>>> #define TERROR(code) \
>>>>>> getSystemErrorMessage(messageBuffer, sizeof(messageBuffer), code)
>>>>>>
>>>>>> int main(int argc, char ** argv) {
>>>>>>      if (argc > 1) {
>>>>>>        char messageBuffer[MESSAGE_BUFFER_SIZE];
>>>>>>        int code = atoi(argv[1]);
>>>>>>
>>>>>>        fprintf(stderr, "System error for code %s: %s\n", argv[1],
>>>>>> TERROR(code));
>>>>>>      }
>>>>>>      return 0;
>>>>>> }
>>>>>>
>>>>>>
>>>>>> This changes terror to a macro TERROR and requires all functions 
>>>>>> that
>>>>>> call TERROR macro to declare the local variable messageBuffer. Since
>>>>>> there are only five files to modify, I think it is not a big effort.
>>>>>> What do you think?
>>>>>>
>>>>>> -- Asokan
>>>>>>
>>>>>> On 12/13/2014 04:29 AM, malcolm wrote:
>>>>>> Colin,
>>>>>>
>>>>>> I am not sure what you mean by a thread-local buffer (in native
>>>>>> code). In Java this is pretty standard, but I couldn't find any
>>>>>> implementation for C code.
>>>>>>
>>>>>> Here is the terror function:
>>>>>>
>>>>>>        const char* terror(int errnum)
>>>>>>        {
>>>>>>          if ((errnum < 0) || (errnum >= sys_nerr)) {
>>>>>>            return "unknown error.";
>>>>>>          }
>>>>>>          return sys_errlist[errnum];
>>>>>>        }
>>>>>>
>>>>>>
>>>>>> The interface is identical to strerror, but the implementation is
>>>>>> actually re-entrant since it returns a pointer to a static string.
>>>>>>
>>>>>> If I understand your suggestion, the new function would look like 
>>>>>> this:
>>>>>>
>>>>>>       const char* terror(int errnum)
>>>>>>       {
>>>>>>          static char result[65];
>>>>>>
>>>>>>          strerror_r(errnum, result, 64);
>>>>>>
>>>>>>          return result;
>>>>>>       }
>>>>>>
>>>>>> No need for snprintf, strerror_r  has the 'n' bounding built-in.
>>>>>>
>>>>>> Of course, this is still non-re-entrant, so unless the caller copies
>>>>>> the returned buffer, before the function is called again, there is a
>>>>>> problem.
>>>>>>
>>>>>> After considerable thought, I have come up with this version of
>>>>>> terror, tested OK on Windows, Linux and Solaris:
>>>>>>
>>>>>>       #if defined(_WIN32)
>>>>>>       #define strerror_r(errno,buf,len) strerror_s(buf,len,errno)
>>>>>>       #endif
>>>>>>
>>>>>>       #define MAX_ERRORS 256
>>>>>>       #define MAX_ERROR_LEN 80
>>>>>>
>>>>>>       char *terror(int errnum)
>>>>>>       {
>>>>>>
>>>>>>          static char errlist[MAX_ERRORS][MAX_ERROR_LEN+1]; // 
>>>>>> cache of
>>>>>>       error messages
>>>>>>
>>>>>>          if ( errnum >= 0 && errnum < MAX_ERRORS )
>>>>>>            {
>>>>>>              if ( errlist[errnum][0] == 0 )
>>>>>>                strerror_r( errnum, errlist[errnum], MAX_ERROR_LEN);
>>>>>>
>>>>>>              return errlist[errnum];
>>>>>>            }
>>>>>>          else
>>>>>>            {
>>>>>>              return "Unknown error";
>>>>>>            }
>>>>>>       }
>>>>>>
>>>>>> This version is portable and re-entrant.
>>>>>>
>>>>>> On windows, the largest errnum is 43, on Ubuntu 14.04 we have 133,
>>>>>> and on Solaris 11.1 we get 151.
>>>>>>
>>>>>> If this is OK with you, I will open a jira for this.
>>>>>>
>>>>>>
>>>>>> Thanks,
>>>>>> Malcolm
>>>>>>
>>>>>>
>>>>>> On 12/12/2014 11:10 PM, Colin McCabe wrote:
>>>>>> Just use snprintf to copy the error message from strerror_r into a
>>>>>> thread-local buffer of 64 bytes or so.  Then preserve the existing
>>>>>> terror() interface.
>>>>>>
>>>>>> Can you open a jira for this?
>>>>>>
>>>>>> best,
>>>>>> Colin
>>>>>>
>>>>>> On Thu, Dec 11, 2014 at 8:35 PM,
>>>>>> malcolm<ma...@oracle.com> 
>>>>>>
>>>>>> wrote:
>>>>>> So, turns out that if I had naively changed all calls to terror or
>>>>>> references to sys_errlist, to using strerror_r, then I would have 
>>>>>> broken
>>>>>> code for Windows and HPUX (and possibly other OSes).
>>>>>>
>>>>>> If we are to assume that current code runs fine on all platforms
>>>>>> (maybe even
>>>>>> AIX an MacOS, for example), then any change/additions made to the
>>>>>> code and
>>>>>> not ifdeffed appropriately can break on other OSes. On the other
>>>>>> hand,  too
>>>>>> many ifdefs can pollute the code source and render it less readable
>>>>>> (though
>>>>>> possibly less important).
>>>>>>
>>>>>> In the general case what are code contributors responsibilities to
>>>>>> adding
>>>>>> code regarding OSes besides Linux ?
>>>>>> What OSes does jenkins test on ?
>>>>>> I guess maintainers of code on non-tested platforms are 
>>>>>> responsible for
>>>>>> their own testing ?
>>>>>>
>>>>>> How do we avoid the ping-pong effect, i.e. I make a generic 
>>>>>> change to
>>>>>> code
>>>>>> which breaks on Windows, then the Windows maintainer reverts 
>>>>>> changes to
>>>>>> break on Solaris for example ? Or does this not happen in 
>>>>>> actuality ?
>>>>>>
>>>>>>
>>>>>> On 12/11/2014 11:25 PM, Asokan, M wrote:
>>>>>> Hi Malcom,
>>>>>>        The Windows versions of strerror() and strerror_s() 
>>>>>> functions are
>>>>>> probably meant for ANSI C library functions that set errno.  For 
>>>>>> core
>>>>>> Windows API calls (like UNIX system calls), one gets the error 
>>>>>> number by
>>>>>> calling GetLastError() function.  In the code snippet I sent 
>>>>>> earlier,
>>>>>> the
>>>>>> "code" argument is the value returned by GetLastError(). Neither
>>>>>> strerror()
>>>>>> nor strerror_s() will give the correct error message for this error
>>>>>> code.
>>>>>>
>>>>>> You could probably look at libwinutils.c in Hadoop source.  It uses
>>>>>> FormatMessageW (which returns messages in Unicode.)  My requirement
>>>>>> was to
>>>>>> return messages in current system locale.
>>>>>>
>>>>>> -- Asokan
>>>>>> ________________________________________
>>>>>> From: malcolm
>>>>>> [malcolm.kavalsky@oracle.com<ma...@oracle.com>]
>>>>>> Sent: Thursday, December 11, 2014 4:04 PM
>>>>>> To:common-dev@hadoop.apache.org<ma...@hadoop.apache.org> 
>>>>>>
>>>>>> Subject: Re: Solaris Port
>>>>>>
>>>>>> Hi Asok,
>>>>>>
>>>>>> I googled and found that windows has strerror, and strerror_s 
>>>>>> (which is
>>>>>> the strerror_r equivalent).
>>>>>> Is there a reason why you didn't use this call ?
>>>>>>
>>>>>> On 12/11/2014 06:27 PM, Asokan, M wrote:
>>>>>> Hi Malcom,
>>>>>>          Recently, I had to work on a function to get system error
>>>>>> message on
>>>>>> various systems.  Here is the piece of code I came up with. Hope it
>>>>>> helps.
>>>>>>
>>>>>> static void get_system_error_message(char * buf, int buf_len, int 
>>>>>> code)
>>>>>> {
>>>>>> #if defined(_WIN32)
>>>>>>           LPVOID lpMsgBuf;
>>>>>>           DWORD status = 
>>>>>> FormatMessage(FORMAT_MESSAGE_ALLOCATE_BUFFER |
>>>>>> FORMAT_MESSAGE_FROM_SYSTEM |
>>>>>> FORMAT_MESSAGE_IGNORE_INSERTS,
>>>>>>                                        NULL, code,
>>>>>> MAKELANGID(LANG_NEUTRAL,
>>>>>> SUBLANG_DEFAULT),
>>>>>> /* Default
>>>>>> language */
>>>>>>                                        (LPTSTR) &lpMsgBuf, 0, NULL);
>>>>>>           if (status > 0)
>>>>>>           {
>>>>>>               strncpy(buf, (char *)lpMsgBuf, buf_len-1);
>>>>>>               buf[buf_len-1] = '\0';
>>>>>>               /* Free the buffer returned by system */
>>>>>>               LocalFree(lpMsgBuf);
>>>>>>           }
>>>>>>           else
>>>>>>           {
>>>>>>               _snprintf(buf, buf_len-1 , "%s %d",
>>>>>>                   "Can't get system error message for code", code);
>>>>>>               buf[buf_len-1] = '\0';
>>>>>>           }
>>>>>> #else
>>>>>> #if defined(_HPUX_SOURCE)
>>>>>>           {
>>>>>>               char * msg;
>>>>>>               errno = 0;
>>>>>>               msg = strerror(code);
>>>>>>               if (errno == 0)
>>>>>>               {
>>>>>>                   strncpy(buf, msg, buf_len-1);
>>>>>>                   buf[buf_len-1] = '\0';
>>>>>>               }
>>>>>>               else
>>>>>>               {
>>>>>>                   snprintf(buf, buf_len, "%s %d",
>>>>>>                       "Can't get system error message for code", 
>>>>>> code);
>>>>>>               }
>>>>>>           }
>>>>>> #else
>>>>>>           if (strerror_r(code, buf, buf_len) != 0)
>>>>>>           {
>>>>>>               snprintf(buf, buf_len, "%s %d",
>>>>>>                   "Can't get system error message for code", code);
>>>>>>           }
>>>>>> #endif
>>>>>> #endif
>>>>>> }
>>>>>>
>>>>>> Note that HPUX does not have strerror_r() since strerror() itself is
>>>>>> thread-safe.  Also Windows does not have snprintf(). The equivalent
>>>>>> function _snprintf() has a subtle difference in its interface.
>>>>>>
>>>>>> -- Asokan
>>>>>> ________________________________________
>>>>>> From: malcolm
>>>>>> [malcolm.kavalsky@oracle.com<ma...@oracle.com>]
>>>>>> Sent: Thursday, December 11, 2014 11:02 AM
>>>>>> To:common-dev@hadoop.apache.org<ma...@hadoop.apache.org> 
>>>>>>
>>>>>> Subject: Re: Solaris Port
>>>>>>
>>>>>> Fine with me, I volunteer to do this, if accepted.
>>>>>>
>>>>>> On 12/11/2014 05:48 PM, Allen Wittenauer wrote:
>>>>>> sys_errlist was removed for a reason.  Creating a fake 
>>>>>> sys_errlist on
>>>>>> Solaris will mean the libhadoop.so will need to be tied a 
>>>>>> specific build
>>>>>> (kernel/include pairing) and therefore limits upward
>>>>>> mobility/compatibility.
>>>>>> That doesn’t seem like a very good idea.
>>>>>>
>>>>>> IMO, switching to strerror_r is much preferred, since other than the
>>>>>> brain-dead GNU libc version, is highly portable and should work
>>>>>> regardless
>>>>>> of the kernel or OS in place.
>>>>>>
>>>>>> On Dec 11, 2014, at 5:20 AM,
>>>>>> malcolm<ma...@oracle.com> 
>>>>>>
>>>>>> wrote:
>>>>>>
>>>>>> FYI, there are a couple more files that reference sys_errlist 
>>>>>> directly
>>>>>> (not just terror within exception.c) , but also 
>>>>>> hdfs_http_client.c and
>>>>>> NativeiO.c
>>>>>>
>>>>>> On 12/11/2014 07:38 AM, malcolm wrote:
>>>>>> Hi Colin,
>>>>>>
>>>>>> Exactly, as you noticed, the problem is the thread-local buffer 
>>>>>> needed
>>>>>> to return from terror.
>>>>>> Currently, terror just returns a static string from an array, 
>>>>>> this is
>>>>>> fast, simple and error-proof.
>>>>>>
>>>>>> In order to use strerror_r inside terror,  would require 
>>>>>> allocating a
>>>>>> buffer inside terror  and depend on the caller to free the buffer 
>>>>>> after
>>>>>> using it, or to pass a buffer to terrror (which is basically the 
>>>>>> same as
>>>>>> strerror_r, rendering terror redundant).
>>>>>> Both cases require modification outside terror itself, as far as 
>>>>>> I can
>>>>>> tell, no simple fix. Unless you have an alternative which I haven't
>>>>>> thought
>>>>>> of ?
>>>>>>
>>>>>> As far as I can tell, we have two choices:
>>>>>>
>>>>>> 1. Remove terror and replace calls with strerror_r, passing a buffer
>>>>>> from the callee.
>>>>>>           Advantage: a more modern portable interface.
>>>>>>           Disadvantage: All calls to terror need to be modified, 
>>>>>> though
>>>>>> all seem to be in a few files as far as I can tell.
>>>>>>
>>>>>> 2. Adding a sys_errlist array (ifdeffed for Solaris)
>>>>>>           Advantage: no change to any calls to terror
>>>>>>           Disadvantage: 2 additional files added to source tree 
>>>>>> (.c and
>>>>>> .h) and some minor ifdefs only used for Solaris.
>>>>>>
>>>>>> I think it is more a question of style than anything else, so I 
>>>>>> leave
>>>>>> you to make the call.
>>>>>>
>>>>>> Thanks for your patience,
>>>>>> Malcolm
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 12/10/2014 09:54 PM, Colin McCabe wrote:
>>>>>> On Wed, Dec 10, 2014 at 2:31 AM, malcolm
>>>>>> <ma...@oracle.com> 
>>>>>> wrote:
>>>>>> Hi Colin,
>>>>>>
>>>>>> Thanks for the hints around JIRAs.
>>>>>>
>>>>>> You are correct errno still exists, however sys_errlist does not.
>>>>>>
>>>>>> Hadoop uses a function terror (defined in exception.c) which indexes
>>>>>> sys_errlist by errno to return the error message from the array.
>>>>>> This
>>>>>> function is called 26 times in various places (in 2.2)
>>>>>>
>>>>>> Originally, I thought to replace all calls to terror with strerror,
>>>>>> but
>>>>>> there can be issues with multi-threading (it returns a buffer which
>>>>>> can be
>>>>>> overwritten), so it seemed simpler just to recreate the sys_errlist
>>>>>> message
>>>>>> array.
>>>>>>
>>>>>> There is also a multi-threaded version strerror_r where you pass the
>>>>>> buffer
>>>>>> as a parameter, but this would necessitate changing every call to
>>>>>> terror
>>>>>> with mutiple lines of code.
>>>>>> Why don't you just use strerror_r inside terror()?
>>>>>>
>>>>>> I wrote that code originally.  The reason I didn't want to use
>>>>>> strerror_r there is because GNU libc provides a non-POSIX definition
>>>>>> of strerror_r, and forcing it to use the POSIX one is a pain. But 
>>>>>> you
>>>>>> can do it.  You also will require a thread-local buffer to hold the
>>>>>> return from strerror_r, since it is not guaranteed to be static
>>>>>> (although in practice it is 99% of the time-- another annoyance with
>>>>>> the API).
>>>>>>
>>>>>>
>>>>>> ________________________________
>>>>>>
>>>>>>
>>>>>>
>>>>>> ATTENTION: -----
>>>>>>
>>>>>> The information contained in this message (including any files
>>>>>> transmitted with this message) may contain proprietary, trade 
>>>>>> secret or
>>>>>> other confidential and/or legally privileged information. Any 
>>>>>> pricing
>>>>>> information contained in this message or in any files transmitted
>>>>>> with this
>>>>>> message is always confidential and cannot be shared with any third
>>>>>> parties
>>>>>> without prior written approval from Syncsort. This message is
>>>>>> intended to be
>>>>>> read only by the individual or entity to whom it is addressed or by
>>>>>> their
>>>>>> designee. If the reader of this message is not the intended
>>>>>> recipient, you
>>>>>> are on notice that any use, disclosure, copying or distribution 
>>>>>> of this
>>>>>> message, in any form, is strictly prohibited. If you have 
>>>>>> received this
>>>>>> message in error, please immediately notify the sender and/or
>>>>>> Syncsort and
>>>>>> destroy all copies of this message in your possession, custody or
>>>>>> control.
>>>>>>
>>>>>>
>>>>>>
>>>> ________________________________
>>>>
>>>>
>>>>
>>>> ATTENTION: -----
>>>>
>>>> The information contained in this message (including any files 
>>>> transmitted
>>>> with this message) may contain proprietary, trade secret or other
>>>> confidential and/or legally privileged information. Any pricing 
>>>> information
>>>> contained in this message or in any files transmitted with this 
>>>> message is
>>>> always confidential and cannot be shared with any third parties 
>>>> without
>>>> prior written approval from Syncsort. This message is intended to 
>>>> be read
>>>> only by the individual or entity to whom it is addressed or by their
>>>> designee. If the reader of this message is not the intended 
>>>> recipient, you
>>>> are on notice that any use, disclosure, copying or distribution of 
>>>> this
>>>> message, in any form, is strictly prohibited. If you have received 
>>>> this
>>>> message in error, please immediately notify the sender and/or 
>>>> Syncsort and
>>>> destroy all copies of this message in your possession, custody or 
>>>> control.
>>>
>


Re: Solaris Port SOLVED!

Posted by malcolm <ma...@oracle.com>.
Done, and added the comment as you requested.
I attached a second patch file to the JIRA (with .002 appended as per 
convention) assuming Jenkins knows to take the latest version, since I 
understand that I cannot remove the previous patch file .

On 12/16/2014 04:12 AM, Colin McCabe wrote:
> Thanks, Malcom.  I reviewed it.  The only thing you still have to do
> is hit "submit patch" to get a Jenkins run.  See our HowToContribute
> wiki page for more details.
>
> wiki.apache.org/hadoop/HowToContribute
>
> best,
> Colin
>
> On Sat, Dec 13, 2014 at 9:22 PM, malcolm <ma...@oracle.com> wrote:
>> I am checking on the latest release of Solaris 11 and yes, it is still
>> thread safe (or MT Safe as documented on the man page).
>>
>> strerror checks the error code, and returns the same "unknown error" string
>> as terror does, if it receives an invalid code. I checked this on Windows,
>> Solaris and Linux (though my changes only affect Solaris platforms).
>>
>> JIRA newbie question:
>>
>> I have filed the JIRA attaching the patch  HADOOP-11403 against the trunk,
>> asking for reviewers in the comments section.
>> Is there any other protocol I should follow ?
>>
>> Thanks,
>> Malcolm
>>
>>
>> On 12/14/2014 01:08 AM, Asokan, M wrote:
>>> Malcom,
>>>      That's great! Is strerror() thread-safe in the recent version of
>>> Solaris?  In any case, to be correct you still need to make sure that the
>>> code passed to strerror() is a valid one.  For this you need to check errno
>>> after the call to strerror().  Please check the code snippet I sent earlier
>>> for HPUX.
>>>
>>> -- Asokan
>>> ________________________________________
>>> From: malcolm [malcolm.kavalsky@oracle.com]
>>> Sent: Saturday, December 13, 2014 3:13 PM
>>> To: common-dev@hadoop.apache.org
>>> Subject: Re: Solaris Port SOLVED!
>>>
>>> Wiping egg off face  ...
>>>
>>> After consulting with the Solaris team (and looking at the source code
>>> and man page) ,  it turns out that strerror itself on Solaris is MT-Safe
>>> ! (Just like HPUX)
>>>
>>> So, after all this effort, all I need to do is modify terror as follows:
>>>
>>>       const char* terror(int errnum)
>>>       {
>>>
>>>       #if defined(__sun)
>>>          return strerror(errnum); //  MT-Safe under Solaris
>>>       #else
>>>          if ((errnum < 0) || (errnum >= sys_nerr)) {
>>>            return "unknown error.";
>>>          }
>>>          return sys_errlist[errnum];
>>>       #endif
>>>       }
>>>
>>> And in two other files where sys_errlist is referenced directly
>>> (NativeIO and hdfs_http_client.c), I replaced this direct access instead
>>> with a call to terror.
>>>
>>> Thanks for all your help and patience,
>>>
>>> I'll file a JIRA asap,
>>>
>>> Cheers,
>>> Malcolm
>>>
>>> On 12/13/2014 05:26 PM, malcolm wrote:
>>>> Thanks Asokan,
>>>>
>>>> Looked up Gcc's thread local variables, seems a bit complex though and
>>>> quite specific to Gnu.
>>>>
>>>> Intialization of the static errlist array should be thread safe i.e.
>>>> initially the array is nulled out, and afterwards if two threads write
>>>> to the same address, then they would be writing the same string.
>>>>
>>>> But if we are ok with changing 5 files, not just terror, then I would
>>>> just remove terror completely and use strerror_r (or the alternatives
>>>> for Windows and HP_UX) in the caller code instead i.e. using your
>>>> suggestion for a local buffer in the caller, wherever needed. The more
>>>> I think about it, the more this seems to be the right thing to do.
>>>>
>>>> Cheers,
>>>> Malcolm
>>>>
>>>>
>>>> On 12/13/2014 04:38 PM, Asokan, M wrote:
>>>>> Malcom,
>>>>>       Gcc supports thread-local variables. See
>>>>>
>>>>> https://gcc.gnu.org/onlinedocs/gcc-3.3.1/gcc/Thread-Local.html
>>>>>
>>>>> I am not sure about native compilers on Solaris, HPUX, or AIX.
>>>>>
>>>>> In any case, I found out that the Windows native code in Hadoop seems
>>>>> to handle error messages properly. Here is what I found:
>>>>>
>>>>> $ find ~/work/hadoop/hadoop-trunk/ -name '*.c'|xargs grephadoop how to
>>>>> file a jira
>>>>>
>>>>> FormatMessage|awk -F: '{print $1}'|sort -u
>>>>>
>>>>> /home/asokan/work/hadoop/hadoop-trunk/hadoop-common-project/hadoop-common/src/main/native/src/org/apache/hadoop/io/nativeio/NativeIO.c
>>>>>
>>>>>
>>>>> /home/asokan/work/hadoop/hadoop-trunk/hadoop-common-project/hadoop-common/src/main/native/src/org/apache/hadoop/security/JniBasedUnixGroupsMappingWin.c
>>>>>
>>>>>
>>>>> /home/asokan/work/hadoop/hadoop-trunk/hadoop-common-project/hadoop-common/src/main/winutils/libwinutils.c
>>>>>
>>>>>
>>>>>
>>>>> $ find ~/work/hadoop/hadoop-trunk/ -name '*.c'|xargs grep terror|awk
>>>>> -F: '{print $1}'|sort -u
>>>>>
>>>>> /home/asokan/work/hadoop/hadoop-trunk/hadoop-common-project/hadoop-common/src/main/native/src/exception.c
>>>>>
>>>>>
>>>>> /home/asokan/work/hadoop/hadoop-trunk/hadoop-common-project/hadoop-common/src/main/native/src/org/apache/hadoop/io/nativeio/SharedFileDescriptorFactory.c
>>>>>
>>>>>
>>>>> /home/asokan/work/hadoop/hadoop-trunk/hadoop-common-project/hadoop-common/src/main/native/src/org/apache/hadoop/net/unix/DomainSocket.c
>>>>>
>>>>>
>>>>> /home/asokan/work/hadoop/hadoop-trunk/hadoop-common-project/hadoop-common/src/main/native/src/org/apache/hadoop/net/unix/DomainSocketWatcher.c
>>>>>
>>>>>
>>>>> /home/asokan/work/hadoop/hadoop-trunk/hadoop-common-project/hadoop-common/src/main/native/src/org/apache/hadoop/security/JniBasedUnixGroupsMapping.c
>>>>>
>>>>>
>>>>>
>>>>> This means you need not worry about the Windows version of terror().
>>>>> You need to change five files that contain UNIX specific native code.
>>>>>
>>>>> I have a question on your suggested implementation:
>>>>>
>>>>> How do you initialize the static errlist array in a thread-safe manner?
>>>>>
>>>>> ________________________________
>>>>> Here is another thread-safe implementation that I could come up with:
>>>>>
>>>>> #include <string.h>
>>>>> #include <stdlib.h>
>>>>> #include <errno.h>
>>>>> #include <stdio.h>
>>>>>
>>>>> #define MESSAGE_BUFFER_SIZE 256
>>>>>
>>>>> char * getSystemErrorMessage(char * buf, int buf_len, int code) {
>>>>> #if defined(_HPUX_SOURCE)
>>>>>      char * msg;
>>>>>      errno = 0;
>>>>>      msg = strerror(code);
>>>>>      if (errno == 0) {
>>>>>        strncpy(buf, msg, buf_len-1);
>>>>>        buf[buf_len-1] = '\0';
>>>>>      } else {
>>>>>        snprintf(buf, buf_len, "%s %d",
>>>>>            "Can't get system error message for code", code);
>>>>>      }
>>>>> #else
>>>>>      if (strerror_r(code, buf, buf_len) != 0) {
>>>>>        snprintf(buf, buf_len, "%s %d",
>>>>>            "Can't get system error message for code", code);
>>>>>      }
>>>>> #endif
>>>>>      return buf;
>>>>> }
>>>>>
>>>>> #define TERROR(code) \
>>>>> getSystemErrorMessage(messageBuffer, sizeof(messageBuffer), code)
>>>>>
>>>>> int main(int argc, char ** argv) {
>>>>>      if (argc > 1) {
>>>>>        char messageBuffer[MESSAGE_BUFFER_SIZE];
>>>>>        int code = atoi(argv[1]);
>>>>>
>>>>>        fprintf(stderr, "System error for code %s: %s\n", argv[1],
>>>>> TERROR(code));
>>>>>      }
>>>>>      return 0;
>>>>> }
>>>>>
>>>>>
>>>>> This changes terror to a macro TERROR and requires all functions that
>>>>> call TERROR macro to declare the local variable messageBuffer. Since
>>>>> there are only five files to modify, I think it is not a big effort.
>>>>> What do you think?
>>>>>
>>>>> -- Asokan
>>>>>
>>>>> On 12/13/2014 04:29 AM, malcolm wrote:
>>>>> Colin,
>>>>>
>>>>> I am not sure what you mean by a thread-local buffer (in native
>>>>> code). In Java this is pretty standard, but I couldn't find any
>>>>> implementation for C code.
>>>>>
>>>>> Here is the terror function:
>>>>>
>>>>>        const char* terror(int errnum)
>>>>>        {
>>>>>          if ((errnum < 0) || (errnum >= sys_nerr)) {
>>>>>            return "unknown error.";
>>>>>          }
>>>>>          return sys_errlist[errnum];
>>>>>        }
>>>>>
>>>>>
>>>>> The interface is identical to strerror, but the implementation is
>>>>> actually re-entrant since it returns a pointer to a static string.
>>>>>
>>>>> If I understand your suggestion, the new function would look like this:
>>>>>
>>>>>       const char* terror(int errnum)
>>>>>       {
>>>>>          static char result[65];
>>>>>
>>>>>          strerror_r(errnum, result, 64);
>>>>>
>>>>>          return result;
>>>>>       }
>>>>>
>>>>> No need for snprintf, strerror_r  has the 'n' bounding built-in.
>>>>>
>>>>> Of course, this is still non-re-entrant, so unless the caller copies
>>>>> the returned buffer, before the function is called again, there is a
>>>>> problem.
>>>>>
>>>>> After considerable thought, I have come up with this version of
>>>>> terror, tested OK on Windows, Linux and Solaris:
>>>>>
>>>>>       #if defined(_WIN32)
>>>>>       #define strerror_r(errno,buf,len) strerror_s(buf,len,errno)
>>>>>       #endif
>>>>>
>>>>>       #define MAX_ERRORS 256
>>>>>       #define MAX_ERROR_LEN 80
>>>>>
>>>>>       char *terror(int errnum)
>>>>>       {
>>>>>
>>>>>          static char errlist[MAX_ERRORS][MAX_ERROR_LEN+1]; // cache of
>>>>>       error messages
>>>>>
>>>>>          if ( errnum >= 0 && errnum < MAX_ERRORS )
>>>>>            {
>>>>>              if ( errlist[errnum][0] == 0 )
>>>>>                strerror_r( errnum, errlist[errnum], MAX_ERROR_LEN);
>>>>>
>>>>>              return errlist[errnum];
>>>>>            }
>>>>>          else
>>>>>            {
>>>>>              return "Unknown error";
>>>>>            }
>>>>>       }
>>>>>
>>>>> This version is portable and re-entrant.
>>>>>
>>>>> On windows, the largest errnum is 43, on Ubuntu 14.04 we have 133,
>>>>> and on Solaris 11.1 we get 151.
>>>>>
>>>>> If this is OK with you, I will open a jira for this.
>>>>>
>>>>>
>>>>> Thanks,
>>>>> Malcolm
>>>>>
>>>>>
>>>>> On 12/12/2014 11:10 PM, Colin McCabe wrote:
>>>>> Just use snprintf to copy the error message from strerror_r into a
>>>>> thread-local buffer of 64 bytes or so.  Then preserve the existing
>>>>> terror() interface.
>>>>>
>>>>> Can you open a jira for this?
>>>>>
>>>>> best,
>>>>> Colin
>>>>>
>>>>> On Thu, Dec 11, 2014 at 8:35 PM,
>>>>> malcolm<ma...@oracle.com>
>>>>> wrote:
>>>>> So, turns out that if I had naively changed all calls to terror or
>>>>> references to sys_errlist, to using strerror_r, then I would have broken
>>>>> code for Windows and HPUX (and possibly other OSes).
>>>>>
>>>>> If we are to assume that current code runs fine on all platforms
>>>>> (maybe even
>>>>> AIX an MacOS, for example), then any change/additions made to the
>>>>> code and
>>>>> not ifdeffed appropriately can break on other OSes. On the other
>>>>> hand,  too
>>>>> many ifdefs can pollute the code source and render it less readable
>>>>> (though
>>>>> possibly less important).
>>>>>
>>>>> In the general case what are code contributors responsibilities to
>>>>> adding
>>>>> code regarding OSes besides Linux ?
>>>>> What OSes does jenkins test on ?
>>>>> I guess maintainers of code on non-tested platforms are responsible for
>>>>> their own testing ?
>>>>>
>>>>> How do we avoid the ping-pong effect, i.e. I make a generic change to
>>>>> code
>>>>> which breaks on Windows, then the Windows maintainer reverts changes to
>>>>> break on Solaris for example ? Or does this not happen in actuality ?
>>>>>
>>>>>
>>>>> On 12/11/2014 11:25 PM, Asokan, M wrote:
>>>>> Hi Malcom,
>>>>>        The Windows versions of strerror() and strerror_s() functions are
>>>>> probably meant for ANSI C library functions that set errno.  For core
>>>>> Windows API calls (like UNIX system calls), one gets the error number by
>>>>> calling GetLastError() function.  In the code snippet I sent earlier,
>>>>> the
>>>>> "code" argument is the value returned by GetLastError(). Neither
>>>>> strerror()
>>>>> nor strerror_s() will give the correct error message for this error
>>>>> code.
>>>>>
>>>>> You could probably look at libwinutils.c in Hadoop source.  It uses
>>>>> FormatMessageW (which returns messages in Unicode.)  My requirement
>>>>> was to
>>>>> return messages in current system locale.
>>>>>
>>>>> -- Asokan
>>>>> ________________________________________
>>>>> From: malcolm
>>>>> [malcolm.kavalsky@oracle.com<ma...@oracle.com>]
>>>>> Sent: Thursday, December 11, 2014 4:04 PM
>>>>> To:common-dev@hadoop.apache.org<ma...@hadoop.apache.org>
>>>>> Subject: Re: Solaris Port
>>>>>
>>>>> Hi Asok,
>>>>>
>>>>> I googled and found that windows has strerror, and strerror_s (which is
>>>>> the strerror_r equivalent).
>>>>> Is there a reason why you didn't use this call ?
>>>>>
>>>>> On 12/11/2014 06:27 PM, Asokan, M wrote:
>>>>> Hi Malcom,
>>>>>          Recently, I had to work on a function to get system error
>>>>> message on
>>>>> various systems.  Here is the piece of code I came up with. Hope it
>>>>> helps.
>>>>>
>>>>> static void get_system_error_message(char * buf, int buf_len, int code)
>>>>> {
>>>>> #if defined(_WIN32)
>>>>>           LPVOID lpMsgBuf;
>>>>>           DWORD status = FormatMessage(FORMAT_MESSAGE_ALLOCATE_BUFFER |
>>>>>                                        FORMAT_MESSAGE_FROM_SYSTEM |
>>>>> FORMAT_MESSAGE_IGNORE_INSERTS,
>>>>>                                        NULL, code,
>>>>>                                        MAKELANGID(LANG_NEUTRAL,
>>>>> SUBLANG_DEFAULT),
>>>>>                                                                /* Default
>>>>> language */
>>>>>                                        (LPTSTR) &lpMsgBuf, 0, NULL);
>>>>>           if (status > 0)
>>>>>           {
>>>>>               strncpy(buf, (char *)lpMsgBuf, buf_len-1);
>>>>>               buf[buf_len-1] = '\0';
>>>>>               /* Free the buffer returned by system */
>>>>>               LocalFree(lpMsgBuf);
>>>>>           }
>>>>>           else
>>>>>           {
>>>>>               _snprintf(buf, buf_len-1 , "%s %d",
>>>>>                   "Can't get system error message for code", code);
>>>>>               buf[buf_len-1] = '\0';
>>>>>           }
>>>>> #else
>>>>> #if defined(_HPUX_SOURCE)
>>>>>           {
>>>>>               char * msg;
>>>>>               errno = 0;
>>>>>               msg = strerror(code);
>>>>>               if (errno == 0)
>>>>>               {
>>>>>                   strncpy(buf, msg, buf_len-1);
>>>>>                   buf[buf_len-1] = '\0';
>>>>>               }
>>>>>               else
>>>>>               {
>>>>>                   snprintf(buf, buf_len, "%s %d",
>>>>>                       "Can't get system error message for code", code);
>>>>>               }
>>>>>           }
>>>>> #else
>>>>>           if (strerror_r(code, buf, buf_len) != 0)
>>>>>           {
>>>>>               snprintf(buf, buf_len, "%s %d",
>>>>>                   "Can't get system error message for code", code);
>>>>>           }
>>>>> #endif
>>>>> #endif
>>>>> }
>>>>>
>>>>> Note that HPUX does not have strerror_r() since strerror() itself is
>>>>> thread-safe.  Also Windows does not have snprintf().  The equivalent
>>>>> function _snprintf() has a subtle difference in its interface.
>>>>>
>>>>> -- Asokan
>>>>> ________________________________________
>>>>> From: malcolm
>>>>> [malcolm.kavalsky@oracle.com<ma...@oracle.com>]
>>>>> Sent: Thursday, December 11, 2014 11:02 AM
>>>>> To:common-dev@hadoop.apache.org<ma...@hadoop.apache.org>
>>>>> Subject: Re: Solaris Port
>>>>>
>>>>> Fine with me, I volunteer to do this, if accepted.
>>>>>
>>>>> On 12/11/2014 05:48 PM, Allen Wittenauer wrote:
>>>>> sys_errlist was removed for a reason.  Creating a fake sys_errlist on
>>>>> Solaris will mean the libhadoop.so will need to be tied a specific build
>>>>> (kernel/include pairing) and therefore limits upward
>>>>> mobility/compatibility.
>>>>> That doesn’t seem like a very good idea.
>>>>>
>>>>> IMO, switching to strerror_r is much preferred, since other than the
>>>>> brain-dead GNU libc version, is highly portable and should work
>>>>> regardless
>>>>> of the kernel or OS in place.
>>>>>
>>>>> On Dec 11, 2014, at 5:20 AM,
>>>>> malcolm<ma...@oracle.com>
>>>>> wrote:
>>>>>
>>>>> FYI, there are a couple more files that reference sys_errlist directly
>>>>> (not just terror within exception.c) , but also hdfs_http_client.c and
>>>>> NativeiO.c
>>>>>
>>>>> On 12/11/2014 07:38 AM, malcolm wrote:
>>>>> Hi Colin,
>>>>>
>>>>> Exactly, as you noticed, the problem is the thread-local buffer needed
>>>>> to return from terror.
>>>>> Currently, terror just returns a static string from an array, this is
>>>>> fast, simple and error-proof.
>>>>>
>>>>> In order to use strerror_r inside terror,  would require allocating a
>>>>> buffer inside terror  and depend on the caller to free the buffer after
>>>>> using it, or to pass a buffer to terrror (which is basically the same as
>>>>> strerror_r, rendering terror redundant).
>>>>> Both cases require modification outside terror itself, as far as I can
>>>>> tell, no simple fix. Unless you have an alternative which I haven't
>>>>> thought
>>>>> of ?
>>>>>
>>>>> As far as I can tell, we have two choices:
>>>>>
>>>>> 1. Remove terror and replace calls with strerror_r, passing a buffer
>>>>> from the callee.
>>>>>           Advantage: a more modern portable interface.
>>>>>           Disadvantage: All calls to terror need to be modified, though
>>>>> all seem to be in a few files as far as I can tell.
>>>>>
>>>>> 2. Adding a sys_errlist array (ifdeffed for Solaris)
>>>>>           Advantage: no change to any calls to terror
>>>>>           Disadvantage: 2 additional files added to source tree (.c and
>>>>> .h) and some minor ifdefs only used for Solaris.
>>>>>
>>>>> I think it is more a question of style than anything else, so I leave
>>>>> you to make the call.
>>>>>
>>>>> Thanks for your patience,
>>>>> Malcolm
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On 12/10/2014 09:54 PM, Colin McCabe wrote:
>>>>> On Wed, Dec 10, 2014 at 2:31 AM, malcolm
>>>>> <ma...@oracle.com> wrote:
>>>>> Hi Colin,
>>>>>
>>>>> Thanks for the hints around JIRAs.
>>>>>
>>>>> You are correct errno still exists, however sys_errlist does not.
>>>>>
>>>>> Hadoop uses a function terror (defined in exception.c) which indexes
>>>>> sys_errlist by errno to return the error message from the array.
>>>>> This
>>>>> function is called 26 times in various places (in 2.2)
>>>>>
>>>>> Originally, I thought to replace all calls to terror with strerror,
>>>>> but
>>>>> there can be issues with multi-threading (it returns a buffer which
>>>>> can be
>>>>> overwritten), so it seemed simpler just to recreate the sys_errlist
>>>>> message
>>>>> array.
>>>>>
>>>>> There is also a multi-threaded version strerror_r where you pass the
>>>>> buffer
>>>>> as a parameter, but this would necessitate changing every call to
>>>>> terror
>>>>> with mutiple lines of code.
>>>>> Why don't you just use strerror_r inside terror()?
>>>>>
>>>>> I wrote that code originally.  The reason I didn't want to use
>>>>> strerror_r there is because GNU libc provides a non-POSIX definition
>>>>> of strerror_r, and forcing it to use the POSIX one is a pain. But you
>>>>> can do it.  You also will require a thread-local buffer to hold the
>>>>> return from strerror_r, since it is not guaranteed to be static
>>>>> (although in practice it is 99% of the time-- another annoyance with
>>>>> the API).
>>>>>
>>>>>
>>>>> ________________________________
>>>>>
>>>>>
>>>>>
>>>>> ATTENTION: -----
>>>>>
>>>>> The information contained in this message (including any files
>>>>> transmitted with this message) may contain proprietary, trade secret or
>>>>> other confidential and/or legally privileged information. Any pricing
>>>>> information contained in this message or in any files transmitted
>>>>> with this
>>>>> message is always confidential and cannot be shared with any third
>>>>> parties
>>>>> without prior written approval from Syncsort. This message is
>>>>> intended to be
>>>>> read only by the individual or entity to whom it is addressed or by
>>>>> their
>>>>> designee. If the reader of this message is not the intended
>>>>> recipient, you
>>>>> are on notice that any use, disclosure, copying or distribution of this
>>>>> message, in any form, is strictly prohibited. If you have received this
>>>>> message in error, please immediately notify the sender and/or
>>>>> Syncsort and
>>>>> destroy all copies of this message in your possession, custody or
>>>>> control.
>>>>>
>>>>>
>>>>>
>>> ________________________________
>>>
>>>
>>>
>>> ATTENTION: -----
>>>
>>> The information contained in this message (including any files transmitted
>>> with this message) may contain proprietary, trade secret or other
>>> confidential and/or legally privileged information. Any pricing information
>>> contained in this message or in any files transmitted with this message is
>>> always confidential and cannot be shared with any third parties without
>>> prior written approval from Syncsort. This message is intended to be read
>>> only by the individual or entity to whom it is addressed or by their
>>> designee. If the reader of this message is not the intended recipient, you
>>> are on notice that any use, disclosure, copying or distribution of this
>>> message, in any form, is strictly prohibited. If you have received this
>>> message in error, please immediately notify the sender and/or Syncsort and
>>> destroy all copies of this message in your possession, custody or control.
>>


Re: Solaris Port SOLVED!

Posted by Colin McCabe <cm...@alumni.cmu.edu>.
Thanks, Malcom.  I reviewed it.  The only thing you still have to do
is hit "submit patch" to get a Jenkins run.  See our HowToContribute
wiki page for more details.

wiki.apache.org/hadoop/HowToContribute

best,
Colin

On Sat, Dec 13, 2014 at 9:22 PM, malcolm <ma...@oracle.com> wrote:
> I am checking on the latest release of Solaris 11 and yes, it is still
> thread safe (or MT Safe as documented on the man page).
>
> strerror checks the error code, and returns the same "unknown error" string
> as terror does, if it receives an invalid code. I checked this on Windows,
> Solaris and Linux (though my changes only affect Solaris platforms).
>
> JIRA newbie question:
>
> I have filed the JIRA attaching the patch  HADOOP-11403 against the trunk,
> asking for reviewers in the comments section.
> Is there any other protocol I should follow ?
>
> Thanks,
> Malcolm
>
>
> On 12/14/2014 01:08 AM, Asokan, M wrote:
>>
>> Malcom,
>>     That's great! Is strerror() thread-safe in the recent version of
>> Solaris?  In any case, to be correct you still need to make sure that the
>> code passed to strerror() is a valid one.  For this you need to check errno
>> after the call to strerror().  Please check the code snippet I sent earlier
>> for HPUX.
>>
>> -- Asokan
>> ________________________________________
>> From: malcolm [malcolm.kavalsky@oracle.com]
>> Sent: Saturday, December 13, 2014 3:13 PM
>> To: common-dev@hadoop.apache.org
>> Subject: Re: Solaris Port SOLVED!
>>
>> Wiping egg off face  ...
>>
>> After consulting with the Solaris team (and looking at the source code
>> and man page) ,  it turns out that strerror itself on Solaris is MT-Safe
>> ! (Just like HPUX)
>>
>> So, after all this effort, all I need to do is modify terror as follows:
>>
>>      const char* terror(int errnum)
>>      {
>>
>>      #if defined(__sun)
>>         return strerror(errnum); //  MT-Safe under Solaris
>>      #else
>>         if ((errnum < 0) || (errnum >= sys_nerr)) {
>>           return "unknown error.";
>>         }
>>         return sys_errlist[errnum];
>>      #endif
>>      }
>>
>> And in two other files where sys_errlist is referenced directly
>> (NativeIO and hdfs_http_client.c), I replaced this direct access instead
>> with a call to terror.
>>
>> Thanks for all your help and patience,
>>
>> I'll file a JIRA asap,
>>
>> Cheers,
>> Malcolm
>>
>> On 12/13/2014 05:26 PM, malcolm wrote:
>>>
>>> Thanks Asokan,
>>>
>>> Looked up Gcc's thread local variables, seems a bit complex though and
>>> quite specific to Gnu.
>>>
>>> Intialization of the static errlist array should be thread safe i.e.
>>> initially the array is nulled out, and afterwards if two threads write
>>> to the same address, then they would be writing the same string.
>>>
>>> But if we are ok with changing 5 files, not just terror, then I would
>>> just remove terror completely and use strerror_r (or the alternatives
>>> for Windows and HP_UX) in the caller code instead i.e. using your
>>> suggestion for a local buffer in the caller, wherever needed. The more
>>> I think about it, the more this seems to be the right thing to do.
>>>
>>> Cheers,
>>> Malcolm
>>>
>>>
>>> On 12/13/2014 04:38 PM, Asokan, M wrote:
>>>>
>>>> Malcom,
>>>>      Gcc supports thread-local variables. See
>>>>
>>>> https://gcc.gnu.org/onlinedocs/gcc-3.3.1/gcc/Thread-Local.html
>>>>
>>>> I am not sure about native compilers on Solaris, HPUX, or AIX.
>>>>
>>>> In any case, I found out that the Windows native code in Hadoop seems
>>>> to handle error messages properly. Here is what I found:
>>>>
>>>> $ find ~/work/hadoop/hadoop-trunk/ -name '*.c'|xargs grephadoop how to
>>>> file a jira
>>>>
>>>> FormatMessage|awk -F: '{print $1}'|sort -u
>>>>
>>>> /home/asokan/work/hadoop/hadoop-trunk/hadoop-common-project/hadoop-common/src/main/native/src/org/apache/hadoop/io/nativeio/NativeIO.c
>>>>
>>>>
>>>> /home/asokan/work/hadoop/hadoop-trunk/hadoop-common-project/hadoop-common/src/main/native/src/org/apache/hadoop/security/JniBasedUnixGroupsMappingWin.c
>>>>
>>>>
>>>> /home/asokan/work/hadoop/hadoop-trunk/hadoop-common-project/hadoop-common/src/main/winutils/libwinutils.c
>>>>
>>>>
>>>>
>>>> $ find ~/work/hadoop/hadoop-trunk/ -name '*.c'|xargs grep terror|awk
>>>> -F: '{print $1}'|sort -u
>>>>
>>>> /home/asokan/work/hadoop/hadoop-trunk/hadoop-common-project/hadoop-common/src/main/native/src/exception.c
>>>>
>>>>
>>>> /home/asokan/work/hadoop/hadoop-trunk/hadoop-common-project/hadoop-common/src/main/native/src/org/apache/hadoop/io/nativeio/SharedFileDescriptorFactory.c
>>>>
>>>>
>>>> /home/asokan/work/hadoop/hadoop-trunk/hadoop-common-project/hadoop-common/src/main/native/src/org/apache/hadoop/net/unix/DomainSocket.c
>>>>
>>>>
>>>> /home/asokan/work/hadoop/hadoop-trunk/hadoop-common-project/hadoop-common/src/main/native/src/org/apache/hadoop/net/unix/DomainSocketWatcher.c
>>>>
>>>>
>>>> /home/asokan/work/hadoop/hadoop-trunk/hadoop-common-project/hadoop-common/src/main/native/src/org/apache/hadoop/security/JniBasedUnixGroupsMapping.c
>>>>
>>>>
>>>>
>>>> This means you need not worry about the Windows version of terror().
>>>> You need to change five files that contain UNIX specific native code.
>>>>
>>>> I have a question on your suggested implementation:
>>>>
>>>> How do you initialize the static errlist array in a thread-safe manner?
>>>>
>>>> ________________________________
>>>> Here is another thread-safe implementation that I could come up with:
>>>>
>>>> #include <string.h>
>>>> #include <stdlib.h>
>>>> #include <errno.h>
>>>> #include <stdio.h>
>>>>
>>>> #define MESSAGE_BUFFER_SIZE 256
>>>>
>>>> char * getSystemErrorMessage(char * buf, int buf_len, int code) {
>>>> #if defined(_HPUX_SOURCE)
>>>>     char * msg;
>>>>     errno = 0;
>>>>     msg = strerror(code);
>>>>     if (errno == 0) {
>>>>       strncpy(buf, msg, buf_len-1);
>>>>       buf[buf_len-1] = '\0';
>>>>     } else {
>>>>       snprintf(buf, buf_len, "%s %d",
>>>>           "Can't get system error message for code", code);
>>>>     }
>>>> #else
>>>>     if (strerror_r(code, buf, buf_len) != 0) {
>>>>       snprintf(buf, buf_len, "%s %d",
>>>>           "Can't get system error message for code", code);
>>>>     }
>>>> #endif
>>>>     return buf;
>>>> }
>>>>
>>>> #define TERROR(code) \
>>>> getSystemErrorMessage(messageBuffer, sizeof(messageBuffer), code)
>>>>
>>>> int main(int argc, char ** argv) {
>>>>     if (argc > 1) {
>>>>       char messageBuffer[MESSAGE_BUFFER_SIZE];
>>>>       int code = atoi(argv[1]);
>>>>
>>>>       fprintf(stderr, "System error for code %s: %s\n", argv[1],
>>>> TERROR(code));
>>>>     }
>>>>     return 0;
>>>> }
>>>>
>>>>
>>>> This changes terror to a macro TERROR and requires all functions that
>>>> call TERROR macro to declare the local variable messageBuffer. Since
>>>> there are only five files to modify, I think it is not a big effort.
>>>> What do you think?
>>>>
>>>> -- Asokan
>>>>
>>>> On 12/13/2014 04:29 AM, malcolm wrote:
>>>> Colin,
>>>>
>>>> I am not sure what you mean by a thread-local buffer (in native
>>>> code). In Java this is pretty standard, but I couldn't find any
>>>> implementation for C code.
>>>>
>>>> Here is the terror function:
>>>>
>>>>       const char* terror(int errnum)
>>>>       {
>>>>         if ((errnum < 0) || (errnum >= sys_nerr)) {
>>>>           return "unknown error.";
>>>>         }
>>>>         return sys_errlist[errnum];
>>>>       }
>>>>
>>>>
>>>> The interface is identical to strerror, but the implementation is
>>>> actually re-entrant since it returns a pointer to a static string.
>>>>
>>>> If I understand your suggestion, the new function would look like this:
>>>>
>>>>      const char* terror(int errnum)
>>>>      {
>>>>         static char result[65];
>>>>
>>>>         strerror_r(errnum, result, 64);
>>>>
>>>>         return result;
>>>>      }
>>>>
>>>> No need for snprintf, strerror_r  has the 'n' bounding built-in.
>>>>
>>>> Of course, this is still non-re-entrant, so unless the caller copies
>>>> the returned buffer, before the function is called again, there is a
>>>> problem.
>>>>
>>>> After considerable thought, I have come up with this version of
>>>> terror, tested OK on Windows, Linux and Solaris:
>>>>
>>>>      #if defined(_WIN32)
>>>>      #define strerror_r(errno,buf,len) strerror_s(buf,len,errno)
>>>>      #endif
>>>>
>>>>      #define MAX_ERRORS 256
>>>>      #define MAX_ERROR_LEN 80
>>>>
>>>>      char *terror(int errnum)
>>>>      {
>>>>
>>>>         static char errlist[MAX_ERRORS][MAX_ERROR_LEN+1]; // cache of
>>>>      error messages
>>>>
>>>>         if ( errnum >= 0 && errnum < MAX_ERRORS )
>>>>           {
>>>>             if ( errlist[errnum][0] == 0 )
>>>>               strerror_r( errnum, errlist[errnum], MAX_ERROR_LEN);
>>>>
>>>>             return errlist[errnum];
>>>>           }
>>>>         else
>>>>           {
>>>>             return "Unknown error";
>>>>           }
>>>>      }
>>>>
>>>> This version is portable and re-entrant.
>>>>
>>>> On windows, the largest errnum is 43, on Ubuntu 14.04 we have 133,
>>>> and on Solaris 11.1 we get 151.
>>>>
>>>> If this is OK with you, I will open a jira for this.
>>>>
>>>>
>>>> Thanks,
>>>> Malcolm
>>>>
>>>>
>>>> On 12/12/2014 11:10 PM, Colin McCabe wrote:
>>>> Just use snprintf to copy the error message from strerror_r into a
>>>> thread-local buffer of 64 bytes or so.  Then preserve the existing
>>>> terror() interface.
>>>>
>>>> Can you open a jira for this?
>>>>
>>>> best,
>>>> Colin
>>>>
>>>> On Thu, Dec 11, 2014 at 8:35 PM,
>>>> malcolm<ma...@oracle.com>
>>>> wrote:
>>>> So, turns out that if I had naively changed all calls to terror or
>>>> references to sys_errlist, to using strerror_r, then I would have broken
>>>> code for Windows and HPUX (and possibly other OSes).
>>>>
>>>> If we are to assume that current code runs fine on all platforms
>>>> (maybe even
>>>> AIX an MacOS, for example), then any change/additions made to the
>>>> code and
>>>> not ifdeffed appropriately can break on other OSes. On the other
>>>> hand,  too
>>>> many ifdefs can pollute the code source and render it less readable
>>>> (though
>>>> possibly less important).
>>>>
>>>> In the general case what are code contributors responsibilities to
>>>> adding
>>>> code regarding OSes besides Linux ?
>>>> What OSes does jenkins test on ?
>>>> I guess maintainers of code on non-tested platforms are responsible for
>>>> their own testing ?
>>>>
>>>> How do we avoid the ping-pong effect, i.e. I make a generic change to
>>>> code
>>>> which breaks on Windows, then the Windows maintainer reverts changes to
>>>> break on Solaris for example ? Or does this not happen in actuality ?
>>>>
>>>>
>>>> On 12/11/2014 11:25 PM, Asokan, M wrote:
>>>> Hi Malcom,
>>>>       The Windows versions of strerror() and strerror_s() functions are
>>>> probably meant for ANSI C library functions that set errno.  For core
>>>> Windows API calls (like UNIX system calls), one gets the error number by
>>>> calling GetLastError() function.  In the code snippet I sent earlier,
>>>> the
>>>> "code" argument is the value returned by GetLastError(). Neither
>>>> strerror()
>>>> nor strerror_s() will give the correct error message for this error
>>>> code.
>>>>
>>>> You could probably look at libwinutils.c in Hadoop source.  It uses
>>>> FormatMessageW (which returns messages in Unicode.)  My requirement
>>>> was to
>>>> return messages in current system locale.
>>>>
>>>> -- Asokan
>>>> ________________________________________
>>>> From: malcolm
>>>> [malcolm.kavalsky@oracle.com<ma...@oracle.com>]
>>>> Sent: Thursday, December 11, 2014 4:04 PM
>>>> To:common-dev@hadoop.apache.org<ma...@hadoop.apache.org>
>>>> Subject: Re: Solaris Port
>>>>
>>>> Hi Asok,
>>>>
>>>> I googled and found that windows has strerror, and strerror_s (which is
>>>> the strerror_r equivalent).
>>>> Is there a reason why you didn't use this call ?
>>>>
>>>> On 12/11/2014 06:27 PM, Asokan, M wrote:
>>>> Hi Malcom,
>>>>         Recently, I had to work on a function to get system error
>>>> message on
>>>> various systems.  Here is the piece of code I came up with. Hope it
>>>> helps.
>>>>
>>>> static void get_system_error_message(char * buf, int buf_len, int code)
>>>> {
>>>> #if defined(_WIN32)
>>>>          LPVOID lpMsgBuf;
>>>>          DWORD status = FormatMessage(FORMAT_MESSAGE_ALLOCATE_BUFFER |
>>>>                                       FORMAT_MESSAGE_FROM_SYSTEM |
>>>> FORMAT_MESSAGE_IGNORE_INSERTS,
>>>>                                       NULL, code,
>>>>                                       MAKELANGID(LANG_NEUTRAL,
>>>> SUBLANG_DEFAULT),
>>>>                                                               /* Default
>>>> language */
>>>>                                       (LPTSTR) &lpMsgBuf, 0, NULL);
>>>>          if (status > 0)
>>>>          {
>>>>              strncpy(buf, (char *)lpMsgBuf, buf_len-1);
>>>>              buf[buf_len-1] = '\0';
>>>>              /* Free the buffer returned by system */
>>>>              LocalFree(lpMsgBuf);
>>>>          }
>>>>          else
>>>>          {
>>>>              _snprintf(buf, buf_len-1 , "%s %d",
>>>>                  "Can't get system error message for code", code);
>>>>              buf[buf_len-1] = '\0';
>>>>          }
>>>> #else
>>>> #if defined(_HPUX_SOURCE)
>>>>          {
>>>>              char * msg;
>>>>              errno = 0;
>>>>              msg = strerror(code);
>>>>              if (errno == 0)
>>>>              {
>>>>                  strncpy(buf, msg, buf_len-1);
>>>>                  buf[buf_len-1] = '\0';
>>>>              }
>>>>              else
>>>>              {
>>>>                  snprintf(buf, buf_len, "%s %d",
>>>>                      "Can't get system error message for code", code);
>>>>              }
>>>>          }
>>>> #else
>>>>          if (strerror_r(code, buf, buf_len) != 0)
>>>>          {
>>>>              snprintf(buf, buf_len, "%s %d",
>>>>                  "Can't get system error message for code", code);
>>>>          }
>>>> #endif
>>>> #endif
>>>> }
>>>>
>>>> Note that HPUX does not have strerror_r() since strerror() itself is
>>>> thread-safe.  Also Windows does not have snprintf().  The equivalent
>>>> function _snprintf() has a subtle difference in its interface.
>>>>
>>>> -- Asokan
>>>> ________________________________________
>>>> From: malcolm
>>>> [malcolm.kavalsky@oracle.com<ma...@oracle.com>]
>>>> Sent: Thursday, December 11, 2014 11:02 AM
>>>> To:common-dev@hadoop.apache.org<ma...@hadoop.apache.org>
>>>> Subject: Re: Solaris Port
>>>>
>>>> Fine with me, I volunteer to do this, if accepted.
>>>>
>>>> On 12/11/2014 05:48 PM, Allen Wittenauer wrote:
>>>> sys_errlist was removed for a reason.  Creating a fake sys_errlist on
>>>> Solaris will mean the libhadoop.so will need to be tied a specific build
>>>> (kernel/include pairing) and therefore limits upward
>>>> mobility/compatibility.
>>>> That doesn’t seem like a very good idea.
>>>>
>>>> IMO, switching to strerror_r is much preferred, since other than the
>>>> brain-dead GNU libc version, is highly portable and should work
>>>> regardless
>>>> of the kernel or OS in place.
>>>>
>>>> On Dec 11, 2014, at 5:20 AM,
>>>> malcolm<ma...@oracle.com>
>>>> wrote:
>>>>
>>>> FYI, there are a couple more files that reference sys_errlist directly
>>>> (not just terror within exception.c) , but also hdfs_http_client.c and
>>>> NativeiO.c
>>>>
>>>> On 12/11/2014 07:38 AM, malcolm wrote:
>>>> Hi Colin,
>>>>
>>>> Exactly, as you noticed, the problem is the thread-local buffer needed
>>>> to return from terror.
>>>> Currently, terror just returns a static string from an array, this is
>>>> fast, simple and error-proof.
>>>>
>>>> In order to use strerror_r inside terror,  would require allocating a
>>>> buffer inside terror  and depend on the caller to free the buffer after
>>>> using it, or to pass a buffer to terrror (which is basically the same as
>>>> strerror_r, rendering terror redundant).
>>>> Both cases require modification outside terror itself, as far as I can
>>>> tell, no simple fix. Unless you have an alternative which I haven't
>>>> thought
>>>> of ?
>>>>
>>>> As far as I can tell, we have two choices:
>>>>
>>>> 1. Remove terror and replace calls with strerror_r, passing a buffer
>>>> from the callee.
>>>>          Advantage: a more modern portable interface.
>>>>          Disadvantage: All calls to terror need to be modified, though
>>>> all seem to be in a few files as far as I can tell.
>>>>
>>>> 2. Adding a sys_errlist array (ifdeffed for Solaris)
>>>>          Advantage: no change to any calls to terror
>>>>          Disadvantage: 2 additional files added to source tree (.c and
>>>> .h) and some minor ifdefs only used for Solaris.
>>>>
>>>> I think it is more a question of style than anything else, so I leave
>>>> you to make the call.
>>>>
>>>> Thanks for your patience,
>>>> Malcolm
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On 12/10/2014 09:54 PM, Colin McCabe wrote:
>>>> On Wed, Dec 10, 2014 at 2:31 AM, malcolm
>>>> <ma...@oracle.com> wrote:
>>>> Hi Colin,
>>>>
>>>> Thanks for the hints around JIRAs.
>>>>
>>>> You are correct errno still exists, however sys_errlist does not.
>>>>
>>>> Hadoop uses a function terror (defined in exception.c) which indexes
>>>> sys_errlist by errno to return the error message from the array.
>>>> This
>>>> function is called 26 times in various places (in 2.2)
>>>>
>>>> Originally, I thought to replace all calls to terror with strerror,
>>>> but
>>>> there can be issues with multi-threading (it returns a buffer which
>>>> can be
>>>> overwritten), so it seemed simpler just to recreate the sys_errlist
>>>> message
>>>> array.
>>>>
>>>> There is also a multi-threaded version strerror_r where you pass the
>>>> buffer
>>>> as a parameter, but this would necessitate changing every call to
>>>> terror
>>>> with mutiple lines of code.
>>>> Why don't you just use strerror_r inside terror()?
>>>>
>>>> I wrote that code originally.  The reason I didn't want to use
>>>> strerror_r there is because GNU libc provides a non-POSIX definition
>>>> of strerror_r, and forcing it to use the POSIX one is a pain. But you
>>>> can do it.  You also will require a thread-local buffer to hold the
>>>> return from strerror_r, since it is not guaranteed to be static
>>>> (although in practice it is 99% of the time-- another annoyance with
>>>> the API).
>>>>
>>>>
>>>> ________________________________
>>>>
>>>>
>>>>
>>>> ATTENTION: -----
>>>>
>>>> The information contained in this message (including any files
>>>> transmitted with this message) may contain proprietary, trade secret or
>>>> other confidential and/or legally privileged information. Any pricing
>>>> information contained in this message or in any files transmitted
>>>> with this
>>>> message is always confidential and cannot be shared with any third
>>>> parties
>>>> without prior written approval from Syncsort. This message is
>>>> intended to be
>>>> read only by the individual or entity to whom it is addressed or by
>>>> their
>>>> designee. If the reader of this message is not the intended
>>>> recipient, you
>>>> are on notice that any use, disclosure, copying or distribution of this
>>>> message, in any form, is strictly prohibited. If you have received this
>>>> message in error, please immediately notify the sender and/or
>>>> Syncsort and
>>>> destroy all copies of this message in your possession, custody or
>>>> control.
>>>>
>>>>
>>>>
>>
>> ________________________________
>>
>>
>>
>> ATTENTION: -----
>>
>> The information contained in this message (including any files transmitted
>> with this message) may contain proprietary, trade secret or other
>> confidential and/or legally privileged information. Any pricing information
>> contained in this message or in any files transmitted with this message is
>> always confidential and cannot be shared with any third parties without
>> prior written approval from Syncsort. This message is intended to be read
>> only by the individual or entity to whom it is addressed or by their
>> designee. If the reader of this message is not the intended recipient, you
>> are on notice that any use, disclosure, copying or distribution of this
>> message, in any form, is strictly prohibited. If you have received this
>> message in error, please immediately notify the sender and/or Syncsort and
>> destroy all copies of this message in your possession, custody or control.
>
>

Re: Solaris Port SOLVED!

Posted by malcolm <ma...@oracle.com>.
I am checking on the latest release of Solaris 11 and yes, it is still 
thread safe (or MT Safe as documented on the man page).

strerror checks the error code, and returns the same "unknown error" 
string as terror does, if it receives an invalid code. I checked this on 
Windows, Solaris and Linux (though my changes only affect Solaris 
platforms).

JIRA newbie question:

I have filed the JIRA attaching the patch  HADOOP-11403 against the 
trunk, asking for reviewers in the comments section.
Is there any other protocol I should follow ?

Thanks,
Malcolm

On 12/14/2014 01:08 AM, Asokan, M wrote:
> Malcom,
>     That's great! Is strerror() thread-safe in the recent version of Solaris?  In any case, to be correct you still need to make sure that the code passed to strerror() is a valid one.  For this you need to check errno after the call to strerror().  Please check the code snippet I sent earlier for HPUX.
>
> -- Asokan
> ________________________________________
> From: malcolm [malcolm.kavalsky@oracle.com]
> Sent: Saturday, December 13, 2014 3:13 PM
> To: common-dev@hadoop.apache.org
> Subject: Re: Solaris Port SOLVED!
>
> Wiping egg off face  ...
>
> After consulting with the Solaris team (and looking at the source code
> and man page) ,  it turns out that strerror itself on Solaris is MT-Safe
> ! (Just like HPUX)
>
> So, after all this effort, all I need to do is modify terror as follows:
>
>      const char* terror(int errnum)
>      {
>
>      #if defined(__sun)
>         return strerror(errnum); //  MT-Safe under Solaris
>      #else
>         if ((errnum < 0) || (errnum >= sys_nerr)) {
>           return "unknown error.";
>         }
>         return sys_errlist[errnum];
>      #endif
>      }
>
> And in two other files where sys_errlist is referenced directly
> (NativeIO and hdfs_http_client.c), I replaced this direct access instead
> with a call to terror.
>
> Thanks for all your help and patience,
>
> I'll file a JIRA asap,
>
> Cheers,
> Malcolm
>
> On 12/13/2014 05:26 PM, malcolm wrote:
>> Thanks Asokan,
>>
>> Looked up Gcc's thread local variables, seems a bit complex though and
>> quite specific to Gnu.
>>
>> Intialization of the static errlist array should be thread safe i.e.
>> initially the array is nulled out, and afterwards if two threads write
>> to the same address, then they would be writing the same string.
>>
>> But if we are ok with changing 5 files, not just terror, then I would
>> just remove terror completely and use strerror_r (or the alternatives
>> for Windows and HP_UX) in the caller code instead i.e. using your
>> suggestion for a local buffer in the caller, wherever needed. The more
>> I think about it, the more this seems to be the right thing to do.
>>
>> Cheers,
>> Malcolm
>>
>>
>> On 12/13/2014 04:38 PM, Asokan, M wrote:
>>> Malcom,
>>>      Gcc supports thread-local variables. See
>>>
>>> https://gcc.gnu.org/onlinedocs/gcc-3.3.1/gcc/Thread-Local.html
>>>
>>> I am not sure about native compilers on Solaris, HPUX, or AIX.
>>>
>>> In any case, I found out that the Windows native code in Hadoop seems
>>> to handle error messages properly. Here is what I found:
>>>
>>> $ find ~/work/hadoop/hadoop-trunk/ -name '*.c'|xargs grephadoop how to file a jira
>>> FormatMessage|awk -F: '{print $1}'|sort -u
>>> /home/asokan/work/hadoop/hadoop-trunk/hadoop-common-project/hadoop-common/src/main/native/src/org/apache/hadoop/io/nativeio/NativeIO.c
>>>
>>> /home/asokan/work/hadoop/hadoop-trunk/hadoop-common-project/hadoop-common/src/main/native/src/org/apache/hadoop/security/JniBasedUnixGroupsMappingWin.c
>>>
>>> /home/asokan/work/hadoop/hadoop-trunk/hadoop-common-project/hadoop-common/src/main/winutils/libwinutils.c
>>>
>>>
>>>
>>> $ find ~/work/hadoop/hadoop-trunk/ -name '*.c'|xargs grep terror|awk
>>> -F: '{print $1}'|sort -u
>>> /home/asokan/work/hadoop/hadoop-trunk/hadoop-common-project/hadoop-common/src/main/native/src/exception.c
>>>
>>> /home/asokan/work/hadoop/hadoop-trunk/hadoop-common-project/hadoop-common/src/main/native/src/org/apache/hadoop/io/nativeio/SharedFileDescriptorFactory.c
>>>
>>> /home/asokan/work/hadoop/hadoop-trunk/hadoop-common-project/hadoop-common/src/main/native/src/org/apache/hadoop/net/unix/DomainSocket.c
>>>
>>> /home/asokan/work/hadoop/hadoop-trunk/hadoop-common-project/hadoop-common/src/main/native/src/org/apache/hadoop/net/unix/DomainSocketWatcher.c
>>>
>>> /home/asokan/work/hadoop/hadoop-trunk/hadoop-common-project/hadoop-common/src/main/native/src/org/apache/hadoop/security/JniBasedUnixGroupsMapping.c
>>>
>>>
>>>
>>> This means you need not worry about the Windows version of terror().
>>> You need to change five files that contain UNIX specific native code.
>>>
>>> I have a question on your suggested implementation:
>>>
>>> How do you initialize the static errlist array in a thread-safe manner?
>>>
>>> ________________________________
>>> Here is another thread-safe implementation that I could come up with:
>>>
>>> #include <string.h>
>>> #include <stdlib.h>
>>> #include <errno.h>
>>> #include <stdio.h>
>>>
>>> #define MESSAGE_BUFFER_SIZE 256
>>>
>>> char * getSystemErrorMessage(char * buf, int buf_len, int code) {
>>> #if defined(_HPUX_SOURCE)
>>>     char * msg;
>>>     errno = 0;
>>>     msg = strerror(code);
>>>     if (errno == 0) {
>>>       strncpy(buf, msg, buf_len-1);
>>>       buf[buf_len-1] = '\0';
>>>     } else {
>>>       snprintf(buf, buf_len, "%s %d",
>>>           "Can't get system error message for code", code);
>>>     }
>>> #else
>>>     if (strerror_r(code, buf, buf_len) != 0) {
>>>       snprintf(buf, buf_len, "%s %d",
>>>           "Can't get system error message for code", code);
>>>     }
>>> #endif
>>>     return buf;
>>> }
>>>
>>> #define TERROR(code) \
>>> getSystemErrorMessage(messageBuffer, sizeof(messageBuffer), code)
>>>
>>> int main(int argc, char ** argv) {
>>>     if (argc > 1) {
>>>       char messageBuffer[MESSAGE_BUFFER_SIZE];
>>>       int code = atoi(argv[1]);
>>>
>>>       fprintf(stderr, "System error for code %s: %s\n", argv[1],
>>> TERROR(code));
>>>     }
>>>     return 0;
>>> }
>>>
>>>
>>> This changes terror to a macro TERROR and requires all functions that
>>> call TERROR macro to declare the local variable messageBuffer. Since
>>> there are only five files to modify, I think it is not a big effort.
>>> What do you think?
>>>
>>> -- Asokan
>>>
>>> On 12/13/2014 04:29 AM, malcolm wrote:
>>> Colin,
>>>
>>> I am not sure what you mean by a thread-local buffer (in native
>>> code). In Java this is pretty standard, but I couldn't find any
>>> implementation for C code.
>>>
>>> Here is the terror function:
>>>
>>>       const char* terror(int errnum)
>>>       {
>>>         if ((errnum < 0) || (errnum >= sys_nerr)) {
>>>           return "unknown error.";
>>>         }
>>>         return sys_errlist[errnum];
>>>       }
>>>
>>>
>>> The interface is identical to strerror, but the implementation is
>>> actually re-entrant since it returns a pointer to a static string.
>>>
>>> If I understand your suggestion, the new function would look like this:
>>>
>>>      const char* terror(int errnum)
>>>      {
>>>         static char result[65];
>>>
>>>         strerror_r(errnum, result, 64);
>>>
>>>         return result;
>>>      }
>>>
>>> No need for snprintf, strerror_r  has the 'n' bounding built-in.
>>>
>>> Of course, this is still non-re-entrant, so unless the caller copies
>>> the returned buffer, before the function is called again, there is a
>>> problem.
>>>
>>> After considerable thought, I have come up with this version of
>>> terror, tested OK on Windows, Linux and Solaris:
>>>
>>>      #if defined(_WIN32)
>>>      #define strerror_r(errno,buf,len) strerror_s(buf,len,errno)
>>>      #endif
>>>
>>>      #define MAX_ERRORS 256
>>>      #define MAX_ERROR_LEN 80
>>>
>>>      char *terror(int errnum)
>>>      {
>>>
>>>         static char errlist[MAX_ERRORS][MAX_ERROR_LEN+1]; // cache of
>>>      error messages
>>>
>>>         if ( errnum >= 0 && errnum < MAX_ERRORS )
>>>           {
>>>             if ( errlist[errnum][0] == 0 )
>>>               strerror_r( errnum, errlist[errnum], MAX_ERROR_LEN);
>>>
>>>             return errlist[errnum];
>>>           }
>>>         else
>>>           {
>>>             return "Unknown error";
>>>           }
>>>      }
>>>
>>> This version is portable and re-entrant.
>>>
>>> On windows, the largest errnum is 43, on Ubuntu 14.04 we have 133,
>>> and on Solaris 11.1 we get 151.
>>>
>>> If this is OK with you, I will open a jira for this.
>>>
>>>
>>> Thanks,
>>> Malcolm
>>>
>>>
>>> On 12/12/2014 11:10 PM, Colin McCabe wrote:
>>> Just use snprintf to copy the error message from strerror_r into a
>>> thread-local buffer of 64 bytes or so.  Then preserve the existing
>>> terror() interface.
>>>
>>> Can you open a jira for this?
>>>
>>> best,
>>> Colin
>>>
>>> On Thu, Dec 11, 2014 at 8:35 PM,
>>> malcolm<ma...@oracle.com>
>>> wrote:
>>> So, turns out that if I had naively changed all calls to terror or
>>> references to sys_errlist, to using strerror_r, then I would have broken
>>> code for Windows and HPUX (and possibly other OSes).
>>>
>>> If we are to assume that current code runs fine on all platforms
>>> (maybe even
>>> AIX an MacOS, for example), then any change/additions made to the
>>> code and
>>> not ifdeffed appropriately can break on other OSes. On the other
>>> hand,  too
>>> many ifdefs can pollute the code source and render it less readable
>>> (though
>>> possibly less important).
>>>
>>> In the general case what are code contributors responsibilities to
>>> adding
>>> code regarding OSes besides Linux ?
>>> What OSes does jenkins test on ?
>>> I guess maintainers of code on non-tested platforms are responsible for
>>> their own testing ?
>>>
>>> How do we avoid the ping-pong effect, i.e. I make a generic change to
>>> code
>>> which breaks on Windows, then the Windows maintainer reverts changes to
>>> break on Solaris for example ? Or does this not happen in actuality ?
>>>
>>>
>>> On 12/11/2014 11:25 PM, Asokan, M wrote:
>>> Hi Malcom,
>>>       The Windows versions of strerror() and strerror_s() functions are
>>> probably meant for ANSI C library functions that set errno.  For core
>>> Windows API calls (like UNIX system calls), one gets the error number by
>>> calling GetLastError() function.  In the code snippet I sent earlier,
>>> the
>>> "code" argument is the value returned by GetLastError(). Neither
>>> strerror()
>>> nor strerror_s() will give the correct error message for this error
>>> code.
>>>
>>> You could probably look at libwinutils.c in Hadoop source.  It uses
>>> FormatMessageW (which returns messages in Unicode.)  My requirement
>>> was to
>>> return messages in current system locale.
>>>
>>> -- Asokan
>>> ________________________________________
>>> From: malcolm
>>> [malcolm.kavalsky@oracle.com<ma...@oracle.com>]
>>> Sent: Thursday, December 11, 2014 4:04 PM
>>> To:common-dev@hadoop.apache.org<ma...@hadoop.apache.org>
>>> Subject: Re: Solaris Port
>>>
>>> Hi Asok,
>>>
>>> I googled and found that windows has strerror, and strerror_s (which is
>>> the strerror_r equivalent).
>>> Is there a reason why you didn't use this call ?
>>>
>>> On 12/11/2014 06:27 PM, Asokan, M wrote:
>>> Hi Malcom,
>>>         Recently, I had to work on a function to get system error
>>> message on
>>> various systems.  Here is the piece of code I came up with. Hope it
>>> helps.
>>>
>>> static void get_system_error_message(char * buf, int buf_len, int code)
>>> {
>>> #if defined(_WIN32)
>>>          LPVOID lpMsgBuf;
>>>          DWORD status = FormatMessage(FORMAT_MESSAGE_ALLOCATE_BUFFER |
>>>                                       FORMAT_MESSAGE_FROM_SYSTEM |
>>> FORMAT_MESSAGE_IGNORE_INSERTS,
>>>                                       NULL, code,
>>>                                       MAKELANGID(LANG_NEUTRAL,
>>> SUBLANG_DEFAULT),
>>>                                                               /* Default
>>> language */
>>>                                       (LPTSTR) &lpMsgBuf, 0, NULL);
>>>          if (status > 0)
>>>          {
>>>              strncpy(buf, (char *)lpMsgBuf, buf_len-1);
>>>              buf[buf_len-1] = '\0';
>>>              /* Free the buffer returned by system */
>>>              LocalFree(lpMsgBuf);
>>>          }
>>>          else
>>>          {
>>>              _snprintf(buf, buf_len-1 , "%s %d",
>>>                  "Can't get system error message for code", code);
>>>              buf[buf_len-1] = '\0';
>>>          }
>>> #else
>>> #if defined(_HPUX_SOURCE)
>>>          {
>>>              char * msg;
>>>              errno = 0;
>>>              msg = strerror(code);
>>>              if (errno == 0)
>>>              {
>>>                  strncpy(buf, msg, buf_len-1);
>>>                  buf[buf_len-1] = '\0';
>>>              }
>>>              else
>>>              {
>>>                  snprintf(buf, buf_len, "%s %d",
>>>                      "Can't get system error message for code", code);
>>>              }
>>>          }
>>> #else
>>>          if (strerror_r(code, buf, buf_len) != 0)
>>>          {
>>>              snprintf(buf, buf_len, "%s %d",
>>>                  "Can't get system error message for code", code);
>>>          }
>>> #endif
>>> #endif
>>> }
>>>
>>> Note that HPUX does not have strerror_r() since strerror() itself is
>>> thread-safe.  Also Windows does not have snprintf().  The equivalent
>>> function _snprintf() has a subtle difference in its interface.
>>>
>>> -- Asokan
>>> ________________________________________
>>> From: malcolm
>>> [malcolm.kavalsky@oracle.com<ma...@oracle.com>]
>>> Sent: Thursday, December 11, 2014 11:02 AM
>>> To:common-dev@hadoop.apache.org<ma...@hadoop.apache.org>
>>> Subject: Re: Solaris Port
>>>
>>> Fine with me, I volunteer to do this, if accepted.
>>>
>>> On 12/11/2014 05:48 PM, Allen Wittenauer wrote:
>>> sys_errlist was removed for a reason.  Creating a fake sys_errlist on
>>> Solaris will mean the libhadoop.so will need to be tied a specific build
>>> (kernel/include pairing) and therefore limits upward
>>> mobility/compatibility.
>>> That doesn’t seem like a very good idea.
>>>
>>> IMO, switching to strerror_r is much preferred, since other than the
>>> brain-dead GNU libc version, is highly portable and should work
>>> regardless
>>> of the kernel or OS in place.
>>>
>>> On Dec 11, 2014, at 5:20 AM,
>>> malcolm<ma...@oracle.com>
>>> wrote:
>>>
>>> FYI, there are a couple more files that reference sys_errlist directly
>>> (not just terror within exception.c) , but also hdfs_http_client.c and
>>> NativeiO.c
>>>
>>> On 12/11/2014 07:38 AM, malcolm wrote:
>>> Hi Colin,
>>>
>>> Exactly, as you noticed, the problem is the thread-local buffer needed
>>> to return from terror.
>>> Currently, terror just returns a static string from an array, this is
>>> fast, simple and error-proof.
>>>
>>> In order to use strerror_r inside terror,  would require allocating a
>>> buffer inside terror  and depend on the caller to free the buffer after
>>> using it, or to pass a buffer to terrror (which is basically the same as
>>> strerror_r, rendering terror redundant).
>>> Both cases require modification outside terror itself, as far as I can
>>> tell, no simple fix. Unless you have an alternative which I haven't
>>> thought
>>> of ?
>>>
>>> As far as I can tell, we have two choices:
>>>
>>> 1. Remove terror and replace calls with strerror_r, passing a buffer
>>> from the callee.
>>>          Advantage: a more modern portable interface.
>>>          Disadvantage: All calls to terror need to be modified, though
>>> all seem to be in a few files as far as I can tell.
>>>
>>> 2. Adding a sys_errlist array (ifdeffed for Solaris)
>>>          Advantage: no change to any calls to terror
>>>          Disadvantage: 2 additional files added to source tree (.c and
>>> .h) and some minor ifdefs only used for Solaris.
>>>
>>> I think it is more a question of style than anything else, so I leave
>>> you to make the call.
>>>
>>> Thanks for your patience,
>>> Malcolm
>>>
>>>
>>>
>>>
>>>
>>> On 12/10/2014 09:54 PM, Colin McCabe wrote:
>>> On Wed, Dec 10, 2014 at 2:31 AM, malcolm
>>> <ma...@oracle.com> wrote:
>>> Hi Colin,
>>>
>>> Thanks for the hints around JIRAs.
>>>
>>> You are correct errno still exists, however sys_errlist does not.
>>>
>>> Hadoop uses a function terror (defined in exception.c) which indexes
>>> sys_errlist by errno to return the error message from the array.
>>> This
>>> function is called 26 times in various places (in 2.2)
>>>
>>> Originally, I thought to replace all calls to terror with strerror,
>>> but
>>> there can be issues with multi-threading (it returns a buffer which
>>> can be
>>> overwritten), so it seemed simpler just to recreate the sys_errlist
>>> message
>>> array.
>>>
>>> There is also a multi-threaded version strerror_r where you pass the
>>> buffer
>>> as a parameter, but this would necessitate changing every call to
>>> terror
>>> with mutiple lines of code.
>>> Why don't you just use strerror_r inside terror()?
>>>
>>> I wrote that code originally.  The reason I didn't want to use
>>> strerror_r there is because GNU libc provides a non-POSIX definition
>>> of strerror_r, and forcing it to use the POSIX one is a pain. But you
>>> can do it.  You also will require a thread-local buffer to hold the
>>> return from strerror_r, since it is not guaranteed to be static
>>> (although in practice it is 99% of the time-- another annoyance with
>>> the API).
>>>
>>>
>>> ________________________________
>>>
>>>
>>>
>>> ATTENTION: -----
>>>
>>> The information contained in this message (including any files
>>> transmitted with this message) may contain proprietary, trade secret or
>>> other confidential and/or legally privileged information. Any pricing
>>> information contained in this message or in any files transmitted
>>> with this
>>> message is always confidential and cannot be shared with any third
>>> parties
>>> without prior written approval from Syncsort. This message is
>>> intended to be
>>> read only by the individual or entity to whom it is addressed or by
>>> their
>>> designee. If the reader of this message is not the intended
>>> recipient, you
>>> are on notice that any use, disclosure, copying or distribution of this
>>> message, in any form, is strictly prohibited. If you have received this
>>> message in error, please immediately notify the sender and/or
>>> Syncsort and
>>> destroy all copies of this message in your possession, custody or
>>> control.
>>>
>>>
>>>
>
> ________________________________
>
>
>
> ATTENTION: -----
>
> The information contained in this message (including any files transmitted with this message) may contain proprietary, trade secret or other confidential and/or legally privileged information. Any pricing information contained in this message or in any files transmitted with this message is always confidential and cannot be shared with any third parties without prior written approval from Syncsort. This message is intended to be read only by the individual or entity to whom it is addressed or by their designee. If the reader of this message is not the intended recipient, you are on notice that any use, disclosure, copying or distribution of this message, in any form, is strictly prohibited. If you have received this message in error, please immediately notify the sender and/or Syncsort and destroy all copies of this message in your possession, custody or control.


RE: Solaris Port SOLVED!

Posted by "Asokan, M" <ma...@syncsort.com>.
Malcom,
   That's great! Is strerror() thread-safe in the recent version of Solaris?  In any case, to be correct you still need to make sure that the code passed to strerror() is a valid one.  For this you need to check errno after the call to strerror().  Please check the code snippet I sent earlier for HPUX.

-- Asokan
________________________________________
From: malcolm [malcolm.kavalsky@oracle.com]
Sent: Saturday, December 13, 2014 3:13 PM
To: common-dev@hadoop.apache.org
Subject: Re: Solaris Port SOLVED!

Wiping egg off face  ...

After consulting with the Solaris team (and looking at the source code
and man page) ,  it turns out that strerror itself on Solaris is MT-Safe
! (Just like HPUX)

So, after all this effort, all I need to do is modify terror as follows:

    const char* terror(int errnum)
    {

    #if defined(__sun)
       return strerror(errnum); //  MT-Safe under Solaris
    #else
       if ((errnum < 0) || (errnum >= sys_nerr)) {
         return "unknown error.";
       }
       return sys_errlist[errnum];
    #endif
    }

And in two other files where sys_errlist is referenced directly
(NativeIO and hdfs_http_client.c), I replaced this direct access instead
with a call to terror.

Thanks for all your help and patience,

I'll file a JIRA asap,

Cheers,
Malcolm

On 12/13/2014 05:26 PM, malcolm wrote:
> Thanks Asokan,
>
> Looked up Gcc's thread local variables, seems a bit complex though and
> quite specific to Gnu.
>
> Intialization of the static errlist array should be thread safe i.e.
> initially the array is nulled out, and afterwards if two threads write
> to the same address, then they would be writing the same string.
>
> But if we are ok with changing 5 files, not just terror, then I would
> just remove terror completely and use strerror_r (or the alternatives
> for Windows and HP_UX) in the caller code instead i.e. using your
> suggestion for a local buffer in the caller, wherever needed. The more
> I think about it, the more this seems to be the right thing to do.
>
> Cheers,
> Malcolm
>
>
> On 12/13/2014 04:38 PM, Asokan, M wrote:
>> Malcom,
>>     Gcc supports thread-local variables. See
>>
>> https://gcc.gnu.org/onlinedocs/gcc-3.3.1/gcc/Thread-Local.html
>>
>> I am not sure about native compilers on Solaris, HPUX, or AIX.
>>
>> In any case, I found out that the Windows native code in Hadoop seems
>> to handle error messages properly. Here is what I found:
>>
>> $ find ~/work/hadoop/hadoop-trunk/ -name '*.c'|xargs grep
>> FormatMessage|awk -F: '{print $1}'|sort -u
>> /home/asokan/work/hadoop/hadoop-trunk/hadoop-common-project/hadoop-common/src/main/native/src/org/apache/hadoop/io/nativeio/NativeIO.c
>>
>> /home/asokan/work/hadoop/hadoop-trunk/hadoop-common-project/hadoop-common/src/main/native/src/org/apache/hadoop/security/JniBasedUnixGroupsMappingWin.c
>>
>> /home/asokan/work/hadoop/hadoop-trunk/hadoop-common-project/hadoop-common/src/main/winutils/libwinutils.c
>>
>>
>>
>> $ find ~/work/hadoop/hadoop-trunk/ -name '*.c'|xargs grep terror|awk
>> -F: '{print $1}'|sort -u
>> /home/asokan/work/hadoop/hadoop-trunk/hadoop-common-project/hadoop-common/src/main/native/src/exception.c
>>
>> /home/asokan/work/hadoop/hadoop-trunk/hadoop-common-project/hadoop-common/src/main/native/src/org/apache/hadoop/io/nativeio/SharedFileDescriptorFactory.c
>>
>> /home/asokan/work/hadoop/hadoop-trunk/hadoop-common-project/hadoop-common/src/main/native/src/org/apache/hadoop/net/unix/DomainSocket.c
>>
>> /home/asokan/work/hadoop/hadoop-trunk/hadoop-common-project/hadoop-common/src/main/native/src/org/apache/hadoop/net/unix/DomainSocketWatcher.c
>>
>> /home/asokan/work/hadoop/hadoop-trunk/hadoop-common-project/hadoop-common/src/main/native/src/org/apache/hadoop/security/JniBasedUnixGroupsMapping.c
>>
>>
>>
>> This means you need not worry about the Windows version of terror().
>> You need to change five files that contain UNIX specific native code.
>>
>> I have a question on your suggested implementation:
>>
>> How do you initialize the static errlist array in a thread-safe manner?
>>
>> ________________________________
>> Here is another thread-safe implementation that I could come up with:
>>
>> #include <string.h>
>> #include <stdlib.h>
>> #include <errno.h>
>> #include <stdio.h>
>>
>> #define MESSAGE_BUFFER_SIZE 256
>>
>> char * getSystemErrorMessage(char * buf, int buf_len, int code) {
>> #if defined(_HPUX_SOURCE)
>>    char * msg;
>>    errno = 0;
>>    msg = strerror(code);
>>    if (errno == 0) {
>>      strncpy(buf, msg, buf_len-1);
>>      buf[buf_len-1] = '\0';
>>    } else {
>>      snprintf(buf, buf_len, "%s %d",
>>          "Can't get system error message for code", code);
>>    }
>> #else
>>    if (strerror_r(code, buf, buf_len) != 0) {
>>      snprintf(buf, buf_len, "%s %d",
>>          "Can't get system error message for code", code);
>>    }
>> #endif
>>    return buf;
>> }
>>
>> #define TERROR(code) \
>> getSystemErrorMessage(messageBuffer, sizeof(messageBuffer), code)
>>
>> int main(int argc, char ** argv) {
>>    if (argc > 1) {
>>      char messageBuffer[MESSAGE_BUFFER_SIZE];
>>      int code = atoi(argv[1]);
>>
>>      fprintf(stderr, "System error for code %s: %s\n", argv[1],
>> TERROR(code));
>>    }
>>    return 0;
>> }
>>
>>
>> This changes terror to a macro TERROR and requires all functions that
>> call TERROR macro to declare the local variable messageBuffer. Since
>> there are only five files to modify, I think it is not a big effort.
>> What do you think?
>>
>> -- Asokan
>>
>> On 12/13/2014 04:29 AM, malcolm wrote:
>> Colin,
>>
>> I am not sure what you mean by a thread-local buffer (in native
>> code). In Java this is pretty standard, but I couldn't find any
>> implementation for C code.
>>
>> Here is the terror function:
>>
>>      const char* terror(int errnum)
>>      {
>>        if ((errnum < 0) || (errnum >= sys_nerr)) {
>>          return "unknown error.";
>>        }
>>        return sys_errlist[errnum];
>>      }
>>
>>
>> The interface is identical to strerror, but the implementation is
>> actually re-entrant since it returns a pointer to a static string.
>>
>> If I understand your suggestion, the new function would look like this:
>>
>>     const char* terror(int errnum)
>>     {
>>        static char result[65];
>>
>>        strerror_r(errnum, result, 64);
>>
>>        return result;
>>     }
>>
>> No need for snprintf, strerror_r  has the 'n' bounding built-in.
>>
>> Of course, this is still non-re-entrant, so unless the caller copies
>> the returned buffer, before the function is called again, there is a
>> problem.
>>
>> After considerable thought, I have come up with this version of
>> terror, tested OK on Windows, Linux and Solaris:
>>
>>     #if defined(_WIN32)
>>     #define strerror_r(errno,buf,len) strerror_s(buf,len,errno)
>>     #endif
>>
>>     #define MAX_ERRORS 256
>>     #define MAX_ERROR_LEN 80
>>
>>     char *terror(int errnum)
>>     {
>>
>>        static char errlist[MAX_ERRORS][MAX_ERROR_LEN+1]; // cache of
>>     error messages
>>
>>        if ( errnum >= 0 && errnum < MAX_ERRORS )
>>          {
>>            if ( errlist[errnum][0] == 0 )
>>              strerror_r( errnum, errlist[errnum], MAX_ERROR_LEN);
>>
>>            return errlist[errnum];
>>          }
>>        else
>>          {
>>            return "Unknown error";
>>          }
>>     }
>>
>> This version is portable and re-entrant.
>>
>> On windows, the largest errnum is 43, on Ubuntu 14.04 we have 133,
>> and on Solaris 11.1 we get 151.
>>
>> If this is OK with you, I will open a jira for this.
>>
>>
>> Thanks,
>> Malcolm
>>
>>
>> On 12/12/2014 11:10 PM, Colin McCabe wrote:
>> Just use snprintf to copy the error message from strerror_r into a
>> thread-local buffer of 64 bytes or so.  Then preserve the existing
>> terror() interface.
>>
>> Can you open a jira for this?
>>
>> best,
>> Colin
>>
>> On Thu, Dec 11, 2014 at 8:35 PM,
>> malcolm<ma...@oracle.com>
>> wrote:
>> So, turns out that if I had naively changed all calls to terror or
>> references to sys_errlist, to using strerror_r, then I would have broken
>> code for Windows and HPUX (and possibly other OSes).
>>
>> If we are to assume that current code runs fine on all platforms
>> (maybe even
>> AIX an MacOS, for example), then any change/additions made to the
>> code and
>> not ifdeffed appropriately can break on other OSes. On the other
>> hand,  too
>> many ifdefs can pollute the code source and render it less readable
>> (though
>> possibly less important).
>>
>> In the general case what are code contributors responsibilities to
>> adding
>> code regarding OSes besides Linux ?
>> What OSes does jenkins test on ?
>> I guess maintainers of code on non-tested platforms are responsible for
>> their own testing ?
>>
>> How do we avoid the ping-pong effect, i.e. I make a generic change to
>> code
>> which breaks on Windows, then the Windows maintainer reverts changes to
>> break on Solaris for example ? Or does this not happen in actuality ?
>>
>>
>> On 12/11/2014 11:25 PM, Asokan, M wrote:
>> Hi Malcom,
>>      The Windows versions of strerror() and strerror_s() functions are
>> probably meant for ANSI C library functions that set errno.  For core
>> Windows API calls (like UNIX system calls), one gets the error number by
>> calling GetLastError() function.  In the code snippet I sent earlier,
>> the
>> "code" argument is the value returned by GetLastError(). Neither
>> strerror()
>> nor strerror_s() will give the correct error message for this error
>> code.
>>
>> You could probably look at libwinutils.c in Hadoop source.  It uses
>> FormatMessageW (which returns messages in Unicode.)  My requirement
>> was to
>> return messages in current system locale.
>>
>> -- Asokan
>> ________________________________________
>> From: malcolm
>> [malcolm.kavalsky@oracle.com<ma...@oracle.com>]
>> Sent: Thursday, December 11, 2014 4:04 PM
>> To:common-dev@hadoop.apache.org<ma...@hadoop.apache.org>
>> Subject: Re: Solaris Port
>>
>> Hi Asok,
>>
>> I googled and found that windows has strerror, and strerror_s (which is
>> the strerror_r equivalent).
>> Is there a reason why you didn't use this call ?
>>
>> On 12/11/2014 06:27 PM, Asokan, M wrote:
>> Hi Malcom,
>>        Recently, I had to work on a function to get system error
>> message on
>> various systems.  Here is the piece of code I came up with. Hope it
>> helps.
>>
>> static void get_system_error_message(char * buf, int buf_len, int code)
>> {
>> #if defined(_WIN32)
>>         LPVOID lpMsgBuf;
>>         DWORD status = FormatMessage(FORMAT_MESSAGE_ALLOCATE_BUFFER |
>>                                      FORMAT_MESSAGE_FROM_SYSTEM |
>> FORMAT_MESSAGE_IGNORE_INSERTS,
>>                                      NULL, code,
>>                                      MAKELANGID(LANG_NEUTRAL,
>> SUBLANG_DEFAULT),
>>                                                              /* Default
>> language */
>>                                      (LPTSTR) &lpMsgBuf, 0, NULL);
>>         if (status > 0)
>>         {
>>             strncpy(buf, (char *)lpMsgBuf, buf_len-1);
>>             buf[buf_len-1] = '\0';
>>             /* Free the buffer returned by system */
>>             LocalFree(lpMsgBuf);
>>         }
>>         else
>>         {
>>             _snprintf(buf, buf_len-1 , "%s %d",
>>                 "Can't get system error message for code", code);
>>             buf[buf_len-1] = '\0';
>>         }
>> #else
>> #if defined(_HPUX_SOURCE)
>>         {
>>             char * msg;
>>             errno = 0;
>>             msg = strerror(code);
>>             if (errno == 0)
>>             {
>>                 strncpy(buf, msg, buf_len-1);
>>                 buf[buf_len-1] = '\0';
>>             }
>>             else
>>             {
>>                 snprintf(buf, buf_len, "%s %d",
>>                     "Can't get system error message for code", code);
>>             }
>>         }
>> #else
>>         if (strerror_r(code, buf, buf_len) != 0)
>>         {
>>             snprintf(buf, buf_len, "%s %d",
>>                 "Can't get system error message for code", code);
>>         }
>> #endif
>> #endif
>> }
>>
>> Note that HPUX does not have strerror_r() since strerror() itself is
>> thread-safe.  Also Windows does not have snprintf().  The equivalent
>> function _snprintf() has a subtle difference in its interface.
>>
>> -- Asokan
>> ________________________________________
>> From: malcolm
>> [malcolm.kavalsky@oracle.com<ma...@oracle.com>]
>> Sent: Thursday, December 11, 2014 11:02 AM
>> To:common-dev@hadoop.apache.org<ma...@hadoop.apache.org>
>> Subject: Re: Solaris Port
>>
>> Fine with me, I volunteer to do this, if accepted.
>>
>> On 12/11/2014 05:48 PM, Allen Wittenauer wrote:
>> sys_errlist was removed for a reason.  Creating a fake sys_errlist on
>> Solaris will mean the libhadoop.so will need to be tied a specific build
>> (kernel/include pairing) and therefore limits upward
>> mobility/compatibility.
>> That doesn’t seem like a very good idea.
>>
>> IMO, switching to strerror_r is much preferred, since other than the
>> brain-dead GNU libc version, is highly portable and should work
>> regardless
>> of the kernel or OS in place.
>>
>> On Dec 11, 2014, at 5:20 AM,
>> malcolm<ma...@oracle.com>
>> wrote:
>>
>> FYI, there are a couple more files that reference sys_errlist directly
>> (not just terror within exception.c) , but also hdfs_http_client.c and
>> NativeiO.c
>>
>> On 12/11/2014 07:38 AM, malcolm wrote:
>> Hi Colin,
>>
>> Exactly, as you noticed, the problem is the thread-local buffer needed
>> to return from terror.
>> Currently, terror just returns a static string from an array, this is
>> fast, simple and error-proof.
>>
>> In order to use strerror_r inside terror,  would require allocating a
>> buffer inside terror  and depend on the caller to free the buffer after
>> using it, or to pass a buffer to terrror (which is basically the same as
>> strerror_r, rendering terror redundant).
>> Both cases require modification outside terror itself, as far as I can
>> tell, no simple fix. Unless you have an alternative which I haven't
>> thought
>> of ?
>>
>> As far as I can tell, we have two choices:
>>
>> 1. Remove terror and replace calls with strerror_r, passing a buffer
>> from the callee.
>>         Advantage: a more modern portable interface.
>>         Disadvantage: All calls to terror need to be modified, though
>> all seem to be in a few files as far as I can tell.
>>
>> 2. Adding a sys_errlist array (ifdeffed for Solaris)
>>         Advantage: no change to any calls to terror
>>         Disadvantage: 2 additional files added to source tree (.c and
>> .h) and some minor ifdefs only used for Solaris.
>>
>> I think it is more a question of style than anything else, so I leave
>> you to make the call.
>>
>> Thanks for your patience,
>> Malcolm
>>
>>
>>
>>
>>
>> On 12/10/2014 09:54 PM, Colin McCabe wrote:
>> On Wed, Dec 10, 2014 at 2:31 AM, malcolm
>> <ma...@oracle.com> wrote:
>> Hi Colin,
>>
>> Thanks for the hints around JIRAs.
>>
>> You are correct errno still exists, however sys_errlist does not.
>>
>> Hadoop uses a function terror (defined in exception.c) which indexes
>> sys_errlist by errno to return the error message from the array.
>> This
>> function is called 26 times in various places (in 2.2)
>>
>> Originally, I thought to replace all calls to terror with strerror,
>> but
>> there can be issues with multi-threading (it returns a buffer which
>> can be
>> overwritten), so it seemed simpler just to recreate the sys_errlist
>> message
>> array.
>>
>> There is also a multi-threaded version strerror_r where you pass the
>> buffer
>> as a parameter, but this would necessitate changing every call to
>> terror
>> with mutiple lines of code.
>> Why don't you just use strerror_r inside terror()?
>>
>> I wrote that code originally.  The reason I didn't want to use
>> strerror_r there is because GNU libc provides a non-POSIX definition
>> of strerror_r, and forcing it to use the POSIX one is a pain. But you
>> can do it.  You also will require a thread-local buffer to hold the
>> return from strerror_r, since it is not guaranteed to be static
>> (although in practice it is 99% of the time-- another annoyance with
>> the API).
>>
>>
>> ________________________________
>>
>>
>>
>> ATTENTION: -----
>>
>> The information contained in this message (including any files
>> transmitted with this message) may contain proprietary, trade secret or
>> other confidential and/or legally privileged information. Any pricing
>> information contained in this message or in any files transmitted
>> with this
>> message is always confidential and cannot be shared with any third
>> parties
>> without prior written approval from Syncsort. This message is
>> intended to be
>> read only by the individual or entity to whom it is addressed or by
>> their
>> designee. If the reader of this message is not the intended
>> recipient, you
>> are on notice that any use, disclosure, copying or distribution of this
>> message, in any form, is strictly prohibited. If you have received this
>> message in error, please immediately notify the sender and/or
>> Syncsort and
>> destroy all copies of this message in your possession, custody or
>> control.
>>
>>
>>
>


________________________________



ATTENTION: -----

The information contained in this message (including any files transmitted with this message) may contain proprietary, trade secret or other confidential and/or legally privileged information. Any pricing information contained in this message or in any files transmitted with this message is always confidential and cannot be shared with any third parties without prior written approval from Syncsort. This message is intended to be read only by the individual or entity to whom it is addressed or by their designee. If the reader of this message is not the intended recipient, you are on notice that any use, disclosure, copying or distribution of this message, in any form, is strictly prohibited. If you have received this message in error, please immediately notify the sender and/or Syncsort and destroy all copies of this message in your possession, custody or control.

Re: Solaris Port SOLVED!

Posted by malcolm <ma...@oracle.com>.
Wiping egg off face  ...

After consulting with the Solaris team (and looking at the source code 
and man page) ,  it turns out that strerror itself on Solaris is MT-Safe 
! (Just like HPUX)

So, after all this effort, all I need to do is modify terror as follows:

    const char* terror(int errnum)
    {

    #if defined(__sun)
       return strerror(errnum); //  MT-Safe under Solaris
    #else
       if ((errnum < 0) || (errnum >= sys_nerr)) {
         return "unknown error.";
       }
       return sys_errlist[errnum];
    #endif
    }

And in two other files where sys_errlist is referenced directly 
(NativeIO and hdfs_http_client.c), I replaced this direct access instead 
with a call to terror.

Thanks for all your help and patience,

I'll file a JIRA asap,

Cheers,
Malcolm

On 12/13/2014 05:26 PM, malcolm wrote:
> Thanks Asokan,
>
> Looked up Gcc's thread local variables, seems a bit complex though and 
> quite specific to Gnu.
>
> Intialization of the static errlist array should be thread safe i.e. 
> initially the array is nulled out, and afterwards if two threads write 
> to the same address, then they would be writing the same string.
>
> But if we are ok with changing 5 files, not just terror, then I would 
> just remove terror completely and use strerror_r (or the alternatives 
> for Windows and HP_UX) in the caller code instead i.e. using your 
> suggestion for a local buffer in the caller, wherever needed. The more 
> I think about it, the more this seems to be the right thing to do.
>
> Cheers,
> Malcolm
>
>
> On 12/13/2014 04:38 PM, Asokan, M wrote:
>> Malcom,
>>     Gcc supports thread-local variables. See
>>
>> https://gcc.gnu.org/onlinedocs/gcc-3.3.1/gcc/Thread-Local.html
>>
>> I am not sure about native compilers on Solaris, HPUX, or AIX.
>>
>> In any case, I found out that the Windows native code in Hadoop seems 
>> to handle error messages properly. Here is what I found:
>>
>> $ find ~/work/hadoop/hadoop-trunk/ -name '*.c'|xargs grep 
>> FormatMessage|awk -F: '{print $1}'|sort -u
>> /home/asokan/work/hadoop/hadoop-trunk/hadoop-common-project/hadoop-common/src/main/native/src/org/apache/hadoop/io/nativeio/NativeIO.c 
>>
>> /home/asokan/work/hadoop/hadoop-trunk/hadoop-common-project/hadoop-common/src/main/native/src/org/apache/hadoop/security/JniBasedUnixGroupsMappingWin.c 
>>
>> /home/asokan/work/hadoop/hadoop-trunk/hadoop-common-project/hadoop-common/src/main/winutils/libwinutils.c 
>>
>>
>>
>> $ find ~/work/hadoop/hadoop-trunk/ -name '*.c'|xargs grep terror|awk 
>> -F: '{print $1}'|sort -u
>> /home/asokan/work/hadoop/hadoop-trunk/hadoop-common-project/hadoop-common/src/main/native/src/exception.c 
>>
>> /home/asokan/work/hadoop/hadoop-trunk/hadoop-common-project/hadoop-common/src/main/native/src/org/apache/hadoop/io/nativeio/SharedFileDescriptorFactory.c 
>>
>> /home/asokan/work/hadoop/hadoop-trunk/hadoop-common-project/hadoop-common/src/main/native/src/org/apache/hadoop/net/unix/DomainSocket.c 
>>
>> /home/asokan/work/hadoop/hadoop-trunk/hadoop-common-project/hadoop-common/src/main/native/src/org/apache/hadoop/net/unix/DomainSocketWatcher.c 
>>
>> /home/asokan/work/hadoop/hadoop-trunk/hadoop-common-project/hadoop-common/src/main/native/src/org/apache/hadoop/security/JniBasedUnixGroupsMapping.c 
>>
>>
>>
>> This means you need not worry about the Windows version of terror(). 
>> You need to change five files that contain UNIX specific native code.
>>
>> I have a question on your suggested implementation:
>>
>> How do you initialize the static errlist array in a thread-safe manner?
>>
>> ________________________________
>> Here is another thread-safe implementation that I could come up with:
>>
>> #include <string.h>
>> #include <stdlib.h>
>> #include <errno.h>
>> #include <stdio.h>
>>
>> #define MESSAGE_BUFFER_SIZE 256
>>
>> char * getSystemErrorMessage(char * buf, int buf_len, int code) {
>> #if defined(_HPUX_SOURCE)
>>    char * msg;
>>    errno = 0;
>>    msg = strerror(code);
>>    if (errno == 0) {
>>      strncpy(buf, msg, buf_len-1);
>>      buf[buf_len-1] = '\0';
>>    } else {
>>      snprintf(buf, buf_len, "%s %d",
>>          "Can't get system error message for code", code);
>>    }
>> #else
>>    if (strerror_r(code, buf, buf_len) != 0) {
>>      snprintf(buf, buf_len, "%s %d",
>>          "Can't get system error message for code", code);
>>    }
>> #endif
>>    return buf;
>> }
>>
>> #define TERROR(code) \
>> getSystemErrorMessage(messageBuffer, sizeof(messageBuffer), code)
>>
>> int main(int argc, char ** argv) {
>>    if (argc > 1) {
>>      char messageBuffer[MESSAGE_BUFFER_SIZE];
>>      int code = atoi(argv[1]);
>>
>>      fprintf(stderr, "System error for code %s: %s\n", argv[1], 
>> TERROR(code));
>>    }
>>    return 0;
>> }
>>
>>
>> This changes terror to a macro TERROR and requires all functions that 
>> call TERROR macro to declare the local variable messageBuffer. Since 
>> there are only five files to modify, I think it is not a big effort. 
>> What do you think?
>>
>> -- Asokan
>>
>> On 12/13/2014 04:29 AM, malcolm wrote:
>> Colin,
>>
>> I am not sure what you mean by a thread-local buffer (in native 
>> code). In Java this is pretty standard, but I couldn't find any 
>> implementation for C code.
>>
>> Here is the terror function:
>>
>>      const char* terror(int errnum)
>>      {
>>        if ((errnum < 0) || (errnum >= sys_nerr)) {
>>          return "unknown error.";
>>        }
>>        return sys_errlist[errnum];
>>      }
>>
>>
>> The interface is identical to strerror, but the implementation is 
>> actually re-entrant since it returns a pointer to a static string.
>>
>> If I understand your suggestion, the new function would look like this:
>>
>>     const char* terror(int errnum)
>>     {
>>        static char result[65];
>>
>>        strerror_r(errnum, result, 64);
>>
>>        return result;
>>     }
>>
>> No need for snprintf, strerror_r  has the 'n' bounding built-in.
>>
>> Of course, this is still non-re-entrant, so unless the caller copies 
>> the returned buffer, before the function is called again, there is a 
>> problem.
>>
>> After considerable thought, I have come up with this version of 
>> terror, tested OK on Windows, Linux and Solaris:
>>
>>     #if defined(_WIN32)
>>     #define strerror_r(errno,buf,len) strerror_s(buf,len,errno)
>>     #endif
>>
>>     #define MAX_ERRORS 256
>>     #define MAX_ERROR_LEN 80
>>
>>     char *terror(int errnum)
>>     {
>>
>>        static char errlist[MAX_ERRORS][MAX_ERROR_LEN+1]; // cache of
>>     error messages
>>
>>        if ( errnum >= 0 && errnum < MAX_ERRORS )
>>          {
>>            if ( errlist[errnum][0] == 0 )
>>              strerror_r( errnum, errlist[errnum], MAX_ERROR_LEN);
>>
>>            return errlist[errnum];
>>          }
>>        else
>>          {
>>            return "Unknown error";
>>          }
>>     }
>>
>> This version is portable and re-entrant.
>>
>> On windows, the largest errnum is 43, on Ubuntu 14.04 we have 133, 
>> and on Solaris 11.1 we get 151.
>>
>> If this is OK with you, I will open a jira for this.
>>
>>
>> Thanks,
>> Malcolm
>>
>>
>> On 12/12/2014 11:10 PM, Colin McCabe wrote:
>> Just use snprintf to copy the error message from strerror_r into a
>> thread-local buffer of 64 bytes or so.  Then preserve the existing
>> terror() interface.
>>
>> Can you open a jira for this?
>>
>> best,
>> Colin
>>
>> On Thu, Dec 11, 2014 at 8:35 PM, 
>> malcolm<ma...@oracle.com> 
>> wrote:
>> So, turns out that if I had naively changed all calls to terror or
>> references to sys_errlist, to using strerror_r, then I would have broken
>> code for Windows and HPUX (and possibly other OSes).
>>
>> If we are to assume that current code runs fine on all platforms 
>> (maybe even
>> AIX an MacOS, for example), then any change/additions made to the 
>> code and
>> not ifdeffed appropriately can break on other OSes. On the other 
>> hand,  too
>> many ifdefs can pollute the code source and render it less readable 
>> (though
>> possibly less important).
>>
>> In the general case what are code contributors responsibilities to 
>> adding
>> code regarding OSes besides Linux ?
>> What OSes does jenkins test on ?
>> I guess maintainers of code on non-tested platforms are responsible for
>> their own testing ?
>>
>> How do we avoid the ping-pong effect, i.e. I make a generic change to 
>> code
>> which breaks on Windows, then the Windows maintainer reverts changes to
>> break on Solaris for example ? Or does this not happen in actuality ?
>>
>>
>> On 12/11/2014 11:25 PM, Asokan, M wrote:
>> Hi Malcom,
>>      The Windows versions of strerror() and strerror_s() functions are
>> probably meant for ANSI C library functions that set errno.  For core
>> Windows API calls (like UNIX system calls), one gets the error number by
>> calling GetLastError() function.  In the code snippet I sent earlier, 
>> the
>> "code" argument is the value returned by GetLastError(). Neither 
>> strerror()
>> nor strerror_s() will give the correct error message for this error 
>> code.
>>
>> You could probably look at libwinutils.c in Hadoop source.  It uses
>> FormatMessageW (which returns messages in Unicode.)  My requirement 
>> was to
>> return messages in current system locale.
>>
>> -- Asokan
>> ________________________________________
>> From: malcolm 
>> [malcolm.kavalsky@oracle.com<ma...@oracle.com>]
>> Sent: Thursday, December 11, 2014 4:04 PM
>> To:common-dev@hadoop.apache.org<ma...@hadoop.apache.org>
>> Subject: Re: Solaris Port
>>
>> Hi Asok,
>>
>> I googled and found that windows has strerror, and strerror_s (which is
>> the strerror_r equivalent).
>> Is there a reason why you didn't use this call ?
>>
>> On 12/11/2014 06:27 PM, Asokan, M wrote:
>> Hi Malcom,
>>        Recently, I had to work on a function to get system error 
>> message on
>> various systems.  Here is the piece of code I came up with. Hope it 
>> helps.
>>
>> static void get_system_error_message(char * buf, int buf_len, int code)
>> {
>> #if defined(_WIN32)
>>         LPVOID lpMsgBuf;
>>         DWORD status = FormatMessage(FORMAT_MESSAGE_ALLOCATE_BUFFER |
>>                                      FORMAT_MESSAGE_FROM_SYSTEM |
>> FORMAT_MESSAGE_IGNORE_INSERTS,
>>                                      NULL, code,
>>                                      MAKELANGID(LANG_NEUTRAL,
>> SUBLANG_DEFAULT),
>>                                                              /* Default
>> language */
>>                                      (LPTSTR) &lpMsgBuf, 0, NULL);
>>         if (status > 0)
>>         {
>>             strncpy(buf, (char *)lpMsgBuf, buf_len-1);
>>             buf[buf_len-1] = '\0';
>>             /* Free the buffer returned by system */
>>             LocalFree(lpMsgBuf);
>>         }
>>         else
>>         {
>>             _snprintf(buf, buf_len-1 , "%s %d",
>>                 "Can't get system error message for code", code);
>>             buf[buf_len-1] = '\0';
>>         }
>> #else
>> #if defined(_HPUX_SOURCE)
>>         {
>>             char * msg;
>>             errno = 0;
>>             msg = strerror(code);
>>             if (errno == 0)
>>             {
>>                 strncpy(buf, msg, buf_len-1);
>>                 buf[buf_len-1] = '\0';
>>             }
>>             else
>>             {
>>                 snprintf(buf, buf_len, "%s %d",
>>                     "Can't get system error message for code", code);
>>             }
>>         }
>> #else
>>         if (strerror_r(code, buf, buf_len) != 0)
>>         {
>>             snprintf(buf, buf_len, "%s %d",
>>                 "Can't get system error message for code", code);
>>         }
>> #endif
>> #endif
>> }
>>
>> Note that HPUX does not have strerror_r() since strerror() itself is
>> thread-safe.  Also Windows does not have snprintf().  The equivalent
>> function _snprintf() has a subtle difference in its interface.
>>
>> -- Asokan
>> ________________________________________
>> From: malcolm 
>> [malcolm.kavalsky@oracle.com<ma...@oracle.com>]
>> Sent: Thursday, December 11, 2014 11:02 AM
>> To:common-dev@hadoop.apache.org<ma...@hadoop.apache.org>
>> Subject: Re: Solaris Port
>>
>> Fine with me, I volunteer to do this, if accepted.
>>
>> On 12/11/2014 05:48 PM, Allen Wittenauer wrote:
>> sys_errlist was removed for a reason.  Creating a fake sys_errlist on
>> Solaris will mean the libhadoop.so will need to be tied a specific build
>> (kernel/include pairing) and therefore limits upward 
>> mobility/compatibility.
>> That doesn’t seem like a very good idea.
>>
>> IMO, switching to strerror_r is much preferred, since other than the
>> brain-dead GNU libc version, is highly portable and should work 
>> regardless
>> of the kernel or OS in place.
>>
>> On Dec 11, 2014, at 5:20 AM, 
>> malcolm<ma...@oracle.com>
>> wrote:
>>
>> FYI, there are a couple more files that reference sys_errlist directly
>> (not just terror within exception.c) , but also hdfs_http_client.c and
>> NativeiO.c
>>
>> On 12/11/2014 07:38 AM, malcolm wrote:
>> Hi Colin,
>>
>> Exactly, as you noticed, the problem is the thread-local buffer needed
>> to return from terror.
>> Currently, terror just returns a static string from an array, this is
>> fast, simple and error-proof.
>>
>> In order to use strerror_r inside terror,  would require allocating a
>> buffer inside terror  and depend on the caller to free the buffer after
>> using it, or to pass a buffer to terrror (which is basically the same as
>> strerror_r, rendering terror redundant).
>> Both cases require modification outside terror itself, as far as I can
>> tell, no simple fix. Unless you have an alternative which I haven't 
>> thought
>> of ?
>>
>> As far as I can tell, we have two choices:
>>
>> 1. Remove terror and replace calls with strerror_r, passing a buffer
>> from the callee.
>>         Advantage: a more modern portable interface.
>>         Disadvantage: All calls to terror need to be modified, though
>> all seem to be in a few files as far as I can tell.
>>
>> 2. Adding a sys_errlist array (ifdeffed for Solaris)
>>         Advantage: no change to any calls to terror
>>         Disadvantage: 2 additional files added to source tree (.c and
>> .h) and some minor ifdefs only used for Solaris.
>>
>> I think it is more a question of style than anything else, so I leave
>> you to make the call.
>>
>> Thanks for your patience,
>> Malcolm
>>
>>
>>
>>
>>
>> On 12/10/2014 09:54 PM, Colin McCabe wrote:
>> On Wed, Dec 10, 2014 at 2:31 AM, malcolm
>> <ma...@oracle.com> wrote:
>> Hi Colin,
>>
>> Thanks for the hints around JIRAs.
>>
>> You are correct errno still exists, however sys_errlist does not.
>>
>> Hadoop uses a function terror (defined in exception.c) which indexes
>> sys_errlist by errno to return the error message from the array.
>> This
>> function is called 26 times in various places (in 2.2)
>>
>> Originally, I thought to replace all calls to terror with strerror,
>> but
>> there can be issues with multi-threading (it returns a buffer which
>> can be
>> overwritten), so it seemed simpler just to recreate the sys_errlist
>> message
>> array.
>>
>> There is also a multi-threaded version strerror_r where you pass the
>> buffer
>> as a parameter, but this would necessitate changing every call to
>> terror
>> with mutiple lines of code.
>> Why don't you just use strerror_r inside terror()?
>>
>> I wrote that code originally.  The reason I didn't want to use
>> strerror_r there is because GNU libc provides a non-POSIX definition
>> of strerror_r, and forcing it to use the POSIX one is a pain. But you
>> can do it.  You also will require a thread-local buffer to hold the
>> return from strerror_r, since it is not guaranteed to be static
>> (although in practice it is 99% of the time-- another annoyance with
>> the API).
>>
>>
>> ________________________________
>>
>>
>>
>> ATTENTION: -----
>>
>> The information contained in this message (including any files
>> transmitted with this message) may contain proprietary, trade secret or
>> other confidential and/or legally privileged information. Any pricing
>> information contained in this message or in any files transmitted 
>> with this
>> message is always confidential and cannot be shared with any third 
>> parties
>> without prior written approval from Syncsort. This message is 
>> intended to be
>> read only by the individual or entity to whom it is addressed or by 
>> their
>> designee. If the reader of this message is not the intended 
>> recipient, you
>> are on notice that any use, disclosure, copying or distribution of this
>> message, in any form, is strictly prohibited. If you have received this
>> message in error, please immediately notify the sender and/or 
>> Syncsort and
>> destroy all copies of this message in your possession, custody or 
>> control.
>>
>>
>>
>


Re: Solaris Port

Posted by malcolm <ma...@oracle.com>.
Thanks Asokan,

Looked up Gcc's thread local variables, seems a bit complex though and 
quite specific to Gnu.

Intialization of the static errlist array should be thread safe i.e. 
initially the array is nulled out, and afterwards if two threads write 
to the same address, then they would be writing the same string.

But if we are ok with changing 5 files, not just terror, then I would 
just remove terror completely and use strerror_r (or the alternatives 
for Windows and HP_UX) in the caller code instead i.e. using your 
suggestion for a local buffer in the caller, wherever needed. The more I 
think about it, the more this seems to be the right thing to do.

Cheers,
Malcolm


On 12/13/2014 04:38 PM, Asokan, M wrote:
> Malcom,
>     Gcc supports thread-local variables. See
>
> https://gcc.gnu.org/onlinedocs/gcc-3.3.1/gcc/Thread-Local.html
>
> I am not sure about native compilers on Solaris, HPUX, or AIX.
>
> In any case, I found out that the Windows native code in Hadoop seems to handle error messages properly. Here is what I found:
>
> $ find ~/work/hadoop/hadoop-trunk/ -name '*.c'|xargs grep FormatMessage|awk -F: '{print $1}'|sort -u
> /home/asokan/work/hadoop/hadoop-trunk/hadoop-common-project/hadoop-common/src/main/native/src/org/apache/hadoop/io/nativeio/NativeIO.c
> /home/asokan/work/hadoop/hadoop-trunk/hadoop-common-project/hadoop-common/src/main/native/src/org/apache/hadoop/security/JniBasedUnixGroupsMappingWin.c
> /home/asokan/work/hadoop/hadoop-trunk/hadoop-common-project/hadoop-common/src/main/winutils/libwinutils.c
>
>
> $ find ~/work/hadoop/hadoop-trunk/ -name '*.c'|xargs grep terror|awk -F: '{print $1}'|sort -u
> /home/asokan/work/hadoop/hadoop-trunk/hadoop-common-project/hadoop-common/src/main/native/src/exception.c
> /home/asokan/work/hadoop/hadoop-trunk/hadoop-common-project/hadoop-common/src/main/native/src/org/apache/hadoop/io/nativeio/SharedFileDescriptorFactory.c
> /home/asokan/work/hadoop/hadoop-trunk/hadoop-common-project/hadoop-common/src/main/native/src/org/apache/hadoop/net/unix/DomainSocket.c
> /home/asokan/work/hadoop/hadoop-trunk/hadoop-common-project/hadoop-common/src/main/native/src/org/apache/hadoop/net/unix/DomainSocketWatcher.c
> /home/asokan/work/hadoop/hadoop-trunk/hadoop-common-project/hadoop-common/src/main/native/src/org/apache/hadoop/security/JniBasedUnixGroupsMapping.c
>
>
> This means you need not worry about the Windows version of terror(). You need to change five files that contain UNIX specific native code.
>
> I have a question on your suggested implementation:
>
> How do you initialize the static errlist array in a thread-safe manner?
>
> ________________________________
> Here is another thread-safe implementation that I could come up with:
>
> #include <string.h>
> #include <stdlib.h>
> #include <errno.h>
> #include <stdio.h>
>
> #define MESSAGE_BUFFER_SIZE 256
>
> char * getSystemErrorMessage(char * buf, int buf_len, int code) {
> #if defined(_HPUX_SOURCE)
>    char * msg;
>    errno = 0;
>    msg = strerror(code);
>    if (errno == 0) {
>      strncpy(buf, msg, buf_len-1);
>      buf[buf_len-1] = '\0';
>    } else {
>      snprintf(buf, buf_len, "%s %d",
>          "Can't get system error message for code", code);
>    }
> #else
>    if (strerror_r(code, buf, buf_len) != 0) {
>      snprintf(buf, buf_len, "%s %d",
>          "Can't get system error message for code", code);
>    }
> #endif
>    return buf;
> }
>
> #define TERROR(code) \
> getSystemErrorMessage(messageBuffer, sizeof(messageBuffer), code)
>
> int main(int argc, char ** argv) {
>    if (argc > 1) {
>      char messageBuffer[MESSAGE_BUFFER_SIZE];
>      int code = atoi(argv[1]);
>
>      fprintf(stderr, "System error for code %s: %s\n", argv[1],  TERROR(code));
>    }
>    return 0;
> }
>
>
> This changes terror to a macro TERROR and requires all functions that call TERROR macro to declare the local variable messageBuffer. Since there are only five files to modify, I think it is not a big effort. What do you think?
>
> -- Asokan
>
> On 12/13/2014 04:29 AM, malcolm wrote:
> Colin,
>
> I am not sure what you mean by a thread-local buffer (in native code). In Java this is pretty standard, but I couldn't find any implementation for C code.
>
> Here is the terror function:
>
>      const char* terror(int errnum)
>      {
>        if ((errnum < 0) || (errnum >= sys_nerr)) {
>          return "unknown error.";
>        }
>        return sys_errlist[errnum];
>      }
>
>
> The interface is identical to strerror, but the implementation is actually re-entrant since it returns a pointer to a static string.
>
> If I understand your suggestion, the new function would look like this:
>
>     const char* terror(int errnum)
>     {
>        static char result[65];
>
>        strerror_r(errnum, result, 64);
>
>        return result;
>     }
>
> No need for snprintf, strerror_r  has the 'n' bounding built-in.
>
> Of course, this is still non-re-entrant, so unless the caller copies the returned buffer, before the function is called again, there is a problem.
>
> After considerable thought, I have come up with this version of terror, tested OK on Windows, Linux and Solaris:
>
>     #if defined(_WIN32)
>     #define strerror_r(errno,buf,len) strerror_s(buf,len,errno)
>     #endif
>
>     #define MAX_ERRORS 256
>     #define MAX_ERROR_LEN 80
>
>     char *terror(int errnum)
>     {
>
>        static char errlist[MAX_ERRORS][MAX_ERROR_LEN+1]; // cache of
>     error messages
>
>        if ( errnum >= 0 && errnum < MAX_ERRORS )
>          {
>            if ( errlist[errnum][0] == 0 )
>              strerror_r( errnum, errlist[errnum], MAX_ERROR_LEN);
>
>            return errlist[errnum];
>          }
>        else
>          {
>            return "Unknown error";
>          }
>     }
>
> This version is portable and re-entrant.
>
> On windows, the largest errnum is 43, on Ubuntu 14.04 we have 133, and on Solaris 11.1 we get 151.
>
> If this is OK with you, I will open a jira for this.
>
>
> Thanks,
> Malcolm
>
>
> On 12/12/2014 11:10 PM, Colin McCabe wrote:
> Just use snprintf to copy the error message from strerror_r into a
> thread-local buffer of 64 bytes or so.  Then preserve the existing
> terror() interface.
>
> Can you open a jira for this?
>
> best,
> Colin
>
> On Thu, Dec 11, 2014 at 8:35 PM, malcolm<ma...@oracle.com>  wrote:
> So, turns out that if I had naively changed all calls to terror or
> references to sys_errlist, to using strerror_r, then I would have broken
> code for Windows and HPUX (and possibly other OSes).
>
> If we are to assume that current code runs fine on all platforms (maybe even
> AIX an MacOS, for example), then any change/additions made to the code and
> not ifdeffed appropriately can break on other OSes. On the other hand,  too
> many ifdefs can pollute the code source and render it less readable (though
> possibly less important).
>
> In the general case what are code contributors responsibilities to adding
> code regarding OSes besides Linux ?
> What OSes does jenkins test on ?
> I guess maintainers of code on non-tested platforms are responsible for
> their own testing ?
>
> How do we avoid the ping-pong effect, i.e. I make a generic change to code
> which breaks on Windows, then the Windows maintainer reverts changes to
> break on Solaris for example ? Or does this not happen in actuality ?
>
>
> On 12/11/2014 11:25 PM, Asokan, M wrote:
> Hi Malcom,
>      The Windows versions of strerror() and strerror_s() functions are
> probably meant for ANSI C library functions that set errno.  For core
> Windows API calls (like UNIX system calls), one gets the error number by
> calling GetLastError() function.  In the code snippet I sent earlier, the
> "code" argument is the value returned by GetLastError().  Neither strerror()
> nor strerror_s() will give the correct error message for this error code.
>
> You could probably look at libwinutils.c in Hadoop source.  It uses
> FormatMessageW (which returns messages in Unicode.)  My requirement was to
> return messages in current system locale.
>
> -- Asokan
> ________________________________________
> From: malcolm [malcolm.kavalsky@oracle.com<ma...@oracle.com>]
> Sent: Thursday, December 11, 2014 4:04 PM
> To:common-dev@hadoop.apache.org<ma...@hadoop.apache.org>
> Subject: Re: Solaris Port
>
> Hi Asok,
>
> I googled and found that windows has strerror, and strerror_s (which is
> the strerror_r equivalent).
> Is there a reason why you didn't use this call ?
>
> On 12/11/2014 06:27 PM, Asokan, M wrote:
> Hi Malcom,
>        Recently, I had to work on a function to get system error message on
> various systems.  Here is the piece of code I came up with.  Hope it helps.
>
> static void get_system_error_message(char * buf, int buf_len, int code)
> {
> #if defined(_WIN32)
>         LPVOID lpMsgBuf;
>         DWORD status = FormatMessage(FORMAT_MESSAGE_ALLOCATE_BUFFER |
>                                      FORMAT_MESSAGE_FROM_SYSTEM |
>                                      FORMAT_MESSAGE_IGNORE_INSERTS,
>                                      NULL, code,
>                                      MAKELANGID(LANG_NEUTRAL,
> SUBLANG_DEFAULT),
>                                                              /* Default
> language */
>                                      (LPTSTR) &lpMsgBuf, 0, NULL);
>         if (status > 0)
>         {
>             strncpy(buf, (char *)lpMsgBuf, buf_len-1);
>             buf[buf_len-1] = '\0';
>             /* Free the buffer returned by system */
>             LocalFree(lpMsgBuf);
>         }
>         else
>         {
>             _snprintf(buf, buf_len-1 , "%s %d",
>                 "Can't get system error message for code", code);
>             buf[buf_len-1] = '\0';
>         }
> #else
> #if defined(_HPUX_SOURCE)
>         {
>             char * msg;
>             errno = 0;
>             msg = strerror(code);
>             if (errno == 0)
>             {
>                 strncpy(buf, msg, buf_len-1);
>                 buf[buf_len-1] = '\0';
>             }
>             else
>             {
>                 snprintf(buf, buf_len, "%s %d",
>                     "Can't get system error message for code", code);
>             }
>         }
> #else
>         if (strerror_r(code, buf, buf_len) != 0)
>         {
>             snprintf(buf, buf_len, "%s %d",
>                 "Can't get system error message for code", code);
>         }
> #endif
> #endif
> }
>
> Note that HPUX does not have strerror_r() since strerror() itself is
> thread-safe.  Also Windows does not have snprintf().  The equivalent
> function _snprintf() has a subtle difference in its interface.
>
> -- Asokan
> ________________________________________
> From: malcolm [malcolm.kavalsky@oracle.com<ma...@oracle.com>]
> Sent: Thursday, December 11, 2014 11:02 AM
> To:common-dev@hadoop.apache.org<ma...@hadoop.apache.org>
> Subject: Re: Solaris Port
>
> Fine with me, I volunteer to do this, if accepted.
>
> On 12/11/2014 05:48 PM, Allen Wittenauer wrote:
> sys_errlist was removed for a reason.  Creating a fake sys_errlist on
> Solaris will mean the libhadoop.so will need to be tied a specific build
> (kernel/include pairing) and therefore limits upward mobility/compatibility.
> That doesn’t seem like a very good idea.
>
> IMO, switching to strerror_r is much preferred, since other than the
> brain-dead GNU libc version, is highly portable and should work regardless
> of the kernel or OS in place.
>
> On Dec 11, 2014, at 5:20 AM, malcolm<ma...@oracle.com>
> wrote:
>
> FYI, there are a couple more files that reference sys_errlist directly
> (not just terror within exception.c) , but also hdfs_http_client.c and
> NativeiO.c
>
> On 12/11/2014 07:38 AM, malcolm wrote:
> Hi Colin,
>
> Exactly, as you noticed, the problem is the thread-local buffer needed
> to return from terror.
> Currently, terror just returns a static string from an array, this is
> fast, simple and error-proof.
>
> In order to use strerror_r inside terror,  would require allocating a
> buffer inside terror  and depend on the caller to free the buffer after
> using it, or to pass a buffer to terrror (which is basically the same as
> strerror_r, rendering terror redundant).
> Both cases require modification outside terror itself, as far as I can
> tell, no simple fix. Unless you have an alternative which I haven't thought
> of ?
>
> As far as I can tell, we have two choices:
>
> 1. Remove terror and replace calls with strerror_r, passing a buffer
> from the callee.
>         Advantage: a more modern portable interface.
>         Disadvantage: All calls to terror need to be modified, though
> all seem to be in a few files as far as I can tell.
>
> 2. Adding a sys_errlist array (ifdeffed for Solaris)
>         Advantage: no change to any calls to terror
>         Disadvantage: 2 additional files added to source tree (.c and
> .h) and some minor ifdefs only used for Solaris.
>
> I think it is more a question of style than anything else, so I leave
> you to make the call.
>
> Thanks for your patience,
> Malcolm
>
>
>
>
>
> On 12/10/2014 09:54 PM, Colin McCabe wrote:
> On Wed, Dec 10, 2014 at 2:31 AM, malcolm
> <ma...@oracle.com>  wrote:
> Hi Colin,
>
> Thanks for the hints around JIRAs.
>
> You are correct errno still exists, however sys_errlist does not.
>
> Hadoop uses a function terror (defined in exception.c) which indexes
> sys_errlist by errno to return the error message from the array.
> This
> function is called 26 times in various places (in 2.2)
>
> Originally, I thought to replace all calls to terror with strerror,
> but
> there can be issues with multi-threading (it returns a buffer which
> can be
> overwritten), so it seemed simpler just to recreate the sys_errlist
> message
> array.
>
> There is also a multi-threaded version strerror_r where you pass the
> buffer
> as a parameter, but this would necessitate changing every call to
> terror
> with mutiple lines of code.
> Why don't you just use strerror_r inside terror()?
>
> I wrote that code originally.  The reason I didn't want to use
> strerror_r there is because GNU libc provides a non-POSIX definition
> of strerror_r, and forcing it to use the POSIX one is a pain. But you
> can do it.  You also will require a thread-local buffer to hold the
> return from strerror_r, since it is not guaranteed to be static
> (although in practice it is 99% of the time-- another annoyance with
> the API).
>
>
> ________________________________
>
>
>
> ATTENTION: -----
>
> The information contained in this message (including any files
> transmitted with this message) may contain proprietary, trade secret or
> other confidential and/or legally privileged information. Any pricing
> information contained in this message or in any files transmitted with this
> message is always confidential and cannot be shared with any third parties
> without prior written approval from Syncsort. This message is intended to be
> read only by the individual or entity to whom it is addressed or by their
> designee. If the reader of this message is not the intended recipient, you
> are on notice that any use, disclosure, copying or distribution of this
> message, in any form, is strictly prohibited. If you have received this
> message in error, please immediately notify the sender and/or Syncsort and
> destroy all copies of this message in your possession, custody or control.
>
>
>


Re: Solaris Port

Posted by "Asokan, M" <ma...@syncsort.com>.
Malcom,
   Gcc supports thread-local variables. See

https://gcc.gnu.org/onlinedocs/gcc-3.3.1/gcc/Thread-Local.html

I am not sure about native compilers on Solaris, HPUX, or AIX.

In any case, I found out that the Windows native code in Hadoop seems to handle error messages properly. Here is what I found:

$ find ~/work/hadoop/hadoop-trunk/ -name '*.c'|xargs grep FormatMessage|awk -F: '{print $1}'|sort -u
/home/asokan/work/hadoop/hadoop-trunk/hadoop-common-project/hadoop-common/src/main/native/src/org/apache/hadoop/io/nativeio/NativeIO.c
/home/asokan/work/hadoop/hadoop-trunk/hadoop-common-project/hadoop-common/src/main/native/src/org/apache/hadoop/security/JniBasedUnixGroupsMappingWin.c
/home/asokan/work/hadoop/hadoop-trunk/hadoop-common-project/hadoop-common/src/main/winutils/libwinutils.c


$ find ~/work/hadoop/hadoop-trunk/ -name '*.c'|xargs grep terror|awk -F: '{print $1}'|sort -u
/home/asokan/work/hadoop/hadoop-trunk/hadoop-common-project/hadoop-common/src/main/native/src/exception.c
/home/asokan/work/hadoop/hadoop-trunk/hadoop-common-project/hadoop-common/src/main/native/src/org/apache/hadoop/io/nativeio/SharedFileDescriptorFactory.c
/home/asokan/work/hadoop/hadoop-trunk/hadoop-common-project/hadoop-common/src/main/native/src/org/apache/hadoop/net/unix/DomainSocket.c
/home/asokan/work/hadoop/hadoop-trunk/hadoop-common-project/hadoop-common/src/main/native/src/org/apache/hadoop/net/unix/DomainSocketWatcher.c
/home/asokan/work/hadoop/hadoop-trunk/hadoop-common-project/hadoop-common/src/main/native/src/org/apache/hadoop/security/JniBasedUnixGroupsMapping.c


This means you need not worry about the Windows version of terror(). You need to change five files that contain UNIX specific native code.

I have a question on your suggested implementation:

How do you initialize the static errlist array in a thread-safe manner?

________________________________
Here is another thread-safe implementation that I could come up with:

#include <string.h>
#include <stdlib.h>
#include <errno.h>
#include <stdio.h>

#define MESSAGE_BUFFER_SIZE 256

char * getSystemErrorMessage(char * buf, int buf_len, int code) {
#if defined(_HPUX_SOURCE)
  char * msg;
  errno = 0;
  msg = strerror(code);
  if (errno == 0) {
    strncpy(buf, msg, buf_len-1);
    buf[buf_len-1] = '\0';
  } else {
    snprintf(buf, buf_len, "%s %d",
        "Can't get system error message for code", code);
  }
#else
  if (strerror_r(code, buf, buf_len) != 0) {
    snprintf(buf, buf_len, "%s %d",
        "Can't get system error message for code", code);
  }
#endif
  return buf;
}

#define TERROR(code) \
getSystemErrorMessage(messageBuffer, sizeof(messageBuffer), code)

int main(int argc, char ** argv) {
  if (argc > 1) {
    char messageBuffer[MESSAGE_BUFFER_SIZE];
    int code = atoi(argv[1]);

    fprintf(stderr, "System error for code %s: %s\n", argv[1],  TERROR(code));
  }
  return 0;
}


This changes terror to a macro TERROR and requires all functions that call TERROR macro to declare the local variable messageBuffer. Since there are only five files to modify, I think it is not a big effort. What do you think?

-- Asokan

On 12/13/2014 04:29 AM, malcolm wrote:
Colin,

I am not sure what you mean by a thread-local buffer (in native code). In Java this is pretty standard, but I couldn't find any implementation for C code.

Here is the terror function:

    const char* terror(int errnum)
    {
      if ((errnum < 0) || (errnum >= sys_nerr)) {
        return "unknown error.";
      }
      return sys_errlist[errnum];
    }


The interface is identical to strerror, but the implementation is actually re-entrant since it returns a pointer to a static string.

If I understand your suggestion, the new function would look like this:

   const char* terror(int errnum)
   {
      static char result[65];

      strerror_r(errnum, result, 64);

      return result;
   }

No need for snprintf, strerror_r  has the 'n' bounding built-in.

Of course, this is still non-re-entrant, so unless the caller copies the returned buffer, before the function is called again, there is a problem.

After considerable thought, I have come up with this version of terror, tested OK on Windows, Linux and Solaris:

   #if defined(_WIN32)
   #define strerror_r(errno,buf,len) strerror_s(buf,len,errno)
   #endif

   #define MAX_ERRORS 256
   #define MAX_ERROR_LEN 80

   char *terror(int errnum)
   {

      static char errlist[MAX_ERRORS][MAX_ERROR_LEN+1]; // cache of
   error messages

      if ( errnum >= 0 && errnum < MAX_ERRORS )
        {
          if ( errlist[errnum][0] == 0 )
            strerror_r( errnum, errlist[errnum], MAX_ERROR_LEN);

          return errlist[errnum];
        }
      else
        {
          return "Unknown error";
        }
   }

This version is portable and re-entrant.

On windows, the largest errnum is 43, on Ubuntu 14.04 we have 133, and on Solaris 11.1 we get 151.

If this is OK with you, I will open a jira for this.


Thanks,
Malcolm


On 12/12/2014 11:10 PM, Colin McCabe wrote:
Just use snprintf to copy the error message from strerror_r into a
thread-local buffer of 64 bytes or so.  Then preserve the existing
terror() interface.

Can you open a jira for this?

best,
Colin

On Thu, Dec 11, 2014 at 8:35 PM, malcolm<ma...@oracle.com>  wrote:
So, turns out that if I had naively changed all calls to terror or
references to sys_errlist, to using strerror_r, then I would have broken
code for Windows and HPUX (and possibly other OSes).

If we are to assume that current code runs fine on all platforms (maybe even
AIX an MacOS, for example), then any change/additions made to the code and
not ifdeffed appropriately can break on other OSes. On the other hand,  too
many ifdefs can pollute the code source and render it less readable (though
possibly less important).

In the general case what are code contributors responsibilities to adding
code regarding OSes besides Linux ?
What OSes does jenkins test on ?
I guess maintainers of code on non-tested platforms are responsible for
their own testing ?

How do we avoid the ping-pong effect, i.e. I make a generic change to code
which breaks on Windows, then the Windows maintainer reverts changes to
break on Solaris for example ? Or does this not happen in actuality ?


On 12/11/2014 11:25 PM, Asokan, M wrote:
Hi Malcom,
    The Windows versions of strerror() and strerror_s() functions are
probably meant for ANSI C library functions that set errno.  For core
Windows API calls (like UNIX system calls), one gets the error number by
calling GetLastError() function.  In the code snippet I sent earlier, the
"code" argument is the value returned by GetLastError().  Neither strerror()
nor strerror_s() will give the correct error message for this error code.

You could probably look at libwinutils.c in Hadoop source.  It uses
FormatMessageW (which returns messages in Unicode.)  My requirement was to
return messages in current system locale.

-- Asokan
________________________________________
From: malcolm [malcolm.kavalsky@oracle.com<ma...@oracle.com>]
Sent: Thursday, December 11, 2014 4:04 PM
To:common-dev@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Re: Solaris Port

Hi Asok,

I googled and found that windows has strerror, and strerror_s (which is
the strerror_r equivalent).
Is there a reason why you didn't use this call ?

On 12/11/2014 06:27 PM, Asokan, M wrote:
Hi Malcom,
      Recently, I had to work on a function to get system error message on
various systems.  Here is the piece of code I came up with.  Hope it helps.

static void get_system_error_message(char * buf, int buf_len, int code)
{
#if defined(_WIN32)
       LPVOID lpMsgBuf;
       DWORD status = FormatMessage(FORMAT_MESSAGE_ALLOCATE_BUFFER |
                                    FORMAT_MESSAGE_FROM_SYSTEM |
                                    FORMAT_MESSAGE_IGNORE_INSERTS,
                                    NULL, code,
                                    MAKELANGID(LANG_NEUTRAL,
SUBLANG_DEFAULT),
                                                            /* Default
language */
                                    (LPTSTR) &lpMsgBuf, 0, NULL);
       if (status > 0)
       {
           strncpy(buf, (char *)lpMsgBuf, buf_len-1);
           buf[buf_len-1] = '\0';
           /* Free the buffer returned by system */
           LocalFree(lpMsgBuf);
       }
       else
       {
           _snprintf(buf, buf_len-1 , "%s %d",
               "Can't get system error message for code", code);
           buf[buf_len-1] = '\0';
       }
#else
#if defined(_HPUX_SOURCE)
       {
           char * msg;
           errno = 0;
           msg = strerror(code);
           if (errno == 0)
           {
               strncpy(buf, msg, buf_len-1);
               buf[buf_len-1] = '\0';
           }
           else
           {
               snprintf(buf, buf_len, "%s %d",
                   "Can't get system error message for code", code);
           }
       }
#else
       if (strerror_r(code, buf, buf_len) != 0)
       {
           snprintf(buf, buf_len, "%s %d",
               "Can't get system error message for code", code);
       }
#endif
#endif
}

Note that HPUX does not have strerror_r() since strerror() itself is
thread-safe.  Also Windows does not have snprintf().  The equivalent
function _snprintf() has a subtle difference in its interface.

-- Asokan
________________________________________
From: malcolm [malcolm.kavalsky@oracle.com<ma...@oracle.com>]
Sent: Thursday, December 11, 2014 11:02 AM
To:common-dev@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Re: Solaris Port

Fine with me, I volunteer to do this, if accepted.

On 12/11/2014 05:48 PM, Allen Wittenauer wrote:
sys_errlist was removed for a reason.  Creating a fake sys_errlist on
Solaris will mean the libhadoop.so will need to be tied a specific build
(kernel/include pairing) and therefore limits upward mobility/compatibility.
That doesn’t seem like a very good idea.

IMO, switching to strerror_r is much preferred, since other than the
brain-dead GNU libc version, is highly portable and should work regardless
of the kernel or OS in place.

On Dec 11, 2014, at 5:20 AM, malcolm<ma...@oracle.com>
wrote:

FYI, there are a couple more files that reference sys_errlist directly
(not just terror within exception.c) , but also hdfs_http_client.c and
NativeiO.c

On 12/11/2014 07:38 AM, malcolm wrote:
Hi Colin,

Exactly, as you noticed, the problem is the thread-local buffer needed
to return from terror.
Currently, terror just returns a static string from an array, this is
fast, simple and error-proof.

In order to use strerror_r inside terror,  would require allocating a
buffer inside terror  and depend on the caller to free the buffer after
using it, or to pass a buffer to terrror (which is basically the same as
strerror_r, rendering terror redundant).
Both cases require modification outside terror itself, as far as I can
tell, no simple fix. Unless you have an alternative which I haven't thought
of ?

As far as I can tell, we have two choices:

1. Remove terror and replace calls with strerror_r, passing a buffer
from the callee.
       Advantage: a more modern portable interface.
       Disadvantage: All calls to terror need to be modified, though
all seem to be in a few files as far as I can tell.

2. Adding a sys_errlist array (ifdeffed for Solaris)
       Advantage: no change to any calls to terror
       Disadvantage: 2 additional files added to source tree (.c and
.h) and some minor ifdefs only used for Solaris.

I think it is more a question of style than anything else, so I leave
you to make the call.

Thanks for your patience,
Malcolm





On 12/10/2014 09:54 PM, Colin McCabe wrote:
On Wed, Dec 10, 2014 at 2:31 AM, malcolm
<ma...@oracle.com>  wrote:
Hi Colin,

Thanks for the hints around JIRAs.

You are correct errno still exists, however sys_errlist does not.

Hadoop uses a function terror (defined in exception.c) which indexes
sys_errlist by errno to return the error message from the array.
This
function is called 26 times in various places (in 2.2)

Originally, I thought to replace all calls to terror with strerror,
but
there can be issues with multi-threading (it returns a buffer which
can be
overwritten), so it seemed simpler just to recreate the sys_errlist
message
array.

There is also a multi-threaded version strerror_r where you pass the
buffer
as a parameter, but this would necessitate changing every call to
terror
with mutiple lines of code.
Why don't you just use strerror_r inside terror()?

I wrote that code originally.  The reason I didn't want to use
strerror_r there is because GNU libc provides a non-POSIX definition
of strerror_r, and forcing it to use the POSIX one is a pain. But you
can do it.  You also will require a thread-local buffer to hold the
return from strerror_r, since it is not guaranteed to be static
(although in practice it is 99% of the time-- another annoyance with
the API).


________________________________



ATTENTION: -----

The information contained in this message (including any files
transmitted with this message) may contain proprietary, trade secret or
other confidential and/or legally privileged information. Any pricing
information contained in this message or in any files transmitted with this
message is always confidential and cannot be shared with any third parties
without prior written approval from Syncsort. This message is intended to be
read only by the individual or entity to whom it is addressed or by their
designee. If the reader of this message is not the intended recipient, you
are on notice that any use, disclosure, copying or distribution of this
message, in any form, is strictly prohibited. If you have received this
message in error, please immediately notify the sender and/or Syncsort and
destroy all copies of this message in your possession, custody or control.




Re: Solaris Port

Posted by malcolm <ma...@oracle.com>.
Colin,

I am not sure what you mean by a thread-local buffer (in native code). 
In Java this is pretty standard, but I couldn't find any implementation 
for C code.

Here is the terror function:

     const char* terror(int errnum)
     {
       if ((errnum < 0) || (errnum >= sys_nerr)) {
         return "unknown error.";
       }
       return sys_errlist[errnum];
     }


The interface is identical to strerror, but the implementation is 
actually re-entrant since it returns a pointer to a static string.

If I understand your suggestion, the new function would look like this:

    const char* terror(int errnum)
    {
       static char result[65];

       strerror_r(errnum, result, 64);

       return result;
    }

No need for snprintf, strerror_r  has the 'n' bounding built-in.

Of course, this is still non-re-entrant, so unless the caller copies the 
returned buffer, before the function is called again, there is a problem.

After considerable thought, I have come up with this version of terror, 
tested OK on Windows, Linux and Solaris:

    #if defined(_WIN32)
    #define strerror_r(errno,buf,len) strerror_s(buf,len,errno)
    #endif

    #define MAX_ERRORS 256
    #define MAX_ERROR_LEN 80

    char *terror(int errnum)
    {

       static char errlist[MAX_ERRORS][MAX_ERROR_LEN+1]; // cache of
    error messages

       if ( errnum >= 0 && errnum < MAX_ERRORS )
         {
           if ( errlist[errnum][0] == 0 )
             strerror_r( errnum, errlist[errnum], MAX_ERROR_LEN);

           return errlist[errnum];
         }
       else
         {
           return "Unknown error";
         }
    }

This version is portable and re-entrant.

On windows, the largest errnum is 43, on Ubuntu 14.04 we have 133, and 
on Solaris 11.1 we get 151.

If this is OK with you, I will open a jira for this.


Thanks,
Malcolm


On 12/12/2014 11:10 PM, Colin McCabe wrote:
> Just use snprintf to copy the error message from strerror_r into a
> thread-local buffer of 64 bytes or so.  Then preserve the existing
> terror() interface.
>
> Can you open a jira for this?
>
> best,
> Colin
>
> On Thu, Dec 11, 2014 at 8:35 PM, malcolm<ma...@oracle.com>  wrote:
>> So, turns out that if I had naively changed all calls to terror or
>> references to sys_errlist, to using strerror_r, then I would have broken
>> code for Windows and HPUX (and possibly other OSes).
>>
>> If we are to assume that current code runs fine on all platforms (maybe even
>> AIX an MacOS, for example), then any change/additions made to the code and
>> not ifdeffed appropriately can break on other OSes. On the other hand,  too
>> many ifdefs can pollute the code source and render it less readable (though
>> possibly less important).
>>
>> In the general case what are code contributors responsibilities to adding
>> code regarding OSes besides Linux ?
>> What OSes does jenkins test on ?
>> I guess maintainers of code on non-tested platforms are responsible for
>> their own testing ?
>>
>> How do we avoid the ping-pong effect, i.e. I make a generic change to code
>> which breaks on Windows, then the Windows maintainer reverts changes to
>> break on Solaris for example ? Or does this not happen in actuality ?
>>
>>
>> On 12/11/2014 11:25 PM, Asokan, M wrote:
>>> Hi Malcom,
>>>     The Windows versions of strerror() and strerror_s() functions are
>>> probably meant for ANSI C library functions that set errno.  For core
>>> Windows API calls (like UNIX system calls), one gets the error number by
>>> calling GetLastError() function.  In the code snippet I sent earlier, the
>>> "code" argument is the value returned by GetLastError().  Neither strerror()
>>> nor strerror_s() will give the correct error message for this error code.
>>>
>>> You could probably look at libwinutils.c in Hadoop source.  It uses
>>> FormatMessageW (which returns messages in Unicode.)  My requirement was to
>>> return messages in current system locale.
>>>
>>> -- Asokan
>>> ________________________________________
>>> From: malcolm [malcolm.kavalsky@oracle.com]
>>> Sent: Thursday, December 11, 2014 4:04 PM
>>> To:common-dev@hadoop.apache.org
>>> Subject: Re: Solaris Port
>>>
>>> Hi Asok,
>>>
>>> I googled and found that windows has strerror, and strerror_s (which is
>>> the strerror_r equivalent).
>>> Is there a reason why you didn't use this call ?
>>>
>>> On 12/11/2014 06:27 PM, Asokan, M wrote:
>>>> Hi Malcom,
>>>>       Recently, I had to work on a function to get system error message on
>>>> various systems.  Here is the piece of code I came up with.  Hope it helps.
>>>>
>>>> static void get_system_error_message(char * buf, int buf_len, int code)
>>>> {
>>>> #if defined(_WIN32)
>>>>        LPVOID lpMsgBuf;
>>>>        DWORD status = FormatMessage(FORMAT_MESSAGE_ALLOCATE_BUFFER |
>>>>                                     FORMAT_MESSAGE_FROM_SYSTEM |
>>>>                                     FORMAT_MESSAGE_IGNORE_INSERTS,
>>>>                                     NULL, code,
>>>>                                     MAKELANGID(LANG_NEUTRAL,
>>>> SUBLANG_DEFAULT),
>>>>                                                             /* Default
>>>> language */
>>>>                                     (LPTSTR) &lpMsgBuf, 0, NULL);
>>>>        if (status > 0)
>>>>        {
>>>>            strncpy(buf, (char *)lpMsgBuf, buf_len-1);
>>>>            buf[buf_len-1] = '\0';
>>>>            /* Free the buffer returned by system */
>>>>            LocalFree(lpMsgBuf);
>>>>        }
>>>>        else
>>>>        {
>>>>            _snprintf(buf, buf_len-1 , "%s %d",
>>>>                "Can't get system error message for code", code);
>>>>            buf[buf_len-1] = '\0';
>>>>        }
>>>> #else
>>>> #if defined(_HPUX_SOURCE)
>>>>        {
>>>>            char * msg;
>>>>            errno = 0;
>>>>            msg = strerror(code);
>>>>            if (errno == 0)
>>>>            {
>>>>                strncpy(buf, msg, buf_len-1);
>>>>                buf[buf_len-1] = '\0';
>>>>            }
>>>>            else
>>>>            {
>>>>                snprintf(buf, buf_len, "%s %d",
>>>>                    "Can't get system error message for code", code);
>>>>            }
>>>>        }
>>>> #else
>>>>        if (strerror_r(code, buf, buf_len) != 0)
>>>>        {
>>>>            snprintf(buf, buf_len, "%s %d",
>>>>                "Can't get system error message for code", code);
>>>>        }
>>>> #endif
>>>> #endif
>>>> }
>>>>
>>>> Note that HPUX does not have strerror_r() since strerror() itself is
>>>> thread-safe.  Also Windows does not have snprintf().  The equivalent
>>>> function _snprintf() has a subtle difference in its interface.
>>>>
>>>> -- Asokan
>>>> ________________________________________
>>>> From: malcolm [malcolm.kavalsky@oracle.com]
>>>> Sent: Thursday, December 11, 2014 11:02 AM
>>>> To:common-dev@hadoop.apache.org
>>>> Subject: Re: Solaris Port
>>>>
>>>> Fine with me, I volunteer to do this, if accepted.
>>>>
>>>> On 12/11/2014 05:48 PM, Allen Wittenauer wrote:
>>>>> sys_errlist was removed for a reason.  Creating a fake sys_errlist on
>>>>> Solaris will mean the libhadoop.so will need to be tied a specific build
>>>>> (kernel/include pairing) and therefore limits upward mobility/compatibility.
>>>>> That doesn’t seem like a very good idea.
>>>>>
>>>>> IMO, switching to strerror_r is much preferred, since other than the
>>>>> brain-dead GNU libc version, is highly portable and should work regardless
>>>>> of the kernel or OS in place.
>>>>>
>>>>> On Dec 11, 2014, at 5:20 AM, malcolm<ma...@oracle.com>
>>>>> wrote:
>>>>>
>>>>>> FYI, there are a couple more files that reference sys_errlist directly
>>>>>> (not just terror within exception.c) , but also hdfs_http_client.c and
>>>>>> NativeiO.c
>>>>>>
>>>>>> On 12/11/2014 07:38 AM, malcolm wrote:
>>>>>>> Hi Colin,
>>>>>>>
>>>>>>> Exactly, as you noticed, the problem is the thread-local buffer needed
>>>>>>> to return from terror.
>>>>>>> Currently, terror just returns a static string from an array, this is
>>>>>>> fast, simple and error-proof.
>>>>>>>
>>>>>>> In order to use strerror_r inside terror,  would require allocating a
>>>>>>> buffer inside terror  and depend on the caller to free the buffer after
>>>>>>> using it, or to pass a buffer to terrror (which is basically the same as
>>>>>>> strerror_r, rendering terror redundant).
>>>>>>> Both cases require modification outside terror itself, as far as I can
>>>>>>> tell, no simple fix. Unless you have an alternative which I haven't thought
>>>>>>> of ?
>>>>>>>
>>>>>>> As far as I can tell, we have two choices:
>>>>>>>
>>>>>>> 1. Remove terror and replace calls with strerror_r, passing a buffer
>>>>>>> from the callee.
>>>>>>>        Advantage: a more modern portable interface.
>>>>>>>        Disadvantage: All calls to terror need to be modified, though
>>>>>>> all seem to be in a few files as far as I can tell.
>>>>>>>
>>>>>>> 2. Adding a sys_errlist array (ifdeffed for Solaris)
>>>>>>>        Advantage: no change to any calls to terror
>>>>>>>        Disadvantage: 2 additional files added to source tree (.c and
>>>>>>> .h) and some minor ifdefs only used for Solaris.
>>>>>>>
>>>>>>> I think it is more a question of style than anything else, so I leave
>>>>>>> you to make the call.
>>>>>>>
>>>>>>> Thanks for your patience,
>>>>>>> Malcolm
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On 12/10/2014 09:54 PM, Colin McCabe wrote:
>>>>>>>> On Wed, Dec 10, 2014 at 2:31 AM, malcolm
>>>>>>>> <ma...@oracle.com>  wrote:
>>>>>>>>> Hi Colin,
>>>>>>>>>
>>>>>>>>> Thanks for the hints around JIRAs.
>>>>>>>>>
>>>>>>>>> You are correct errno still exists, however sys_errlist does not.
>>>>>>>>>
>>>>>>>>> Hadoop uses a function terror (defined in exception.c) which indexes
>>>>>>>>> sys_errlist by errno to return the error message from the array.
>>>>>>>>> This
>>>>>>>>> function is called 26 times in various places (in 2.2)
>>>>>>>>>
>>>>>>>>> Originally, I thought to replace all calls to terror with strerror,
>>>>>>>>> but
>>>>>>>>> there can be issues with multi-threading (it returns a buffer which
>>>>>>>>> can be
>>>>>>>>> overwritten), so it seemed simpler just to recreate the sys_errlist
>>>>>>>>> message
>>>>>>>>> array.
>>>>>>>>>
>>>>>>>>> There is also a multi-threaded version strerror_r where you pass the
>>>>>>>>> buffer
>>>>>>>>> as a parameter, but this would necessitate changing every call to
>>>>>>>>> terror
>>>>>>>>> with mutiple lines of code.
>>>>>>>> Why don't you just use strerror_r inside terror()?
>>>>>>>>
>>>>>>>> I wrote that code originally.  The reason I didn't want to use
>>>>>>>> strerror_r there is because GNU libc provides a non-POSIX definition
>>>>>>>> of strerror_r, and forcing it to use the POSIX one is a pain. But you
>>>>>>>> can do it.  You also will require a thread-local buffer to hold the
>>>>>>>> return from strerror_r, since it is not guaranteed to be static
>>>>>>>> (although in practice it is 99% of the time-- another annoyance with
>>>>>>>> the API).
>>>>>>>>
>>>>>>>>
>>>> ________________________________
>>>>
>>>>
>>>>
>>>> ATTENTION: -----
>>>>
>>>> The information contained in this message (including any files
>>>> transmitted with this message) may contain proprietary, trade secret or
>>>> other confidential and/or legally privileged information. Any pricing
>>>> information contained in this message or in any files transmitted with this
>>>> message is always confidential and cannot be shared with any third parties
>>>> without prior written approval from Syncsort. This message is intended to be
>>>> read only by the individual or entity to whom it is addressed or by their
>>>> designee. If the reader of this message is not the intended recipient, you
>>>> are on notice that any use, disclosure, copying or distribution of this
>>>> message, in any form, is strictly prohibited. If you have received this
>>>> message in error, please immediately notify the sender and/or Syncsort and
>>>> destroy all copies of this message in your possession, custody or control.


Re: Solaris Port

Posted by Colin McCabe <cm...@alumni.cmu.edu>.
Just use snprintf to copy the error message from strerror_r into a
thread-local buffer of 64 bytes or so.  Then preserve the existing
terror() interface.

Can you open a jira for this?

best,
Colin

On Thu, Dec 11, 2014 at 8:35 PM, malcolm <ma...@oracle.com> wrote:
> So, turns out that if I had naively changed all calls to terror or
> references to sys_errlist, to using strerror_r, then I would have broken
> code for Windows and HPUX (and possibly other OSes).
>
> If we are to assume that current code runs fine on all platforms (maybe even
> AIX an MacOS, for example), then any change/additions made to the code and
> not ifdeffed appropriately can break on other OSes. On the other hand,  too
> many ifdefs can pollute the code source and render it less readable (though
> possibly less important).
>
> In the general case what are code contributors responsibilities to adding
> code regarding OSes besides Linux ?
> What OSes does jenkins test on ?
> I guess maintainers of code on non-tested platforms are responsible for
> their own testing ?
>
> How do we avoid the ping-pong effect, i.e. I make a generic change to code
> which breaks on Windows, then the Windows maintainer reverts changes to
> break on Solaris for example ? Or does this not happen in actuality ?
>
>
> On 12/11/2014 11:25 PM, Asokan, M wrote:
>>
>> Hi Malcom,
>>    The Windows versions of strerror() and strerror_s() functions are
>> probably meant for ANSI C library functions that set errno.  For core
>> Windows API calls (like UNIX system calls), one gets the error number by
>> calling GetLastError() function.  In the code snippet I sent earlier, the
>> "code" argument is the value returned by GetLastError().  Neither strerror()
>> nor strerror_s() will give the correct error message for this error code.
>>
>> You could probably look at libwinutils.c in Hadoop source.  It uses
>> FormatMessageW (which returns messages in Unicode.)  My requirement was to
>> return messages in current system locale.
>>
>> -- Asokan
>> ________________________________________
>> From: malcolm [malcolm.kavalsky@oracle.com]
>> Sent: Thursday, December 11, 2014 4:04 PM
>> To: common-dev@hadoop.apache.org
>> Subject: Re: Solaris Port
>>
>> Hi Asok,
>>
>> I googled and found that windows has strerror, and strerror_s (which is
>> the strerror_r equivalent).
>> Is there a reason why you didn't use this call ?
>>
>> On 12/11/2014 06:27 PM, Asokan, M wrote:
>>>
>>> Hi Malcom,
>>>      Recently, I had to work on a function to get system error message on
>>> various systems.  Here is the piece of code I came up with.  Hope it helps.
>>>
>>> static void get_system_error_message(char * buf, int buf_len, int code)
>>> {
>>> #if defined(_WIN32)
>>>       LPVOID lpMsgBuf;
>>>       DWORD status = FormatMessage(FORMAT_MESSAGE_ALLOCATE_BUFFER |
>>>                                    FORMAT_MESSAGE_FROM_SYSTEM |
>>>                                    FORMAT_MESSAGE_IGNORE_INSERTS,
>>>                                    NULL, code,
>>>                                    MAKELANGID(LANG_NEUTRAL,
>>> SUBLANG_DEFAULT),
>>>                                                            /* Default
>>> language */
>>>                                    (LPTSTR) &lpMsgBuf, 0, NULL);
>>>       if (status > 0)
>>>       {
>>>           strncpy(buf, (char *)lpMsgBuf, buf_len-1);
>>>           buf[buf_len-1] = '\0';
>>>           /* Free the buffer returned by system */
>>>           LocalFree(lpMsgBuf);
>>>       }
>>>       else
>>>       {
>>>           _snprintf(buf, buf_len-1 , "%s %d",
>>>               "Can't get system error message for code", code);
>>>           buf[buf_len-1] = '\0';
>>>       }
>>> #else
>>> #if defined(_HPUX_SOURCE)
>>>       {
>>>           char * msg;
>>>           errno = 0;
>>>           msg = strerror(code);
>>>           if (errno == 0)
>>>           {
>>>               strncpy(buf, msg, buf_len-1);
>>>               buf[buf_len-1] = '\0';
>>>           }
>>>           else
>>>           {
>>>               snprintf(buf, buf_len, "%s %d",
>>>                   "Can't get system error message for code", code);
>>>           }
>>>       }
>>> #else
>>>       if (strerror_r(code, buf, buf_len) != 0)
>>>       {
>>>           snprintf(buf, buf_len, "%s %d",
>>>               "Can't get system error message for code", code);
>>>       }
>>> #endif
>>> #endif
>>> }
>>>
>>> Note that HPUX does not have strerror_r() since strerror() itself is
>>> thread-safe.  Also Windows does not have snprintf().  The equivalent
>>> function _snprintf() has a subtle difference in its interface.
>>>
>>> -- Asokan
>>> ________________________________________
>>> From: malcolm [malcolm.kavalsky@oracle.com]
>>> Sent: Thursday, December 11, 2014 11:02 AM
>>> To: common-dev@hadoop.apache.org
>>> Subject: Re: Solaris Port
>>>
>>> Fine with me, I volunteer to do this, if accepted.
>>>
>>> On 12/11/2014 05:48 PM, Allen Wittenauer wrote:
>>>>
>>>> sys_errlist was removed for a reason.  Creating a fake sys_errlist on
>>>> Solaris will mean the libhadoop.so will need to be tied a specific build
>>>> (kernel/include pairing) and therefore limits upward mobility/compatibility.
>>>> That doesn’t seem like a very good idea.
>>>>
>>>> IMO, switching to strerror_r is much preferred, since other than the
>>>> brain-dead GNU libc version, is highly portable and should work regardless
>>>> of the kernel or OS in place.
>>>>
>>>> On Dec 11, 2014, at 5:20 AM, malcolm <ma...@oracle.com>
>>>> wrote:
>>>>
>>>>> FYI, there are a couple more files that reference sys_errlist directly
>>>>> (not just terror within exception.c) , but also hdfs_http_client.c and
>>>>> NativeiO.c
>>>>>
>>>>> On 12/11/2014 07:38 AM, malcolm wrote:
>>>>>>
>>>>>> Hi Colin,
>>>>>>
>>>>>> Exactly, as you noticed, the problem is the thread-local buffer needed
>>>>>> to return from terror.
>>>>>> Currently, terror just returns a static string from an array, this is
>>>>>> fast, simple and error-proof.
>>>>>>
>>>>>> In order to use strerror_r inside terror,  would require allocating a
>>>>>> buffer inside terror  and depend on the caller to free the buffer after
>>>>>> using it, or to pass a buffer to terrror (which is basically the same as
>>>>>> strerror_r, rendering terror redundant).
>>>>>> Both cases require modification outside terror itself, as far as I can
>>>>>> tell, no simple fix. Unless you have an alternative which I haven't thought
>>>>>> of ?
>>>>>>
>>>>>> As far as I can tell, we have two choices:
>>>>>>
>>>>>> 1. Remove terror and replace calls with strerror_r, passing a buffer
>>>>>> from the callee.
>>>>>>       Advantage: a more modern portable interface.
>>>>>>       Disadvantage: All calls to terror need to be modified, though
>>>>>> all seem to be in a few files as far as I can tell.
>>>>>>
>>>>>> 2. Adding a sys_errlist array (ifdeffed for Solaris)
>>>>>>       Advantage: no change to any calls to terror
>>>>>>       Disadvantage: 2 additional files added to source tree (.c and
>>>>>> .h) and some minor ifdefs only used for Solaris.
>>>>>>
>>>>>> I think it is more a question of style than anything else, so I leave
>>>>>> you to make the call.
>>>>>>
>>>>>> Thanks for your patience,
>>>>>> Malcolm
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 12/10/2014 09:54 PM, Colin McCabe wrote:
>>>>>>>
>>>>>>> On Wed, Dec 10, 2014 at 2:31 AM, malcolm
>>>>>>> <ma...@oracle.com> wrote:
>>>>>>>>
>>>>>>>> Hi Colin,
>>>>>>>>
>>>>>>>> Thanks for the hints around JIRAs.
>>>>>>>>
>>>>>>>> You are correct errno still exists, however sys_errlist does not.
>>>>>>>>
>>>>>>>> Hadoop uses a function terror (defined in exception.c) which indexes
>>>>>>>> sys_errlist by errno to return the error message from the array.
>>>>>>>> This
>>>>>>>> function is called 26 times in various places (in 2.2)
>>>>>>>>
>>>>>>>> Originally, I thought to replace all calls to terror with strerror,
>>>>>>>> but
>>>>>>>> there can be issues with multi-threading (it returns a buffer which
>>>>>>>> can be
>>>>>>>> overwritten), so it seemed simpler just to recreate the sys_errlist
>>>>>>>> message
>>>>>>>> array.
>>>>>>>>
>>>>>>>> There is also a multi-threaded version strerror_r where you pass the
>>>>>>>> buffer
>>>>>>>> as a parameter, but this would necessitate changing every call to
>>>>>>>> terror
>>>>>>>> with mutiple lines of code.
>>>>>>>
>>>>>>> Why don't you just use strerror_r inside terror()?
>>>>>>>
>>>>>>> I wrote that code originally.  The reason I didn't want to use
>>>>>>> strerror_r there is because GNU libc provides a non-POSIX definition
>>>>>>> of strerror_r, and forcing it to use the POSIX one is a pain. But you
>>>>>>> can do it.  You also will require a thread-local buffer to hold the
>>>>>>> return from strerror_r, since it is not guaranteed to be static
>>>>>>> (although in practice it is 99% of the time-- another annoyance with
>>>>>>> the API).
>>>>>>>
>>>>>>>
>>> ________________________________
>>>
>>>
>>>
>>> ATTENTION: -----
>>>
>>> The information contained in this message (including any files
>>> transmitted with this message) may contain proprietary, trade secret or
>>> other confidential and/or legally privileged information. Any pricing
>>> information contained in this message or in any files transmitted with this
>>> message is always confidential and cannot be shared with any third parties
>>> without prior written approval from Syncsort. This message is intended to be
>>> read only by the individual or entity to whom it is addressed or by their
>>> designee. If the reader of this message is not the intended recipient, you
>>> are on notice that any use, disclosure, copying or distribution of this
>>> message, in any form, is strictly prohibited. If you have received this
>>> message in error, please immediately notify the sender and/or Syncsort and
>>> destroy all copies of this message in your possession, custody or control.
>
>

Re: Solaris Port

Posted by malcolm <ma...@oracle.com>.
So, turns out that if I had naively changed all calls to terror or 
references to sys_errlist, to using strerror_r, then I would have broken 
code for Windows and HPUX (and possibly other OSes).

If we are to assume that current code runs fine on all platforms (maybe 
even AIX an MacOS, for example), then any change/additions made to the 
code and not ifdeffed appropriately can break on other OSes. On the 
other hand,  too many ifdefs can pollute the code source and render it 
less readable (though possibly less important).

In the general case what are code contributors responsibilities to 
adding code regarding OSes besides Linux ?
What OSes does jenkins test on ?
I guess maintainers of code on non-tested platforms are responsible for 
their own testing ?

How do we avoid the ping-pong effect, i.e. I make a generic change to 
code which breaks on Windows, then the Windows maintainer reverts 
changes to break on Solaris for example ? Or does this not happen in 
actuality ?

On 12/11/2014 11:25 PM, Asokan, M wrote:
> Hi Malcom,
>    The Windows versions of strerror() and strerror_s() functions are probably meant for ANSI C library functions that set errno.  For core Windows API calls (like UNIX system calls), one gets the error number by calling GetLastError() function.  In the code snippet I sent earlier, the "code" argument is the value returned by GetLastError().  Neither strerror() nor strerror_s() will give the correct error message for this error code.
>
> You could probably look at libwinutils.c in Hadoop source.  It uses FormatMessageW (which returns messages in Unicode.)  My requirement was to return messages in current system locale.
>
> -- Asokan
> ________________________________________
> From: malcolm [malcolm.kavalsky@oracle.com]
> Sent: Thursday, December 11, 2014 4:04 PM
> To: common-dev@hadoop.apache.org
> Subject: Re: Solaris Port
>
> Hi Asok,
>
> I googled and found that windows has strerror, and strerror_s (which is
> the strerror_r equivalent).
> Is there a reason why you didn't use this call ?
>
> On 12/11/2014 06:27 PM, Asokan, M wrote:
>> Hi Malcom,
>>      Recently, I had to work on a function to get system error message on various systems.  Here is the piece of code I came up with.  Hope it helps.
>>
>> static void get_system_error_message(char * buf, int buf_len, int code)
>> {
>> #if defined(_WIN32)
>>       LPVOID lpMsgBuf;
>>       DWORD status = FormatMessage(FORMAT_MESSAGE_ALLOCATE_BUFFER |
>>                                    FORMAT_MESSAGE_FROM_SYSTEM |
>>                                    FORMAT_MESSAGE_IGNORE_INSERTS,
>>                                    NULL, code,
>>                                    MAKELANGID(LANG_NEUTRAL, SUBLANG_DEFAULT),
>>                                                            /* Default language */
>>                                    (LPTSTR) &lpMsgBuf, 0, NULL);
>>       if (status > 0)
>>       {
>>           strncpy(buf, (char *)lpMsgBuf, buf_len-1);
>>           buf[buf_len-1] = '\0';
>>           /* Free the buffer returned by system */
>>           LocalFree(lpMsgBuf);
>>       }
>>       else
>>       {
>>           _snprintf(buf, buf_len-1 , "%s %d",
>>               "Can't get system error message for code", code);
>>           buf[buf_len-1] = '\0';
>>       }
>> #else
>> #if defined(_HPUX_SOURCE)
>>       {
>>           char * msg;
>>           errno = 0;
>>           msg = strerror(code);
>>           if (errno == 0)
>>           {
>>               strncpy(buf, msg, buf_len-1);
>>               buf[buf_len-1] = '\0';
>>           }
>>           else
>>           {
>>               snprintf(buf, buf_len, "%s %d",
>>                   "Can't get system error message for code", code);
>>           }
>>       }
>> #else
>>       if (strerror_r(code, buf, buf_len) != 0)
>>       {
>>           snprintf(buf, buf_len, "%s %d",
>>               "Can't get system error message for code", code);
>>       }
>> #endif
>> #endif
>> }
>>
>> Note that HPUX does not have strerror_r() since strerror() itself is thread-safe.  Also Windows does not have snprintf().  The equivalent function _snprintf() has a subtle difference in its interface.
>>
>> -- Asokan
>> ________________________________________
>> From: malcolm [malcolm.kavalsky@oracle.com]
>> Sent: Thursday, December 11, 2014 11:02 AM
>> To: common-dev@hadoop.apache.org
>> Subject: Re: Solaris Port
>>
>> Fine with me, I volunteer to do this, if accepted.
>>
>> On 12/11/2014 05:48 PM, Allen Wittenauer wrote:
>>> sys_errlist was removed for a reason.  Creating a fake sys_errlist on Solaris will mean the libhadoop.so will need to be tied a specific build (kernel/include pairing) and therefore limits upward mobility/compatibility.  That doesn’t seem like a very good idea.
>>>
>>> IMO, switching to strerror_r is much preferred, since other than the brain-dead GNU libc version, is highly portable and should work regardless of the kernel or OS in place.
>>>
>>> On Dec 11, 2014, at 5:20 AM, malcolm <ma...@oracle.com> wrote:
>>>
>>>> FYI, there are a couple more files that reference sys_errlist directly (not just terror within exception.c) , but also hdfs_http_client.c and NativeiO.c
>>>>
>>>> On 12/11/2014 07:38 AM, malcolm wrote:
>>>>> Hi Colin,
>>>>>
>>>>> Exactly, as you noticed, the problem is the thread-local buffer needed to return from terror.
>>>>> Currently, terror just returns a static string from an array, this is fast, simple and error-proof.
>>>>>
>>>>> In order to use strerror_r inside terror,  would require allocating a buffer inside terror  and depend on the caller to free the buffer after using it, or to pass a buffer to terrror (which is basically the same as strerror_r, rendering terror redundant).
>>>>> Both cases require modification outside terror itself, as far as I can tell, no simple fix. Unless you have an alternative which I haven't thought of ?
>>>>>
>>>>> As far as I can tell, we have two choices:
>>>>>
>>>>> 1. Remove terror and replace calls with strerror_r, passing a buffer from the callee.
>>>>>       Advantage: a more modern portable interface.
>>>>>       Disadvantage: All calls to terror need to be modified, though all seem to be in a few files as far as I can tell.
>>>>>
>>>>> 2. Adding a sys_errlist array (ifdeffed for Solaris)
>>>>>       Advantage: no change to any calls to terror
>>>>>       Disadvantage: 2 additional files added to source tree (.c and .h) and some minor ifdefs only used for Solaris.
>>>>>
>>>>> I think it is more a question of style than anything else, so I leave you to make the call.
>>>>>
>>>>> Thanks for your patience,
>>>>> Malcolm
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On 12/10/2014 09:54 PM, Colin McCabe wrote:
>>>>>> On Wed, Dec 10, 2014 at 2:31 AM, malcolm <ma...@oracle.com> wrote:
>>>>>>> Hi Colin,
>>>>>>>
>>>>>>> Thanks for the hints around JIRAs.
>>>>>>>
>>>>>>> You are correct errno still exists, however sys_errlist does not.
>>>>>>>
>>>>>>> Hadoop uses a function terror (defined in exception.c) which indexes
>>>>>>> sys_errlist by errno to return the error message from the array. This
>>>>>>> function is called 26 times in various places (in 2.2)
>>>>>>>
>>>>>>> Originally, I thought to replace all calls to terror with strerror, but
>>>>>>> there can be issues with multi-threading (it returns a buffer which can be
>>>>>>> overwritten), so it seemed simpler just to recreate the sys_errlist message
>>>>>>> array.
>>>>>>>
>>>>>>> There is also a multi-threaded version strerror_r where you pass the buffer
>>>>>>> as a parameter, but this would necessitate changing every call to terror
>>>>>>> with mutiple lines of code.
>>>>>> Why don't you just use strerror_r inside terror()?
>>>>>>
>>>>>> I wrote that code originally.  The reason I didn't want to use
>>>>>> strerror_r there is because GNU libc provides a non-POSIX definition
>>>>>> of strerror_r, and forcing it to use the POSIX one is a pain. But you
>>>>>> can do it.  You also will require a thread-local buffer to hold the
>>>>>> return from strerror_r, since it is not guaranteed to be static
>>>>>> (although in practice it is 99% of the time-- another annoyance with
>>>>>> the API).
>>>>>>
>>>>>>
>> ________________________________
>>
>>
>>
>> ATTENTION: -----
>>
>> The information contained in this message (including any files transmitted with this message) may contain proprietary, trade secret or other confidential and/or legally privileged information. Any pricing information contained in this message or in any files transmitted with this message is always confidential and cannot be shared with any third parties without prior written approval from Syncsort. This message is intended to be read only by the individual or entity to whom it is addressed or by their designee. If the reader of this message is not the intended recipient, you are on notice that any use, disclosure, copying or distribution of this message, in any form, is strictly prohibited. If you have received this message in error, please immediately notify the sender and/or Syncsort and destroy all copies of this message in your possession, custody or control.


RE: Solaris Port

Posted by "Asokan, M" <ma...@syncsort.com>.
Hi Malcom,
  The Windows versions of strerror() and strerror_s() functions are probably meant for ANSI C library functions that set errno.  For core Windows API calls (like UNIX system calls), one gets the error number by calling GetLastError() function.  In the code snippet I sent earlier, the "code" argument is the value returned by GetLastError().  Neither strerror() nor strerror_s() will give the correct error message for this error code.

You could probably look at libwinutils.c in Hadoop source.  It uses FormatMessageW (which returns messages in Unicode.)  My requirement was to return messages in current system locale.

-- Asokan
________________________________________
From: malcolm [malcolm.kavalsky@oracle.com]
Sent: Thursday, December 11, 2014 4:04 PM
To: common-dev@hadoop.apache.org
Subject: Re: Solaris Port

Hi Asok,

I googled and found that windows has strerror, and strerror_s (which is
the strerror_r equivalent).
Is there a reason why you didn't use this call ?

On 12/11/2014 06:27 PM, Asokan, M wrote:
> Hi Malcom,
>     Recently, I had to work on a function to get system error message on various systems.  Here is the piece of code I came up with.  Hope it helps.
>
> static void get_system_error_message(char * buf, int buf_len, int code)
> {
> #if defined(_WIN32)
>      LPVOID lpMsgBuf;
>      DWORD status = FormatMessage(FORMAT_MESSAGE_ALLOCATE_BUFFER |
>                                   FORMAT_MESSAGE_FROM_SYSTEM |
>                                   FORMAT_MESSAGE_IGNORE_INSERTS,
>                                   NULL, code,
>                                   MAKELANGID(LANG_NEUTRAL, SUBLANG_DEFAULT),
>                                                           /* Default language */
>                                   (LPTSTR) &lpMsgBuf, 0, NULL);
>      if (status > 0)
>      {
>          strncpy(buf, (char *)lpMsgBuf, buf_len-1);
>          buf[buf_len-1] = '\0';
>          /* Free the buffer returned by system */
>          LocalFree(lpMsgBuf);
>      }
>      else
>      {
>          _snprintf(buf, buf_len-1 , "%s %d",
>              "Can't get system error message for code", code);
>          buf[buf_len-1] = '\0';
>      }
> #else
> #if defined(_HPUX_SOURCE)
>      {
>          char * msg;
>          errno = 0;
>          msg = strerror(code);
>          if (errno == 0)
>          {
>              strncpy(buf, msg, buf_len-1);
>              buf[buf_len-1] = '\0';
>          }
>          else
>          {
>              snprintf(buf, buf_len, "%s %d",
>                  "Can't get system error message for code", code);
>          }
>      }
> #else
>      if (strerror_r(code, buf, buf_len) != 0)
>      {
>          snprintf(buf, buf_len, "%s %d",
>              "Can't get system error message for code", code);
>      }
> #endif
> #endif
> }
>
> Note that HPUX does not have strerror_r() since strerror() itself is thread-safe.  Also Windows does not have snprintf().  The equivalent function _snprintf() has a subtle difference in its interface.
>
> -- Asokan
> ________________________________________
> From: malcolm [malcolm.kavalsky@oracle.com]
> Sent: Thursday, December 11, 2014 11:02 AM
> To: common-dev@hadoop.apache.org
> Subject: Re: Solaris Port
>
> Fine with me, I volunteer to do this, if accepted.
>
> On 12/11/2014 05:48 PM, Allen Wittenauer wrote:
>> sys_errlist was removed for a reason.  Creating a fake sys_errlist on Solaris will mean the libhadoop.so will need to be tied a specific build (kernel/include pairing) and therefore limits upward mobility/compatibility.  That doesn’t seem like a very good idea.
>>
>> IMO, switching to strerror_r is much preferred, since other than the brain-dead GNU libc version, is highly portable and should work regardless of the kernel or OS in place.
>>
>> On Dec 11, 2014, at 5:20 AM, malcolm <ma...@oracle.com> wrote:
>>
>>> FYI, there are a couple more files that reference sys_errlist directly (not just terror within exception.c) , but also hdfs_http_client.c and NativeiO.c
>>>
>>> On 12/11/2014 07:38 AM, malcolm wrote:
>>>> Hi Colin,
>>>>
>>>> Exactly, as you noticed, the problem is the thread-local buffer needed to return from terror.
>>>> Currently, terror just returns a static string from an array, this is fast, simple and error-proof.
>>>>
>>>> In order to use strerror_r inside terror,  would require allocating a buffer inside terror  and depend on the caller to free the buffer after using it, or to pass a buffer to terrror (which is basically the same as strerror_r, rendering terror redundant).
>>>> Both cases require modification outside terror itself, as far as I can tell, no simple fix. Unless you have an alternative which I haven't thought of ?
>>>>
>>>> As far as I can tell, we have two choices:
>>>>
>>>> 1. Remove terror and replace calls with strerror_r, passing a buffer from the callee.
>>>>      Advantage: a more modern portable interface.
>>>>      Disadvantage: All calls to terror need to be modified, though all seem to be in a few files as far as I can tell.
>>>>
>>>> 2. Adding a sys_errlist array (ifdeffed for Solaris)
>>>>      Advantage: no change to any calls to terror
>>>>      Disadvantage: 2 additional files added to source tree (.c and .h) and some minor ifdefs only used for Solaris.
>>>>
>>>> I think it is more a question of style than anything else, so I leave you to make the call.
>>>>
>>>> Thanks for your patience,
>>>> Malcolm
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On 12/10/2014 09:54 PM, Colin McCabe wrote:
>>>>> On Wed, Dec 10, 2014 at 2:31 AM, malcolm <ma...@oracle.com> wrote:
>>>>>> Hi Colin,
>>>>>>
>>>>>> Thanks for the hints around JIRAs.
>>>>>>
>>>>>> You are correct errno still exists, however sys_errlist does not.
>>>>>>
>>>>>> Hadoop uses a function terror (defined in exception.c) which indexes
>>>>>> sys_errlist by errno to return the error message from the array. This
>>>>>> function is called 26 times in various places (in 2.2)
>>>>>>
>>>>>> Originally, I thought to replace all calls to terror with strerror, but
>>>>>> there can be issues with multi-threading (it returns a buffer which can be
>>>>>> overwritten), so it seemed simpler just to recreate the sys_errlist message
>>>>>> array.
>>>>>>
>>>>>> There is also a multi-threaded version strerror_r where you pass the buffer
>>>>>> as a parameter, but this would necessitate changing every call to terror
>>>>>> with mutiple lines of code.
>>>>> Why don't you just use strerror_r inside terror()?
>>>>>
>>>>> I wrote that code originally.  The reason I didn't want to use
>>>>> strerror_r there is because GNU libc provides a non-POSIX definition
>>>>> of strerror_r, and forcing it to use the POSIX one is a pain. But you
>>>>> can do it.  You also will require a thread-local buffer to hold the
>>>>> return from strerror_r, since it is not guaranteed to be static
>>>>> (although in practice it is 99% of the time-- another annoyance with
>>>>> the API).
>>>>>
>>>>>
>
> ________________________________
>
>
>
> ATTENTION: -----
>
> The information contained in this message (including any files transmitted with this message) may contain proprietary, trade secret or other confidential and/or legally privileged information. Any pricing information contained in this message or in any files transmitted with this message is always confidential and cannot be shared with any third parties without prior written approval from Syncsort. This message is intended to be read only by the individual or entity to whom it is addressed or by their designee. If the reader of this message is not the intended recipient, you are on notice that any use, disclosure, copying or distribution of this message, in any form, is strictly prohibited. If you have received this message in error, please immediately notify the sender and/or Syncsort and destroy all copies of this message in your possession, custody or control.


Re: Solaris Port

Posted by malcolm <ma...@oracle.com>.
Hi Asok,

I googled and found that windows has strerror, and strerror_s (which is 
the strerror_r equivalent).
Is there a reason why you didn't use this call ?

On 12/11/2014 06:27 PM, Asokan, M wrote:
> Hi Malcom,
>     Recently, I had to work on a function to get system error message on various systems.  Here is the piece of code I came up with.  Hope it helps.
>
> static void get_system_error_message(char * buf, int buf_len, int code)
> {
> #if defined(_WIN32)
>      LPVOID lpMsgBuf;
>      DWORD status = FormatMessage(FORMAT_MESSAGE_ALLOCATE_BUFFER |
>                                   FORMAT_MESSAGE_FROM_SYSTEM |
>                                   FORMAT_MESSAGE_IGNORE_INSERTS,
>                                   NULL, code,
>                                   MAKELANGID(LANG_NEUTRAL, SUBLANG_DEFAULT),
>                                                           /* Default language */
>                                   (LPTSTR) &lpMsgBuf, 0, NULL);
>      if (status > 0)
>      {
>          strncpy(buf, (char *)lpMsgBuf, buf_len-1);
>          buf[buf_len-1] = '\0';
>          /* Free the buffer returned by system */
>          LocalFree(lpMsgBuf);
>      }
>      else
>      {
>          _snprintf(buf, buf_len-1 , "%s %d",
>              "Can't get system error message for code", code);
>          buf[buf_len-1] = '\0';
>      }
> #else
> #if defined(_HPUX_SOURCE)
>      {
>          char * msg;
>          errno = 0;
>          msg = strerror(code);
>          if (errno == 0)
>          {
>              strncpy(buf, msg, buf_len-1);
>              buf[buf_len-1] = '\0';
>          }
>          else
>          {
>              snprintf(buf, buf_len, "%s %d",
>                  "Can't get system error message for code", code);
>          }
>      }
> #else
>      if (strerror_r(code, buf, buf_len) != 0)
>      {
>          snprintf(buf, buf_len, "%s %d",
>              "Can't get system error message for code", code);
>      }
> #endif
> #endif
> }
>
> Note that HPUX does not have strerror_r() since strerror() itself is thread-safe.  Also Windows does not have snprintf().  The equivalent function _snprintf() has a subtle difference in its interface.
>
> -- Asokan
> ________________________________________
> From: malcolm [malcolm.kavalsky@oracle.com]
> Sent: Thursday, December 11, 2014 11:02 AM
> To: common-dev@hadoop.apache.org
> Subject: Re: Solaris Port
>
> Fine with me, I volunteer to do this, if accepted.
>
> On 12/11/2014 05:48 PM, Allen Wittenauer wrote:
>> sys_errlist was removed for a reason.  Creating a fake sys_errlist on Solaris will mean the libhadoop.so will need to be tied a specific build (kernel/include pairing) and therefore limits upward mobility/compatibility.  That doesn’t seem like a very good idea.
>>
>> IMO, switching to strerror_r is much preferred, since other than the brain-dead GNU libc version, is highly portable and should work regardless of the kernel or OS in place.
>>
>> On Dec 11, 2014, at 5:20 AM, malcolm <ma...@oracle.com> wrote:
>>
>>> FYI, there are a couple more files that reference sys_errlist directly (not just terror within exception.c) , but also hdfs_http_client.c and NativeiO.c
>>>
>>> On 12/11/2014 07:38 AM, malcolm wrote:
>>>> Hi Colin,
>>>>
>>>> Exactly, as you noticed, the problem is the thread-local buffer needed to return from terror.
>>>> Currently, terror just returns a static string from an array, this is fast, simple and error-proof.
>>>>
>>>> In order to use strerror_r inside terror,  would require allocating a buffer inside terror  and depend on the caller to free the buffer after using it, or to pass a buffer to terrror (which is basically the same as strerror_r, rendering terror redundant).
>>>> Both cases require modification outside terror itself, as far as I can tell, no simple fix. Unless you have an alternative which I haven't thought of ?
>>>>
>>>> As far as I can tell, we have two choices:
>>>>
>>>> 1. Remove terror and replace calls with strerror_r, passing a buffer from the callee.
>>>>      Advantage: a more modern portable interface.
>>>>      Disadvantage: All calls to terror need to be modified, though all seem to be in a few files as far as I can tell.
>>>>
>>>> 2. Adding a sys_errlist array (ifdeffed for Solaris)
>>>>      Advantage: no change to any calls to terror
>>>>      Disadvantage: 2 additional files added to source tree (.c and .h) and some minor ifdefs only used for Solaris.
>>>>
>>>> I think it is more a question of style than anything else, so I leave you to make the call.
>>>>
>>>> Thanks for your patience,
>>>> Malcolm
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On 12/10/2014 09:54 PM, Colin McCabe wrote:
>>>>> On Wed, Dec 10, 2014 at 2:31 AM, malcolm <ma...@oracle.com> wrote:
>>>>>> Hi Colin,
>>>>>>
>>>>>> Thanks for the hints around JIRAs.
>>>>>>
>>>>>> You are correct errno still exists, however sys_errlist does not.
>>>>>>
>>>>>> Hadoop uses a function terror (defined in exception.c) which indexes
>>>>>> sys_errlist by errno to return the error message from the array. This
>>>>>> function is called 26 times in various places (in 2.2)
>>>>>>
>>>>>> Originally, I thought to replace all calls to terror with strerror, but
>>>>>> there can be issues with multi-threading (it returns a buffer which can be
>>>>>> overwritten), so it seemed simpler just to recreate the sys_errlist message
>>>>>> array.
>>>>>>
>>>>>> There is also a multi-threaded version strerror_r where you pass the buffer
>>>>>> as a parameter, but this would necessitate changing every call to terror
>>>>>> with mutiple lines of code.
>>>>> Why don't you just use strerror_r inside terror()?
>>>>>
>>>>> I wrote that code originally.  The reason I didn't want to use
>>>>> strerror_r there is because GNU libc provides a non-POSIX definition
>>>>> of strerror_r, and forcing it to use the POSIX one is a pain. But you
>>>>> can do it.  You also will require a thread-local buffer to hold the
>>>>> return from strerror_r, since it is not guaranteed to be static
>>>>> (although in practice it is 99% of the time-- another annoyance with
>>>>> the API).
>>>>>
>>>>>
>
> ________________________________
>
>
>
> ATTENTION: -----
>
> The information contained in this message (including any files transmitted with this message) may contain proprietary, trade secret or other confidential and/or legally privileged information. Any pricing information contained in this message or in any files transmitted with this message is always confidential and cannot be shared with any third parties without prior written approval from Syncsort. This message is intended to be read only by the individual or entity to whom it is addressed or by their designee. If the reader of this message is not the intended recipient, you are on notice that any use, disclosure, copying or distribution of this message, in any form, is strictly prohibited. If you have received this message in error, please immediately notify the sender and/or Syncsort and destroy all copies of this message in your possession, custody or control.


RE: Solaris Port

Posted by "Asokan, M" <ma...@syncsort.com>.
Hi Malcom,
   Recently, I had to work on a function to get system error message on various systems.  Here is the piece of code I came up with.  Hope it helps.

static void get_system_error_message(char * buf, int buf_len, int code)
{
#if defined(_WIN32)
    LPVOID lpMsgBuf;
    DWORD status = FormatMessage(FORMAT_MESSAGE_ALLOCATE_BUFFER |
                                 FORMAT_MESSAGE_FROM_SYSTEM |
                                 FORMAT_MESSAGE_IGNORE_INSERTS,
                                 NULL, code,
                                 MAKELANGID(LANG_NEUTRAL, SUBLANG_DEFAULT),
                                                         /* Default language */
                                 (LPTSTR) &lpMsgBuf, 0, NULL);
    if (status > 0)
    {
        strncpy(buf, (char *)lpMsgBuf, buf_len-1);
        buf[buf_len-1] = '\0';
        /* Free the buffer returned by system */
        LocalFree(lpMsgBuf);
    }
    else
    {
        _snprintf(buf, buf_len-1 , "%s %d",
            "Can't get system error message for code", code);
        buf[buf_len-1] = '\0';
    }
#else
#if defined(_HPUX_SOURCE)
    {
        char * msg;
        errno = 0;
        msg = strerror(code);
        if (errno == 0)
        {
            strncpy(buf, msg, buf_len-1);
            buf[buf_len-1] = '\0';
        }
        else
        {
            snprintf(buf, buf_len, "%s %d",
                "Can't get system error message for code", code);
        }
    }
#else
    if (strerror_r(code, buf, buf_len) != 0)
    {
        snprintf(buf, buf_len, "%s %d",
            "Can't get system error message for code", code);
    }
#endif
#endif
}

Note that HPUX does not have strerror_r() since strerror() itself is thread-safe.  Also Windows does not have snprintf().  The equivalent function _snprintf() has a subtle difference in its interface.

-- Asokan
________________________________________
From: malcolm [malcolm.kavalsky@oracle.com]
Sent: Thursday, December 11, 2014 11:02 AM
To: common-dev@hadoop.apache.org
Subject: Re: Solaris Port

Fine with me, I volunteer to do this, if accepted.

On 12/11/2014 05:48 PM, Allen Wittenauer wrote:
> sys_errlist was removed for a reason.  Creating a fake sys_errlist on Solaris will mean the libhadoop.so will need to be tied a specific build (kernel/include pairing) and therefore limits upward mobility/compatibility.  That doesn’t seem like a very good idea.
>
> IMO, switching to strerror_r is much preferred, since other than the brain-dead GNU libc version, is highly portable and should work regardless of the kernel or OS in place.
>
> On Dec 11, 2014, at 5:20 AM, malcolm <ma...@oracle.com> wrote:
>
>> FYI, there are a couple more files that reference sys_errlist directly (not just terror within exception.c) , but also hdfs_http_client.c and NativeiO.c
>>
>> On 12/11/2014 07:38 AM, malcolm wrote:
>>> Hi Colin,
>>>
>>> Exactly, as you noticed, the problem is the thread-local buffer needed to return from terror.
>>> Currently, terror just returns a static string from an array, this is fast, simple and error-proof.
>>>
>>> In order to use strerror_r inside terror,  would require allocating a buffer inside terror  and depend on the caller to free the buffer after using it, or to pass a buffer to terrror (which is basically the same as strerror_r, rendering terror redundant).
>>> Both cases require modification outside terror itself, as far as I can tell, no simple fix. Unless you have an alternative which I haven't thought of ?
>>>
>>> As far as I can tell, we have two choices:
>>>
>>> 1. Remove terror and replace calls with strerror_r, passing a buffer from the callee.
>>>     Advantage: a more modern portable interface.
>>>     Disadvantage: All calls to terror need to be modified, though all seem to be in a few files as far as I can tell.
>>>
>>> 2. Adding a sys_errlist array (ifdeffed for Solaris)
>>>     Advantage: no change to any calls to terror
>>>     Disadvantage: 2 additional files added to source tree (.c and .h) and some minor ifdefs only used for Solaris.
>>>
>>> I think it is more a question of style than anything else, so I leave you to make the call.
>>>
>>> Thanks for your patience,
>>> Malcolm
>>>
>>>
>>>
>>>
>>>
>>> On 12/10/2014 09:54 PM, Colin McCabe wrote:
>>>> On Wed, Dec 10, 2014 at 2:31 AM, malcolm <ma...@oracle.com> wrote:
>>>>> Hi Colin,
>>>>>
>>>>> Thanks for the hints around JIRAs.
>>>>>
>>>>> You are correct errno still exists, however sys_errlist does not.
>>>>>
>>>>> Hadoop uses a function terror (defined in exception.c) which indexes
>>>>> sys_errlist by errno to return the error message from the array. This
>>>>> function is called 26 times in various places (in 2.2)
>>>>>
>>>>> Originally, I thought to replace all calls to terror with strerror, but
>>>>> there can be issues with multi-threading (it returns a buffer which can be
>>>>> overwritten), so it seemed simpler just to recreate the sys_errlist message
>>>>> array.
>>>>>
>>>>> There is also a multi-threaded version strerror_r where you pass the buffer
>>>>> as a parameter, but this would necessitate changing every call to terror
>>>>> with mutiple lines of code.
>>>> Why don't you just use strerror_r inside terror()?
>>>>
>>>> I wrote that code originally.  The reason I didn't want to use
>>>> strerror_r there is because GNU libc provides a non-POSIX definition
>>>> of strerror_r, and forcing it to use the POSIX one is a pain. But you
>>>> can do it.  You also will require a thread-local buffer to hold the
>>>> return from strerror_r, since it is not guaranteed to be static
>>>> (although in practice it is 99% of the time-- another annoyance with
>>>> the API).
>>>>
>>>>


________________________________



ATTENTION: -----

The information contained in this message (including any files transmitted with this message) may contain proprietary, trade secret or other confidential and/or legally privileged information. Any pricing information contained in this message or in any files transmitted with this message is always confidential and cannot be shared with any third parties without prior written approval from Syncsort. This message is intended to be read only by the individual or entity to whom it is addressed or by their designee. If the reader of this message is not the intended recipient, you are on notice that any use, disclosure, copying or distribution of this message, in any form, is strictly prohibited. If you have received this message in error, please immediately notify the sender and/or Syncsort and destroy all copies of this message in your possession, custody or control.

Re: Solaris Port

Posted by malcolm <ma...@oracle.com>.
Fine with me, I volunteer to do this, if accepted.

On 12/11/2014 05:48 PM, Allen Wittenauer wrote:
> sys_errlist was removed for a reason.  Creating a fake sys_errlist on Solaris will mean the libhadoop.so will need to be tied a specific build (kernel/include pairing) and therefore limits upward mobility/compatibility.  That doesn’t seem like a very good idea.
>
> IMO, switching to strerror_r is much preferred, since other than the brain-dead GNU libc version, is highly portable and should work regardless of the kernel or OS in place.
>
> On Dec 11, 2014, at 5:20 AM, malcolm <ma...@oracle.com> wrote:
>
>> FYI, there are a couple more files that reference sys_errlist directly (not just terror within exception.c) , but also hdfs_http_client.c and NativeiO.c
>>
>> On 12/11/2014 07:38 AM, malcolm wrote:
>>> Hi Colin,
>>>
>>> Exactly, as you noticed, the problem is the thread-local buffer needed to return from terror.
>>> Currently, terror just returns a static string from an array, this is fast, simple and error-proof.
>>>
>>> In order to use strerror_r inside terror,  would require allocating a buffer inside terror  and depend on the caller to free the buffer after using it, or to pass a buffer to terrror (which is basically the same as strerror_r, rendering terror redundant).
>>> Both cases require modification outside terror itself, as far as I can tell, no simple fix. Unless you have an alternative which I haven't thought of ?
>>>
>>> As far as I can tell, we have two choices:
>>>
>>> 1. Remove terror and replace calls with strerror_r, passing a buffer from the callee.
>>>     Advantage: a more modern portable interface.
>>>     Disadvantage: All calls to terror need to be modified, though all seem to be in a few files as far as I can tell.
>>>
>>> 2. Adding a sys_errlist array (ifdeffed for Solaris)
>>>     Advantage: no change to any calls to terror
>>>     Disadvantage: 2 additional files added to source tree (.c and .h) and some minor ifdefs only used for Solaris.
>>>
>>> I think it is more a question of style than anything else, so I leave you to make the call.
>>>
>>> Thanks for your patience,
>>> Malcolm
>>>
>>>
>>>
>>>
>>>
>>> On 12/10/2014 09:54 PM, Colin McCabe wrote:
>>>> On Wed, Dec 10, 2014 at 2:31 AM, malcolm <ma...@oracle.com> wrote:
>>>>> Hi Colin,
>>>>>
>>>>> Thanks for the hints around JIRAs.
>>>>>
>>>>> You are correct errno still exists, however sys_errlist does not.
>>>>>
>>>>> Hadoop uses a function terror (defined in exception.c) which indexes
>>>>> sys_errlist by errno to return the error message from the array. This
>>>>> function is called 26 times in various places (in 2.2)
>>>>>
>>>>> Originally, I thought to replace all calls to terror with strerror, but
>>>>> there can be issues with multi-threading (it returns a buffer which can be
>>>>> overwritten), so it seemed simpler just to recreate the sys_errlist message
>>>>> array.
>>>>>
>>>>> There is also a multi-threaded version strerror_r where you pass the buffer
>>>>> as a parameter, but this would necessitate changing every call to terror
>>>>> with mutiple lines of code.
>>>> Why don't you just use strerror_r inside terror()?
>>>>
>>>> I wrote that code originally.  The reason I didn't want to use
>>>> strerror_r there is because GNU libc provides a non-POSIX definition
>>>> of strerror_r, and forcing it to use the POSIX one is a pain. But you
>>>> can do it.  You also will require a thread-local buffer to hold the
>>>> return from strerror_r, since it is not guaranteed to be static
>>>> (although in practice it is 99% of the time-- another annoyance with
>>>> the API).
>>>>
>>>>


Re: Solaris Port

Posted by Allen Wittenauer <aw...@altiscale.com>.
sys_errlist was removed for a reason.  Creating a fake sys_errlist on Solaris will mean the libhadoop.so will need to be tied a specific build (kernel/include pairing) and therefore limits upward mobility/compatibility.  That doesn’t seem like a very good idea.  

IMO, switching to strerror_r is much preferred, since other than the brain-dead GNU libc version, is highly portable and should work regardless of the kernel or OS in place.

On Dec 11, 2014, at 5:20 AM, malcolm <ma...@oracle.com> wrote:

> FYI, there are a couple more files that reference sys_errlist directly (not just terror within exception.c) , but also hdfs_http_client.c and NativeiO.c
> 
> On 12/11/2014 07:38 AM, malcolm wrote:
>> Hi Colin,
>> 
>> Exactly, as you noticed, the problem is the thread-local buffer needed to return from terror.
>> Currently, terror just returns a static string from an array, this is fast, simple and error-proof.
>> 
>> In order to use strerror_r inside terror,  would require allocating a buffer inside terror  and depend on the caller to free the buffer after using it, or to pass a buffer to terrror (which is basically the same as strerror_r, rendering terror redundant).
>> Both cases require modification outside terror itself, as far as I can tell, no simple fix. Unless you have an alternative which I haven't thought of ?
>> 
>> As far as I can tell, we have two choices:
>> 
>> 1. Remove terror and replace calls with strerror_r, passing a buffer from the callee.
>>    Advantage: a more modern portable interface.
>>    Disadvantage: All calls to terror need to be modified, though all seem to be in a few files as far as I can tell.
>> 
>> 2. Adding a sys_errlist array (ifdeffed for Solaris)
>>    Advantage: no change to any calls to terror
>>    Disadvantage: 2 additional files added to source tree (.c and .h) and some minor ifdefs only used for Solaris.
>> 
>> I think it is more a question of style than anything else, so I leave you to make the call.
>> 
>> Thanks for your patience,
>> Malcolm
>> 
>> 
>> 
>> 
>> 
>> On 12/10/2014 09:54 PM, Colin McCabe wrote:
>>> On Wed, Dec 10, 2014 at 2:31 AM, malcolm <ma...@oracle.com> wrote:
>>>> Hi Colin,
>>>> 
>>>> Thanks for the hints around JIRAs.
>>>> 
>>>> You are correct errno still exists, however sys_errlist does not.
>>>> 
>>>> Hadoop uses a function terror (defined in exception.c) which indexes
>>>> sys_errlist by errno to return the error message from the array. This
>>>> function is called 26 times in various places (in 2.2)
>>>> 
>>>> Originally, I thought to replace all calls to terror with strerror, but
>>>> there can be issues with multi-threading (it returns a buffer which can be
>>>> overwritten), so it seemed simpler just to recreate the sys_errlist message
>>>> array.
>>>> 
>>>> There is also a multi-threaded version strerror_r where you pass the buffer
>>>> as a parameter, but this would necessitate changing every call to terror
>>>> with mutiple lines of code.
>>> Why don't you just use strerror_r inside terror()?
>>> 
>>> I wrote that code originally.  The reason I didn't want to use
>>> strerror_r there is because GNU libc provides a non-POSIX definition
>>> of strerror_r, and forcing it to use the POSIX one is a pain. But you
>>> can do it.  You also will require a thread-local buffer to hold the
>>> return from strerror_r, since it is not guaranteed to be static
>>> (although in practice it is 99% of the time-- another annoyance with
>>> the API).
>>> 
>>> 
>> 
> 


Re: Solaris Port

Posted by malcolm <ma...@oracle.com>.
FYI, there are a couple more files that reference sys_errlist directly 
(not just terror within exception.c) , but also hdfs_http_client.c and 
NativeiO.c

On 12/11/2014 07:38 AM, malcolm wrote:
> Hi Colin,
>
> Exactly, as you noticed, the problem is the thread-local buffer needed 
> to return from terror.
> Currently, terror just returns a static string from an array, this is 
> fast, simple and error-proof.
>
> In order to use strerror_r inside terror,  would require allocating a 
> buffer inside terror  and depend on the caller to free the buffer 
> after using it, or to pass a buffer to terrror (which is basically the 
> same as strerror_r, rendering terror redundant).
> Both cases require modification outside terror itself, as far as I can 
> tell, no simple fix. Unless you have an alternative which I haven't 
> thought of ?
>
> As far as I can tell, we have two choices:
>
> 1. Remove terror and replace calls with strerror_r, passing a buffer 
> from the callee.
>     Advantage: a more modern portable interface.
>     Disadvantage: All calls to terror need to be modified, though all 
> seem to be in a few files as far as I can tell.
>
> 2. Adding a sys_errlist array (ifdeffed for Solaris)
>     Advantage: no change to any calls to terror
>     Disadvantage: 2 additional files added to source tree (.c and .h) 
> and some minor ifdefs only used for Solaris.
>
> I think it is more a question of style than anything else, so I leave 
> you to make the call.
>
> Thanks for your patience,
> Malcolm
>
>
>
>
>
> On 12/10/2014 09:54 PM, Colin McCabe wrote:
>> On Wed, Dec 10, 2014 at 2:31 AM, malcolm 
>> <ma...@oracle.com> wrote:
>>> Hi Colin,
>>>
>>> Thanks for the hints around JIRAs.
>>>
>>> You are correct errno still exists, however sys_errlist does not.
>>>
>>> Hadoop uses a function terror (defined in exception.c) which indexes
>>> sys_errlist by errno to return the error message from the array. This
>>> function is called 26 times in various places (in 2.2)
>>>
>>> Originally, I thought to replace all calls to terror with strerror, but
>>> there can be issues with multi-threading (it returns a buffer which 
>>> can be
>>> overwritten), so it seemed simpler just to recreate the sys_errlist 
>>> message
>>> array.
>>>
>>> There is also a multi-threaded version strerror_r where you pass the 
>>> buffer
>>> as a parameter, but this would necessitate changing every call to 
>>> terror
>>> with mutiple lines of code.
>> Why don't you just use strerror_r inside terror()?
>>
>> I wrote that code originally.  The reason I didn't want to use
>> strerror_r there is because GNU libc provides a non-POSIX definition
>> of strerror_r, and forcing it to use the POSIX one is a pain. But you
>> can do it.  You also will require a thread-local buffer to hold the
>> return from strerror_r, since it is not guaranteed to be static
>> (although in practice it is 99% of the time-- another annoyance with
>> the API).
>>
>>
>


Re: Solaris Port

Posted by malcolm <ma...@oracle.com>.
Hi Colin,

Exactly, as you noticed, the problem is the thread-local buffer needed 
to return from terror.
Currently, terror just returns a static string from an array, this is 
fast, simple and error-proof.

In order to use strerror_r inside terror,  would require allocating a 
buffer inside terror  and depend on the caller to free the buffer after 
using it, or to pass a buffer to terrror (which is basically the same as 
strerror_r, rendering terror redundant).
Both cases require modification outside terror itself, as far as I can 
tell, no simple fix. Unless you have an alternative which I haven't 
thought of ?

As far as I can tell, we have two choices:

1. Remove terror and replace calls with strerror_r, passing a buffer 
from the callee.
     Advantage: a more modern portable interface.
     Disadvantage: All calls to terror need to be modified, though all 
seem to be in a few files as far as I can tell.

2. Adding a sys_errlist array (ifdeffed for Solaris)
     Advantage: no change to any calls to terror
     Disadvantage: 2 additional files added to source tree (.c and .h) 
and some minor ifdefs only used for Solaris.

I think it is more a question of style than anything else, so I leave 
you to make the call.

Thanks for your patience,
Malcolm





On 12/10/2014 09:54 PM, Colin McCabe wrote:
> On Wed, Dec 10, 2014 at 2:31 AM, malcolm <ma...@oracle.com> wrote:
>> Hi Colin,
>>
>> Thanks for the hints around JIRAs.
>>
>> You are correct errno still exists, however sys_errlist does not.
>>
>> Hadoop uses a function terror (defined in exception.c) which indexes
>> sys_errlist by errno to return the error message from the array. This
>> function is called 26 times in various places (in 2.2)
>>
>> Originally, I thought to replace all calls to terror with strerror, but
>> there can be issues with multi-threading (it returns a buffer which can be
>> overwritten), so it seemed simpler just to recreate the sys_errlist message
>> array.
>>
>> There is also a multi-threaded version strerror_r where you pass the buffer
>> as a parameter, but this would necessitate changing every call to terror
>> with mutiple lines of code.
> Why don't you just use strerror_r inside terror()?
>
> I wrote that code originally.  The reason I didn't want to use
> strerror_r there is because GNU libc provides a non-POSIX definition
> of strerror_r, and forcing it to use the POSIX one is a pain.  But you
> can do it.  You also will require a thread-local buffer to hold the
> return from strerror_r, since it is not guaranteed to be static
> (although in practice it is 99% of the time-- another annoyance with
> the API).
>
>


Re: Solaris Port

Posted by Colin McCabe <cm...@alumni.cmu.edu>.
On Wed, Dec 10, 2014 at 2:31 AM, malcolm <ma...@oracle.com> wrote:
> Hi Colin,
>
> Thanks for the hints around JIRAs.
>
> You are correct errno still exists, however sys_errlist does not.
>
> Hadoop uses a function terror (defined in exception.c) which indexes
> sys_errlist by errno to return the error message from the array. This
> function is called 26 times in various places (in 2.2)
>
> Originally, I thought to replace all calls to terror with strerror, but
> there can be issues with multi-threading (it returns a buffer which can be
> overwritten), so it seemed simpler just to recreate the sys_errlist message
> array.
>
> There is also a multi-threaded version strerror_r where you pass the buffer
> as a parameter, but this would necessitate changing every call to terror
> with mutiple lines of code.

Why don't you just use strerror_r inside terror()?

I wrote that code originally.  The reason I didn't want to use
strerror_r there is because GNU libc provides a non-POSIX definition
of strerror_r, and forcing it to use the POSIX one is a pain.  But you
can do it.  You also will require a thread-local buffer to hold the
return from strerror_r, since it is not guaranteed to be static
(although in practice it is 99% of the time-- another annoyance with
the API).

>
> Sorry, I wasn't clear.
>
> Also, I have been requested to ensure my port is available on 2.4, perceived
> as a more stable release. If I make changes to this branch are they
> automatically available for 2.6, or will I need multiple JIRAs ?

As Steve commented, 2.4 is pretty much "done."  I don't think this
kind of thing would get backported there.  Even getting it in a 2.6
release seems unlikely to me (but I haven't thought about it that
hard).  I would just get the work done, and let it show up in the
release it's ready in.

cheers,
Colin

>
> Thanks,
> Malcolm
>
>
> On 12/10/2014 10:45 AM, Colin McCabe wrote:
>>
>> Hi Malcolm,
>>
>> In general we file JIRAs for particular issues.  So if one issue is
>> handling errlist on Solaris, that might be one JIRA.  Another issue
>> might be handling socket write timeouts on Solaris.  And so on.  Most
>> of these should probably be HADOOP tickets since they sound like they
>> are mostly in the generic hadoop-common code.
>>
>> "solaris does not have errno" seems like a bold statement.  errno is
>> part of POSIX, and Solaris is a POSIX os, right?  Am I way off base on
>> this?
>> I googled around and one of the first results I found talked about
>> errno values on Solaris.
>>
>> http://www.pixelstech.net/article/1413273556-A-trick-of-building-multithreaded-application-on-Solaris
>>   Perhaps I misunderstood what you meant by this statement.
>>
>> Anyway, please file JIRAs for any portability improvements you can think
>> of!
>>
>> best,
>> Colin
>>
>> On Mon, Dec 8, 2014 at 9:09 PM, malcolm <ma...@oracle.com>
>> wrote:
>>>
>>> Hi Colin,
>>>
>>> A short summary of my changes are as follows:
>>>
>>> - Native C source files: added 5,  modified 6, requiring also changes to
>>> CMakeLists.txt. Of course, all changes are "ifdeffed" for Solaris
>>> appropriately and new files, are prefixed with solaris_ as well.
>>>
>>> For example, Solaris does not have errno, or errlist any more which are
>>> used
>>> quite a lot in hadoop native code. I could have replaced all calls to use
>>> strerror instead which would be compatible with Linux, however in the
>>> interests of making minimal changes, I recreated and added these files
>>> from
>>> a running Solaris machine instead.
>>>
>>> Another issue is that Solaris doesn't have the timeout option for
>>> sockets,
>>> so I had to write my own solaris_read routine with timeout and added it
>>> to
>>> DomainSocket.c . A few issues with lz4 on Sparc needed modification, and
>>> some other OS specific issues: getgrouplist, container-executer (from
>>> yarn).
>>>
>>> - Some very minor changes were made to some Java source files (mainly
>>> tests
>>> to get them to pass on Solaris)
>>>
>>> The above changes were made to 2.2, I will recheck everything against the
>>> latest trunk, maybe some fixes aren't needed any more.
>>>
>>> I have generated a single patch file with all changes. Perhaps it would
>>> be
>>> better to file multiple JIRAs for each change, perhaps grouped, one per
>>> issue ? Or should I file a JIRA for each modified source file ?
>>>
>>> Thank you,
>>> Malcolm
>>>
>>>
>>> On 12/08/2014 09:53 PM, Colin McCabe wrote:
>>>>
>>>> Hi Malcolm,
>>>>
>>>> It's great that you are going to contribute!  Please make your patches
>>>> against trunk.
>>>>
>>>> 2.2 is fairly old at this point.  It hasn't been the focus of
>>>> development in more than a year.
>>>>
>>>> We don't use github or pull requests.
>>>>
>>>> Check the section on http://wiki.apache.org/hadoop/HowToContribute
>>>> that talks about "Contributing your work".  Excerpt:
>>>> "Finally, patches should be attached to an issue report in Jira via
>>>> the Attach File link on the issue's Jira. Please add a comment that
>>>> asks for a code review following our code review checklist. Please
>>>> note that the attachment should be granted license to ASF for
>>>> inclusion in ASF works (as per the Apache License §5)."
>>>>
>>>> As this says, you attach the patch file to a JIRA that you have
>>>> created, and then hit "submit patch."
>>>>
>>>> I don't think a branch is required for this work since it is just
>>>> build fixes, right?
>>>>
>>>> best,
>>>> Colin
>>>>
>>>>
>>>> On Mon, Dec 8, 2014 at 3:30 AM, malcolm <ma...@oracle.com>
>>>> wrote:
>>>>>
>>>>> I have ported Hadoop  native libraries to Solaris 11 (both Sparc and
>>>>> Intel )
>>>>> Oracle have agreed to release my changes to the community so that
>>>>> Solaris
>>>>> platforms can benefit.
>>>>> Reading the HowToContribute and GitandHadoop documents, I am not 100%
>>>>> clear
>>>>> on how to get my changes into the main tree. I am also a Git(hub)
>>>>> newbie,
>>>>> and was using svn previously.
>>>>>
>>>>> Please let me know if I am going the correct path:
>>>>>
>>>>> 1. I forked Hadoop on Github and downloaded a clone to my development
>>>>> machine.
>>>>>
>>>>> 2. The changes I made were to 2.2.0, can I still add changes to this
>>>>> branch,
>>>>> and hopefully get them accepted or must I migrate my changes to 2.6 ?
>>>>> (On
>>>>> the main Hadoop download page, 2.2 is still listed as the GA version )
>>>>>
>>>>> 3. I understand that I should create a new branch for my changes, and
>>>>> then
>>>>> generate pull requests after uploading them to Github.
>>>>>
>>>>> 4. I also registered  at Jira in the understanding that I need to
>>>>> generate a
>>>>> Jira number for my changes, and to name my branch accordingly ?
>>>>>
>>>>> Does all this make sense ?
>>>>>
>>>>> Thanks,
>>>>> Malcolm
>>>>>
>>>>>
>

Re: Solaris Port

Posted by malcolm <ma...@oracle.com>.
Hi Colin,

Thanks for the hints around JIRAs.

You are correct errno still exists, however sys_errlist does not.

Hadoop uses a function terror (defined in exception.c) which indexes 
sys_errlist by errno to return the error message from the array. This 
function is called 26 times in various places (in 2.2)

Originally, I thought to replace all calls to terror with strerror, but 
there can be issues with multi-threading (it returns a buffer which can 
be overwritten), so it seemed simpler just to recreate the sys_errlist 
message array.

There is also a multi-threaded version strerror_r where you pass the 
buffer as a parameter, but this would necessitate changing every call to 
terror with mutiple lines of code.

Sorry, I wasn't clear.

Also, I have been requested to ensure my port is available on 2.4, 
perceived as a more stable release. If I make changes to this branch are 
they automatically available for 2.6, or will I need multiple JIRAs ?

Thanks,
Malcolm

On 12/10/2014 10:45 AM, Colin McCabe wrote:
> Hi Malcolm,
>
> In general we file JIRAs for particular issues.  So if one issue is
> handling errlist on Solaris, that might be one JIRA.  Another issue
> might be handling socket write timeouts on Solaris.  And so on.  Most
> of these should probably be HADOOP tickets since they sound like they
> are mostly in the generic hadoop-common code.
>
> "solaris does not have errno" seems like a bold statement.  errno is
> part of POSIX, and Solaris is a POSIX os, right?  Am I way off base on
> this?
> I googled around and one of the first results I found talked about
> errno values on Solaris.
> http://www.pixelstech.net/article/1413273556-A-trick-of-building-multithreaded-application-on-Solaris
>   Perhaps I misunderstood what you meant by this statement.
>
> Anyway, please file JIRAs for any portability improvements you can think of!
>
> best,
> Colin
>
> On Mon, Dec 8, 2014 at 9:09 PM, malcolm <ma...@oracle.com> wrote:
>> Hi Colin,
>>
>> A short summary of my changes are as follows:
>>
>> - Native C source files: added 5,  modified 6, requiring also changes to
>> CMakeLists.txt. Of course, all changes are "ifdeffed" for Solaris
>> appropriately and new files, are prefixed with solaris_ as well.
>>
>> For example, Solaris does not have errno, or errlist any more which are used
>> quite a lot in hadoop native code. I could have replaced all calls to use
>> strerror instead which would be compatible with Linux, however in the
>> interests of making minimal changes, I recreated and added these files from
>> a running Solaris machine instead.
>>
>> Another issue is that Solaris doesn't have the timeout option for sockets,
>> so I had to write my own solaris_read routine with timeout and added it to
>> DomainSocket.c . A few issues with lz4 on Sparc needed modification, and
>> some other OS specific issues: getgrouplist, container-executer (from yarn).
>>
>> - Some very minor changes were made to some Java source files (mainly tests
>> to get them to pass on Solaris)
>>
>> The above changes were made to 2.2, I will recheck everything against the
>> latest trunk, maybe some fixes aren't needed any more.
>>
>> I have generated a single patch file with all changes. Perhaps it would be
>> better to file multiple JIRAs for each change, perhaps grouped, one per
>> issue ? Or should I file a JIRA for each modified source file ?
>>
>> Thank you,
>> Malcolm
>>
>>
>> On 12/08/2014 09:53 PM, Colin McCabe wrote:
>>> Hi Malcolm,
>>>
>>> It's great that you are going to contribute!  Please make your patches
>>> against trunk.
>>>
>>> 2.2 is fairly old at this point.  It hasn't been the focus of
>>> development in more than a year.
>>>
>>> We don't use github or pull requests.
>>>
>>> Check the section on http://wiki.apache.org/hadoop/HowToContribute
>>> that talks about "Contributing your work".  Excerpt:
>>> "Finally, patches should be attached to an issue report in Jira via
>>> the Attach File link on the issue's Jira. Please add a comment that
>>> asks for a code review following our code review checklist. Please
>>> note that the attachment should be granted license to ASF for
>>> inclusion in ASF works (as per the Apache License §5)."
>>>
>>> As this says, you attach the patch file to a JIRA that you have
>>> created, and then hit "submit patch."
>>>
>>> I don't think a branch is required for this work since it is just
>>> build fixes, right?
>>>
>>> best,
>>> Colin
>>>
>>>
>>> On Mon, Dec 8, 2014 at 3:30 AM, malcolm <ma...@oracle.com>
>>> wrote:
>>>> I have ported Hadoop  native libraries to Solaris 11 (both Sparc and
>>>> Intel )
>>>> Oracle have agreed to release my changes to the community so that Solaris
>>>> platforms can benefit.
>>>> Reading the HowToContribute and GitandHadoop documents, I am not 100%
>>>> clear
>>>> on how to get my changes into the main tree. I am also a Git(hub) newbie,
>>>> and was using svn previously.
>>>>
>>>> Please let me know if I am going the correct path:
>>>>
>>>> 1. I forked Hadoop on Github and downloaded a clone to my development
>>>> machine.
>>>>
>>>> 2. The changes I made were to 2.2.0, can I still add changes to this
>>>> branch,
>>>> and hopefully get them accepted or must I migrate my changes to 2.6 ? (On
>>>> the main Hadoop download page, 2.2 is still listed as the GA version )
>>>>
>>>> 3. I understand that I should create a new branch for my changes, and
>>>> then
>>>> generate pull requests after uploading them to Github.
>>>>
>>>> 4. I also registered  at Jira in the understanding that I need to
>>>> generate a
>>>> Jira number for my changes, and to name my branch accordingly ?
>>>>
>>>> Does all this make sense ?
>>>>
>>>> Thanks,
>>>> Malcolm
>>>>
>>>>


Re: Solaris Port

Posted by Colin McCabe <cm...@alumni.cmu.edu>.
Hi Malcolm,

In general we file JIRAs for particular issues.  So if one issue is
handling errlist on Solaris, that might be one JIRA.  Another issue
might be handling socket write timeouts on Solaris.  And so on.  Most
of these should probably be HADOOP tickets since they sound like they
are mostly in the generic hadoop-common code.

"solaris does not have errno" seems like a bold statement.  errno is
part of POSIX, and Solaris is a POSIX os, right?  Am I way off base on
this?
I googled around and one of the first results I found talked about
errno values on Solaris.
http://www.pixelstech.net/article/1413273556-A-trick-of-building-multithreaded-application-on-Solaris
 Perhaps I misunderstood what you meant by this statement.

Anyway, please file JIRAs for any portability improvements you can think of!

best,
Colin

On Mon, Dec 8, 2014 at 9:09 PM, malcolm <ma...@oracle.com> wrote:
> Hi Colin,
>
> A short summary of my changes are as follows:
>
> - Native C source files: added 5,  modified 6, requiring also changes to
> CMakeLists.txt. Of course, all changes are "ifdeffed" for Solaris
> appropriately and new files, are prefixed with solaris_ as well.
>
> For example, Solaris does not have errno, or errlist any more which are used
> quite a lot in hadoop native code. I could have replaced all calls to use
> strerror instead which would be compatible with Linux, however in the
> interests of making minimal changes, I recreated and added these files from
> a running Solaris machine instead.
>
> Another issue is that Solaris doesn't have the timeout option for sockets,
> so I had to write my own solaris_read routine with timeout and added it to
> DomainSocket.c . A few issues with lz4 on Sparc needed modification, and
> some other OS specific issues: getgrouplist, container-executer (from yarn).
>
> - Some very minor changes were made to some Java source files (mainly tests
> to get them to pass on Solaris)
>
> The above changes were made to 2.2, I will recheck everything against the
> latest trunk, maybe some fixes aren't needed any more.
>
> I have generated a single patch file with all changes. Perhaps it would be
> better to file multiple JIRAs for each change, perhaps grouped, one per
> issue ? Or should I file a JIRA for each modified source file ?
>
> Thank you,
> Malcolm
>
>
> On 12/08/2014 09:53 PM, Colin McCabe wrote:
>>
>> Hi Malcolm,
>>
>> It's great that you are going to contribute!  Please make your patches
>> against trunk.
>>
>> 2.2 is fairly old at this point.  It hasn't been the focus of
>> development in more than a year.
>>
>> We don't use github or pull requests.
>>
>> Check the section on http://wiki.apache.org/hadoop/HowToContribute
>> that talks about "Contributing your work".  Excerpt:
>> "Finally, patches should be attached to an issue report in Jira via
>> the Attach File link on the issue's Jira. Please add a comment that
>> asks for a code review following our code review checklist. Please
>> note that the attachment should be granted license to ASF for
>> inclusion in ASF works (as per the Apache License §5)."
>>
>> As this says, you attach the patch file to a JIRA that you have
>> created, and then hit "submit patch."
>>
>> I don't think a branch is required for this work since it is just
>> build fixes, right?
>>
>> best,
>> Colin
>>
>>
>> On Mon, Dec 8, 2014 at 3:30 AM, malcolm <ma...@oracle.com>
>> wrote:
>>>
>>> I have ported Hadoop  native libraries to Solaris 11 (both Sparc and
>>> Intel )
>>> Oracle have agreed to release my changes to the community so that Solaris
>>> platforms can benefit.
>>> Reading the HowToContribute and GitandHadoop documents, I am not 100%
>>> clear
>>> on how to get my changes into the main tree. I am also a Git(hub) newbie,
>>> and was using svn previously.
>>>
>>> Please let me know if I am going the correct path:
>>>
>>> 1. I forked Hadoop on Github and downloaded a clone to my development
>>> machine.
>>>
>>> 2. The changes I made were to 2.2.0, can I still add changes to this
>>> branch,
>>> and hopefully get them accepted or must I migrate my changes to 2.6 ? (On
>>> the main Hadoop download page, 2.2 is still listed as the GA version )
>>>
>>> 3. I understand that I should create a new branch for my changes, and
>>> then
>>> generate pull requests after uploading them to Github.
>>>
>>> 4. I also registered  at Jira in the understanding that I need to
>>> generate a
>>> Jira number for my changes, and to name my branch accordingly ?
>>>
>>> Does all this make sense ?
>>>
>>> Thanks,
>>> Malcolm
>>>
>>>
>

Re: Solaris Port

Posted by malcolm <ma...@oracle.com>.
Hi Colin,

A short summary of my changes are as follows:

- Native C source files: added 5,  modified 6, requiring also changes to 
CMakeLists.txt. Of course, all changes are "ifdeffed" for Solaris 
appropriately and new files, are prefixed with solaris_ as well.

For example, Solaris does not have errno, or errlist any more which are 
used quite a lot in hadoop native code. I could have replaced all calls 
to use strerror instead which would be compatible with Linux, however in 
the interests of making minimal changes, I recreated and added these 
files from a running Solaris machine instead.

Another issue is that Solaris doesn't have the timeout option for 
sockets, so I had to write my own solaris_read routine with timeout and 
added it to DomainSocket.c . A few issues with lz4 on Sparc needed 
modification, and some other OS specific issues: getgrouplist, 
container-executer (from yarn).

- Some very minor changes were made to some Java source files (mainly 
tests to get them to pass on Solaris)

The above changes were made to 2.2, I will recheck everything against 
the latest trunk, maybe some fixes aren't needed any more.

I have generated a single patch file with all changes. Perhaps it would 
be better to file multiple JIRAs for each change, perhaps grouped, one 
per issue ? Or should I file a JIRA for each modified source file ?

Thank you,
Malcolm

On 12/08/2014 09:53 PM, Colin McCabe wrote:
> Hi Malcolm,
>
> It's great that you are going to contribute!  Please make your patches
> against trunk.
>
> 2.2 is fairly old at this point.  It hasn't been the focus of
> development in more than a year.
>
> We don't use github or pull requests.
>
> Check the section on http://wiki.apache.org/hadoop/HowToContribute
> that talks about "Contributing your work".  Excerpt:
> "Finally, patches should be attached to an issue report in Jira via
> the Attach File link on the issue's Jira. Please add a comment that
> asks for a code review following our code review checklist. Please
> note that the attachment should be granted license to ASF for
> inclusion in ASF works (as per the Apache License §5)."
>
> As this says, you attach the patch file to a JIRA that you have
> created, and then hit "submit patch."
>
> I don't think a branch is required for this work since it is just
> build fixes, right?
>
> best,
> Colin
>
>
> On Mon, Dec 8, 2014 at 3:30 AM, malcolm <ma...@oracle.com> wrote:
>> I have ported Hadoop  native libraries to Solaris 11 (both Sparc and Intel )
>> Oracle have agreed to release my changes to the community so that Solaris
>> platforms can benefit.
>> Reading the HowToContribute and GitandHadoop documents, I am not 100% clear
>> on how to get my changes into the main tree. I am also a Git(hub) newbie,
>> and was using svn previously.
>>
>> Please let me know if I am going the correct path:
>>
>> 1. I forked Hadoop on Github and downloaded a clone to my development
>> machine.
>>
>> 2. The changes I made were to 2.2.0, can I still add changes to this branch,
>> and hopefully get them accepted or must I migrate my changes to 2.6 ? (On
>> the main Hadoop download page, 2.2 is still listed as the GA version )
>>
>> 3. I understand that I should create a new branch for my changes, and then
>> generate pull requests after uploading them to Github.
>>
>> 4. I also registered  at Jira in the understanding that I need to generate a
>> Jira number for my changes, and to name my branch accordingly ?
>>
>> Does all this make sense ?
>>
>> Thanks,
>> Malcolm
>>
>>


Re: Solaris Port

Posted by Colin McCabe <cm...@alumni.cmu.edu>.
Hi Malcolm,

It's great that you are going to contribute!  Please make your patches
against trunk.

2.2 is fairly old at this point.  It hasn't been the focus of
development in more than a year.

We don't use github or pull requests.

Check the section on http://wiki.apache.org/hadoop/HowToContribute
that talks about "Contributing your work".  Excerpt:
"Finally, patches should be attached to an issue report in Jira via
the Attach File link on the issue's Jira. Please add a comment that
asks for a code review following our code review checklist. Please
note that the attachment should be granted license to ASF for
inclusion in ASF works (as per the Apache License §5)."

As this says, you attach the patch file to a JIRA that you have
created, and then hit "submit patch."

I don't think a branch is required for this work since it is just
build fixes, right?

best,
Colin


On Mon, Dec 8, 2014 at 3:30 AM, malcolm <ma...@oracle.com> wrote:
> I have ported Hadoop  native libraries to Solaris 11 (both Sparc and Intel )
> Oracle have agreed to release my changes to the community so that Solaris
> platforms can benefit.
> Reading the HowToContribute and GitandHadoop documents, I am not 100% clear
> on how to get my changes into the main tree. I am also a Git(hub) newbie,
> and was using svn previously.
>
> Please let me know if I am going the correct path:
>
> 1. I forked Hadoop on Github and downloaded a clone to my development
> machine.
>
> 2. The changes I made were to 2.2.0, can I still add changes to this branch,
> and hopefully get them accepted or must I migrate my changes to 2.6 ? (On
> the main Hadoop download page, 2.2 is still listed as the GA version )
>
> 3. I understand that I should create a new branch for my changes, and then
> generate pull requests after uploading them to Github.
>
> 4. I also registered  at Jira in the understanding that I need to generate a
> Jira number for my changes, and to name my branch accordingly ?
>
> Does all this make sense ?
>
> Thanks,
> Malcolm
>
>