You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-issues@hadoop.apache.org by "Allen Wittenauer (JIRA)" <ji...@apache.org> on 2016/05/25 19:40:12 UTC
[jira] [Comment Edited] (HADOOP-12488) DomainSocket: Solaris does
not support timeouts on AF_UNIX sockets
[ https://issues.apache.org/jira/browse/HADOOP-12488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15300713#comment-15300713 ]
Allen Wittenauer edited comment on HADOOP-12488 at 5/25/16 7:39 PM:
--------------------------------------------------------------------
hadoop-common-project/hadoop-common/src/check_unix_sock_timeouts.c really does needs an ASF license header. Also, is it possible to move this into src/main/native?
was (Author: aw):
hadoop-common-project/hadoop-common/src/check_unix_sock_timeouts.c really does needs an ASF license header.
> DomainSocket: Solaris does not support timeouts on AF_UNIX sockets
> ------------------------------------------------------------------
>
> Key: HADOOP-12488
> URL: https://issues.apache.org/jira/browse/HADOOP-12488
> Project: Hadoop Common
> Issue Type: Sub-task
> Components: net
> Affects Versions: 2.7.1
> Environment: Solaris
> Reporter: Alan Burlison
> Assignee: Alan Burlison
> Attachments: HADOOP-12488.001.patch, HADOOP-12488.002.patch
>
>
> From the hadoop-common-dev mailing list:
> http://mail-archives.apache.org/mod_mbox/hadoop-common-dev/201509.mbox/%3C560B99F6.6010408@oracle.com%3E
> http://mail-archives.apache.org/mod_mbox/hadoop-common-dev/201510.mbox/%3C560EA6BF.2070001@oracle.com%3E
> {quote}
> Now that the Hadoop native code builds on Solaris I've been chipping
> away at all the test failures. About 50% of the failures involve
> DomainSocket, either directly or indirectly. That seems to be mainly
> because the tests use DomainSocket to do single-node testing, whereas in
> production it seems that DomainSocket is less commonly used
> (https://hadoop.apache.org/docs/r2.7.1/hadoop-project-dist/hadoop-hdfs/ShortCircuitLocalReads.html).
> The particular problem on Solaris is that socket read/write timeouts
> (the SO_SNDTIMEO and SO_RCVTIMEO socket options) are not supported for
> UNIX domain (PF_UNIX) sockets. Those options are however supported for
> PF_INET sockets. That's because the socket implementation on Solaris is
> split roughly into two parts, for inet sockets and for STREAMS sockets,
> and the STREAMS implementation lacks support for SO_SNDTIMEO and
> SO_RCVTIMEO. As an aside, performance of sockets that use loopback or
> the host's own IP is slightly better than that of UNIX domain sockets on
> Solaris.
> I'm investigating getting timeouts supported for PF_UNIX sockets added
> to Solaris, but in the meantime I'm also looking how this might be
> worked around in Hadoop. One way would be to implement timeouts by
> wrapping all the read/write/send/recv etc calls in DomainSocket.c with
> either poll() or select().
> The basic idea is to add two new fields to DomainSocket.c to hold the
> read/write timeouts. On platforms that support SO_SNDTIMEO and
> SO_RCVTIMEO these would be unused as setsockopt() would be used to set
> the socket timeouts. On platforms such as Solaris the JNI code would use
> the values to implement the timeouts appropriately.
> To prevent the code in DomainSocket.c becoming a #ifdef hairball, the
> current socket IO function calls such as accept(), send(), read() etc
> would be replaced with a macros such as HD_ACCEPT. On platforms that
> provide timeouts these would just expand to the normal socket functions,
> on platforms that don't support timeouts it would expand to wrappers
> that implements timeouts for them.
> The only caveats are that all code that does anything to a PF_UNIX
> socket would *always* have to do so via DomainSocket. As far as I can
> tell that's not an issue, but it would have to be borne in mind if any
> changes were made in this area.
> Before I set about doing this, does the approach seem reasonable?
> {quote}
> {quote}
> Unfortunately it's not a simple as I'd hoped. For some reason I don't
> really understand, nearly all the JNI methods are declared as static and
> therefore don't get a "this" pointer and as a consequence all the class
> data members that are needed by the JNI code have to be passed in as
> parameters. That also means it's not possible to store the timeouts in
> the DomainSocket fields from within the JNI code. Most of the JNI
> methods should be instance methods rather than static ones, but making
> that change would require some significant surgery to DomainSocket.
> {quote}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org