You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@mesos.apache.org by "Benjamin Mahler (JIRA)" <ji...@apache.org> on 2015/01/10 01:15:34 UTC

[jira] [Commented] (MESOS-2079) IO.Write test is flaky on OS X 10.10.

    [ https://issues.apache.org/jira/browse/MESOS-2079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14272124#comment-14272124 ] 

Benjamin Mahler commented on MESOS-2079:
----------------------------------------

It appears that on my laptop, we fairly consistently lose the race here:

{code}
    // Do a write but ignore SIGPIPE so we can return an error when
    // writing to a pipe or socket where the reading end is closed.
    // TODO(benh): The 'suppress' macro failed to work on OS X as it
    // appears that signal delivery was happening asynchronously.
    // That is, the signal would not appear to be pending when the
    // 'suppress' block was closed thus the destructor for
    // 'Suppressor' was not waiting/removing the signal via 'sigwait'.
    // It also appeared that the signal would be delivered to another
    // thread even if it remained blocked in this thiread. The
    // workaround here is to check explicitly for EPIPE and then do
    // 'sigwait' regardless of what 'os::signals::pending' returns. We
    // don't have that luxury with 'Suppressor' and arbitrary signals
    // because we don't always have something like EPIPE to tell us
    // that a signal is (or will soon be) pending.
    bool pending = os::signals::pending(SIGPIPE);
    bool unblock = !pending ? os::signals::block(SIGPIPE) : false;

    ssize_t length = ::write(fd, data, size);

    // Save the errno so we can restore it after doing sig* functions
    // below.
    int errno_ = errno;

    // XXX: We receive EPIPE, but before we can call sigwait to capture it
    // per the TODO above, SIPIPE is delivered to another thread.

    if (length < 0 && errno == EPIPE && !pending) {
      sigset_t mask;
      sigemptyset(&mask);
      sigaddset(&mask, SIGPIPE);

      int result;
      do {
        int ignored;
        // XXX: Too late!
        result = sigwait(&mask, &ignored);
      } while (result == -1 && errno == EINTR);
    }

    if (unblock) {
      os::signals::unblock(SIGPIPE);
    }
{code}

> IO.Write test is flaky on OS X 10.10.
> -------------------------------------
>
>                 Key: MESOS-2079
>                 URL: https://issues.apache.org/jira/browse/MESOS-2079
>             Project: Mesos
>          Issue Type: Task
>          Components: libprocess, technical debt, test
>         Environment: OS X 10.10
> {noformat}
> $ clang++ --version
> Apple LLVM version 6.0 (clang-600.0.54) (based on LLVM 3.5svn)
> Target: x86_64-apple-darwin14.0.0
> Thread model: posix
> {noformat}
>            Reporter: Benjamin Mahler
>              Labels: flaky
>
> [~benjaminhindman]: If I recall correctly, this is related to MESOS-1658. Unfortunately, we don't have a stacktrace for SIGPIPE currently:
> {noformat}
> [ RUN      ] IO.Write
> make[5]: *** [check-local] Broken pipe: 13
> {noformat}
> Running in gdb, seems to always occur here:
> {code}
> Program received signal SIGPIPE, Broken pipe.
> [Switching to process 56827 thread 0x60b]
> 0x00007fff9a011132 in __psynch_cvwait ()
> (gdb) where
> #0  0x00007fff9a011132 in __psynch_cvwait ()
> #1  0x00007fff903e7ea0 in _pthread_cond_wait ()
> #2  0x000000010062f27c in Gate::arrive (this=0x101908a10, old=14780) at gate.hpp:82
> #3  0x0000000100600888 in process::schedule (arg=0x0) at src/process.cpp:1373
> #4  0x00007fff903e72fc in _pthread_body ()
> #5  0x00007fff903e7279 in _pthread_start ()
> #6  0x00007fff903e54b1 in thread_start ()
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)