You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mesos.apache.org by Benjamin Mahler <be...@gmail.com> on 2015/12/02 23:09:05 UTC

Re: [jira] [Comment Edited] (MESOS-2079) IO.Write test is flaky on OS X 10.10.

Yeah I had chatted with Alexander in person to clarify what the actual
semantics of SIGPIPE are. We should be good to go here, sorry for the delay
I will get back to these patches.

On Fri, Nov 20, 2015 at 8:52 AM, James Peach <jo...@gmail.com> wrote:

>
> > On Nov 11, 2015, at 12:44 AM, Alexander Rojas <al...@mesosphere.io>
> wrote:
> >
> > What I meant is that we may not care about SIGPIPE (which tell us a pipe
> was broken) because we will be notified when we try to write into it anyway
> (on the writing side) and we will get an EOF on the reading side.
> >
> > The only thing I could see us caring about SIGPIPE is if we want to know
> as soon as the pipe breaks that the event happened.
>
> So it sounds like there is no objection to this change? Can we land these
> changes now?
>
> >> On 06 Nov 2015, at 19:10, Benjamin Mahler <be...@gmail.com>
> wrote:
> >>
> >> To answer your questions:
> >>
> >> We use pipes when we need to communicate across the process boundary
> after
> >> a fork. Look for Subprocess::IO::Pipe for examples. There is plenty of
> code
> >> using pipes.
> >>
> >> Sockets aren't an issue as one can avoid SIGPIPE across OS X
> (SO_NOSIGPIPE)
> >> and Linux (MSG_NOSIGNAL).
> >>
> >> I'm a bit confused by your comment about the timing of SIGPIPE, which
> seems
> >> to suggest that the raising of SIGPIPE is not tied to the bad write
> call.
> >> Why do you think this?
> >>
> >> On Fri, Nov 6, 2015 at 4:37 AM, Alexander Rojas <
> alexander@mesosphere.io>
> >> wrote:
> >>
> >>> I have multiple questions here
> >>>
> >>> 1. Why do we use pipes at all? or is SIGPIPE raised also when writing
> into
> >>> sockets? which leads me to:
> >>> 2. Do we use it only in test cases or is there something actively using
> >>> pipes?
> >>>
> >>> SIGPIPE itself is a weird signal, since a failed call to `write`
> returns
> >>> -1 and sets `errno` to `EPIPE` so there are two ways to deal with
> errors
> >>> when the reading process is not longer reading, one is handling the
> return
> >>> value+errno (which usually means ignoring the SIGPIPE) and the second
> is
> >>> ignoring the return value and handling SIGPIPE. The difference is that
> >>> SIGPIPE is raised as soon as the OS realizes the pipe is broken while
> the
> >>> error on the write happens when you actually try to write on the pipe.
> >>>
> >>> All in all, I prefer to ignore the signal and deal with the return
> value
> >>> of `write`.
> >>>
> >>>> On 06 Nov 2015, at 03:27, Benjamin Mahler <be...@gmail.com>
> >>> wrote:
> >>>>
> >>>> Just want to surface this up to the dev@ thread to raise some
> awareness.
> >>>> Recently with the SIGPIPE bug from libev [1], we've revisited whether
> it
> >>>> makes sense to continue down the path of leaving SIGPIPE unblocked and
> >>>> trying to handle it case by case.
> >>>>
> >>>> We originally wanted users of libprocess to decide on their own
> whether
> >>>> they want to ignore SIGPIPE. However, we'd like to reconsider:
> >>>>
> >>>> (a) The amount of code that is needed to work around SIGPIPE is
> >>>> substantial, especially because on OS X SIGPIPE appears to not be
> >>> delivered
> >>>> synchronously [2]. Also, it is not possible to create pipes that don't
> >>>> surface SIGPIPE (unlike sockets), so in order to safely write to a
> pipe
> >>> we
> >>>> need to wrap write() calls with signal suppression blocks (which we
> don't
> >>>> do in general!). You can get a sense of the code from [3] and [4].
> >>>>
> >>>> (b) SIGPIPE seems to be more of a legacy mechanism to shut down a set
> of
> >>>> piped programs and the general recommendation seems to be to not
> bother
> >>>> with it and ignore it. Programs can handle EPIPE as they would with
> other
> >>>> signals.
> >>>>
> >>>> Would love to hear if there are any concerns. I will be glad to
> shepherd
> >>>> James' changes here.
> >>>>
> >>>> [1] https://issues.apache.org/jira/browse/MESOS-2768
> >>>> [2] https://issues.apache.org/jira/browse/MESOS-2079
> >>>> [3] https://reviews.apache.org/r/39940/diff/1#index_header
> >>>> [4]
> >>>>
> >>>
> https://github.com/apache/mesos/blob/0.25.0/3rdparty/libprocess/3rdparty/stout/include/stout/os/posix/signals.hpp#L101
> >>>>
> >>>> On Wed, Nov 4, 2015 at 9:20 AM, James Peach (JIRA) <ji...@apache.org>
> >>> wrote:
> >>>>
> >>>>>
> >>>>>  [
> >>>>>
> >>>
> https://issues.apache.org/jira/browse/MESOS-2079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14989947#comment-14989947
> >>>>> ]
> >>>>>
> >>>>> James Peach edited comment on MESOS-2079 at 11/4/15 5:19 PM:
> >>>>> -------------------------------------------------------------
> >>>>>
> >>>>> These patches global ignore {{SIGPIPE}} during libprocess
> >>> initialization,
> >>>>> document {{SIGPIPE}} behavior a bit more, and remove various signal
> >>>>> manipulations that were formerly necessary for disabling {{SIGPIPE}}
> >>>>> delivery.
> >>>>>
> >>>>> https://reviews.apache.org/r/39938/
> >>>>> https://reviews.apache.org/r/39940/
> >>>>> https://reviews.apache.org/r/39941/
> >>>>>
> >>>>>
> >>>>>
> >>>>> was (Author: jamespeach):
> >>>>> https://reviews.apache.org/r/39938/
> >>>>> https://reviews.apache.org/r/39940/
> >>>>> https://reviews.apache.org/r/39941/
> >>>>>
> >>>>>
> >>>>>> IO.Write test is flaky on OS X 10.10.
> >>>>>> -------------------------------------
> >>>>>>
> >>>>>>              Key: MESOS-2079
> >>>>>>              URL: https://issues.apache.org/jira/browse/MESOS-2079
> >>>>>>          Project: Mesos
> >>>>>>       Issue Type: Task
> >>>>>>       Components: libprocess, technical debt, test
> >>>>>>      Environment: OS X 10.10
> >>>>>> {noformat}
> >>>>>> $ clang++ --version
> >>>>>> Apple LLVM version 6.0 (clang-600.0.54) (based on LLVM 3.5svn)
> >>>>>> Target: x86_64-apple-darwin14.0.0
> >>>>>> Thread model: posix
> >>>>>> {noformat}
> >>>>>>         Reporter: Benjamin Mahler
> >>>>>>         Assignee: James Peach
> >>>>>>           Labels: flaky
> >>>>>>
> >>>>>> [~benjaminhindman]: If I recall correctly, this is related to
> >>>>> MESOS-1658. Unfortunately, we don't have a stacktrace for SIGPIPE
> >>> currently:
> >>>>>> {noformat}
> >>>>>> [ RUN      ] IO.Write
> >>>>>> make[5]: *** [check-local] Broken pipe: 13
> >>>>>> {noformat}
> >>>>>> Running in gdb, seems to always occur here:
> >>>>>> {code}
> >>>>>> Program received signal SIGPIPE, Broken pipe.
> >>>>>> [Switching to process 56827 thread 0x60b]
> >>>>>> 0x00007fff9a011132 in __psynch_cvwait ()
> >>>>>> (gdb) where
> >>>>>> #0  0x00007fff9a011132 in __psynch_cvwait ()
> >>>>>> #1  0x00007fff903e7ea0 in _pthread_cond_wait ()
> >>>>>> #2  0x000000010062f27c in Gate::arrive (this=0x101908a10,
> old=14780) at
> >>>>> gate.hpp:82
> >>>>>> #3  0x0000000100600888 in process::schedule (arg=0x0) at
> >>>>> src/process.cpp:1373
> >>>>>> #4  0x00007fff903e72fc in _pthread_body ()
> >>>>>> #5  0x00007fff903e7279 in _pthread_start ()
> >>>>>> #6  0x00007fff903e54b1 in thread_start ()
> >>>>>> {code}
> >>>>>
> >>>>>
> >>>>>
> >>>>> --
> >>>>> This message was sent by Atlassian JIRA
> >>>>> (v6.3.4#6332)
> >>>>>
> >>>
> >>>
> >
>
>