You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mesos.apache.org by Charles-François Natali <cf...@gmail.com> on 2019/10/19 11:34:50 UTC

[MESOS-10007] random "Failed to get exit status for Command" for short-lived commands

Hi,

I'm wondering if there's anything I could do to help
https://issues.apache.org/jira/browse/MESOS-10007 move forward?

Basically it's a race condition in libprocess/command executor causing
spurious errors to be reported for short-lived tasks.
I've got a detailed code path of the race and a repro, however I'm not
sure what's the best way to fix it - any suggestion?

Cheers,

Charles

Re: [MESOS-10007] random "Failed to get exit status for Command" for short-lived commands

Posted by Benjamin Mahler <bm...@apache.org>.
Hi Charles, thanks for the thorough ticket and for surfacing it here for
attention, it didn't get spotted amongst the JIRA noise.

I replied on the ticket with a patch that should fix the issue, we can
discuss further in the ticket.

Ben

On Sat, Oct 19, 2019 at 7:35 AM Charles-François Natali <cf...@gmail.com>
wrote:

> Hi,
>
> I'm wondering if there's anything I could do to help
> https://issues.apache.org/jira/browse/MESOS-10007 move forward?
>
> Basically it's a race condition in libprocess/command executor causing
> spurious errors to be reported for short-lived tasks.
> I've got a detailed code path of the race and a repro, however I'm not
> sure what's the best way to fix it - any suggestion?
>
> Cheers,
>
> Charles
>