You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@mesos.apache.org by "Neil Conway (JIRA)" <ji...@apache.org> on 2015/11/23 22:02:11 UTC

[jira] [Comment Edited] (MESOS-3991) CHECK shouldn't be an assert in a production environment.

    [ https://issues.apache.org/jira/browse/MESOS-3991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15023060#comment-15023060 ] 

Neil Conway edited comment on MESOS-3991 at 11/23/15 9:02 PM:
--------------------------------------------------------------

Definitely something to discuss, although I also think it is not a blocker.

The classical argument for _not_ doing this is that, if a CHECK fails, you can't necessarily continue execution safely. By throwing an assertion and bailing out, you avoid possibly corrupting distributed state or causing worse downstream problems. Since Mesos should always be run using a process supervisor in production, the real problem with the current behavior (IMO) is mostly when the CHECK failure is (a) relatively innocuous (b) occurs repeatedly. That is the case for the floating point precision problem, but not for many other CHECKs in the source code.


was (Author: neilc):
Definitely something to discuss, although I also think it is definitely not a blocker.

The classical argument for _not_ doing this is that, if a CHECK fails, you can't necessarily continue execution safely. By throwing an assertion and bailing out, you avoid possibly corrupting distributed state or causing worse downstream problems. Since Mesos should always be run using a process supervisor in production, the real problem with the current behavior (IMO) is mostly when the CHECK failure is (a) relatively innocuous (b) occurs repeatedly. That is the case for the floating point precision problem, but not for many other CHECKs in the source code.

> CHECK shouldn't be an assert in a production environment.
> ---------------------------------------------------------
>
>                 Key: MESOS-3991
>                 URL: https://issues.apache.org/jira/browse/MESOS-3991
>             Project: Mesos
>          Issue Type: Bug
>            Reporter: Gabriel Hartmann
>
> For example:
> In this issue some very error-prone double math causes Mesos master to crash when presented with a resource RESERVE Operation of the right form.  On-demand DOS!
> https://issues.apache.org/jira/browse/MESOS-3552



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)