You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@mesos.apache.org by "Marco Massenzio (JIRA)" <ji...@apache.org> on 2015/11/25 01:29:10 UTC

[jira] [Comment Edited] (MESOS-3552) CHECK failure due to floating point precision on reservation request

    [ https://issues.apache.org/jira/browse/MESOS-3552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15025816#comment-15025816 ] 

Marco Massenzio edited comment on MESOS-3552 at 11/25/15 12:28 AM:
-------------------------------------------------------------------

As this is something that has been present in Mesos since forever, and that the {{0.26}} release is in process, I have removed this as a {{0.26}} blocker.
While it's not great that this slipped this release, I also believe that not rushing it through gives us an opportunity to address the problem "properly" (eg, using fixed point for resources) in time for {{0.27}}.

As a halfway, we could just remove the {{CHECK( )}} calls that crash Mesos and replace them with {{LOG(ERROR)}} and return an error to the caller - this won't solve the issue where the cause is actually a rounding error, but at least we don't risk introducing regressions/bugs this close to the release.

Finally, it is my opinion that we should consolidate this ticket and MESOS-1187 into one (probably by closing this one as a duplicate of that one, older) so that we don't have a "split brain" conversation, but wouldn't want to do that and risk losing valuable information in this one - does anyone have suggestions as how to do this cleanly?
(or do people feel that simply linking them and closing this one as duplicate would be sufficient?)

Just to be perfectly clear, I fully agree this is an important issue to address, I'm just suggesting here that it should not block the release.
If people feel strongly about this, please let's have a conversation either via hangout or email.

Thanks, everyone for looking into this!


was (Author: marco-mesos):
As this is something that has been present in Mesos since forever, and that the {{0.26}} release is in process, I have removed this as a {{0.26}} blocker.
While it's not great that this slipped this release, I also believe that not rushing it through gives us an opportunity to address the problem "properly" (eg, using fixed point for resources) in time for {{0.27}}.

As a halfway, we could just remove the {{CHECK( ) }} calls that crash Mesos and replace them with {{LOG(ERROR)}} and return an error to the caller - this won't solve the issue where the cause is actually a rounding error, but at least we don't risk introducing regressions/bugs this close to the release.

Finally, it is my opinion that we should consolidate this ticket and MESOS-1187 into one (probably by closing this one as a duplicate of that one, older) so that we don't have a "split brain" conversation, but wouldn't want to do that and risk losing valuable information in this one - does anyone have suggestions as how to do this cleanly?
(or do people feel that simply linking them and closing this one as duplicate would be sufficient?)

Just to be perfectly clear, I fully agree this is an important issue to address, I'm just suggesting here that it should not block the release.
If people feel strongly about this, please let's have a conversation either via hangout or email.

Thanks, everyone for looking into this!

> CHECK failure due to floating point precision on reservation request
> --------------------------------------------------------------------
>
>                 Key: MESOS-3552
>                 URL: https://issues.apache.org/jira/browse/MESOS-3552
>             Project: Mesos
>          Issue Type: Improvement
>          Components: master
>            Reporter: Mandeep Chadha
>            Assignee: Mandeep Chadha
>              Labels: mesosphere, tech-debt
>
> result.cpus() == cpus() check is failing due to ( double == double ) comparison problem. 
> Root Cause : 
> Framework requested 0.1 cpu reservation for the first task. So far so good. Next Reserve operation — lead to double operations resulting in following double values :
>  results.cpus() : 23.9999999999999964472863211995 cpus() : 24
> And the check ( result.cpus() == cpus() ) failed. 
>  The double arithmetic operations caused results.cpus() value to be :  23.9999999999999964472863211995 and hence ( 23.9999999999999964472863211995 == 24 ) failed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)