You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@mesos.apache.org by "Joris Van Remoortere (JIRA)" <ji...@apache.org> on 2015/12/04 18:31:11 UTC

[jira] [Commented] (MESOS-4059) Investigate remaining flakiness in MasterMaintenanceTest.InverseOffersFilters

    [ https://issues.apache.org/jira/browse/MESOS-4059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15041812#comment-15041812 ] 

Joris Van Remoortere commented on MESOS-4059:
---------------------------------------------

{code}
commit fe4be25fa6011787751547b06f70676fd79bb87b
Author: Neil Conway <ne...@gmail.com>
Date:   Fri Dec 4 11:54:18 2015 -0500

    Fixed flakiness in MasterMaintenanceTest.InverseOffersFilters.
    
    There were two problems:
    
    (1) After launching two tasks, we assumed that we would see TASK_RUNNING
        updates for the tasks in the same order they were launched. This is
        not guaranteed, so adjust the test to handle TASK_RUNNING updates in
        the order they are received.
    
    (2) The test used this pattern:
    
            Mesos m;
            Call c;
    
            m.send(c);
            Clock::settle();
            // Trigger a new batch allocation that reflects the call
            Clock::advance();
    
        However, this is actually unsafe (see MESOS-3760): the send() call
        might not have reached the master by the time `Clock::settle()`
        happens. This was fixed by blocking using `FUTURE_DISPATCH` on the
        downstream logic in the allocator that is invoked to handle the
        delivered event.
    
    Review: https://reviews.apache.org/r/40935
{code}

> Investigate remaining flakiness in MasterMaintenanceTest.InverseOffersFilters
> -----------------------------------------------------------------------------
>
>                 Key: MESOS-4059
>                 URL: https://issues.apache.org/jira/browse/MESOS-4059
>             Project: Mesos
>          Issue Type: Bug
>            Reporter: Neil Conway
>            Assignee: Neil Conway
>            Priority: Minor
>              Labels: flaky-test, mesosphere
>
> Per comments in MESOS-3916, the fix for that issue decreased the degree of flakiness, but it seems that some intermittent test failures do occur -- should be investigated.
> *Flakiness in task acknowledgment*
> {code}
> I1203 18:25:04.609817 28732 status_update_manager.cpp:392] Received status update acknowledgement (UUID: 6afd012e-8e88-41b2-8239-a9b852d07ca1) for task 26305fdd-edb0-4764-8b8a-2558f2b2d81b of framework c7900911-cc7a-4dde-92e7-48fe82cddd9e-0000
> W1203 18:25:04.610076 28732 status_update_manager.cpp:762] Unexpected status update acknowledgement (received 6afd012e-8e88-41b2-8239-a9b852d07ca1, expecting 82fc7a7b-e64a-4f4d-ab74-76abac42b4e6) for update TASK_RUNNING (UUID: 82fc7a7b-e64a-4f4d-ab74-76abac42b4e6) for task 26305fdd-edb0-4764-8b8a-2558f2b2d81b of framework c7900911-cc7a-4dde-92e7-48fe82cddd9e-0000
> E1203 18:25:04.610339 28736 slave.cpp:2339] Failed to handle status update acknowledgement (UUID: 6afd012e-8e88-41b2-8239-a9b852d07ca1) for task 26305fdd-edb0-4764-8b8a-2558f2b2d81b of framework c7900911-cc7a-4dde-92e7-48fe82cddd9e-0000: Duplicate acknowledgemen
> {code}
> This is a race between [launching and acknowledging two tasks|https://github.com/apache/mesos/blob/75aaaacb89fa961b249c9ab7fa0f45dfa9d415a5/src/tests/master_maintenance_tests.cpp#L1486-L1517].  The status updates for each task are not necessarily received in the same order as launching the tasks.
> *Flakiness in first inverse offer filter*
> See [this comment in MESOS-3916|https://issues.apache.org/jira/browse/MESOS-3916?focusedCommentId=15027478&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15027478] for the explanation.  The related logs are above the comment.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)