You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@mesos.apache.org by "Michael Park (JIRA)" <ji...@apache.org> on 2015/12/04 10:06:11 UTC

[jira] [Commented] (MESOS-4067) ReservationTest.ACLMultipleOperations is flaky

    [ https://issues.apache.org/jira/browse/MESOS-4067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15041306#comment-15041306 ] 

Michael Park commented on MESOS-4067:
-------------------------------------

I was able to figure out one issue (not sure if there are more issues, or if the subsequent failures are all stemmed from this one):

{code}
  // Attempt to unreserve an invalid set of resources (not dynamically
  // reserved), reserve the second set, and launch a task.
  driver.acceptOffers({offer.id()},
      {UNRESERVE(unreserved1),
       RESERVE(dynamicallyReserved2),
       LAUNCH({taskInfo})},
      filters);

  // Wait for TASK_FINISHED update ack.
  AWAIT_READY(statusUpdateAcknowledgement);
  EXPECT_EQ(TASK_FINISHED, statusUpdateAcknowledgement.get().state());

  // In the next offer, expect to find both sets of reserved
  // resources, since the Unreserve operation should fail.
  AWAIT_READY(offers);

  ASSERT_EQ(1u, offers.get().size());
  offer = offers.get()[0];

  EXPECT_TRUE(
      Resources(offer.resources()).contains(
          dynamicallyReserved1 +
          dynamicallyReserved2 +
          unreserved2));
{code}

The intention here seems to be: Perform an {{acceptOffers}} with a sequence of operations including a launch task, wait until the launch task has finished and therefore the resources recovered. Then expect all of the available resources to be offered in a single offer.

The issue is that at 50ms as our {{allocation_interval}}, we can make an offer with the available resources while the task is being launched, running, etc. This premature offer is picked up by our {{EXPECT_CALL}} for {{resourceOffers}} and we don't meet our expectation of receiving an offer with {{dynamicallyReserved1 + dynamicallyReserved2 + unreserved2}}.

A few possible approaches in my preferred order:
# We may not need all of these moving parts, and possibly just use one set of resources instead of three. Refer to {{ReservationTest.ReserveAndLaunchThenUnreserve}} for an example.
# Turn allocation off {{allocation_interval=1000s}} and use {{reviveOffers}} to manually control the offers. Refer to {{ReservationEndpointsTest.ReserveAvailableAndOfferedResources}} for an example.
# Instead of a simple {{FutureArg<1>(offers)}} as the action for {{EXPECT_CALL}} of {{resourceOffers}}, perhaps we can aggregate them instead. This one feels like it could get pretty tricky.

[~greggomann], [~jieyu] What are your thoughts?

> ReservationTest.ACLMultipleOperations is flaky
> ----------------------------------------------
>
>                 Key: MESOS-4067
>                 URL: https://issues.apache.org/jira/browse/MESOS-4067
>             Project: Mesos
>          Issue Type: Bug
>            Reporter: Michael Park
>              Labels: flaky, mesosphere
>
> Observed from the CI: https://builds.apache.org/job/Mesos/COMPILER=gcc,CONFIGURATION=--verbose%20--enable-libevent%20--enable-ssl,OS=ubuntu%3A14.04,label_exp=docker%7C%7CHadoop/1319/changes



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)