You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@mesos.apache.org by "Guangya Liu (JIRA)" <ji...@apache.org> on 2016/03/02 02:56:18 UTC
[jira] [Assigned] (MESOS-4831) Master sometimes sends two inverse
offers after the agent goes into maintenance.
[ https://issues.apache.org/jira/browse/MESOS-4831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Guangya Liu reassigned MESOS-4831:
----------------------------------
Assignee: Guangya Liu
> Master sometimes sends two inverse offers after the agent goes into maintenance.
> --------------------------------------------------------------------------------
>
> Key: MESOS-4831
> URL: https://issues.apache.org/jira/browse/MESOS-4831
> Project: Mesos
> Issue Type: Bug
> Affects Versions: 0.27.0
> Reporter: Anand Mazumdar
> Assignee: Guangya Liu
> Labels: maintenance, mesosphere
>
> Showed up on ASF CI for {{MasterMaintenanceTest.PendingUnavailabilityTest}}
> https://builds.apache.org/job/Mesos/1748/COMPILER=gcc,CONFIGURATION=--verbose,ENVIRONMENT=GLOG_v=1%20MESOS_VERBOSE=1,OS=ubuntu:14.04,label_exp=(docker%7C%7CHadoop)&&(!ubuntu-us1)/consoleFull
> {code}
> I0229 11:08:57.027559 668 hierarchical.cpp:1437] No resources available to allocate!
> I0229 11:08:57.027745 668 hierarchical.cpp:1150] Performed allocation for slave fd39ca89-d7fd-4df8-ad50-dbb493d1cd7b-S0 in 272747ns
> I0229 11:08:57.027757 675 master.cpp:5369] Sending 1 offers to framework fd39ca89-d7fd-4df8-ad50-dbb493d1cd7b-0000 (default)
> I0229 11:08:57.028586 675 master.cpp:5459] Sending 1 inverse offers to framework fd39ca89-d7fd-4df8-ad50-dbb493d1cd7b-0000 (default)
> I0229 11:08:57.029039 675 master.cpp:5459] Sending 1 inverse offers to framework fd39ca89-d7fd-4df8-ad50-dbb493d1cd7b-0000 (default)
> {code}
> The ideal expected workflow for this test is something like:
> - The framework receives offers from master.
> - The framework updates its maintenance schedule.
> - The current offer is rescinded.
> - A new offer is received from the master with unavailability set.
> - After the agent goes for maintenance, an inverse offer is sent.
> For some reason, in the logs we see that the master is sending 2 inverse offers. The test seems to pass as we just check for the initial inverse offer being present. This can also be reproduced by a modified version of the original test.
> {code}
> // Test ensures that an offer will have an `unavailability` set if the
> // slave is scheduled to go down for maintenance.
> TEST_F(MasterMaintenanceTest, PendingUnavailabilityTest)
> {
> Try<PID<Master>> master = StartMaster();
> ASSERT_SOME(master);
> MockExecutor exec(DEFAULT_EXECUTOR_ID);
> Try<PID<Slave>> slave = StartSlave(&exec);
> ASSERT_SOME(slave);
> auto scheduler = std::make_shared<MockV1HTTPScheduler>();
> EXPECT_CALL(*scheduler, heartbeat(_))
> .WillRepeatedly(Return()); // Ignore heartbeats.
> Future<Nothing> connected;
> EXPECT_CALL(*scheduler, connected(_))
> .WillOnce(FutureSatisfy(&connected))
> .WillRepeatedly(Return()); // Ignore future invocations.
> scheduler::TestV1Mesos mesos(master.get(), ContentType::PROTOBUF, scheduler);
> AWAIT_READY(connected);
> Future<Event::Subscribed> subscribed;
> EXPECT_CALL(*scheduler, subscribed(_, _))
> .WillOnce(FutureArg<1>(&subscribed));
> Future<Event::Offers> normalOffers;
> Future<Event::Offers> unavailabilityOffers;
> Future<Event::Offers> inverseOffers;
> EXPECT_CALL(*scheduler, offers(_, _))
> .WillOnce(FutureArg<1>(&normalOffers))
> .WillOnce(FutureArg<1>(&unavailabilityOffers))
> .WillOnce(FutureArg<1>(&inverseOffers));
> // The original offers should be rescinded when the unavailability is changed.
> Future<Nothing> offerRescinded;
> EXPECT_CALL(*scheduler, rescind(_, _))
> .WillOnce(FutureSatisfy(&offerRescinded));
> {
> Call call;
> call.set_type(Call::SUBSCRIBE);
> Call::Subscribe* subscribe = call.mutable_subscribe();
> subscribe->mutable_framework_info()->CopyFrom(DEFAULT_V1_FRAMEWORK_INFO);
> mesos.send(call);
> }
> AWAIT_READY(subscribed);
> v1::FrameworkID frameworkId(subscribed->framework_id());
> AWAIT_READY(normalOffers);
> EXPECT_NE(0, normalOffers->offers().size());
> // Regular offers shouldn't have unavailability.
> foreach (const v1::Offer& offer, normalOffers->offers()) {
> EXPECT_FALSE(offer.has_unavailability());
> }
> // Schedule this slave for maintenance.
> MachineID machine;
> machine.set_hostname(maintenanceHostname);
> machine.set_ip(stringify(slave.get().address.ip));
> const Time start = Clock::now() + Seconds(60);
> const Duration duration = Seconds(120);
> const Unavailability unavailability = createUnavailability(start, duration);
> // Post a valid schedule with one machine.
> maintenance::Schedule schedule = createSchedule(
> {createWindow({machine}, unavailability)});
> // We have a few seconds between the first set of offers and the
> // next allocation of offers. This should be enough time to perform
> // a maintenance schedule update. This update will also trigger the
> // rescinding of offers from the scheduled slave.
> Future<Response> response = process::http::post(
> master.get(),
> "maintenance/schedule",
> headers,
> stringify(JSON::protobuf(schedule)));
> AWAIT_EXPECT_RESPONSE_STATUS_EQ(OK().status, response);
> // The original offers should be rescinded when the unavailability
> // is changed.
> AWAIT_READY(offerRescinded);
> AWAIT_READY(unavailabilityOffers);
> EXPECT_NE(0, unavailabilityOffers->offers().size());
> // Make sure the new offers have the unavailability set.
> foreach (const v1::Offer& offer, unavailabilityOffers->offers()) {
> EXPECT_TRUE(offer.has_unavailability());
> EXPECT_EQ(
> unavailability.start().nanoseconds(),
> offer.unavailability().start().nanoseconds());
> EXPECT_EQ(
> unavailability.duration().nanoseconds(),
> offer.unavailability().duration().nanoseconds());
> }
> // We also expect an inverse offer for the slave to go under
> // maintenance.
> AWAIT_READY(inverseOffers);
> EXPECT_NE(0, inverseOffers->inverse_offers().size());
> EXPECT_CALL(exec, shutdown(_))
> .Times(AtMost(1));
> EXPECT_CALL(*scheduler, disconnected(_))
> .Times(AtMost(1));
> Shutdown(); // Must shutdown before 'containerizer' gets deallocated.
> }
> {code}
> Also, unrelated, we need to clean up this test to not expect multiple offers i.e. remove {{numberOfOffers}} constant.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)