You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@mesos.apache.org by ch...@apache.org on 2018/12/05 22:06:06 UTC

[mesos] branch master updated (31ef8be -> 5f4c7d6)

This is an automated email from the ASF dual-hosted git repository.

chhsiao pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/mesos.git.


    from 31ef8be  Used POSIX.1-2001/pax tar format for distribution tarballs.
     new 79cbaed  Fixed handling for offer operation updates.
     new 9337334  Made agent state consistent with forwarded updates.
     new d5577c5  Added agent and resource provider IDs to operation status messages.
     new be52c98  Set agent and/or resource provider ID in operation status updates.
     new 5f4c7d6  Added MESOS-9293 to the 1.7.1 CHANGELOG.

The 5 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 CHANGELOG                                  |   1 +
 include/mesos/mesos.proto                  |  12 +++
 include/mesos/v1/mesos.proto               |  12 +++
 src/common/protobuf_utils.cpp              |   4 +-
 src/common/protobuf_utils.hpp              |   4 +-
 src/common/type_utils.cpp                  |  17 ++++
 src/internal/devolve.cpp                   |   8 +-
 src/internal/evolve.cpp                    |   8 +-
 src/master/master.cpp                      |  74 +++++++++++++----
 src/resource_provider/manager.cpp          |   1 +
 src/resource_provider/storage/provider.cpp |  48 ++++++-----
 src/slave/slave.cpp                        |  63 +++++++++-----
 src/tests/master_tests.cpp                 | 129 +++++++++++++++++++++++++++++
 src/tests/mesos.hpp                        |   7 +-
 14 files changed, 325 insertions(+), 63 deletions(-)


[mesos] 05/05: Added MESOS-9293 to the 1.7.1 CHANGELOG.

Posted by ch...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

chhsiao pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/mesos.git

commit 5f4c7d696c7fb812f4926b1ae217957ba4b354eb
Author: Chun-Hung Hsiao <ch...@mesosphere.io>
AuthorDate: Wed Dec 5 13:10:13 2018 -0800

    Added MESOS-9293 to the 1.7.1 CHANGELOG.
---
 CHANGELOG | 1 +
 1 file changed, 1 insertion(+)

diff --git a/CHANGELOG b/CHANGELOG
index a38d365..85bc05a 100644
--- a/CHANGELOG
+++ b/CHANGELOG
@@ -22,6 +22,7 @@ Release Notes - Mesos - Version 1.7.1 (WIP)
   * [MESOS-9279] - Docker Containerizer 'usage' call might be expensive if mount table is big.
   * [MESOS-9281] - SLRP gets a stale checkpoint after system crash.
   * [MESOS-9283] - Docker containerizer actor can get backlogged with large number of containers.
+  * [MESOS-9293] - If a framework looses operation information it cannot reconcile to acknowledge updates.
   * [MESOS-9295] - Nested container launch could fail if the agent upgrade with new cgroup subsystems.
   * [MESOS-9308] - URI disk profile adaptor could deadlock.
   * [MESOS-9317] - Some master endpoints do not handle failed authorization properly.


[mesos] 02/05: Made agent state consistent with forwarded updates.

Posted by ch...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

chhsiao pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/mesos.git

commit 93373344dab0ba0966871ae1470b8ba95a48e320
Author: Benjamin Bannier <be...@mesosphere.io>
AuthorDate: Wed Dec 5 13:02:59 2018 -0800

    Made agent state consistent with forwarded updates.
    
    When the agent handles an `UpdateOperationStatusMessage` from a resource
    provider, it injects its own ID which is (at least conceptually) unknown
    to the resource provider before forwarding the message to the master,
    and also updates its own tracking for the operation.
    
    This patch makes sure that we first mutate the message before handing it
    on for updating the internal operation tracking, while previously we
    used the unmodified message. Always using the same message reduces error
    potential if in future changes we e.g., introduce agent operation status
    update managers.
    
    Review: https://reviews.apache.org/r/69458/
---
 src/slave/slave.cpp | 16 +++++++---------
 1 file changed, 7 insertions(+), 9 deletions(-)

diff --git a/src/slave/slave.cpp b/src/slave/slave.cpp
index 9a6a6ba..324bec7 100644
--- a/src/slave/slave.cpp
+++ b/src/slave/slave.cpp
@@ -7778,9 +7778,14 @@ void Slave::handleResourceProviderMessage(
     case ResourceProviderMessage::Type::UPDATE_OPERATION_STATUS: {
       CHECK_SOME(message->updateOperationStatus);
 
-      const UpdateOperationStatusMessage& update =
+      // The status update from the resource provider didn't provide
+      // the agent ID (because the resource provider doesn't know it),
+      // hence we inject it here.
+      UpdateOperationStatusMessage update =
         message->updateOperationStatus->update;
 
+      update.mutable_slave_id()->CopyFrom(info.id());
+
       const UUID& operationUUID = update.operation_uuid();
 
       Operation* operation = getOperation(operationUUID);
@@ -7841,14 +7846,7 @@ void Slave::handleResourceProviderMessage(
                  ? " for framework " + stringify(update.framework_id())
                  : " for an operator API call");
 
-          // The status update from the resource provider didn't
-          // provide the agent ID (because the resource provider
-          // doesn't know it), hence we inject it here.
-          UpdateOperationStatusMessage _update;
-          _update.CopyFrom(update);
-          _update.mutable_slave_id()->CopyFrom(info.id());
-
-          send(master.get(), _update);
+          send(master.get(), update);
           break;
         }
       }


[mesos] 03/05: Added agent and resource provider IDs to operation status messages.

Posted by ch...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

chhsiao pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/mesos.git

commit d5577c52587f9bad0bc98eba4b5cf0206022c42a
Author: Benjamin Bannier <be...@mesosphere.io>
AuthorDate: Wed Dec 5 13:03:06 2018 -0800

    Added agent and resource provider IDs to operation status messages.
    
    This patch adds agent and resource provider IDs to
    `UpdateOperationStatus` and `UpdateOperationStatusMessage`. With that
    frameworks are able to reconcile enough information after failover to
    construct operation acknowledgements.
    
    We will add code to populate these fields in a follow-up patch.
    
    Review: https://reviews.apache.org/r/69162/
---
 include/mesos/mesos.proto    | 12 ++++++++++++
 include/mesos/v1/mesos.proto | 12 ++++++++++++
 src/common/type_utils.cpp    | 17 +++++++++++++++++
 src/internal/devolve.cpp     |  8 +++++++-
 src/internal/evolve.cpp      |  8 +++++++-
 5 files changed, 55 insertions(+), 2 deletions(-)

diff --git a/include/mesos/mesos.proto b/include/mesos/mesos.proto
index c822cc7..ef4d785 100644
--- a/include/mesos/mesos.proto
+++ b/include/mesos/mesos.proto
@@ -2431,6 +2431,18 @@ message OperationStatus {
   // include a `uuid`. The status is considered delivered once
   // it is acknowledged by the scheduler.
   optional UUID uuid = 5;
+
+  // If the operation affects resources from a local resource provider,
+  // both `slave_id` and `resource_provider_id` will be set.
+  //
+  // If the operation affects resources that belong to an external
+  // resource provider, only `resource_provider_id` will be set.
+  //
+  // In certain cases, e.g., invalid operations, neither `uuid`,
+  // `slave_id` nor `resource_provider_id` will be set, and the
+  // scheduler does not need to acknowledge this status update.
+  optional SlaveID slave_id = 6;
+  optional ResourceProviderID resource_provider_id = 7;
 }
 
 
diff --git a/include/mesos/v1/mesos.proto b/include/mesos/v1/mesos.proto
index 51c1bfd..4e8f7a1 100644
--- a/include/mesos/v1/mesos.proto
+++ b/include/mesos/v1/mesos.proto
@@ -2424,6 +2424,18 @@ message OperationStatus {
   // include a `uuid`. The status is considered delivered once
   // it is acknowledged by the scheduler.
   optional UUID uuid = 5;
+
+  // If the operation affects resources from a local resource provider,
+  // both `agent_id` and `resource_provider_id` will be set.
+  //
+  // If the operation affects resources that belong to an external
+  // resource provider, only `resource_provider_id` will be set.
+  //
+  // In certain cases, e.g., invalid operations, neither `uuid`,
+  // `slave_id` nor `resource_provider_id` will be set, and the
+  // scheduler does not need to acknowledge this status update.
+  optional AgentID agent_id = 6;
+  optional ResourceProviderID resource_provider_id = 7;
 }
 
 
diff --git a/src/common/type_utils.cpp b/src/common/type_utils.cpp
index ef13eae..df888ef 100644
--- a/src/common/type_utils.cpp
+++ b/src/common/type_utils.cpp
@@ -492,6 +492,23 @@ bool operator==(const OperationStatus& left, const OperationStatus& right)
     return false;
   }
 
+  if (left.has_slave_id() != right.has_slave_id()) {
+    return false;
+  }
+
+  if (left.has_slave_id() && left.slave_id() != right.slave_id()) {
+    return false;
+  }
+
+  if (left.has_resource_provider_id() != right.has_resource_provider_id()) {
+    return false;
+  }
+
+  if (left.has_resource_provider_id() &&
+      left.resource_provider_id() != right.resource_provider_id()) {
+    return false;
+  }
+
   return true;
 }
 
diff --git a/src/internal/devolve.cpp b/src/internal/devolve.cpp
index 491ed2a..e23ed3c 100644
--- a/src/internal/devolve.cpp
+++ b/src/internal/devolve.cpp
@@ -106,7 +106,13 @@ Offer devolve(const v1::Offer& offer)
 
 OperationStatus devolve(const v1::OperationStatus& status)
 {
-  return devolve<OperationStatus>(status);
+  OperationStatus _status = devolve<OperationStatus>(status);
+
+  if (status.has_agent_id()) {
+    *_status.mutable_slave_id() = devolve<SlaveID>(status.agent_id());
+  }
+
+  return _status;
 }
 
 
diff --git a/src/internal/evolve.cpp b/src/internal/evolve.cpp
index aa60efe..19c1559 100644
--- a/src/internal/evolve.cpp
+++ b/src/internal/evolve.cpp
@@ -160,7 +160,13 @@ v1::OfferID evolve(const OfferID& offerId)
 
 v1::OperationStatus evolve(const OperationStatus& status)
 {
-  return evolve<v1::OperationStatus>(status);
+  v1::OperationStatus _status = evolve<v1::OperationStatus>(status);
+
+  if (status.has_slave_id()) {
+    *_status.mutable_agent_id() = evolve<v1::AgentID>(status.slave_id());
+  }
+
+  return _status;
 }
 
 


[mesos] 04/05: Set agent and/or resource provider ID in operation status updates.

Posted by ch...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

chhsiao pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/mesos.git

commit be52c980598406a6f41c6e3f6323b02ecaccc450
Author: Benjamin Bannier <be...@mesosphere.io>
AuthorDate: Wed Dec 5 13:03:09 2018 -0800

    Set agent and/or resource provider ID in operation status updates.
    
    This patch sets agent and/or resource provider ID operation status
    update messages. This is not always possible, e.g., some operations
    might fail validation so that no corresponding IDs can be extracted.
    
    Since operations failing validation are currently directly rejected by
    the master without going through a status update manager, they are not
    retried either. If a master status update manager for operations is
    introduced at a later point it should be possible to forward
    acknowledgements for updates to the master's update manager.
    
    Review: https://reviews.apache.org/r/69163/
---
 src/common/protobuf_utils.cpp              |   4 +-
 src/common/protobuf_utils.hpp              |   4 +-
 src/master/master.cpp                      |  74 +++++++++++++----
 src/resource_provider/manager.cpp          |   1 +
 src/resource_provider/storage/provider.cpp |  48 ++++++-----
 src/slave/slave.cpp                        |  41 +++++++--
 src/tests/master_tests.cpp                 | 129 +++++++++++++++++++++++++++++
 src/tests/mesos.hpp                        |   7 +-
 8 files changed, 259 insertions(+), 49 deletions(-)

diff --git a/src/common/protobuf_utils.cpp b/src/common/protobuf_utils.cpp
index a5a4ace..583f0ab 100644
--- a/src/common/protobuf_utils.cpp
+++ b/src/common/protobuf_utils.cpp
@@ -444,7 +444,9 @@ OperationStatus createOperationStatus(
     const Option<OperationID>& operationId,
     const Option<string>& message,
     const Option<Resources>& convertedResources,
-    const Option<id::UUID>& uuid)
+    const Option<id::UUID>& uuid,
+    const Option<SlaveID>& slaveId,
+    const Option<ResourceProviderID>& resourceProviderId)
 {
   OperationStatus status;
   status.set_state(state);
diff --git a/src/common/protobuf_utils.hpp b/src/common/protobuf_utils.hpp
index a7ddca4..f529471 100644
--- a/src/common/protobuf_utils.hpp
+++ b/src/common/protobuf_utils.hpp
@@ -165,7 +165,9 @@ OperationStatus createOperationStatus(
     const Option<OperationID>& operationId = None(),
     const Option<std::string>& message = None(),
     const Option<Resources>& convertedResources = None(),
-    const Option<id::UUID>& statusUUID = None());
+    const Option<id::UUID>& statusUUID = None(),
+    const Option<SlaveID>& slaveId = None(),
+    const Option<ResourceProviderID>& resourceProviderId = None());
 
 
 Operation createOperation(
diff --git a/src/master/master.cpp b/src/master/master.cpp
index 3b3824a..ae5b240 100644
--- a/src/master/master.cpp
+++ b/src/master/master.cpp
@@ -2251,6 +2251,11 @@ void Master::drop(
     scheduler::Event update;
     update.set_type(scheduler::Event::UPDATE_OPERATION_STATUS);
 
+    // NOTE: We do not attempt to set the agent or resource provider IDs for
+    // dropped operations as we cannot guarantee to always know their values.
+    //
+    // TODO(bbannier): Set agent or resource provider ID if we know
+    // for certain that the operation was valid.
     *update.mutable_update_operation_status()->mutable_status() =
       protobuf::createOperationStatus(
           OperationState::OPERATION_ERROR,
@@ -9188,6 +9193,11 @@ scheduler::Response::ReconcileOperations Master::reconcileOperations(
       slaveId = operation.slave_id();
     }
 
+    Option<ResourceProviderID> resourceProviderId = None();
+    if (operation.has_resource_provider_id()) {
+      resourceProviderId = operation.resource_provider_id();
+    }
+
     Option<Operation*> frameworkOperation =
       framework->getOperation(operation.operation_id());
 
@@ -9205,38 +9215,62 @@ scheduler::Response::ReconcileOperations Master::reconcileOperations(
       *status = protobuf::createOperationStatus(
           OperationState::OPERATION_RECOVERING,
           operation.operation_id(),
-          "Reconciliation: Agent is recovered but has not re-registered");
+          "Reconciliation: Agent is recovered but has not re-registered",
+          None(),
+          None(),
+          slaveId,
+          resourceProviderId);
     } else if (slaveId.isSome() && slaves.registered.contains(slaveId.get())) {
       // (3) Operation is unknown, slave is registered: OPERATION_UNKNOWN.
       *status = protobuf::createOperationStatus(
           OperationState::OPERATION_UNKNOWN,
           operation.operation_id(),
-          "Reconciliation: Operation is unknown");
+          "Reconciliation: Operation is unknown",
+          None(),
+          None(),
+          slaveId,
+          resourceProviderId);
     } else if (slaveId.isSome() && slaves.unreachable.contains(slaveId.get())) {
       // (4) Operation is unknown, slave is unreachable: OPERATION_UNREACHABLE.
       *status = protobuf::createOperationStatus(
           OperationState::OPERATION_UNREACHABLE,
           operation.operation_id(),
-          "Reconciliation: Agent is unreachable");
+          "Reconciliation: Agent is unreachable",
+          None(),
+          None(),
+          slaveId,
+          resourceProviderId);
     } else if (slaveId.isSome() && slaves.gone.contains(slaveId.get())) {
       // (5) Operation is unknown, slave is gone: OPERATION_GONE_BY_OPERATOR.
       *status = protobuf::createOperationStatus(
           OperationState::OPERATION_GONE_BY_OPERATOR,
           operation.operation_id(),
-          "Reconciliation: Agent marked gone by operator");
+          "Reconciliation: Agent marked gone by operator",
+          None(),
+          None(),
+          slaveId,
+          resourceProviderId);
     } else if (slaveId.isSome()) {
       // (6) Operation is unknown, slave is unknown: OPERATION_UNKNOWN.
       *status = protobuf::createOperationStatus(
           OperationState::OPERATION_UNKNOWN,
           operation.operation_id(),
-          "Reconciliation: Both operation and agent are unknown");
+          "Reconciliation: Both operation and agent are unknown",
+          None(),
+          None(),
+          slaveId,
+          resourceProviderId);
     } else {
       // (7) Operation is unknown, slave is unknown: OPERATION_UNKNOWN.
       *status = protobuf::createOperationStatus(
           OperationState::OPERATION_UNKNOWN,
           operation.operation_id(),
           "Reconciliation: Operation is unknown and no 'agent_id' was"
-          " provided");
+          " provided",
+          None(),
+          None(),
+          slaveId,
+          resourceProviderId);
     }
   }
 
@@ -11309,18 +11343,22 @@ void Master::_apply(
       ? slave->resourceVersion.get()
       : slave->resourceProviders.get(resourceProviderId.get())->resourceVersion;
 
-    Operation* operation = new Operation(
-        protobuf::createOperation(
-            operationInfo,
-            protobuf::createOperationStatus(
-              OPERATION_PENDING,
-              operationInfo.has_id()
-                ? operationInfo.id()
-                : Option<OperationID>::none()),
-            framework != nullptr
-              ? framework->id()
-              : Option<FrameworkID>::none(),
-            slave->id));
+    Operation* operation = new Operation(protobuf::createOperation(
+        operationInfo,
+        protobuf::createOperationStatus(
+            OPERATION_PENDING,
+            operationInfo.has_id()
+              ? operationInfo.id()
+              : Option<OperationID>::none(),
+            None(),
+            None(),
+            None(),
+            slave->id,
+            resourceProviderId.isSome()
+              ? Some(resourceProviderId.get())
+              : Option<ResourceProviderID>::none()),
+        framework != nullptr ? framework->id() : Option<FrameworkID>::none(),
+        slave->id));
 
     addOperation(framework, slave, operation);
 
diff --git a/src/resource_provider/manager.cpp b/src/resource_provider/manager.cpp
index 6c81c43..103f545 100644
--- a/src/resource_provider/manager.cpp
+++ b/src/resource_provider/manager.cpp
@@ -576,6 +576,7 @@ void ResourceProviderManagerProcess::reconcileOperations(
     if (operation.has_resource_provider_id()) {
       if (!resourceProviders.subscribed.contains(
               operation.resource_provider_id())) {
+        // TODO(bbannier): We should send `OPERATION_UNREACHABLE` here.
         LOG(WARNING) << "Dropping operation reconciliation message with"
                      << " operation_uuid " << operation.operation_uuid()
                      << " because resource provider "
diff --git a/src/resource_provider/storage/provider.cpp b/src/resource_provider/storage/provider.cpp
index a22c82c..955471c 100644
--- a/src/resource_provider/storage/provider.cpp
+++ b/src/resource_provider/storage/provider.cpp
@@ -1576,7 +1576,12 @@ void StorageLocalResourceProviderProcess::applyOperation(
       protobuf::createOperationStatus(
           OPERATION_PENDING,
           operation.info().has_id()
-            ? operation.info().id() : Option<OperationID>::none()),
+            ? operation.info().id() : Option<OperationID>::none(),
+          None(),
+          None(),
+          None(),
+          slaveId,
+          info.id()),
       frameworkId,
       slaveId,
       protobuf::createUUID(uuid.get()));
@@ -3076,17 +3081,19 @@ void StorageLocalResourceProviderProcess::dropOperation(
 
   UpdateOperationStatusMessage update =
     protobuf::createUpdateOperationStatusMessage(
-       protobuf::createUUID(operationUuid),
-       protobuf::createOperationStatus(
-           OPERATION_DROPPED,
-           operation.isSome() && operation->has_id()
-             ? operation->id() : Option<OperationID>::none(),
-           message,
-           None(),
-           id::UUID::random()),
-       None(),
-       frameworkId,
-       slaveId);
+        protobuf::createUUID(operationUuid),
+        protobuf::createOperationStatus(
+            OPERATION_DROPPED,
+            operation.isSome() && operation->has_id()
+              ? operation->id() : Option<OperationID>::none(),
+            message,
+            None(),
+            id::UUID::random(),
+            slaveId,
+            info.id()),
+        None(),
+        frameworkId,
+        slaveId);
 
   auto die = [=](const string& message) {
     LOG(ERROR)
@@ -3322,14 +3329,15 @@ Try<Nothing> StorageLocalResourceProviderProcess::updateOperationStatus(
     error = conversions.error();
   }
 
-  operation.mutable_latest_status()->CopyFrom(
-      protobuf::createOperationStatus(
-          error.isNone() ? OPERATION_FINISHED : OPERATION_FAILED,
-          operation.info().has_id()
-            ? operation.info().id() : Option<OperationID>::none(),
-          error.isNone() ? Option<string>::none() : error->message,
-          error.isNone() ? convertedResources : Option<Resources>::none(),
-          id::UUID::random()));
+  operation.mutable_latest_status()->CopyFrom(protobuf::createOperationStatus(
+      error.isNone() ? OPERATION_FINISHED : OPERATION_FAILED,
+      operation.info().has_id()
+        ? operation.info().id() : Option<OperationID>::none(),
+      error.isNone() ? Option<string>::none() : error->message,
+      error.isNone() ? convertedResources : Option<Resources>::none(),
+      id::UUID::random(),
+      slaveId,
+      info.id()));
 
   operation.add_statuses()->CopyFrom(operation.latest_status());
 
diff --git a/src/slave/slave.cpp b/src/slave/slave.cpp
index 324bec7..e13b955 100644
--- a/src/slave/slave.cpp
+++ b/src/slave/slave.cpp
@@ -4348,13 +4348,20 @@ void Slave::applyOperation(const ApplyOperationMessage& message)
     return;
   }
 
-  Operation* operation = new Operation(
-      protobuf::createOperation(
-          message.operation_info(),
-          protobuf::createOperationStatus(OPERATION_PENDING, operationId),
-          frameworkId,
+  Operation* operation = new Operation(protobuf::createOperation(
+      message.operation_info(),
+      protobuf::createOperationStatus(
+          OPERATION_PENDING,
+          operationId,
+          None(),
+          None(),
+          None(),
           info.id(),
-          uuid));
+          resourceProviderId.isSome()
+            ? resourceProviderId.get() : Option<ResourceProviderID>::none()),
+      frameworkId,
+      info.id(),
+      uuid));
 
   addOperation(operation);
 
@@ -4385,7 +4392,15 @@ void Slave::applyOperation(const ApplyOperationMessage& message)
   UpdateOperationStatusMessage update =
     protobuf::createUpdateOperationStatusMessage(
         uuid,
-        protobuf::createOperationStatus(OPERATION_FINISHED, operationId),
+        protobuf::createOperationStatus(
+            OPERATION_FINISHED,
+            operationId,
+            None(),
+            None(),
+            None(),
+            info.id(),
+            resourceProviderId.isSome()
+              ? resourceProviderId.get() : Option<ResourceProviderID>::none()),
         None(),
         frameworkId,
         info.id());
@@ -4426,7 +4441,13 @@ void Slave::reconcileOperations(const ReconcileOperationsMessage& message)
       UpdateOperationStatusMessage update =
         protobuf::createUpdateOperationStatusMessage(
             operation.operation_uuid(),
-            protobuf::createOperationStatus(OPERATION_DROPPED),
+            protobuf::createOperationStatus(
+                OPERATION_DROPPED,
+                None(),
+                None(),
+                None(),
+                None(),
+                info.id()),
             None(),
             None(),
             info.id());
@@ -7785,6 +7806,10 @@ void Slave::handleResourceProviderMessage(
         message->updateOperationStatus->update;
 
       update.mutable_slave_id()->CopyFrom(info.id());
+      update.mutable_status()->mutable_slave_id()->CopyFrom(info.id());
+      if (update.has_latest_status()) {
+        update.mutable_latest_status()->mutable_slave_id()->CopyFrom(info.id());
+      }
 
       const UUID& operationUUID = update.operation_uuid();
 
diff --git a/src/tests/master_tests.cpp b/src/tests/master_tests.cpp
index ef2c001..3bff7a1 100644
--- a/src/tests/master_tests.cpp
+++ b/src/tests/master_tests.cpp
@@ -9221,6 +9221,135 @@ TEST_F(MasterTest, OperationUpdateDuringFailover)
 }
 
 
+// This test verifies that operation status updates contain the
+// agent ID and resource provider ID of originating providers.
+TEST_F(MasterTest, OperationUpdateResourceProvider)
+{
+  Clock::pause();
+
+  master::Flags masterFlags = CreateMasterFlags();
+
+  Try<Owned<cluster::Master>> master = StartMaster(masterFlags);
+  ASSERT_SOME(master);
+
+  slave::Flags slaveFlags = CreateSlaveFlags();
+
+  Future<UpdateSlaveMessage> updateSlaveMessage =
+    FUTURE_PROTOBUF(UpdateSlaveMessage(), _, _);
+
+  StandaloneMasterDetector detector(master.get()->pid);
+  Try<Owned<cluster::Slave>> slave = StartSlave(&detector, slaveFlags);
+  ASSERT_SOME(slave);
+
+  Clock::advance(slaveFlags.registration_backoff_factor);
+
+  AWAIT_READY(updateSlaveMessage);
+
+  // Register a resource provider with the agent.
+  mesos::v1::ResourceProviderInfo resourceProviderInfo;
+  resourceProviderInfo.set_type("org.apache.mesos.resource_provider.test");
+  resourceProviderInfo.set_name("test");
+
+  v1::Resources resourceProviderResources = v1::createDiskResource(
+      "200", "*", None(), None(), v1::createDiskSourceRaw(None(), "profile"));
+
+  v1::MockResourceProvider resourceProvider(
+      resourceProviderInfo, resourceProviderResources);
+
+  Owned<EndpointDetector> endpointDetector(
+      resource_provider::createEndpointDetector(slave.get()->pid));
+
+  updateSlaveMessage = FUTURE_PROTOBUF(UpdateSlaveMessage(), _, _);
+
+  resourceProvider.start(endpointDetector, ContentType::PROTOBUF);
+
+  AWAIT_READY(updateSlaveMessage);
+
+  const v1::AgentID agentId = evolve(updateSlaveMessage->slave_id());
+
+  ASSERT_TRUE(resourceProvider.info.has_id());
+  const v1::ResourceProviderID resourceProviderId = resourceProvider.info.id();
+
+  // Start a framework to operate on offers.
+  auto scheduler = std::make_shared<v1::MockHTTPScheduler>();
+
+  EXPECT_CALL(*scheduler, connected(_))
+    .WillOnce(v1::scheduler::SendSubscribe(v1::DEFAULT_FRAMEWORK_INFO));
+
+  v1::scheduler::TestMesos driver(
+      master.get()->pid, ContentType::PROTOBUF, scheduler);
+
+  Future<Event::Subscribed> subscribed;
+  EXPECT_CALL(*scheduler, subscribed(_, _))
+    .WillOnce(FutureArg<1>(&subscribed));
+
+  EXPECT_CALL(*scheduler, heartbeat(_))
+    .WillRepeatedly(Return()); // Ignore heartbeats.
+
+  Future<Event::Offers> offers;
+  EXPECT_CALL(*scheduler, offers(_, _))
+    .WillOnce(FutureArg<1>(&offers));
+
+  AWAIT_READY(subscribed);
+
+  const v1::FrameworkID& frameworkId = subscribed->framework_id();
+
+  Clock::settle();
+  Clock::advance(masterFlags.allocation_interval);
+
+  AWAIT_READY(offers);
+
+  Future<Event::UpdateOperationStatus> updateOperationStatus;
+  EXPECT_CALL(*scheduler, updateOperationStatus(_, _))
+    .WillOnce(FutureArg<1>(&updateOperationStatus));
+
+  ASSERT_FALSE(offers->offers().empty());
+  const v1::Offer& offer = offers->offers(0);
+
+  // Perform an operation against the resource provider resources.
+  Option<v1::Resource> resource;
+  foreach (const v1::Resource& resource_, offer.resources()) {
+    if (resource_.has_provider_id()) {
+      resource = resource_;
+      break;
+    }
+  }
+
+  ASSERT_SOME(resource);
+
+  {
+    Call call;
+    call.mutable_framework_id()->CopyFrom(frameworkId);
+    call.set_type(Call::ACCEPT);
+
+    Call::Accept* accept = call.mutable_accept();
+    accept->add_offer_ids()->CopyFrom(offer.id());
+
+    v1::Offer::Operation* operation = accept->add_operations();
+    operation->set_type(v1::Offer::Operation::CREATE_DISK);
+    operation->mutable_id()->set_value("create_disk");
+
+    v1::Offer::Operation::CreateDisk* createDisk =
+      operation->mutable_create_disk();
+    createDisk->mutable_source()->CopyFrom(resource.get());
+    createDisk->set_target_type(v1::Resource::DiskInfo::Source::MOUNT);
+
+    driver.send(call);
+  }
+
+  AWAIT_READY(updateOperationStatus);
+
+  const v1::OperationStatus& status = updateOperationStatus->status();
+  ASSERT_EQ("create_disk", status.operation_id().value());
+
+  ASSERT_TRUE(status.has_agent_id());
+  EXPECT_EQ(agentId, status.agent_id());
+
+  ASSERT_TRUE(status.has_resource_provider_id());
+  EXPECT_EQ(resourceProviderId, status.resource_provider_id());
+}
+
+
 // Tests that the master correctly drops an operation if the operation's 'id'
 // field is set and the operation affects resources not managed by a resource
 // provider.
diff --git a/src/tests/mesos.hpp b/src/tests/mesos.hpp
index c08e7e6..b57bbd4 100644
--- a/src/tests/mesos.hpp
+++ b/src/tests/mesos.hpp
@@ -3158,7 +3158,12 @@ public:
         break;
     }
 
-    update->mutable_latest_status()->CopyFrom(update->status());
+    if (update->has_status()) {
+      update->mutable_status()->mutable_resource_provider_id()->CopyFrom(
+          info.id());
+
+      update->mutable_latest_status()->CopyFrom(update->status());
+    }
 
     driver->send(call);
   }


[mesos] 01/05: Fixed handling for offer operation updates.

Posted by ch...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

chhsiao pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/mesos.git

commit 79cbaede4f29a124c18cd508f76bba33b8ac0ddc
Author: Benjamin Bannier <be...@mesosphere.io>
AuthorDate: Wed Dec 5 13:01:30 2018 -0800

    Fixed handling for offer operation updates.
    
    The handling of offer operation updates introduced in `c946615ec6d`
    made use of an update's `latest_status` without making sure that any
    value was set. This could lead to situation where an uninitialized
    enum value was switched on which would have caused a fatal error at
    runtime.
    
    This patch replaces uses of `latest_status` with `state` which does
    contain the information we care about. We also adjust the error
    logging so we log the value that lead to the error, not some other
    value.
    
    Review: https://reviews.apache.org/r/69157/
---
 src/slave/slave.cpp | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/src/slave/slave.cpp b/src/slave/slave.cpp
index 76332f6..9a6a6ba 100644
--- a/src/slave/slave.cpp
+++ b/src/slave/slave.cpp
@@ -8013,7 +8013,7 @@ void Slave::updateOperation(
     return;
   }
 
-  switch (update.latest_status().state()) {
+  switch (operation->latest_status().state()) {
     // Terminal state, and the conversion is successful.
     case OPERATION_FINISHED: {
       apply(operation);
@@ -8034,8 +8034,8 @@ void Slave::updateOperation(
     case OPERATION_GONE_BY_OPERATOR:
     case OPERATION_RECOVERING:
     case OPERATION_UNKNOWN: {
-      LOG(FATAL) << "Unexpected operation state "
-                 << operation->latest_status().state();
+      LOG(FATAL)
+        << "Unexpected operation state " << operation->latest_status().state();
     }
   }
 }