You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@mesos.apache.org by jo...@apache.org on 2016/02/27 03:35:08 UTC

mesos git commit: Changed scalar resources to use fixed-point internally.

Repository: mesos
Updated Branches:
  refs/heads/master dd0b78856 -> 02c4dbda3


Changed scalar resources to use fixed-point internally.

Scalar resource values are represented using floating point. As a
result, users could see unexpected results when accepting offers and
making reservations for fractional resources: values like "0.1" cannot
be precisely represented using standard floating point, and the resource
values returned to frameworks might contain an unpredictable amount of
roundoff error.

This commit adjusts the master to use fixed-point when doing internal
computations on scalar resource values. The fixed-point format only
supports three decimal digits of precision: that is, fractional resource
values like "0.001" will be supported, but "0.0001" will not be.

Review: https://reviews.apache.org/r/43635/


Project: http://git-wip-us.apache.org/repos/asf/mesos/repo
Commit: http://git-wip-us.apache.org/repos/asf/mesos/commit/02c4dbda
Tree: http://git-wip-us.apache.org/repos/asf/mesos/tree/02c4dbda
Diff: http://git-wip-us.apache.org/repos/asf/mesos/diff/02c4dbda

Branch: refs/heads/master
Commit: 02c4dbda3d6f64fdd57d997664c6983cc5ae2a78
Parents: dd0b788
Author: Neil Conway <ne...@gmail.com>
Authored: Fri Feb 26 18:09:12 2016 -0800
Committer: Joris Van Remoortere <jo...@gmail.com>
Committed: Fri Feb 26 18:34:39 2016 -0800

----------------------------------------------------------------------
 docs/attributes-resources.md                |   4 +-
 docs/upgrades.md                            |   2 +
 include/mesos/mesos.proto                   |   8 ++
 include/mesos/v1/mesos.proto                |   8 ++
 src/common/resources.cpp                    |  23 +----
 src/common/values.cpp                       |  51 +++++++++--
 src/master/allocator/mesos/hierarchical.cpp |   4 +-
 src/tests/resources_tests.cpp               | 105 +++++++++++++++++++++--
 src/v1/resources.cpp                        |  23 +----
 src/v1/values.cpp                           |  51 +++++++++--
 10 files changed, 214 insertions(+), 65 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/mesos/blob/02c4dbda/docs/attributes-resources.md
----------------------------------------------------------------------
diff --git a/docs/attributes-resources.md b/docs/attributes-resources.md
index 818da8a..866481f 100644
--- a/docs/attributes-resources.md
+++ b/docs/attributes-resources.md
@@ -36,7 +36,9 @@ Attributes are key-value pairs (where value is optional) that Mesos passes along
 
 ## Resources
 
-The Mesos system can manage 3 different *types* of resources: scalars, ranges, and sets.  These are used to represent the different resources that a Mesos slave has to offer.  For example, a scalar resource type could be used to represent the amount of memory on a slave.  Resources can be specified either with a JSON array or a semicolon-delimited string of key:value pairs.  If, after examining the examples below, you have questions about the format of the JSON, inspect the `Resource` protobuf message definition in `include/mesos/mesos.proto`.
+Mesos can manage three different *types* of resources: scalars, ranges, and sets.  These are used to represent the different resources that a Mesos slave has to offer.  For example, a scalar resource type could be used to represent the amount of memory on a slave. Scalar resources are represented using floating point numbers to allow fractional values to be specified (e.g., "1.5 CPUs"). Mesos only supports three decimal digits of precision for scalar resources (e.g., reserving "1.5123 CPUs" is considered equivalent to reserving "1.512 CPUs").
+
+Resources can be specified either with a JSON array or a semicolon-delimited string of key-value pairs.  If, after examining the examples below, you have questions about the format of the JSON, inspect the `Resource` protobuf message definition in `include/mesos/mesos.proto`.
 
 As JSON:
 

http://git-wip-us.apache.org/repos/asf/mesos/blob/02c4dbda/docs/upgrades.md
----------------------------------------------------------------------
diff --git a/docs/upgrades.md b/docs/upgrades.md
index 21faea8..0ea586a 100644
--- a/docs/upgrades.md
+++ b/docs/upgrades.md
@@ -8,6 +8,8 @@ This document serves as a guide for users who wish to upgrade an existing Mesos
 
 ## Upgrading from 0.27.x to 0.28.x ##
 
+* Mesos 0.28 only supports three decimal digits of precision for scalar resource values. For example, frameworks can reserve "0.001" CPUs but more fine-grained reservations (e.g., "0.0001" CPUs) are no longer supported (although they did not work reliably in prior versions of Mesos anyway). Internally, resource math is now done using a fixed-point format that supports three decimal digits of precision, and then converted to/from floating point for input and output, respectively. Frameworks that do their own resource math and manipulate fractional resources may observe differences in roundoff error and numerical precision.
+
 * Mesos 0.28 changes the definitions of two ACLs used for authorization. The objects of the `ReserveResources` and `CreateVolume` ACLs have been changed to `roles`. In both cases, principals can now be authorized to perform these operations for particular roles. This means that by default, a framework or operator can reserve resources/create volumes for any role. To restrict this behavior, [ACLs can be added](authorization.md) to the master which authorize principals to reserve resources/create volumes for specified roles only. Previously, frameworks could only reserve resources for their own role; this behavior can be preserved by configuring the `ReserveResources` ACLs such that the framework's principal is only authorized to reserve for the framework's role. **NOTE** This renders existing `ReserveResources` and `CreateVolume` ACL definitions obsolete; if you are authorizing these operations, your ACL definitions should be updated.
 
 ## Upgrading from 0.26.x to 0.27.x ##

http://git-wip-us.apache.org/repos/asf/mesos/blob/02c4dbda/include/mesos/mesos.proto
----------------------------------------------------------------------
diff --git a/include/mesos/mesos.proto b/include/mesos/mesos.proto
index 33f6b08..3d22ec3 100644
--- a/include/mesos/mesos.proto
+++ b/include/mesos/mesos.proto
@@ -510,6 +510,14 @@ message Value {
   }
 
   message Scalar {
+    // Scalar values are represented using floating point. To reduce
+    // the chance of unpredictable floating point behavior due to
+    // roundoff error, Mesos only supports three decimal digits of
+    // precision for scalar resource values. That is, floating point
+    // values are converted to a fixed point format that supports
+    // three decimal digits of precision, and then converted back to
+    // floating point on output. Any additional precision in scalar
+    // resource values is discarded (via rounding).
     required double value = 1;
   }
 

http://git-wip-us.apache.org/repos/asf/mesos/blob/02c4dbda/include/mesos/v1/mesos.proto
----------------------------------------------------------------------
diff --git a/include/mesos/v1/mesos.proto b/include/mesos/v1/mesos.proto
index 1b0e709..31960a5 100644
--- a/include/mesos/v1/mesos.proto
+++ b/include/mesos/v1/mesos.proto
@@ -507,6 +507,14 @@ message Value {
   }
 
   message Scalar {
+    // Scalar values are represented using floating point. To reduce
+    // the chance of unpredictable floating point behavior due to
+    // roundoff error, Mesos only supports three decimal digits of
+    // precision for scalar resource values. That is, floating point
+    // values are converted to a fixed point format that supports
+    // three decimal digits of precision, and then converted back to
+    // floating point on output. Any additional precision in scalar
+    // resource values is discarded (via rounding).
     required double value = 1;
   }
 

http://git-wip-us.apache.org/repos/asf/mesos/blob/02c4dbda/src/common/resources.cpp
----------------------------------------------------------------------
diff --git a/src/common/resources.cpp b/src/common/resources.cpp
index 529a1cd..4fa1e78 100644
--- a/src/common/resources.cpp
+++ b/src/common/resources.cpp
@@ -37,11 +37,6 @@
 #include <stout/protobuf.hpp>
 #include <stout/strings.hpp>
 
-// TODO(bernd-mesos): Remove this interim dependency in the course of
-// solving MESOS-3997.
-#include <master/constants.hpp>
-
-
 using std::map;
 using std::ostream;
 using std::set;
@@ -50,9 +45,6 @@ using std::vector;
 
 using google::protobuf::RepeatedPtrField;
 
-using mesos::internal::master::MIN_CPUS;
-
-
 namespace mesos {
 
 /////////////////////////////////////////////////
@@ -1094,17 +1086,10 @@ Try<Resources> Resources::apply(const Offer::Operation& operation) const
   // TODO(jieyu): Currently, we only check known resource types like
   // cpus, mem, disk, ports, etc. We should generalize this.
 
-  CHECK(result.mem() == mem() &&
-        result.disk() == disk() &&
-        result.ports() == ports());
-
-  // This comparison is an interim fix - see MESOS-3552. We are making it
-  // reasonably certain that almost equal values are correctly regarded as
-  // equal. Small, usually acceptable, differences occur due to numeric
-  // operations such as unparsing and then parsing a floating point number.
-  // TODO(bernd-mesos): Of course, they might also accumulate, so we need a
-  // better long-term fix. Apply one here when solving MESOS-3997.
-  CHECK_NEAR(result.cpus().getOrElse(0.0), cpus().getOrElse(0.0), MIN_CPUS);
+  CHECK(result.cpus() == cpus());
+  CHECK(result.mem() == mem());
+  CHECK(result.disk() == disk());
+  CHECK(result.ports() == ports());
 
   return result;
 }

http://git-wip-us.apache.org/repos/asf/mesos/blob/02c4dbda/src/common/values.cpp
----------------------------------------------------------------------
diff --git a/src/common/values.cpp b/src/common/values.cpp
index c64407b..9f3c0b9 100644
--- a/src/common/values.cpp
+++ b/src/common/values.cpp
@@ -17,6 +17,7 @@
 #include <stdint.h>
 
 #include <algorithm>
+#include <cmath>
 #include <initializer_list>
 #include <ostream>
 #include <string>
@@ -42,50 +43,82 @@ using std::vector;
 
 namespace mesos {
 
+// We manipulate scalar values by converting them from floating point to a
+// fixed point representation, doing a calculation, and then converting
+// the result back to floating point. We deliberately only preserve three
+// decimal digits of precision in the fixed point representation. This
+// ensures that client applications see predictable numerical behavior, at
+// the expense of sacrificing some precision.
+
+static long long convertToFixed(double floatValue)
+{
+  return std::llround(floatValue * 1000);
+}
+
+
+static double convertToFloating(long long fixedValue)
+{
+  // NOTE: We do the conversion from fixed point via integer division
+  // and then modulus, rather than a single floating point division.
+  // This ensures that we only apply floating point division to inputs
+  // in the range [0,999], which is easier to check for correctness.
+  double quotient = static_cast<double>(fixedValue / 1000);
+  double remainder = static_cast<double>(fixedValue % 1000) / 1000.0;
+
+  return quotient + remainder;
+}
+
+
 ostream& operator<<(ostream& stream, const Value::Scalar& scalar)
 {
-  return stream << scalar.value();
+  // We discard any additional precision from scalar resources before
+  // writing them to an ostream. This is redundant when the scalar is
+  // obtained from one of the operators below, but user-specified
+  // resource values might contain additional precision.
+  return stream << convertToFloating(convertToFixed(scalar.value()));
 }
 
 
 bool operator==(const Value::Scalar& left, const Value::Scalar& right)
 {
-  return left.value() == right.value();
+  return convertToFixed(left.value()) == convertToFixed(right.value());
 }
 
 
 bool operator<=(const Value::Scalar& left, const Value::Scalar& right)
 {
-  return left.value() <= right.value();
+  return convertToFixed(left.value()) <= convertToFixed(right.value());
 }
 
 
 Value::Scalar operator+(const Value::Scalar& left, const Value::Scalar& right)
 {
-  Value::Scalar result;
-  result.set_value(left.value() + right.value());
+  Value::Scalar result = left;
+  result += right;
   return result;
 }
 
 
 Value::Scalar operator-(const Value::Scalar& left, const Value::Scalar& right)
 {
-  Value::Scalar result;
-  result.set_value(left.value() - right.value());
+  Value::Scalar result = left;
+  result -= right;
   return result;
 }
 
 
 Value::Scalar& operator+=(Value::Scalar& left, const Value::Scalar& right)
 {
-  left.set_value(left.value() + right.value());
+  long long sum = convertToFixed(left.value()) + convertToFixed(right.value());
+  left.set_value(convertToFloating(sum));
   return left;
 }
 
 
 Value::Scalar& operator-=(Value::Scalar& left, const Value::Scalar& right)
 {
-  left.set_value(left.value() - right.value());
+  long long diff = convertToFixed(left.value()) - convertToFixed(right.value());
+  left.set_value(convertToFloating(diff));
   return left;
 }
 

http://git-wip-us.apache.org/repos/asf/mesos/blob/02c4dbda/src/master/allocator/mesos/hierarchical.cpp
----------------------------------------------------------------------
diff --git a/src/master/allocator/mesos/hierarchical.cpp b/src/master/allocator/mesos/hierarchical.cpp
index 5ef29f2..1b69fa8 100644
--- a/src/master/allocator/mesos/hierarchical.cpp
+++ b/src/master/allocator/mesos/hierarchical.cpp
@@ -883,9 +883,7 @@ void HierarchicalAllocatorProcess::recoverResources(
   // which it might not in the event that we dispatched Master::offer
   // before we received Allocator::removeSlave).
   if (slaves.contains(slaveId)) {
-    // NOTE: We cannot add the following CHECK due to the double
-    // precision errors. See MESOS-1187 for details.
-    // CHECK(slaves[slaveId].allocated.contains(resources));
+    CHECK(slaves[slaveId].allocated.contains(resources));
 
     slaves[slaveId].allocated -= resources;
 

http://git-wip-us.apache.org/repos/asf/mesos/blob/02c4dbda/src/tests/resources_tests.cpp
----------------------------------------------------------------------
diff --git a/src/tests/resources_tests.cpp b/src/tests/resources_tests.cpp
index 96864c3..7642740 100644
--- a/src/tests/resources_tests.cpp
+++ b/src/tests/resources_tests.cpp
@@ -846,6 +846,37 @@ TEST(ResourcesTest, PrintingExtendedAttributes)
 }
 
 
+TEST(ResourcesTest, PrintingScalarPrecision)
+{
+  Resource scalar;
+  scalar.set_name("cpus");
+  scalar.set_type(Value::SCALAR);
+  scalar.mutable_scalar()->set_value(1.234);
+
+  // Three decimal digits of precision are supported.
+  ostringstream stream;
+  stream << scalar;
+  EXPECT_EQ("cpus(*):1.234", stream.str());
+
+  // Additional precision is discarded via rounding.
+  scalar.mutable_scalar()->set_value(1.2345);
+  stream.str("");
+  stream << scalar;
+  EXPECT_EQ("cpus(*):1.235", stream.str());
+
+  scalar.mutable_scalar()->set_value(1.2344);
+  stream.str("");
+  stream << scalar;
+  EXPECT_EQ("cpus(*):1.234", stream.str());
+
+  // Trailing zeroes are not printed.
+  scalar.mutable_scalar()->set_value(1.1);
+  stream.str("");
+  stream << scalar;
+  EXPECT_EQ("cpus(*):1.1", stream.str());
+}
+
+
 TEST(ResourcesTest, InitializedIsEmpty)
 {
   Resources r;
@@ -1541,15 +1572,79 @@ TEST(ResourcesTest, Types)
 }
 
 
-// NOTE: This is disabled due to MESOS-1187.
-TEST(ResourcesTest, DISABLED_Precision)
+TEST(ResourcesTest, PrecisionSimple)
 {
-  Resources cpu = Resources::parse("cpus:0.1").get();
+  Resources cpu = Resources::parse("cpus:1.001").get();
+  EXPECT_EQ(1.001, cpu.cpus().get());
 
   Resources r1 = cpu + cpu + cpu - cpu - cpu;
-  Resources r2 = cpu;
 
-  EXPECT_EQ(r1, r2);
+  EXPECT_EQ(cpu, r1);
+  EXPECT_EQ(1.001, r1.cpus().get());
+
+  Resources zero = Resources::parse("cpus:0").get();
+
+  EXPECT_EQ(cpu, cpu - zero);
+  EXPECT_EQ(cpu, cpu + zero);
+}
+
+
+TEST(ResourcesTest, PrecisionManyOps)
+{
+  Resources start = Resources::parse("cpus:1.001").get();
+  Resources current = start;
+  Resources next;
+
+  for (int i = 0; i < 2500; i++) {
+    next = current + current + current - current - current;
+    EXPECT_EQ(1.001, next.cpus().get());
+    EXPECT_EQ(current, next);
+    EXPECT_EQ(start, next);
+    current = next;
+  }
+}
+
+
+TEST(ResourcesTest, PrecisionManyConsecutiveOps)
+{
+  Resources start = Resources::parse("cpus:1.001").get();
+  Resources increment = start;
+  Resources current = start;
+
+  for (int i = 0; i < 100000; i++) {
+    current += increment;
+  }
+
+  for (int i = 0; i < 100000; i++) {
+    current -= increment;
+  }
+
+  EXPECT_EQ(start, current);
+}
+
+
+TEST(ResourcesTest, PrecisionLost)
+{
+  Resources cpu = Resources::parse("cpus:1.5011").get();
+  EXPECT_EQ(1.501, cpu.cpus().get());
+
+  Resources r1 = cpu + cpu + cpu - cpu - cpu;
+
+  EXPECT_EQ(cpu, r1);
+  EXPECT_EQ(1.501, r1.cpus().get());
+}
+
+
+TEST(ResourcesTest, PrecisionRounding)
+{
+  // Round up (away from zero) at the half-way point.
+  Resources cpu = Resources::parse("cpus:1.5015").get();
+  EXPECT_EQ(1.502, cpu.cpus().get());
+
+  Resources r1 = cpu + cpu + cpu - cpu - cpu;
+
+  EXPECT_EQ(cpu, r1);
+  EXPECT_EQ(1.502, r1.cpus().get());
 }
 
 

http://git-wip-us.apache.org/repos/asf/mesos/blob/02c4dbda/src/v1/resources.cpp
----------------------------------------------------------------------
diff --git a/src/v1/resources.cpp b/src/v1/resources.cpp
index 49dc342..bca5231 100644
--- a/src/v1/resources.cpp
+++ b/src/v1/resources.cpp
@@ -38,11 +38,6 @@
 #include <stout/protobuf.hpp>
 #include <stout/strings.hpp>
 
-// TODO(bernd-mesos): Remove this interim dependency in the course of
-// solving MESOS-3997.
-#include <master/constants.hpp>
-
-
 using std::map;
 using std::ostream;
 using std::set;
@@ -51,9 +46,6 @@ using std::vector;
 
 using google::protobuf::RepeatedPtrField;
 
-using mesos::internal::master::MIN_CPUS;
-
-
 namespace mesos {
 namespace v1 {
 
@@ -1097,17 +1089,10 @@ Try<Resources> Resources::apply(const Offer::Operation& operation) const
   // TODO(jieyu): Currently, we only check known resource types like
   // cpus, mem, disk, ports, etc. We should generalize this.
 
-  CHECK(result.mem() == mem() &&
-        result.disk() == disk() &&
-        result.ports() == ports());
-
-  // This comparison is an interim fix - see MESOS-3552. We are making it
-  // reasonably certain that almost equal values are correctly regarded as
-  // equal. Small, usually acceptable, differences occur due to numeric
-  // operations such as unparsing and then parsing a floating point number.
-  // TODO(bernd-mesos): Of course, they might also accumulate, so we need a
-  // better long-term fix. Apply one here when solving MESOS-3997.
-  CHECK_NEAR(result.cpus().getOrElse(0.0), cpus().getOrElse(0.0), MIN_CPUS);
+  CHECK(result.cpus() == cpus());
+  CHECK(result.mem() == mem());
+  CHECK(result.disk() == disk());
+  CHECK(result.ports() == ports());
 
   return result;
 }

http://git-wip-us.apache.org/repos/asf/mesos/blob/02c4dbda/src/v1/values.cpp
----------------------------------------------------------------------
diff --git a/src/v1/values.cpp b/src/v1/values.cpp
index 58ea987..3e0f739 100644
--- a/src/v1/values.cpp
+++ b/src/v1/values.cpp
@@ -17,6 +17,7 @@
 #include <stdint.h>
 
 #include <algorithm>
+#include <cmath>
 #include <initializer_list>
 #include <ostream>
 #include <string>
@@ -43,50 +44,82 @@ using std::vector;
 namespace mesos {
 namespace v1 {
 
+// We manipulate scalar values by converting them from floating point to a
+// fixed point representation, doing a calculation, and then converting
+// the result back to floating point. We deliberately only preserve three
+// decimal digits of precision in the fixed point representation. This
+// ensures that client applications see predictable numerical behavior, at
+// the expense of sacrificing some precision.
+
+static long long convertToFixed(double floatValue)
+{
+  return std::llround(floatValue * 1000);
+}
+
+
+static double convertToFloating(long long fixedValue)
+{
+  // NOTE: We do the conversion from fixed point via integer division
+  // and then modulus, rather than a single floating point division.
+  // This ensures that we only apply floating point division to inputs
+  // in the range [0,999], which is easier to check for correctness.
+  double quotient = static_cast<double>(fixedValue / 1000);
+  double remainder = static_cast<double>(fixedValue % 1000) / 1000.0;
+
+  return quotient + remainder;
+}
+
+
 ostream& operator<<(ostream& stream, const Value::Scalar& scalar)
 {
-  return stream << scalar.value();
+  // We discard any additional precision from scalar resources before
+  // writing them to an ostream. This is redundant when the scalar is
+  // obtained from one of the operators below, but user-specified
+  // resource values might contain additional precision.
+  return stream << convertToFloating(convertToFixed(scalar.value()));
 }
 
 
 bool operator==(const Value::Scalar& left, const Value::Scalar& right)
 {
-  return left.value() == right.value();
+  return convertToFixed(left.value()) == convertToFixed(right.value());
 }
 
 
 bool operator<=(const Value::Scalar& left, const Value::Scalar& right)
 {
-  return left.value() <= right.value();
+  return convertToFixed(left.value()) <= convertToFixed(right.value());
 }
 
 
 Value::Scalar operator+(const Value::Scalar& left, const Value::Scalar& right)
 {
-  Value::Scalar result;
-  result.set_value(left.value() + right.value());
+  Value::Scalar result = left;
+  result += right;
   return result;
 }
 
 
 Value::Scalar operator-(const Value::Scalar& left, const Value::Scalar& right)
 {
-  Value::Scalar result;
-  result.set_value(left.value() - right.value());
+  Value::Scalar result = left;
+  result -= right;
   return result;
 }
 
 
 Value::Scalar& operator+=(Value::Scalar& left, const Value::Scalar& right)
 {
-  left.set_value(left.value() + right.value());
+  long long sum = convertToFixed(left.value()) + convertToFixed(right.value());
+  left.set_value(convertToFloating(sum));
   return left;
 }
 
 
 Value::Scalar& operator-=(Value::Scalar& left, const Value::Scalar& right)
 {
-  left.set_value(left.value() - right.value());
+  long long diff = convertToFixed(left.value()) - convertToFixed(right.value());
+  left.set_value(convertToFloating(diff));
   return left;
 }