You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@mesos.apache.org by GitBox <gi...@apache.org> on 2021/05/21 01:05:47 UTC

[GitHub] [mesos] bmahler commented on pull request #386: Fixed a LevelDBStorage bug where positions would overflow.

bmahler commented on pull request #386:
URL: https://github.com/apache/mesos/pull/386#issuecomment-845580535


   Thanks for tackling this. @ipronin would be a good reviewer here but to start I think this commit needs more documentation flushed out to help reviewers (i.e. what is the strategy for fixing this? is it backwards compatible? what happens when it reads old overflowed keys? etc)
   
   Here is a private commit from @ipronin for documentation comparison, although note that it uses a different approach that is backwards incompatible and there's another commit with a rekey tool that needs to be run. For an upstream fix, we should try to figure out a backwards compatible way to fix it:
   
   ```
   BACKWARDS INCOMPATIBLE: Fixed replicated log key encoding.
   
   LevelDB keys used by the replicated log storage implementation are
   unsigned 64 bit integer log positions serialized as strings and padded
   with zeroes up to a fixed key size. The current encoding implementation
   is incorrect because it uses the %d formatter that expects a signed 32
   bit integer. It also limits the key size to 10 digits which can fit
   UINT32_MAX but isn't enough for UINT64_MAX.
   
   Because of this bug the possible key range is reduced, and the keys can
   become negative affecting their order. Key overflow can also result in
   the replica's METADATA record (position 0) being overwritten, which in
   turn may cause data loss.
   
   We can fix this by using proper formatting rules in the key encoding
   routine. Note that this change is not backwards compatible. The updated
   storage implementation can recover the state of the old DB because it
   doesn't rely on key encoding for it. But it is unable to read or delete
   any positions with incorrectly encoded keys from the DB. For an update
   the DB either needs to be wiped and reinitialized or migrated to the new
   key encoding. A tool for DB rekeying will be introduced in another
   patch.
   
   Upgrade option 1, perform a rolling upgrade:
   
   1. Stop old master
   2. Wipe DB directory
   3. Start new master with the fix, wait until the replica catches up
   4. Repeat for all masters
   
   Upgrade option 2, use the log re-key tool:
   
   1. Stop old master
   2. Run `mesos-log rekey` tool (see instructions in subsequent patch)
   3. Start new master with the fix
   4. Repeat for all masters
   
   diff --git a/src/log/leveldb.cpp b/src/log/leveldb.cpp
   index fa6088e..b8fc8e9 100644
   --- a/src/log/leveldb.cpp
   +++ b/src/log/leveldb.cpp
   @@ -100,7 +100,11 @@ static string encode(uint64_t position, bool adjust = true)
      // stream.WriteVarint64(position);
      // return s;
    
   -  Try<string> s = strings::format("%.*d", 10, position);
   +  // Pad to 20 characters to maintain key order with the bytewise
   +  // comparator while fitting UINT64_MAX (18,446,744,073,709,551,615).
   +  // Note that this is incompatible with the encoding used in version
   +  // 1.11 and older.
   +  Try<string> s = strings::format("%.*" PRIu64, 20, position);
      CHECK_SOME(s);
      return s.get();
    }
   diff --git a/src/tests/log_tests.cpp b/src/tests/log_tests.cpp
   index a8980e3..68d5f0d 100644
   --- a/src/tests/log_tests.cpp
   +++ b/src/tests/log_tests.cpp
   @@ -321,6 +321,51 @@ TYPED_TEST(LogStorageTest, TruncateWithManyHoles)
    }
    
    
   +// Regression test that verifies that log positions outside of the
   +// uint32_t range are properly encoded and do not cause data loss. See
   +// MESOS-10216 for details.
   +TYPED_TEST(LogStorageTest, IntPositionOverflow)
   +{
   +  Owned<TypeParam> storage(new TypeParam());
   +  Try<Storage::State> state = storage->restore(os::getcwd() + "/.log");
   +  ASSERT_SOME(state);
   +  EXPECT_EQ(Metadata::EMPTY, state->metadata.status());
   +  EXPECT_EQ(0u, state->metadata.promised());
   +  EXPECT_EQ(0u, state->begin);
   +  EXPECT_EQ(0u, state->end);
   +
   +  Metadata metadata;
   +  metadata.set_status(Metadata::VOTING);
   +  metadata.set_promised(1);
   +  ASSERT_SOME(storage->persist(metadata));
   +
   +  // Append positions that could cause uint32_t key overflow. This used
   +  // to be an issue because we string formatted the unsigned 64 bit
   +  // integer keys as signed 32 bit. See MESOS-10216 for details.
   +  const uint64_t first = UINT32_MAX - 5;
   +  const uint64_t last = first + 9;
   +  for (uint64_t i = first; i <= last; i++) {
   +    Action action;
   +    action.set_position(i);
   +    action.set_promised(1);
   +    action.set_performed(1);
   +    action.set_learned(true);
   +    action.set_type(Action::APPEND);
   +    action.mutable_append()->set_bytes(stringify(i));
   +    ASSERT_SOME(storage->persist(action));
   +  }
   +
   +  // Recover the storage and check if the metadata is intact.
   +  storage.reset(new TypeParam());
   +  state = storage->restore(os::getcwd() + "/.log");
   +  ASSERT_SOME(state);
   +  EXPECT_EQ(Metadata::VOTING, state->metadata.status());
   +  EXPECT_EQ(1u, state->metadata.promised());
   +  EXPECT_EQ(0u, state->begin); // No TRUNCATE
   +  EXPECT_EQ(last, state->end);
   +}
   +
   +
    class ReplicaTest : public TemporaryDirectoryTest
    {
    protected:
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org