You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@mesos.apache.org by GitBox <gi...@apache.org> on 2021/05/15 23:12:36 UTC

[GitHub] [mesos] cf-natali opened a new pull request #386: Fixed a LevelDBStorage bug where positions would overflow.

cf-natali opened a new pull request #386:
URL: https://github.com/apache/mesos/pull/386


   Positions are encoded as decimal in ascii, however they were formatted
   as signed integer, which overflows for positions above INT32_MAX,
   causing corruption of the replicated log.
   Note that while unlikely to occur for the registry since it'd be very
   unlikely it'd reach billions of records, it could potentially happen for
   applications using the replicated log.
   
   Closes #10216.
   
   @asekretenko @qianzhangxa 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [mesos] bmahler commented on pull request #386: Fixed a LevelDBStorage bug where positions would overflow.

Posted by GitBox <gi...@apache.org>.
bmahler commented on pull request #386:
URL: https://github.com/apache/mesos/pull/386#issuecomment-845580535


   Thanks for tackling this. @ipronin would be a good reviewer here but to start I think this commit needs more documentation flushed out to help reviewers (i.e. what is the strategy for fixing this? is it backwards compatible? what happens when it reads old overflowed keys? etc)
   
   Here is a private commit from @ipronin for documentation comparison, although note that it uses a different approach that is backwards incompatible and there's another commit with a rekey tool that needs to be run. For an upstream fix, we should try to figure out a backwards compatible way to fix it:
   
   ```
   BACKWARDS INCOMPATIBLE: Fixed replicated log key encoding.
   
   LevelDB keys used by the replicated log storage implementation are
   unsigned 64 bit integer log positions serialized as strings and padded
   with zeroes up to a fixed key size. The current encoding implementation
   is incorrect because it uses the %d formatter that expects a signed 32
   bit integer. It also limits the key size to 10 digits which can fit
   UINT32_MAX but isn't enough for UINT64_MAX.
   
   Because of this bug the possible key range is reduced, and the keys can
   become negative affecting their order. Key overflow can also result in
   the replica's METADATA record (position 0) being overwritten, which in
   turn may cause data loss.
   
   We can fix this by using proper formatting rules in the key encoding
   routine. Note that this change is not backwards compatible. The updated
   storage implementation can recover the state of the old DB because it
   doesn't rely on key encoding for it. But it is unable to read or delete
   any positions with incorrectly encoded keys from the DB. For an update
   the DB either needs to be wiped and reinitialized or migrated to the new
   key encoding. A tool for DB rekeying will be introduced in another
   patch.
   
   Upgrade option 1, perform a rolling upgrade:
   
   1. Stop old master
   2. Wipe DB directory
   3. Start new master with the fix, wait until the replica catches up
   4. Repeat for all masters
   
   Upgrade option 2, use the log re-key tool:
   
   1. Stop old master
   2. Run `mesos-log rekey` tool (see instructions in subsequent patch)
   3. Start new master with the fix
   4. Repeat for all masters
   
   diff --git a/src/log/leveldb.cpp b/src/log/leveldb.cpp
   index fa6088e..b8fc8e9 100644
   --- a/src/log/leveldb.cpp
   +++ b/src/log/leveldb.cpp
   @@ -100,7 +100,11 @@ static string encode(uint64_t position, bool adjust = true)
      // stream.WriteVarint64(position);
      // return s;
    
   -  Try<string> s = strings::format("%.*d", 10, position);
   +  // Pad to 20 characters to maintain key order with the bytewise
   +  // comparator while fitting UINT64_MAX (18,446,744,073,709,551,615).
   +  // Note that this is incompatible with the encoding used in version
   +  // 1.11 and older.
   +  Try<string> s = strings::format("%.*" PRIu64, 20, position);
      CHECK_SOME(s);
      return s.get();
    }
   diff --git a/src/tests/log_tests.cpp b/src/tests/log_tests.cpp
   index a8980e3..68d5f0d 100644
   --- a/src/tests/log_tests.cpp
   +++ b/src/tests/log_tests.cpp
   @@ -321,6 +321,51 @@ TYPED_TEST(LogStorageTest, TruncateWithManyHoles)
    }
    
    
   +// Regression test that verifies that log positions outside of the
   +// uint32_t range are properly encoded and do not cause data loss. See
   +// MESOS-10216 for details.
   +TYPED_TEST(LogStorageTest, IntPositionOverflow)
   +{
   +  Owned<TypeParam> storage(new TypeParam());
   +  Try<Storage::State> state = storage->restore(os::getcwd() + "/.log");
   +  ASSERT_SOME(state);
   +  EXPECT_EQ(Metadata::EMPTY, state->metadata.status());
   +  EXPECT_EQ(0u, state->metadata.promised());
   +  EXPECT_EQ(0u, state->begin);
   +  EXPECT_EQ(0u, state->end);
   +
   +  Metadata metadata;
   +  metadata.set_status(Metadata::VOTING);
   +  metadata.set_promised(1);
   +  ASSERT_SOME(storage->persist(metadata));
   +
   +  // Append positions that could cause uint32_t key overflow. This used
   +  // to be an issue because we string formatted the unsigned 64 bit
   +  // integer keys as signed 32 bit. See MESOS-10216 for details.
   +  const uint64_t first = UINT32_MAX - 5;
   +  const uint64_t last = first + 9;
   +  for (uint64_t i = first; i <= last; i++) {
   +    Action action;
   +    action.set_position(i);
   +    action.set_promised(1);
   +    action.set_performed(1);
   +    action.set_learned(true);
   +    action.set_type(Action::APPEND);
   +    action.mutable_append()->set_bytes(stringify(i));
   +    ASSERT_SOME(storage->persist(action));
   +  }
   +
   +  // Recover the storage and check if the metadata is intact.
   +  storage.reset(new TypeParam());
   +  state = storage->restore(os::getcwd() + "/.log");
   +  ASSERT_SOME(state);
   +  EXPECT_EQ(Metadata::VOTING, state->metadata.status());
   +  EXPECT_EQ(1u, state->metadata.promised());
   +  EXPECT_EQ(0u, state->begin); // No TRUNCATE
   +  EXPECT_EQ(last, state->end);
   +}
   +
   +
    class ReplicaTest : public TemporaryDirectoryTest
    {
    protected:
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [mesos] asekretenko commented on pull request #386: Fixed a LevelDBStorage bug where positions would overflow.

Posted by GitBox <gi...@apache.org>.
asekretenko commented on pull request #386:
URL: https://github.com/apache/mesos/pull/386#issuecomment-852899178


   P.S. Looks like those who implemented backward-incompatible fixes in their private forks will either need one more backward-incompatible upgrade with clearing the log, or to patch out this fix. Given that the tests should work with alternative approaches to fixing `encode()`, this should not be a big deal in either case.
   
   It might even be the case that upgrading from a straightforward 20-digit encoding to the encoding introduced by this patch is _functionally_ equivalent to simply dropping the log on a master: all the previously recorded 20-digit keys will become unreachable after an upgrade.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [mesos] asekretenko merged pull request #386: Fixed a LevelDBStorage bug where positions would overflow.

Posted by GitBox <gi...@apache.org>.
asekretenko merged pull request #386:
URL: https://github.com/apache/mesos/pull/386


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [mesos] asekretenko commented on pull request #386: Fixed a LevelDBStorage bug where positions would overflow.

Posted by GitBox <gi...@apache.org>.
asekretenko commented on pull request #386:
URL: https://github.com/apache/mesos/pull/386#issuecomment-852885997


   @cf-natali Great job, and special thanks for the tests!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [mesos] cf-natali commented on pull request #386: Fixed a LevelDBStorage bug where positions would overflow.

Posted by GitBox <gi...@apache.org>.
cf-natali commented on pull request #386:
URL: https://github.com/apache/mesos/pull/386#issuecomment-849169050


   > @cf-natali @bmahler It strikes me that no backward conversion (from the string key to an integer position) is ever used by the replicated log code.... Do we have a good understanding what the actual contract for `encode()` should look like?
   > 
   > From what I can tell by quickly looking at the code, the main requirement is that `encode()` results, when ordered lexicographically, should be in the order of positions. This means that, **_if I haven't missed something_**, for example, prefixing integers larger than `9'999'999'999` by a single letter `A` will result in the correct order of keys:
   > 
   > ```
   > 0000000000
   > .......
   > 9999999998
   > 9999999999
   > A00000010000000000
   > A00000010000000001
   > ```
   
   Nice!
   Indeed, that's also my understanding, as far as I can tell the only properties needed for `encode()` are that it is:
   - bijective
   - monotonic - the ordering is checked using `leveldb::BytewiseComparator`, which does a lexicographic comparison, see https://github.com/google/leveldb/blob/f57513a1d6c99636fc5b710150d0b93713af4e43/util/comparator.cc#L28 and https://github.com/google/leveldb/blob/5d94ad4d95c09d3ac203ddaf9922e55e730706a8/include/leveldb/slice.h#L100
   
   So yes, it looks like this could work - I'll have a play.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [mesos] cf-natali edited a comment on pull request #386: Fixed a LevelDBStorage bug where positions would overflow.

Posted by GitBox <gi...@apache.org>.
cf-natali edited a comment on pull request #386:
URL: https://github.com/apache/mesos/pull/386#issuecomment-849169050


   > @cf-natali @bmahler It strikes me that no backward conversion (from the string key to an integer position) is ever used by the replicated log code.... Do we have a good understanding what the actual contract for `encode()` should look like?
   > 
   > From what I can tell by quickly looking at the code, the main requirement is that `encode()` results, when ordered lexicographically, should be in the order of positions. This means that, **_if I haven't missed something_**, for example, prefixing integers larger than `9'999'999'999` by a single letter `A` will result in the correct order of keys:
   > 
   > ```
   > 0000000000
   > .......
   > 9999999998
   > 9999999999
   > A00000010000000000
   > A00000010000000001
   > ```
   
   Nice!
   Indeed, that's also my understanding, as far as I can tell the only properties needed for `encode()` are that it is:
   - ~bijective~ injective
   - monotonic - the ordering is checked using `leveldb::BytewiseComparator`, which does a lexicographic comparison, see https://github.com/google/leveldb/blob/f57513a1d6c99636fc5b710150d0b93713af4e43/util/comparator.cc#L28 and https://github.com/google/leveldb/blob/5d94ad4d95c09d3ac203ddaf9922e55e730706a8/include/leveldb/slice.h#L100
   
   So yes, it looks like this could work - I'll have a play.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [mesos] cf-natali commented on pull request #386: Fixed a LevelDBStorage bug where positions would overflow.

Posted by GitBox <gi...@apache.org>.
cf-natali commented on pull request #386:
URL: https://github.com/apache/mesos/pull/386#issuecomment-843419263


   @bmahler , since you seem to still review some MRs :).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [mesos] cf-natali commented on pull request #386: Fixed a LevelDBStorage bug where positions would overflow.

Posted by GitBox <gi...@apache.org>.
cf-natali commented on pull request #386:
URL: https://github.com/apache/mesos/pull/386#issuecomment-846176591


   Hey, @bmahler , thanks for looking at this.
   
   Yes this commit is more a call for discussion.
   This change is backward compatible, however will only increase the maximum record position from `INT32_MAX` to `9'999'999'999`, which was the implicit limit when the encoding was chosen.
   The reason I went for this simple change are that:
   - it's simple
   - it's backward compatible
   - it's not clear how many users are affected by this overflow problem: I don't think Mesos itself should be affected since it'd be quite hard to reach `INT32_MAX` records just with the registry. The original bug report open by @ipronin is quite terse so I'm not sure in which context this was observed - I assume maybe because of a framework used at Twitter which uses the replicated log which managed to overflow it?
   
   Trying to extend the maximum position to `UINT64_MAX` is indeed more tricky, especially in a backward compatible way.
   
   In particular, I don't think it's possible to recover in an overflow happened if the leveldb segments have been compacted. And if overflow occurred I think it's pretty much all bets off so the only reasonable thing to do is to start clean - repairing would be quite tricky
   
   So I can see at least two options:
   1. Merge this change, which increases the range by a factor of over 4X, while being simple and backward compatible.
   2. Extend the range to `UINT64_MAX` using an approach similar to the one from @ipronin 's patch. However requiring users to use a dedicated tool to update the existing log is IMO not a reasonable approach in general. A compromise would be to do the change described in 1 (this merge) for existing logs, but start using the new 20-chars wide format supporting up to `UINT64_MAX` for new logs.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [mesos] cf-natali commented on pull request #386: Fixed a LevelDBStorage bug where positions would overflow.

Posted by GitBox <gi...@apache.org>.
cf-natali commented on pull request #386:
URL: https://github.com/apache/mesos/pull/386#issuecomment-851043018


   @asekretenko Updated.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [mesos] asekretenko commented on pull request #386: Fixed a LevelDBStorage bug where positions would overflow.

Posted by GitBox <gi...@apache.org>.
asekretenko commented on pull request #386:
URL: https://github.com/apache/mesos/pull/386#issuecomment-849077997


   Hmmm... what the approach in this patch definitely helps against, is the replicated log corruption when the position wraps.
   Resulting state will not be recoverable without erasing the replicated log; however, this is cleaner than the effects of the position wrap.  I'm not sure whether the 4X gain of possible position count makes a lot of difference for affected users or not.
   
   @cf-natali @bmahler It strikes me that no backward conversion (from the string key to an integer position) is ever used by the replicated log code.... Do we have a good understanding what the actual contract for `encode()` should look like?
   
   From what I can tell by quickly looking at the code, the main requirement is that `encode()` results, when ordered lexicographically, should be in the order of positions. This means that, **_if I haven't missed something_**, for example, prefixing integers larger than `9'999'999'999` by a single letter `A` will result in the correct order of keys:
   ```
   0000000000
   .......
   9999999998
   9999999999
   A00000010000000000
   A00000010000000001
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [mesos] asekretenko edited a comment on pull request #386: Fixed a LevelDBStorage bug where positions would overflow.

Posted by GitBox <gi...@apache.org>.
asekretenko edited a comment on pull request #386:
URL: https://github.com/apache/mesos/pull/386#issuecomment-852899178


   P.S. Looks like those who implemented backward-incompatible fixes in their private forks will either need one more backward-incompatible upgrade with clearing the log, or to patch out this fix. Given that the tests in this PR should work with alternative approaches to fixing `encode()` as well, this should not be a big deal in either case.
   
   It might even be the case that upgrading from a straightforward 20-digit encoding to the encoding introduced by this patch is _functionally_ equivalent to simply dropping the log on a master: all the previously recorded 20-digit keys will become unreachable after an upgrade.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org