You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@kudu.apache.org by ad...@apache.org on 2017/01/12 08:23:27 UTC

[3/8] kudu git commit: KUDU-1600 (part 1): bump to CFile version 2

KUDU-1600 (part 1): bump to CFile version 2

This is preparatory yak-shaving work for KUDU-1600, an effort to prevent
double-compression of inherently-compressed encodings such as
bitshuffle. In order to enable or disable compression on a per-block
basis, we need to change the header of the compressed blocks to indicate
whether that block was compressed. Doing so, however, is obviously an
incompatible change that earlier versions of Kudu would be confused by.

This is not itself a blocker: we haven't guaranteed yet that we will
always support downgrade between successive versions of Kudu. However,
if someone _does_ downgrade, we would prefer not to crash in confusing
ways or silently return corrupt data. Instead, we would like to give an
error message that clearly traces back to an incompatible file format.

At first glance, it would seem that the route to make such an
incompatible change would be to bump the 'major_version' field present
in the CFileHeaderPB. However, it turns out, despite this field existing
since the very first ever commit to Kudu, we never implemented any code
that actually _reads_ the field. So, bumping the major version would not
actually prevent an incompatible (earlier) Kudu server from reading a
file that it can't understand. Woops!

So, in order to make the new CFiles incompatible with the old, the only
solution is to actually bump the CFile magic string itself. This commit
changes it from 'kuducfil' to 'kuducfl2'. An incompatible server trying
to read such a file will now result in a "bad magic" error message. It's
still not 100% obvious to the average user, but it's better than a weird
error about mismatched compression lengths or a crash.

While bumping the magic string solves the immediate issue at hand, this
is also a good opportunity to fix the structural problem and add a
better mechanism to make such changes in the future.

Rather than implement the original plan of major/minor version, this
patch takes a "feature flags" approach modeled after the 'ext' family of
file systems. The advantages of feature flags over a linear 'version
number' scheme are:

- simplifies non-linear backporting of features to minor releases. For
  example, if Kudu 2.0 adds feature X, and 2.1 adds feature Y, this
  makes it possible to backport Y without X to Kudu 1.8.
- allows granular enablement of new features. For example, imagine that
  we introduce an experimental feature (eg an optimization enabled by a
  flag) in Kudu 1.3, and a user upgrades to 1.3 without enabling that
  flag. We would then avoid writing the flag into any cfiles, and we
  would allow compatible downgrade back to 1.2.
- allows distinguishing behavior between incompatible and compatible
  changes

I manually validated that, if I wrote data with a new server, and tried
to read it with an old one, it gave me an error with "bad magic" as
expected.

Change-Id: I13999de3d406fb184cea3e6e8d52ffd80b8d8ee3
Reviewed-on: http://gerrit.cloudera.org:8080/5678
Reviewed-by: Alexey Serbin <as...@cloudera.com>
Tested-by: Kudu Jenkins


Project: http://git-wip-us.apache.org/repos/asf/kudu/repo
Commit: http://git-wip-us.apache.org/repos/asf/kudu/commit/f82cf691
Tree: http://git-wip-us.apache.org/repos/asf/kudu/tree/f82cf691
Diff: http://git-wip-us.apache.org/repos/asf/kudu/diff/f82cf691

Branch: refs/heads/master
Commit: f82cf6918c00dff6aecad5a6b50836b7eb5db508
Parents: 4114ba9
Author: Todd Lipcon <to...@apache.org>
Authored: Tue Jan 10 22:43:01 2017 -0800
Committer: Todd Lipcon <to...@apache.org>
Committed: Thu Jan 12 06:15:30 2017 +0000

----------------------------------------------------------------------
 docs/design-docs/cfile.md        | 13 ++++++++++--
 src/kudu/cfile/cfile.proto       | 40 +++++++++++++++++++++++++++++++----
 src/kudu/cfile/cfile_reader.cc   | 26 ++++++++++++++++++-----
 src/kudu/cfile/cfile_reader.h    |  2 ++
 src/kudu/cfile/cfile_writer.cc   | 10 ++++-----
 src/kudu/cfile/cfile_writer.h    |  7 +++---
 src/kudu/tools/kudu-tool-test.cc |  7 +++---
 7 files changed, 81 insertions(+), 24 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/kudu/blob/f82cf691/docs/design-docs/cfile.md
----------------------------------------------------------------------
diff --git a/docs/design-docs/cfile.md b/docs/design-docs/cfile.md
index 4606d3b..90a065d 100644
--- a/docs/design-docs/cfile.md
+++ b/docs/design-docs/cfile.md
@@ -27,7 +27,7 @@ EOF
 Header
 ------
 
-<magic>: the string 'kuducfil'
+<magic>: see below
 <header length>: 32-bit unsigned integer length delimiter
 <header>: CFileHeaderPB protobuf
 
@@ -36,10 +36,19 @@ Footer
 ------
 
 <footer>: CFileFooterPB protobuf
-<magic>: the string 'kuducfil'
+<magic>: see below
 <footer length> (length of protobuf)
+
+
+Magic strings
+-------------
+CFile v1: the string 'kuducfil'
+CFile v2: the string 'kuducfl2'
 ```
 
+The two versions are identical except where specifically noted in the source
+code or this document.
+
 ==============================
 
 Data blocks:

http://git-wip-us.apache.org/repos/asf/kudu/blob/f82cf691/src/kudu/cfile/cfile.proto
----------------------------------------------------------------------
diff --git a/src/kudu/cfile/cfile.proto b/src/kudu/cfile/cfile.proto
index f59a030..3215510 100644
--- a/src/kudu/cfile/cfile.proto
+++ b/src/kudu/cfile/cfile.proto
@@ -31,9 +31,12 @@ message FileMetadataPairPB {
 }
 
 message CFileHeaderPB {
-  required int32 major_version = 1;
-  required int32 minor_version = 2;
-
+  // These fields were used in Kudu <= 1.2, for the original header cfile format
+  // corresponding to the 'kuducfil' magic string.
+  // The new format ('kuducfl2' magic) no longer uses them, instead using the
+  // "feature flags" in the CFileFooterPB.
+  // required int32 major_version = 1;
+  // required int32 minor_version = 2;
   repeated FileMetadataPairPB metadata = 3;
 }
 
@@ -72,11 +75,40 @@ message CFileFooterPB {
 
   repeated FileMetadataPairPB metadata = 7;
 
-  optional bool is_type_nullable = 8 [default=false]; // TODO use enum with encoding?
+  optional bool is_type_nullable = 8 [default=false];
 
   // Block pointer for dictionary block if the cfile is dictionary encoded.
   // Only for dictionary encoding.
   optional BlockPointerPB dict_block_ptr = 9;
+
+  // Bitsets of features used by this cfile.
+  //
+  // We use a uint32 bitset instead of a repeated enum in order to save memory since
+  // these footers are kept around in memory for every open CFile. If we ever run out
+  // of bits, we can use the last bit of either list to enable a second bitset.
+  //
+  // We designate each feature as either compatible or incompatible.
+  //
+  // INCOMPATIBLE
+  // ---------------
+  // An incompatible feature is one which must be supported by a reader
+  // in order to properly read the file. If a reader sees an unknown bit set in this list,
+  // it should refuse to read the file.
+  //
+  // Example: an incompatible change to the format of one of the block encodings.
+  // If a reader isn't aware of the flag, it would read incorrect data.
+  //
+  // COMPATIBLE
+  // -----------
+  // A compatible feature is one which may be safely ignored by a reader without
+  // incorrect results. A reader may proceed but may not be taking advantage of
+  // all available index structures, etc.
+  //
+  // An example case here would be something like an additional index structure
+  // or summary statistic which an aware reader could potentially use, but an
+  // old reader could safely ignore.
+  optional uint32 incompatible_features = 10;
+  optional uint32 compatible_features = 11;
 }
 
 

http://git-wip-us.apache.org/repos/asf/kudu/blob/f82cf691/src/kudu/cfile/cfile_reader.cc
----------------------------------------------------------------------
diff --git a/src/kudu/cfile/cfile_reader.cc b/src/kudu/cfile/cfile_reader.cc
index facf474..6205397 100644
--- a/src/kudu/cfile/cfile_reader.cc
+++ b/src/kudu/cfile/cfile_reader.cc
@@ -26,7 +26,7 @@
 #include "kudu/cfile/block_handle.h"
 #include "kudu/cfile/block_pointer.h"
 #include "kudu/cfile/cfile.pb.h"
-#include "kudu/cfile/cfile_writer.h" // for kMagicString
+#include "kudu/cfile/cfile_writer.h"
 #include "kudu/cfile/index_block.h"
 #include "kudu/cfile/index_btree.h"
 #include "kudu/gutil/gscoped_ptr.h"
@@ -62,20 +62,29 @@ static const size_t kMagicAndLengthSize = 12;
 static const size_t kMaxHeaderFooterPBSize = 64*1024;
 
 static Status ParseMagicAndLength(const Slice &data,
+                                  uint8_t* cfile_version,
                                   uint32_t *parsed_len) {
   if (data.size() != kMagicAndLengthSize) {
     return Status::Corruption("Bad size data");
   }
 
-  if (memcmp(kMagicString, data.data(), strlen(kMagicString)) != 0) {
+  uint8_t version;
+  if (memcmp(kMagicStringV1, data.data(), kMagicLength) == 0) {
+    version = 1;
+  } else if (memcmp(kMagicStringV2, data.data(), kMagicLength) == 0) {
+    version = 2;
+  } else {
     return Status::Corruption("bad magic");
   }
 
-  *parsed_len = DecodeFixed32(data.data() + strlen(kMagicString));
-  if (*parsed_len <= 0 || *parsed_len > kMaxHeaderFooterPBSize) {
+  uint32_t len = DecodeFixed32(data.data() + kMagicLength);
+  if (len > kMaxHeaderFooterPBSize) {
     return Status::Corruption("invalid data size");
   }
 
+  *cfile_version = version;
+  *parsed_len = len;
+
   return Status::OK();
 }
 
@@ -125,7 +134,7 @@ Status CFileReader::ReadMagicAndLength(uint64_t offset, uint32_t *len) {
   RETURN_NOT_OK(block_->Read(offset, kMagicAndLengthSize,
                              &slice, scratch));
 
-  return ParseMagicAndLength(slice, len);
+  return ParseMagicAndLength(slice, &cfile_version_, len);
 }
 
 Status CFileReader::InitOnce() {
@@ -136,6 +145,13 @@ Status CFileReader::InitOnce() {
 
   RETURN_NOT_OK(ReadAndParseFooter());
 
+  if (PREDICT_FALSE(footer_->incompatible_features() != 0)) {
+    // Currently we do not support any incompatible features.
+    return Status::NotSupported(Substitute(
+        "cfile uses features from an incompatible version: $0",
+        footer_->incompatible_features()));
+  }
+
   type_info_ = GetTypeInfo(footer_->data_type());
 
   RETURN_NOT_OK(TypeEncodingInfo::Get(type_info_,

http://git-wip-us.apache.org/repos/asf/kudu/blob/f82cf691/src/kudu/cfile/cfile_reader.h
----------------------------------------------------------------------
diff --git a/src/kudu/cfile/cfile_reader.h b/src/kudu/cfile/cfile_reader.h
index f11b4e2..d5ac795 100644
--- a/src/kudu/cfile/cfile_reader.h
+++ b/src/kudu/cfile/cfile_reader.h
@@ -193,6 +193,8 @@ class CFileReader {
   const gscoped_ptr<fs::ReadableBlock> block_;
   const uint64_t file_size_;
 
+  uint8_t cfile_version_;
+
   gscoped_ptr<CFileHeaderPB> header_;
   gscoped_ptr<CFileFooterPB> footer_;
 

http://git-wip-us.apache.org/repos/asf/kudu/blob/f82cf691/src/kudu/cfile/cfile_writer.cc
----------------------------------------------------------------------
diff --git a/src/kudu/cfile/cfile_writer.cc b/src/kudu/cfile/cfile_writer.cc
index 4e53df0..7f98ab4 100644
--- a/src/kudu/cfile/cfile_writer.cc
+++ b/src/kudu/cfile/cfile_writer.cc
@@ -68,7 +68,9 @@ TAG_FLAG(cfile_do_on_finish, experimental);
 namespace kudu {
 namespace cfile {
 
-const char kMagicString[] = "kuducfil";
+const char kMagicStringV1[] = "kuducfil";
+const char kMagicStringV2[] = "kuducfl2";
+const int kMagicLength = 8;
 
 static const size_t kMinBlockSize = 512;
 
@@ -157,15 +159,13 @@ Status CFileWriter::Start() {
   }
 
   CFileHeaderPB header;
-  header.set_major_version(kCFileMajorVersion);
-  header.set_minor_version(kCFileMinorVersion);
   FlushMetadataToPB(header.mutable_metadata());
 
   uint32_t pb_size = header.ByteSize();
 
   faststring buf;
   // First the magic.
-  buf.append(kMagicString);
+  buf.append(kMagicStringV2);
   // Then Length-prefixed header.
   PutFixed32(&buf, pb_size);
   pb_util::AppendToString(header, &buf);
@@ -236,7 +236,7 @@ Status CFileWriter::FinishAndReleaseBlock(ScopedWritableBlockCloser* closer) {
   faststring footer_str;
   pb_util::SerializeToString(footer, &footer_str);
 
-  footer_str.append(kMagicString);
+  footer_str.append(kMagicStringV2);
   PutFixed32(&footer_str, footer.GetCachedSize());
 
   RETURN_NOT_OK(block_->Append(footer_str));

http://git-wip-us.apache.org/repos/asf/kudu/blob/f82cf691/src/kudu/cfile/cfile_writer.h
----------------------------------------------------------------------
diff --git a/src/kudu/cfile/cfile_writer.h b/src/kudu/cfile/cfile_writer.h
index db55286..e619c2a 100644
--- a/src/kudu/cfile/cfile_writer.h
+++ b/src/kudu/cfile/cfile_writer.h
@@ -51,10 +51,9 @@ class BTreeInfoPB;
 class IndexTreeBuilder;
 
 // Magic used in header/footer
-extern const char kMagicString[];
-
-const int kCFileMajorVersion = 1;
-const int kCFileMinorVersion = 0;
+extern const char kMagicStringV1[];
+extern const char kMagicStringV2[];
+extern const int kMagicLength;
 
 class NullBitmapBuilder {
  public:

http://git-wip-us.apache.org/repos/asf/kudu/blob/f82cf691/src/kudu/tools/kudu-tool-test.cc
----------------------------------------------------------------------
diff --git a/src/kudu/tools/kudu-tool-test.cc b/src/kudu/tools/kudu-tool-test.cc
index 55672f0..9fd85ee 100644
--- a/src/kudu/tools/kudu-tool-test.cc
+++ b/src/kudu/tools/kudu-tool-test.cc
@@ -527,7 +527,7 @@ TEST_F(ToolTest, TestFsDumpCFile) {
     SCOPED_TRACE(stdout);
     ASSERT_GE(stdout.size(), 4);
     ASSERT_EQ(stdout[0], "Header:");
-    ASSERT_EQ(stdout[3], "Footer:");
+    ASSERT_EQ(stdout[1], "Footer:");
   }
   {
     NO_FATALS(RunActionStdoutLines(Substitute(
@@ -543,7 +543,7 @@ TEST_F(ToolTest, TestFsDumpCFile) {
     SCOPED_TRACE(stdout);
     ASSERT_GT(stdout.size(), kNumEntries);
     ASSERT_EQ(stdout[0], "Header:");
-    ASSERT_EQ(stdout[3], "Footer:");
+    ASSERT_EQ(stdout[1], "Footer:");
   }
 }
 
@@ -790,8 +790,7 @@ TEST_F(ToolTest, TestLocalReplicaOps) {
     ASSERT_STR_CONTAINS(stdout, "bloom_block {");
     ASSERT_STR_MATCHES(stdout, "id: .*");
     ASSERT_STR_CONTAINS(stdout, "undo_deltas {");
-    ASSERT_STR_MATCHES(stdout, "CFile Header: "
-                        "major_version: .* minor_version: .*");
+    ASSERT_STR_MATCHES(stdout, "CFile Header: ");
     ASSERT_STR_MATCHES(stdout, "Delta stats:.*");
     ASSERT_STR_MATCHES(stdout, "ts range=.*");
     ASSERT_STR_MATCHES(stdout, "update_counts_by_col_id=.*");