You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@kudu.apache.org by to...@apache.org on 2017/01/23 23:05:51 UTC

[2/2] kudu git commit: KUDU-1836. Enable compression of DeltaFiles

KUDU-1836. Enable compression of DeltaFiles

This adds a new experimental flag for this setting, and changes the
default to be LZ4. LZ4 is quite fast and seems to do a decent job of
compression in real-life scenarios.

I gathered a couple numbers from a ~10GB tablet exported from a use case
at Cloudera which has a lot of UPSERTs. In particular, this workload has
a lot of cases where rows get upserted but the changed value is no
different than the previous contents of the row (so multiple deltas in a
row are basically dupes and highly compressible). This is obviously
close to a best-case, but it's also not a contrived use case (this is a
real app):

Codec       Total size   Ratio
            of deltas
------------------------------
NONE        10458MB
LZO         413MB        (25x)
GZIP        296MB        (35x)

The above numbers come from running the deltafile through 'lzop' and
'gzip', rather than using CFile compression which is limited to a
smaller block size. So, the results will be not quite as good. However,
they're still likely to be 10x or better, which is substantial.

Change-Id: I754b31c63ef6c5d7b4ffbcbb0ad8982f9978ca83
Reviewed-on: http://gerrit.cloudera.org:8080/5737
Tested-by: Kudu Jenkins
Reviewed-by: David Ribeiro Alves <dr...@apache.org>


Project: http://git-wip-us.apache.org/repos/asf/kudu/repo
Commit: http://git-wip-us.apache.org/repos/asf/kudu/commit/ef57bda2
Tree: http://git-wip-us.apache.org/repos/asf/kudu/tree/ef57bda2
Diff: http://git-wip-us.apache.org/repos/asf/kudu/diff/ef57bda2

Branch: refs/heads/master
Commit: ef57bda2c55154ca44c40e00602e9e3de891fa85
Parents: 45b7dba
Author: Todd Lipcon <to...@apache.org>
Authored: Wed Jan 18 18:23:52 2017 -0800
Committer: Todd Lipcon <to...@apache.org>
Committed: Mon Jan 23 22:46:37 2017 +0000

----------------------------------------------------------------------
 src/kudu/tablet/deltafile.cc | 7 +++++++
 1 file changed, 7 insertions(+)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/kudu/blob/ef57bda2/src/kudu/tablet/deltafile.cc
----------------------------------------------------------------------
diff --git a/src/kudu/tablet/deltafile.cc b/src/kudu/tablet/deltafile.cc
index 1664133..3975f4c 100644
--- a/src/kudu/tablet/deltafile.cc
+++ b/src/kudu/tablet/deltafile.cc
@@ -32,6 +32,7 @@
 #include "kudu/tablet/mutation.h"
 #include "kudu/tablet/mvcc.h"
 #include "kudu/util/coding-inl.h"
+#include "kudu/util/compression/compression_codec.h"
 #include "kudu/util/flag_tags.h"
 #include "kudu/util/hexdump.h"
 #include "kudu/util/pb_util.h"
@@ -43,6 +44,10 @@ DEFINE_int32(deltafile_default_block_size, 32*1024,
              "on a per-table basis.");
 TAG_FLAG(deltafile_default_block_size, experimental);
 
+DEFINE_string(deltafile_default_compression_codec, "lz4",
+              "The compression codec used when writing deltafiles.");
+TAG_FLAG(deltafile_default_compression_codec, experimental);
+
 using std::shared_ptr;
 using std::unique_ptr;
 
@@ -74,6 +79,8 @@ DeltaFileWriter::DeltaFileWriter(gscoped_ptr<WritableBlock> block)
   opts.write_validx = true;
   opts.storage_attributes.cfile_block_size = FLAGS_deltafile_default_block_size;
   opts.storage_attributes.encoding = PLAIN_ENCODING;
+  opts.storage_attributes.compression = GetCompressionCodecType(
+      FLAGS_deltafile_default_compression_codec);
   // No optimization for deltafiles because a deltafile index key must decode into a DeltaKey
   opts.optimize_index_keys = false;
   writer_.reset(new cfile::CFileWriter(opts, GetTypeInfo(BINARY), false, std::move(block)));