You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@kudu.apache.org by to...@apache.org on 2017/01/23 23:05:51 UTC
[2/2] kudu git commit: KUDU-1836. Enable compression of DeltaFiles
KUDU-1836. Enable compression of DeltaFiles
This adds a new experimental flag for this setting, and changes the
default to be LZ4. LZ4 is quite fast and seems to do a decent job of
compression in real-life scenarios.
I gathered a couple numbers from a ~10GB tablet exported from a use case
at Cloudera which has a lot of UPSERTs. In particular, this workload has
a lot of cases where rows get upserted but the changed value is no
different than the previous contents of the row (so multiple deltas in a
row are basically dupes and highly compressible). This is obviously
close to a best-case, but it's also not a contrived use case (this is a
real app):
Codec Total size Ratio
of deltas
------------------------------
NONE 10458MB
LZO 413MB (25x)
GZIP 296MB (35x)
The above numbers come from running the deltafile through 'lzop' and
'gzip', rather than using CFile compression which is limited to a
smaller block size. So, the results will be not quite as good. However,
they're still likely to be 10x or better, which is substantial.
Change-Id: I754b31c63ef6c5d7b4ffbcbb0ad8982f9978ca83
Reviewed-on: http://gerrit.cloudera.org:8080/5737
Tested-by: Kudu Jenkins
Reviewed-by: David Ribeiro Alves <dr...@apache.org>
Project: http://git-wip-us.apache.org/repos/asf/kudu/repo
Commit: http://git-wip-us.apache.org/repos/asf/kudu/commit/ef57bda2
Tree: http://git-wip-us.apache.org/repos/asf/kudu/tree/ef57bda2
Diff: http://git-wip-us.apache.org/repos/asf/kudu/diff/ef57bda2
Branch: refs/heads/master
Commit: ef57bda2c55154ca44c40e00602e9e3de891fa85
Parents: 45b7dba
Author: Todd Lipcon <to...@apache.org>
Authored: Wed Jan 18 18:23:52 2017 -0800
Committer: Todd Lipcon <to...@apache.org>
Committed: Mon Jan 23 22:46:37 2017 +0000
----------------------------------------------------------------------
src/kudu/tablet/deltafile.cc | 7 +++++++
1 file changed, 7 insertions(+)
----------------------------------------------------------------------
http://git-wip-us.apache.org/repos/asf/kudu/blob/ef57bda2/src/kudu/tablet/deltafile.cc
----------------------------------------------------------------------
diff --git a/src/kudu/tablet/deltafile.cc b/src/kudu/tablet/deltafile.cc
index 1664133..3975f4c 100644
--- a/src/kudu/tablet/deltafile.cc
+++ b/src/kudu/tablet/deltafile.cc
@@ -32,6 +32,7 @@
#include "kudu/tablet/mutation.h"
#include "kudu/tablet/mvcc.h"
#include "kudu/util/coding-inl.h"
+#include "kudu/util/compression/compression_codec.h"
#include "kudu/util/flag_tags.h"
#include "kudu/util/hexdump.h"
#include "kudu/util/pb_util.h"
@@ -43,6 +44,10 @@ DEFINE_int32(deltafile_default_block_size, 32*1024,
"on a per-table basis.");
TAG_FLAG(deltafile_default_block_size, experimental);
+DEFINE_string(deltafile_default_compression_codec, "lz4",
+ "The compression codec used when writing deltafiles.");
+TAG_FLAG(deltafile_default_compression_codec, experimental);
+
using std::shared_ptr;
using std::unique_ptr;
@@ -74,6 +79,8 @@ DeltaFileWriter::DeltaFileWriter(gscoped_ptr<WritableBlock> block)
opts.write_validx = true;
opts.storage_attributes.cfile_block_size = FLAGS_deltafile_default_block_size;
opts.storage_attributes.encoding = PLAIN_ENCODING;
+ opts.storage_attributes.compression = GetCompressionCodecType(
+ FLAGS_deltafile_default_compression_codec);
// No optimization for deltafiles because a deltafile index key must decode into a DeltaKey
opts.optimize_index_keys = false;
writer_.reset(new cfile::CFileWriter(opts, GetTypeInfo(BINARY), false, std::move(block)));