You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@kudu.apache.org by "Todd Lipcon (Code Review)" <ge...@cloudera.org> on 2017/01/19 00:49:48 UTC

[kudu-CR] WIP: KUDU-1835 (part 2): enable WAL compression

Todd Lipcon has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/5736

Change subject: WIP: KUDU-1835 (part 2): enable WAL compression
......................................................................

WIP: KUDU-1835 (part 2): enable WAL compression

Change-Id: I0fc790ddc72a519a82bbb6b71a902418051b35a5
---
M src/kudu/consensus/CMakeLists.txt
M src/kudu/consensus/log.cc
M src/kudu/consensus/log.h
M src/kudu/consensus/log.proto
M src/kudu/consensus/log_util.cc
M src/kudu/consensus/log_util.h
M src/kudu/util/compression/compression_codec.cc
M src/kudu/util/compression/compression_codec.h
8 files changed, 193 insertions(+), 32 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/36/5736/1
-- 
To view, visit http://gerrit.cloudera.org:8080/5736
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I0fc790ddc72a519a82bbb6b71a902418051b35a5
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon <to...@apache.org>

[kudu-CR] KUDU-1835 (part 2): enable WAL compression

Posted by "David Ribeiro Alves (Code Review)" <ge...@cloudera.org>.
David Ribeiro Alves has posted comments on this change.

Change subject: KUDU-1835 (part 2): enable WAL compression
......................................................................


Patch Set 8:

(7 comments)

http://gerrit.cloudera.org:8080/#/c/5736/8/src/kudu/consensus/log.cc
File src/kudu/consensus/log.cc:

PS8, Line 340: c
I forget, we don't capitalize these?


http://gerrit.cloudera.org:8080/#/c/5736/8/src/kudu/consensus/log.proto
File src/kudu/consensus/log.proto:

PS8, Line 55: / Log format major/minor version. These were written by Kudu 1.2 and
            :   // earlier, and marked as required in those versions, but unfortunately
            :   // they were never verified on read. So, in order to make the logs written
            :   // by newer versions give a reasonable error if an old version tries to
            :   // read them, we no longer write these fields.
            :   optional uint32 DEPRECATED_major_version = 1;
            :   optional uint32 DEPRECATED_minor_version = 2;
so you're replacing these with feature flags, right? should we do the same for cfiles?


http://gerrit.cloudera.org:8080/#/c/5736/8/src/kudu/consensus/log_reader.cc
File src/kudu/consensus/log_reader.cc:

PS8, Line 262: kEntryHeaderSizeV2
what if this is reading an older log segment?


http://gerrit.cloudera.org:8080/#/c/5736/8/src/kudu/consensus/log_util.cc
File src/kudu/consensus/log_util.cc:

PS8, Line 429: verison
typo


http://gerrit.cloudera.org:8080/#/c/5736/8/src/kudu/consensus/log_util.h
File src/kudu/consensus/log_util.h:

PS8, Line 50: Ssee
typo


PS8, Line 305: Older versions of Kudu used a different log entry header format.
Older than what?


http://gerrit.cloudera.org:8080/#/c/5736/8/src/kudu/util/compression/compression_codec.h
File src/kudu/util/compression/compression_codec.h:

PS8, Line 59: CompressionType
docs


-- 
To view, visit http://gerrit.cloudera.org:8080/5736
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I0fc790ddc72a519a82bbb6b71a902418051b35a5
Gerrit-PatchSet: 8
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: David Ribeiro Alves <dr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-HasComments: Yes

[kudu-CR] WIP: KUDU-1835 (part 2): enable WAL compression

Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/5736

to look at the new patch set (#5).

Change subject: WIP: KUDU-1835 (part 2): enable WAL compression
......................................................................

WIP: KUDU-1835 (part 2): enable WAL compression

To test the efficiency, I loaded some data into a local tablet server
using kudu-ts and the influxdb benchmark as a data generator. I think
restarted the server with different configurations, and disabled log
segment prealloocation in order to see the _actual_ length of the WALs
instead of the preallocated length.

Compression     Size
NONE            311M
LZ4              98M - 3.17x
SNAPPY           91M - 3.41x
ZLIB             59M - 5.27x

Based on some results in the JIRA I expect more significant improvements
in some other workloads, since kudu-ts already does its own sort of
"dictionary encoding" by normalizing out the tagset IDs.

WIP items:
- need to make sure that this is compatible when set to NONE and
  incompatible when enabled
- may want to use a common dictionary at the top of the file in order to
  improve performance for small log appends
- need to update log-test and add other coverage

Change-Id: I0fc790ddc72a519a82bbb6b71a902418051b35a5
---
M src/kudu/consensus/CMakeLists.txt
M src/kudu/consensus/log-test.cc
M src/kudu/consensus/log.cc
M src/kudu/consensus/log.h
M src/kudu/consensus/log.proto
M src/kudu/consensus/log_anchor_registry.h
M src/kudu/consensus/log_reader.h
M src/kudu/consensus/log_util.cc
M src/kudu/consensus/log_util.h
M src/kudu/integration-tests/disk_reservation-itest.cc
M src/kudu/integration-tests/raft_consensus-itest.cc
M src/kudu/util/compression/compression_codec.cc
M src/kudu/util/compression/compression_codec.h
13 files changed, 248 insertions(+), 59 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/36/5736/5
-- 
To view, visit http://gerrit.cloudera.org:8080/5736
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I0fc790ddc72a519a82bbb6b71a902418051b35a5
Gerrit-PatchSet: 5
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot

[kudu-CR] KUDU-1835 (part 2): enable WAL compression

Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Todd Lipcon has submitted this change and it was merged.

Change subject: KUDU-1835 (part 2): enable WAL compression
......................................................................


KUDU-1835 (part 2): enable WAL compression

This enables compression of the entries in the WAL.

To test the efficiency, I loaded some data into a local tablet server
using kudu-ts and the influxdb benchmark as a data generator. I think
restarted the server with different configurations, and disabled log
segment prealloocation in order to see the _actual_ length of the WALs
instead of the preallocated length.

Compression     Size
NONE            311M
LZ4              98M - 3.17x
SNAPPY           91M - 3.41x
ZLIB             59M - 5.27x

Based on some results in the JIRA I expect more significant improvements
in some other workloads, since kudu-ts already does its own sort of
"dictionary encoding" by normalizing out the tagset IDs.

In order to enable this feature, an incompatible change was made in the
WAL format. Each entry header now contains an extra field indicating the
uncompressed length.

Given that, it's desirable that old servers be prohibited from opening
new-format WALs. Ideally, we could have bumped the major version field
in the WAL header. However, the old reader code never actually checked
the version fields on read. So, this takes the approach of not setting
the versions anymore, and making them deprecated. Since they were
'required' proto fields in old versions, this will make the reader fail
to parse the header with an error that the version field is missing.
This is preferable to failing later with a Corruption error about an
entry header CRC mismatch.

To solve this problem for future versions, I also added an
'incompatible_features' flag set. See commit f82cf6918c00dff6aec for
more information about this approach.

Note that it might have been possible to do this in a more
backward-compatible way by making the default codec be "NONE" and having
the new version write old-format headers when compression is disabled.
I initially took this approach, but eventually abandoned it as the code
was more complex. Additionally, we already have a different data format
incompatible change coming in Kudu 1.3 (the above-referenced
f82cf6918c00dff6aec). So adding a second one doesn't add additional
burden on users.

I tested the compatibility manually:
- started a Kudu 1.2 server, wrote some data
- upgraded to a version with this patch
- downgraded back to a version without this patch, and saw that the
  tablets failed to start, with a message that the major/minor version
  fields were missing
- upgraded back to this patch and verified the tablets started

Change-Id: I0fc790ddc72a519a82bbb6b71a902418051b35a5
Reviewed-on: http://gerrit.cloudera.org:8080/5736
Tested-by: Kudu Jenkins
Reviewed-by: David Ribeiro Alves <dr...@apache.org>
---
M src/kudu/consensus/CMakeLists.txt
M src/kudu/consensus/log-test-base.h
M src/kudu/consensus/log-test.cc
M src/kudu/consensus/log.cc
M src/kudu/consensus/log.h
M src/kudu/consensus/log.proto
M src/kudu/consensus/log_anchor_registry.h
M src/kudu/consensus/log_reader.cc
M src/kudu/consensus/log_reader.h
M src/kudu/consensus/log_util.cc
M src/kudu/consensus/log_util.h
M src/kudu/integration-tests/disk_reservation-itest.cc
M src/kudu/integration-tests/raft_consensus-itest.cc
M src/kudu/util/compression/compression_codec.cc
M src/kudu/util/compression/compression_codec.h
15 files changed, 274 insertions(+), 106 deletions(-)

Approvals:
  David Ribeiro Alves: Looks good to me, approved
  Kudu Jenkins: Verified



-- 
To view, visit http://gerrit.cloudera.org:8080/5736
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: I0fc790ddc72a519a82bbb6b71a902418051b35a5
Gerrit-PatchSet: 10
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: David Ribeiro Alves <dr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] KUDU-1835 (part 2): enable WAL compression

Posted by "David Ribeiro Alves (Code Review)" <ge...@cloudera.org>.
David Ribeiro Alves has posted comments on this change.

Change subject: KUDU-1835 (part 2): enable WAL compression
......................................................................


Patch Set 9: Code-Review+2

(1 comment)

http://gerrit.cloudera.org:8080/#/c/5736/9/src/kudu/consensus/log-test.cc
File src/kudu/consensus/log-test.cc:

PS9, Line 151: 
extra lines


-- 
To view, visit http://gerrit.cloudera.org:8080/5736
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I0fc790ddc72a519a82bbb6b71a902418051b35a5
Gerrit-PatchSet: 9
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: David Ribeiro Alves <dr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes

[kudu-CR] KUDU-1835 (part 2): enable WAL compression

Posted by "David Ribeiro Alves (Code Review)" <ge...@cloudera.org>.
David Ribeiro Alves has posted comments on this change.

Change subject: KUDU-1835 (part 2): enable WAL compression
......................................................................


Patch Set 9:

just a tiny nit, feel free to ignore or fix and keep the +2

-- 
To view, visit http://gerrit.cloudera.org:8080/5736
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I0fc790ddc72a519a82bbb6b71a902418051b35a5
Gerrit-PatchSet: 9
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: David Ribeiro Alves <dr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: No

[kudu-CR] KUDU-1835 (part 2): enable WAL compression

Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/5736

to look at the new patch set (#7).

Change subject: KUDU-1835 (part 2): enable WAL compression
......................................................................

KUDU-1835 (part 2): enable WAL compression

This enables compression of the entries in the WAL.

To test the efficiency, I loaded some data into a local tablet server
using kudu-ts and the influxdb benchmark as a data generator. I think
restarted the server with different configurations, and disabled log
segment prealloocation in order to see the _actual_ length of the WALs
instead of the preallocated length.

Compression     Size
NONE            311M
LZ4              98M - 3.17x
SNAPPY           91M - 3.41x
ZLIB             59M - 5.27x

Based on some results in the JIRA I expect more significant improvements
in some other workloads, since kudu-ts already does its own sort of
"dictionary encoding" by normalizing out the tagset IDs.

In order to enable this feature, an incompatible change was made in the
WAL format. Each entry header now contains an extra field indicating the
uncompressed length.

Given that, it's desirable that old servers be prohibited from opening
new-format WALs. Ideally, we could have bumped the major version field
in the WAL header. However, the old reader code never actually checked
the version fields on read. So, this takes the approach of not setting
the versions anymore, and making them deprecated. Since they were
'required' proto fields in old versions, this will make the reader fail
to parse the header with an error that the version field is missing.
This is preferable to failing later with a Corruption error about an
entry header CRC mismatch.

To solve this problem for future versions, I also added an
'incompatible_features' flag set. See commit f82cf6918c00dff6aec for
more information about this approach.

Note that it migth have been possible to do this in a more
backward-compatible way by making the default codec be "NONE" and having
the new version write old-format headers when compression is disabled.
Initially took this approach, but eventually abandoned it as the code
was more complex. Additionally, we already have a different data format
incompatible change coming in Kudu 1.3 (the above-referenced
f82cf6918c00dff6aec). So adding a second one doesn't add additional
burden on users.

I tested the compatibility manually:
- started a Kudu 1.2 server, wrote some data
- upgraded to a version with this patch
- downgraded back to a version without this patch, and saw that the
  tablets failed to start, with a message that the major/minor version
  fields were missing
- upgraded back to this patch and verified the tablets started

Change-Id: I0fc790ddc72a519a82bbb6b71a902418051b35a5
---
M src/kudu/consensus/CMakeLists.txt
M src/kudu/consensus/log-test-base.h
M src/kudu/consensus/log-test.cc
M src/kudu/consensus/log.cc
M src/kudu/consensus/log.h
M src/kudu/consensus/log.proto
M src/kudu/consensus/log_anchor_registry.h
M src/kudu/consensus/log_reader.cc
M src/kudu/consensus/log_reader.h
M src/kudu/consensus/log_util.cc
M src/kudu/consensus/log_util.h
M src/kudu/integration-tests/disk_reservation-itest.cc
M src/kudu/integration-tests/raft_consensus-itest.cc
M src/kudu/util/compression/compression_codec.cc
M src/kudu/util/compression/compression_codec.h
15 files changed, 270 insertions(+), 106 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/36/5736/7
-- 
To view, visit http://gerrit.cloudera.org:8080/5736
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I0fc790ddc72a519a82bbb6b71a902418051b35a5
Gerrit-PatchSet: 7
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot

[kudu-CR] WIP: KUDU-1835 (part 2): enable WAL compression

Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/5736

to look at the new patch set (#4).

Change subject: WIP: KUDU-1835 (part 2): enable WAL compression
......................................................................

WIP: KUDU-1835 (part 2): enable WAL compression

To test the efficiency, I loaded some data into a local tablet server
using kudu-ts and the influxdb benchmark as a data generator. I think
restarted the server with different configurations, and disabled log
segment prealloocation in order to see the _actual_ length of the WALs
instead of the preallocated length.

Compression     Size
NONE            311M
LZ4              98M - 3.17x
SNAPPY           91M - 3.41x
ZLIB             59M - 5.27x

Based on some results in the JIRA I expect more significant improvements
in some other workloads, since kudu-ts already does its own sort of
"dictionary encoding" by normalizing out the tagset IDs.

WIP items:
- need to make sure that this is compatible when set to NONE and
  incompatible when enabled
- may want to use a common dictionary at the top of the file in order to
  improve performance for small log appends
- need to update log-test and add other coverage

Change-Id: I0fc790ddc72a519a82bbb6b71a902418051b35a5
---
M src/kudu/consensus/CMakeLists.txt
M src/kudu/consensus/log.cc
M src/kudu/consensus/log.h
M src/kudu/consensus/log.proto
M src/kudu/consensus/log_util.cc
M src/kudu/consensus/log_util.h
M src/kudu/util/compression/compression_codec.cc
M src/kudu/util/compression/compression_codec.h
8 files changed, 190 insertions(+), 32 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/36/5736/4
-- 
To view, visit http://gerrit.cloudera.org:8080/5736
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I0fc790ddc72a519a82bbb6b71a902418051b35a5
Gerrit-PatchSet: 4
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot

[kudu-CR] KUDU-1835 (part 2): enable WAL compression

Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Todd Lipcon has posted comments on this change.

Change subject: KUDU-1835 (part 2): enable WAL compression
......................................................................


Patch Set 8:

(8 comments)

http://gerrit.cloudera.org:8080/#/c/5736/8/src/kudu/consensus/log.cc
File src/kudu/consensus/log.cc:

PS8, Line 340: c
> I forget, we don't capitalize these?
yea we've been moving towards not capitalizing them, so they don't look funny when concatenated (eg "Unable to do foo: Unable to do bar: Problem with baz" isn't correct capitalization)


http://gerrit.cloudera.org:8080/#/c/5736/8/src/kudu/consensus/log.proto
File src/kudu/consensus/log.proto:

PS8, Line 55: / Log format major/minor version. These were written by Kudu 1.2 and
            :   // earlier, and marked as required in those versions, but unfortunately
            :   // they were never verified on read. So, in order to make the logs written
            :   // by newer versions give a reasonable error if an old version tries to
            :   // read them, we no longer write these fields.
            :   optional uint32 DEPRECATED_major_version = 1;
            :   optional uint32 DEPRECATED_minor_version = 2;
> so you're replacing these with feature flags, right? should we do the same 
we already did while you were on vacation, see f82cf6918c00dff6aecad5a6b50836b7eb5db508


Line 70:   repeated FeatureFlag incompatible_features = 10;
I discovered today that this doesn't work properly, will edit. (see KUDU-1850)


http://gerrit.cloudera.org:8080/#/c/5736/8/src/kudu/consensus/log_reader.cc
File src/kudu/consensus/log_reader.cc:

PS8, Line 262: kEntryHeaderSizeV2
> what if this is reading an older log segment?
fixed


http://gerrit.cloudera.org:8080/#/c/5736/8/src/kudu/consensus/log_util.cc
File src/kudu/consensus/log_util.cc:

PS8, Line 429: verison
> typo
Done


http://gerrit.cloudera.org:8080/#/c/5736/8/src/kudu/consensus/log_util.h
File src/kudu/consensus/log_util.h:

PS8, Line 50: Ssee
> typo
Done


PS8, Line 305: Older versions of Kudu used a different log entry header format.
> Older than what?
Done


http://gerrit.cloudera.org:8080/#/c/5736/8/src/kudu/util/compression/compression_codec.h
File src/kudu/util/compression/compression_codec.h:

PS8, Line 59: CompressionType
> docs
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/5736
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I0fc790ddc72a519a82bbb6b71a902418051b35a5
Gerrit-PatchSet: 8
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: David Ribeiro Alves <dr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes

[kudu-CR] WIP: KUDU-1835 (part 2): enable WAL compression

Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/5736

to look at the new patch set (#3).

Change subject: WIP: KUDU-1835 (part 2): enable WAL compression
......................................................................

WIP: KUDU-1835 (part 2): enable WAL compression

Change-Id: I0fc790ddc72a519a82bbb6b71a902418051b35a5
---
M src/kudu/consensus/CMakeLists.txt
M src/kudu/consensus/log.cc
M src/kudu/consensus/log.h
M src/kudu/consensus/log.proto
M src/kudu/consensus/log_util.cc
M src/kudu/consensus/log_util.h
M src/kudu/util/compression/compression_codec.cc
M src/kudu/util/compression/compression_codec.h
8 files changed, 193 insertions(+), 32 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/36/5736/3
-- 
To view, visit http://gerrit.cloudera.org:8080/5736
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I0fc790ddc72a519a82bbb6b71a902418051b35a5
Gerrit-PatchSet: 3
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Kudu Jenkins

[kudu-CR] KUDU-1835 (part 2): enable WAL compression

Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/5736

to look at the new patch set (#9).

Change subject: KUDU-1835 (part 2): enable WAL compression
......................................................................

KUDU-1835 (part 2): enable WAL compression

This enables compression of the entries in the WAL.

To test the efficiency, I loaded some data into a local tablet server
using kudu-ts and the influxdb benchmark as a data generator. I think
restarted the server with different configurations, and disabled log
segment prealloocation in order to see the _actual_ length of the WALs
instead of the preallocated length.

Compression     Size
NONE            311M
LZ4              98M - 3.17x
SNAPPY           91M - 3.41x
ZLIB             59M - 5.27x

Based on some results in the JIRA I expect more significant improvements
in some other workloads, since kudu-ts already does its own sort of
"dictionary encoding" by normalizing out the tagset IDs.

In order to enable this feature, an incompatible change was made in the
WAL format. Each entry header now contains an extra field indicating the
uncompressed length.

Given that, it's desirable that old servers be prohibited from opening
new-format WALs. Ideally, we could have bumped the major version field
in the WAL header. However, the old reader code never actually checked
the version fields on read. So, this takes the approach of not setting
the versions anymore, and making them deprecated. Since they were
'required' proto fields in old versions, this will make the reader fail
to parse the header with an error that the version field is missing.
This is preferable to failing later with a Corruption error about an
entry header CRC mismatch.

To solve this problem for future versions, I also added an
'incompatible_features' flag set. See commit f82cf6918c00dff6aec for
more information about this approach.

Note that it might have been possible to do this in a more
backward-compatible way by making the default codec be "NONE" and having
the new version write old-format headers when compression is disabled.
I initially took this approach, but eventually abandoned it as the code
was more complex. Additionally, we already have a different data format
incompatible change coming in Kudu 1.3 (the above-referenced
f82cf6918c00dff6aec). So adding a second one doesn't add additional
burden on users.

I tested the compatibility manually:
- started a Kudu 1.2 server, wrote some data
- upgraded to a version with this patch
- downgraded back to a version without this patch, and saw that the
  tablets failed to start, with a message that the major/minor version
  fields were missing
- upgraded back to this patch and verified the tablets started

Change-Id: I0fc790ddc72a519a82bbb6b71a902418051b35a5
---
M src/kudu/consensus/CMakeLists.txt
M src/kudu/consensus/log-test-base.h
M src/kudu/consensus/log-test.cc
M src/kudu/consensus/log.cc
M src/kudu/consensus/log.h
M src/kudu/consensus/log.proto
M src/kudu/consensus/log_anchor_registry.h
M src/kudu/consensus/log_reader.cc
M src/kudu/consensus/log_reader.h
M src/kudu/consensus/log_util.cc
M src/kudu/consensus/log_util.h
M src/kudu/integration-tests/disk_reservation-itest.cc
M src/kudu/integration-tests/raft_consensus-itest.cc
M src/kudu/util/compression/compression_codec.cc
M src/kudu/util/compression/compression_codec.h
15 files changed, 274 insertions(+), 106 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/36/5736/9
-- 
To view, visit http://gerrit.cloudera.org:8080/5736
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I0fc790ddc72a519a82bbb6b71a902418051b35a5
Gerrit-PatchSet: 9
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: David Ribeiro Alves <dr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] KUDU-1835 (part 2): enable WAL compression

Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/5736

to look at the new patch set (#8).

Change subject: KUDU-1835 (part 2): enable WAL compression
......................................................................

KUDU-1835 (part 2): enable WAL compression

This enables compression of the entries in the WAL.

To test the efficiency, I loaded some data into a local tablet server
using kudu-ts and the influxdb benchmark as a data generator. I think
restarted the server with different configurations, and disabled log
segment prealloocation in order to see the _actual_ length of the WALs
instead of the preallocated length.

Compression     Size
NONE            311M
LZ4              98M - 3.17x
SNAPPY           91M - 3.41x
ZLIB             59M - 5.27x

Based on some results in the JIRA I expect more significant improvements
in some other workloads, since kudu-ts already does its own sort of
"dictionary encoding" by normalizing out the tagset IDs.

In order to enable this feature, an incompatible change was made in the
WAL format. Each entry header now contains an extra field indicating the
uncompressed length.

Given that, it's desirable that old servers be prohibited from opening
new-format WALs. Ideally, we could have bumped the major version field
in the WAL header. However, the old reader code never actually checked
the version fields on read. So, this takes the approach of not setting
the versions anymore, and making them deprecated. Since they were
'required' proto fields in old versions, this will make the reader fail
to parse the header with an error that the version field is missing.
This is preferable to failing later with a Corruption error about an
entry header CRC mismatch.

To solve this problem for future versions, I also added an
'incompatible_features' flag set. See commit f82cf6918c00dff6aec for
more information about this approach.

Note that it migth have been possible to do this in a more
backward-compatible way by making the default codec be "NONE" and having
the new version write old-format headers when compression is disabled.
Initially took this approach, but eventually abandoned it as the code
was more complex. Additionally, we already have a different data format
incompatible change coming in Kudu 1.3 (the above-referenced
f82cf6918c00dff6aec). So adding a second one doesn't add additional
burden on users.

I tested the compatibility manually:
- started a Kudu 1.2 server, wrote some data
- upgraded to a version with this patch
- downgraded back to a version without this patch, and saw that the
  tablets failed to start, with a message that the major/minor version
  fields were missing
- upgraded back to this patch and verified the tablets started

Change-Id: I0fc790ddc72a519a82bbb6b71a902418051b35a5
---
M src/kudu/consensus/CMakeLists.txt
M src/kudu/consensus/log-test-base.h
M src/kudu/consensus/log-test.cc
M src/kudu/consensus/log.cc
M src/kudu/consensus/log.h
M src/kudu/consensus/log.proto
M src/kudu/consensus/log_anchor_registry.h
M src/kudu/consensus/log_reader.cc
M src/kudu/consensus/log_reader.h
M src/kudu/consensus/log_util.cc
M src/kudu/consensus/log_util.h
M src/kudu/integration-tests/disk_reservation-itest.cc
M src/kudu/integration-tests/raft_consensus-itest.cc
M src/kudu/util/compression/compression_codec.cc
M src/kudu/util/compression/compression_codec.h
15 files changed, 270 insertions(+), 106 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/36/5736/8
-- 
To view, visit http://gerrit.cloudera.org:8080/5736
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I0fc790ddc72a519a82bbb6b71a902418051b35a5
Gerrit-PatchSet: 8
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot

[kudu-CR] WIP: KUDU-1835 (part 2): enable WAL compression

Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/5736

to look at the new patch set (#6).

Change subject: WIP: KUDU-1835 (part 2): enable WAL compression
......................................................................

WIP: KUDU-1835 (part 2): enable WAL compression

To test the efficiency, I loaded some data into a local tablet server
using kudu-ts and the influxdb benchmark as a data generator. I think
restarted the server with different configurations, and disabled log
segment prealloocation in order to see the _actual_ length of the WALs
instead of the preallocated length.

Compression     Size
NONE            311M
LZ4              98M - 3.17x
SNAPPY           91M - 3.41x
ZLIB             59M - 5.27x

Based on some results in the JIRA I expect more significant improvements
in some other workloads, since kudu-ts already does its own sort of
"dictionary encoding" by normalizing out the tagset IDs.

WIP items:
- need to make sure that this is compatible when set to NONE and
  incompatible when enabled
- may want to use a common dictionary at the top of the file in order to
  improve performance for small log appends
- need to update log-test and add other coverage

Change-Id: I0fc790ddc72a519a82bbb6b71a902418051b35a5
---
M src/kudu/consensus/CMakeLists.txt
M src/kudu/consensus/log-test.cc
M src/kudu/consensus/log.cc
M src/kudu/consensus/log.h
M src/kudu/consensus/log.proto
M src/kudu/consensus/log_anchor_registry.h
M src/kudu/consensus/log_reader.h
M src/kudu/consensus/log_util.cc
M src/kudu/consensus/log_util.h
M src/kudu/integration-tests/disk_reservation-itest.cc
M src/kudu/integration-tests/raft_consensus-itest.cc
M src/kudu/util/compression/compression_codec.cc
M src/kudu/util/compression/compression_codec.h
13 files changed, 248 insertions(+), 59 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/36/5736/6
-- 
To view, visit http://gerrit.cloudera.org:8080/5736
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I0fc790ddc72a519a82bbb6b71a902418051b35a5
Gerrit-PatchSet: 6
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot