You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@impala.apache.org by "Dimitris Tsirogiannis (Code Review)" <ge...@cloudera.org> on 2017/06/22 22:29:37 UTC
[Impala-ASF-CR] IMPALA-5500: Reduce catalog update topic size
Dimitris Tsirogiannis has uploaded a new change for review.
http://gerrit.cloudera.org:8080/7268
Change subject: IMPALA-5500: Reduce catalog update topic size
......................................................................
IMPALA-5500: Reduce catalog update topic size
Problem: IMPALA-4029 introduced the use of the flatbuffers serialization
libary for storing file and block metadata. That change reduced the
effectiveness of the Thrift compaction protocol (when
--compact_catalog_topic is used), thereby causing a 2X increase in
catalog update topic size when the compact protocol is used.
Fix: Snappy compress the catalog topic updates before sent to the
statestore when --compact_catalog_topic is set to true.
Change-Id: I2f725cd8596205e6101d5b56abf08125faa30b0a
Results:
---
M be/src/catalog/catalog-server.cc
M be/src/catalog/catalog-server.h
M be/src/service/impala-server.cc
M be/src/service/impala-server.h
M be/src/util/compress.h
M be/src/util/decompress.h
A tests/custom_cluster/test_compact_catalog_updates.py
7 files changed, 168 insertions(+), 16 deletions(-)
git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/68/7268/1
--
To view, visit http://gerrit.cloudera.org:8080/7268
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-MessageType: newchange
Gerrit-Change-Id: I2f725cd8596205e6101d5b56abf08125faa30b0a
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis <dt...@cloudera.com>
[Impala-ASF-CR] IMPALA-5500: Reduce catalog update topic size
Posted by "Dimitris Tsirogiannis (Code Review)" <ge...@cloudera.org>.
Dimitris Tsirogiannis has posted comments on this change.
Change subject: IMPALA-5500: Reduce catalog update topic size
......................................................................
Patch Set 2:
(7 comments)
http://gerrit.cloudera.org:8080/#/c/7268/2/be/src/catalog/catalog-server.cc
File be/src/catalog/catalog-server.cc:
Line 330: if (!status.ok()) {
> take out of if/else since it's common to both cases
Done
Line 367: uint8_t* compressed_buffer_ptr = &compressed_buffer[0];
> comptessed_buffer.data()
Done
Line 370: // Base64 encode the compressed catalog object and store it in the topic item.
> Is base64-encoding really necessary? Why don't the raw bytes work?
Hm, not really. It works without it, I only used it because the topic items use strings for keys and values. Removed it.
http://gerrit.cloudera.org:8080/#/c/7268/2/be/src/catalog/catalog-server.h
File be/src/catalog/catalog-server.h:
Line 200: /// Serializes 'catalog_object' and compresses it using Snappy compression.
> remove second "compression"
Done
http://gerrit.cloudera.org:8080/#/c/7268/2/be/src/service/impala-server.cc
File be/src/service/impala-server.cc:
Line 1417: Status ImpalaServer::DecompressAndDeserializeCatalogObject(const TTopicItem& item,
> Why not separate decompression from deserialization? Seem like separate con
Done
http://gerrit.cloudera.org:8080/#/c/7268/2/be/src/util/compress.h
File be/src/util/compress.h:
Line 103: SnappyCompressor(MemPool* mem_pool = NULL, bool reuse_buffer = false);
> Why these changes? Shouldn't these always be created using CreateCompressor
oops forgot about these. I was playing around with directly creating the compressor. reverted.
http://gerrit.cloudera.org:8080/#/c/7268/2/tests/custom_cluster/test_compact_catalog_updates.py
File tests/custom_cluster/test_compact_catalog_updates.py:
Line 45: "select count(*) from functional.alltypes")
> Check result? Maybe also use another query on a parquet table.
Added the result check. Out of curiosity, what's so special about the parquet table in this case?
--
To view, visit http://gerrit.cloudera.org:8080/7268
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-MessageType: comment
Gerrit-Change-Id: I2f725cd8596205e6101d5b56abf08125faa30b0a
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5500: Reduce catalog update topic size
Posted by "Dimitris Tsirogiannis (Code Review)" <ge...@cloudera.org>.
Dimitris Tsirogiannis has uploaded a new patch set (#3).
Change subject: IMPALA-5500: Reduce catalog update topic size
......................................................................
IMPALA-5500: Reduce catalog update topic size
Problem: IMPALA-4029 introduced the use of the flatbuffers serialization
libary for storing file and block metadata. That change reduced the
effectiveness of the Thrift compaction protocol (when
--compact_catalog_topic is used), thereby causing a 2X increase in
catalog update topic size when the compact protocol is used.
Fix: Snappy compress the catalog topic updates before sent to the
statestore when --compact_catalog_topic is set to true.
Results: ~4X reduction in catalog update topic size
Change-Id: I2f725cd8596205e6101d5b56abf08125faa30b0a
---
M be/src/catalog/catalog-server.cc
M be/src/catalog/catalog-server.h
M be/src/service/impala-server.cc
M be/src/service/impala-server.h
M be/src/util/compress.h
M be/src/util/decompress.h
A tests/custom_cluster/test_compact_catalog_updates.py
7 files changed, 141 insertions(+), 10 deletions(-)
git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/68/7268/3
--
To view, visit http://gerrit.cloudera.org:8080/7268
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I2f725cd8596205e6101d5b56abf08125faa30b0a
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
[Impala-ASF-CR] IMPALA-5500: Reduce catalog update topic size
Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has posted comments on this change.
Change subject: IMPALA-5500: Reduce catalog update topic size
......................................................................
Patch Set 7: Code-Review+1
I'm not sure how i missed that
--
To view, visit http://gerrit.cloudera.org:8080/7268
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-MessageType: comment
Gerrit-Change-Id: I2f725cd8596205e6101d5b56abf08125faa30b0a
Gerrit-PatchSet: 7
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5500: Reduce catalog update topic size
Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has posted comments on this change.
Change subject: IMPALA-5500: Reduce catalog update topic size
......................................................................
Patch Set 5:
(4 comments)
http://gerrit.cloudera.org:8080/#/c/7268/5/be/src/catalog/catalog-server.cc
File be/src/catalog/catalog-server.cc:
Line 327: vector<uint8_t> serialized_object;
Unused variable?
Line 358: RETURN_IF_ERROR(Codec::CreateCompressor(nullptr, false, THdfsCompression::SNAPPY,
Why not LZ4 to be consistent with what we do for exchanges? I don't have a problem with Snappy if we have some reason to use it but my understanding was that LZ4 was generally superior.
Line 368: item_value->assign(reinterpret_cast<const char*>(compressed_data.data()),
Why do we need this extra copy here? It seems like we should serialize to a string, then compress directly into the output string. That should be possible if we Serialize() into a temporary string then either compress it or move() it into the output string.
http://gerrit.cloudera.org:8080/#/c/7268/5/be/src/service/impala-server.cc
File be/src/service/impala-server.cc:
PS5, Line 1293: reinterpret_cast<const uint8_t*>(
Is this cast needed? vector<uint8_t>.data() should return uint8_t*.
--
To view, visit http://gerrit.cloudera.org:8080/7268
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-MessageType: comment
Gerrit-Change-Id: I2f725cd8596205e6101d5b56abf08125faa30b0a
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5500: Reduce catalog update topic size
Posted by "Dimitris Tsirogiannis (Code Review)" <ge...@cloudera.org>.
Dimitris Tsirogiannis has posted comments on this change.
Change subject: IMPALA-5500: Reduce catalog update topic size
......................................................................
Patch Set 9: Code-Review+2
--
To view, visit http://gerrit.cloudera.org:8080/7268
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-MessageType: comment
Gerrit-Change-Id: I2f725cd8596205e6101d5b56abf08125faa30b0a
Gerrit-PatchSet: 9
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5500: Reduce catalog update topic size
Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change.
Change subject: IMPALA-5500: Reduce catalog update topic size
......................................................................
Patch Set 10: Verified+1
--
To view, visit http://gerrit.cloudera.org:8080/7268
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-MessageType: comment
Gerrit-Change-Id: I2f725cd8596205e6101d5b56abf08125faa30b0a
Gerrit-PatchSet: 10
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5500: Reduce catalog update topic size
Posted by "Alex Behm (Code Review)" <ge...@cloudera.org>.
Alex Behm has posted comments on this change.
Change subject: IMPALA-5500: Reduce catalog update topic size
......................................................................
Patch Set 2:
(7 comments)
http://gerrit.cloudera.org:8080/#/c/7268/2/be/src/catalog/catalog-server.cc
File be/src/catalog/catalog-server.cc:
Line 330: if (!status.ok()) {
take out of if/else since it's common to both cases
Line 367: uint8_t* compressed_buffer_ptr = &compressed_buffer[0];
comptessed_buffer.data()
Line 370: // Base64 encode the compressed catalog object and store it in the topic item.
Is base64-encoding really necessary? Why don't the raw bytes work?
http://gerrit.cloudera.org:8080/#/c/7268/2/be/src/catalog/catalog-server.h
File be/src/catalog/catalog-server.h:
Line 200: /// Serializes 'catalog_object' and compresses it using Snappy compression.
remove second "compression"
http://gerrit.cloudera.org:8080/#/c/7268/2/be/src/service/impala-server.cc
File be/src/service/impala-server.cc:
Line 1417: Status ImpalaServer::DecompressAndDeserializeCatalogObject(const TTopicItem& item,
Why not separate decompression from deserialization? Seem like separate concerns.
Then the code would be something like:
buffer = raw buffer
if (compaction is on) buffer = decompress();
DeserializeThriftMsg()
...
http://gerrit.cloudera.org:8080/#/c/7268/2/be/src/util/compress.h
File be/src/util/compress.h:
Line 103: SnappyCompressor(MemPool* mem_pool = NULL, bool reuse_buffer = false);
Why these changes? Shouldn't these always be created using CreateCompressor()?
http://gerrit.cloudera.org:8080/#/c/7268/2/tests/custom_cluster/test_compact_catalog_updates.py
File tests/custom_cluster/test_compact_catalog_updates.py:
Line 45: "select count(*) from functional.alltypes")
Check result? Maybe also use another query on a parquet table.
--
To view, visit http://gerrit.cloudera.org:8080/7268
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-MessageType: comment
Gerrit-Change-Id: I2f725cd8596205e6101d5b56abf08125faa30b0a
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5500: Reduce catalog update topic size
Posted by "Dimitris Tsirogiannis (Code Review)" <ge...@cloudera.org>.
Dimitris Tsirogiannis has uploaded a new patch set (#7).
Change subject: IMPALA-5500: Reduce catalog update topic size
......................................................................
IMPALA-5500: Reduce catalog update topic size
Problem: IMPALA-4029 introduced the use of the flatbuffers serialization
libary for storing file and block metadata. That change reduced the
effectiveness of the Thrift compaction protocol (when
--compact_catalog_topic is used), thereby causing a 2X increase in
catalog update topic size when the compact protocol is used.
Fix: LZ4 compress the catalog topic updates before sent to the
statestore when --compact_catalog_topic is set to true.
Results: ~4X reduction in catalog update topic size
Change-Id: I2f725cd8596205e6101d5b56abf08125faa30b0a
---
M be/src/catalog/catalog-server.cc
M be/src/catalog/catalog-util.cc
M be/src/catalog/catalog-util.h
M be/src/service/impala-server.cc
A tests/custom_cluster/test_compact_catalog_updates.py
5 files changed, 135 insertions(+), 5 deletions(-)
git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/68/7268/7
--
To view, visit http://gerrit.cloudera.org:8080/7268
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I2f725cd8596205e6101d5b56abf08125faa30b0a
Gerrit-PatchSet: 7
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
[Impala-ASF-CR] IMPALA-5500: Reduce catalog update topic size
Posted by "Dimitris Tsirogiannis (Code Review)" <ge...@cloudera.org>.
Dimitris Tsirogiannis has posted comments on this change.
Change subject: IMPALA-5500: Reduce catalog update topic size
......................................................................
Patch Set 7:
> The code changes look good. We don't seem to have any test coverage
> for this flag though - I grepped for it and don't see it referenced
> in any tests. Can we add a custom cluster test that starts up
> Impala with this flag and runs a query?
>
> Although I'm actually wondering if this should just be the
> default/only mode. It seems like compression and decompression are
> probably less of a bottleneck than the network.
But I added a custom test for in this patch. Do you mean an additional test?
--
To view, visit http://gerrit.cloudera.org:8080/7268
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-MessageType: comment
Gerrit-Change-Id: I2f725cd8596205e6101d5b56abf08125faa30b0a
Gerrit-PatchSet: 7
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5500: Reduce catalog update topic size
Posted by "Dimitris Tsirogiannis (Code Review)" <ge...@cloudera.org>.
Dimitris Tsirogiannis has uploaded a new patch set (#6).
Change subject: IMPALA-5500: Reduce catalog update topic size
......................................................................
IMPALA-5500: Reduce catalog update topic size
Problem: IMPALA-4029 introduced the use of the flatbuffers serialization
libary for storing file and block metadata. That change reduced the
effectiveness of the Thrift compaction protocol (when
--compact_catalog_topic is used), thereby causing a 2X increase in
catalog update topic size when the compact protocol is used.
Fix: Snappy compress the catalog topic updates before sent to the
statestore when --compact_catalog_topic is set to true.
Results: ~4X reduction in catalog update topic size
Change-Id: I2f725cd8596205e6101d5b56abf08125faa30b0a
---
M be/src/catalog/catalog-server.cc
M be/src/catalog/catalog-util.cc
M be/src/catalog/catalog-util.h
M be/src/service/impala-server.cc
A tests/custom_cluster/test_compact_catalog_updates.py
5 files changed, 135 insertions(+), 5 deletions(-)
git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/68/7268/6
--
To view, visit http://gerrit.cloudera.org:8080/7268
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I2f725cd8596205e6101d5b56abf08125faa30b0a
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
[Impala-ASF-CR] IMPALA-5500: Reduce catalog update topic size
Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has posted comments on this change.
Change subject: IMPALA-5500: Reduce catalog update topic size
......................................................................
Patch Set 5:
(1 comment)
http://gerrit.cloudera.org:8080/#/c/7268/5/be/src/catalog/catalog-server.cc
File be/src/catalog/catalog-server.cc:
Line 358: RETURN_IF_ERROR(Codec::CreateCompressor(nullptr, false, THdfsCompression::SNAPPY,
> LZ4 decompressor requires a pre-allocated output. The problem in this case
I see - we're depending on MaxOutputLen() on the decompression path. Can't we include the uncompressed size along with the compressed data then? Seems like the right thing to do and a decent performance win.
Otherwise we should explain why we're using Snappy here because it's subtle.
--
To view, visit http://gerrit.cloudera.org:8080/7268
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-MessageType: comment
Gerrit-Change-Id: I2f725cd8596205e6101d5b56abf08125faa30b0a
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5500: Reduce catalog update topic size
Posted by "Alex Behm (Code Review)" <ge...@cloudera.org>.
Alex Behm has posted comments on this change.
Change subject: IMPALA-5500: Reduce catalog update topic size
......................................................................
Patch Set 8: Code-Review+2
(1 comment)
http://gerrit.cloudera.org:8080/#/c/7268/8/fe/src/main/java/org/apache/impala/analysis/TableSampleClause.java
File fe/src/main/java/org/apache/impala/analysis/TableSampleClause.java:
Line 21: import com.google.common.base.Preconditions;
These changes are in GVO right now:
http://gerrit.cloudera.org:8080/7315
--
To view, visit http://gerrit.cloudera.org:8080/7268
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-MessageType: comment
Gerrit-Change-Id: I2f725cd8596205e6101d5b56abf08125faa30b0a
Gerrit-PatchSet: 8
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5500: Reduce catalog update topic size
Posted by "Dimitris Tsirogiannis (Code Review)" <ge...@cloudera.org>.
Dimitris Tsirogiannis has posted comments on this change.
Change subject: IMPALA-5500: Reduce catalog update topic size
......................................................................
Patch Set 4:
(1 comment)
http://gerrit.cloudera.org:8080/#/c/7268/4/be/src/catalog/catalog-server.cc
File be/src/catalog/catalog-server.cc:
Line 328: Status status = thrift_serializer_.Serialize(&catalog_object, &serialized_object);
> Let's try to avoid the extra copy for the !FLAGS_compact_catalog_topic case
Done
--
To view, visit http://gerrit.cloudera.org:8080/7268
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-MessageType: comment
Gerrit-Change-Id: I2f725cd8596205e6101d5b56abf08125faa30b0a
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5500: Reduce catalog update topic size
Posted by "Dimitris Tsirogiannis (Code Review)" <ge...@cloudera.org>.
Dimitris Tsirogiannis has posted comments on this change.
Change subject: IMPALA-5500: Reduce catalog update topic size
......................................................................
Patch Set 5:
(5 comments)
http://gerrit.cloudera.org:8080/#/c/7268/5/be/src/catalog/catalog-server.cc
File be/src/catalog/catalog-server.cc:
Line 327: vector<uint8_t> serialized_object;
> Unused variable?
Done
Line 334: // Compress the catalog object
> remove (same as code)
Done
Line 358: RETURN_IF_ERROR(Codec::CreateCompressor(nullptr, false, THdfsCompression::SNAPPY,
> I see - we're depending on MaxOutputLen() on the decompression path. Can't
I am hesitant to do that. We need essentially to populate the TopicItem with an additional field that is only catalog specific. Also, we already get a 4x compression of catalog topic updates, so I am wondering if this is really worth it. Not super familiar with LZ4's performance/compression ratio. Is it significantly better?
Line 367: compressed_data.resize(result_len);
> remove
Done
http://gerrit.cloudera.org:8080/#/c/7268/5/be/src/service/impala-server.cc
File be/src/service/impala-server.cc:
PS5, Line 1293: reinterpret_cast<const uint8_t*>(
> Is this cast needed? vector<uint8_t>.data() should return uint8_t*.
Done
--
To view, visit http://gerrit.cloudera.org:8080/7268
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-MessageType: comment
Gerrit-Change-Id: I2f725cd8596205e6101d5b56abf08125faa30b0a
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5500: Reduce catalog update topic size
Posted by "Dimitris Tsirogiannis (Code Review)" <ge...@cloudera.org>.
Dimitris Tsirogiannis has posted comments on this change.
Change subject: IMPALA-5500: Reduce catalog update topic size
......................................................................
Patch Set 5:
(3 comments)
http://gerrit.cloudera.org:8080/#/c/7268/5/be/src/catalog/catalog-server.cc
File be/src/catalog/catalog-server.cc:
Line 358: RETURN_IF_ERROR(Codec::CreateCompressor(nullptr, false, THdfsCompression::SNAPPY,
> I'm in favor of LZ4.
Done
Line 368: item_value->assign(reinterpret_cast<const char*>(compressed_data.data()),
> Why do we need this extra copy here? It seems like we should serialize to a
Done
http://gerrit.cloudera.org:8080/#/c/7268/5/be/src/service/impala-server.h
File be/src/service/impala-server.h:
Line 556: Status DecompressCatalogObject(const TTopicItem& item,
> It's a little weird to have the Compress/Decompress functions in different
Done
--
To view, visit http://gerrit.cloudera.org:8080/7268
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-MessageType: comment
Gerrit-Change-Id: I2f725cd8596205e6101d5b56abf08125faa30b0a
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5500: Reduce catalog update topic size
Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change.
Change subject: IMPALA-5500: Reduce catalog update topic size
......................................................................
Patch Set 10:
Build started: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/801/
--
To view, visit http://gerrit.cloudera.org:8080/7268
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-MessageType: comment
Gerrit-Change-Id: I2f725cd8596205e6101d5b56abf08125faa30b0a
Gerrit-PatchSet: 10
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5500: Reduce catalog update topic size
Posted by "Alex Behm (Code Review)" <ge...@cloudera.org>.
Alex Behm has posted comments on this change.
Change subject: IMPALA-5500: Reduce catalog update topic size
......................................................................
Patch Set 7:
(2 comments)
http://gerrit.cloudera.org:8080/#/c/7268/7/be/src/catalog/catalog-server.cc
File be/src/catalog/catalog-server.cc:
Line 335: pending_topic_updates_.pop_back();
I think we could incorrectly end up popping two pending topic updates here. One way to fix it is to check status.ok() in L331
http://gerrit.cloudera.org:8080/#/c/7268/7/be/src/catalog/catalog-util.cc
File be/src/catalog/catalog-util.cc:
Line 202: ReadWriteUtil::PutInt(output_buffer_ptr, (uint32_t)catalog_object->size());
static_cast instead of C-style cast
--
To view, visit http://gerrit.cloudera.org:8080/7268
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-MessageType: comment
Gerrit-Change-Id: I2f725cd8596205e6101d5b56abf08125faa30b0a
Gerrit-PatchSet: 7
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5500: Reduce catalog update topic size
Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has posted comments on this change.
Change subject: IMPALA-5500: Reduce catalog update topic size
......................................................................
Patch Set 5:
(1 comment)
http://gerrit.cloudera.org:8080/#/c/7268/5/be/src/catalog/catalog-server.cc
File be/src/catalog/catalog-server.cc:
Line 358: RETURN_IF_ERROR(Codec::CreateCompressor(nullptr, false, THdfsCompression::SNAPPY,
> I am hesitant to do that. We need essentially to populate the TopicItem wit
I see your point.
https://github.com/lz4/lz4 reports that it's maybe 50% faster. According to https://www.percona.com/blog/2016/04/13/evaluating-database-compression-methods-update/ maybe 30-40%
We could alternatively just write the size at the start or end of the buffer on both ends with ReadWriteUtil::PutVLong() or something like that. That's what snappy is doing internally I think.
This may be overengineering it so I'm fine if you just want to explain the motivation for snappy.
--
To view, visit http://gerrit.cloudera.org:8080/7268
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-MessageType: comment
Gerrit-Change-Id: I2f725cd8596205e6101d5b56abf08125faa30b0a
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5500: Reduce catalog update topic size
Posted by "Dimitris Tsirogiannis (Code Review)" <ge...@cloudera.org>.
Dimitris Tsirogiannis has uploaded a new patch set (#4).
Change subject: IMPALA-5500: Reduce catalog update topic size
......................................................................
IMPALA-5500: Reduce catalog update topic size
Problem: IMPALA-4029 introduced the use of the flatbuffers serialization
libary for storing file and block metadata. That change reduced the
effectiveness of the Thrift compaction protocol (when
--compact_catalog_topic is used), thereby causing a 2X increase in
catalog update topic size when the compact protocol is used.
Fix: Snappy compress the catalog topic updates before sent to the
statestore when --compact_catalog_topic is set to true.
Results: ~4X reduction in catalog update topic size
Change-Id: I2f725cd8596205e6101d5b56abf08125faa30b0a
---
M be/src/catalog/catalog-server.cc
M be/src/catalog/catalog-server.h
M be/src/service/impala-server.cc
M be/src/service/impala-server.h
A tests/custom_cluster/test_compact_catalog_updates.py
5 files changed, 143 insertions(+), 6 deletions(-)
git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/68/7268/4
--
To view, visit http://gerrit.cloudera.org:8080/7268
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I2f725cd8596205e6101d5b56abf08125faa30b0a
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
[Impala-ASF-CR] IMPALA-5500: Reduce catalog update topic size
Posted by "Dimitris Tsirogiannis (Code Review)" <ge...@cloudera.org>.
Dimitris Tsirogiannis has posted comments on this change.
Change subject: IMPALA-5500: Reduce catalog update topic size
......................................................................
Patch Set 5:
(1 comment)
http://gerrit.cloudera.org:8080/#/c/7268/5/be/src/catalog/catalog-server.cc
File be/src/catalog/catalog-server.cc:
Line 358: RETURN_IF_ERROR(Codec::CreateCompressor(nullptr, false, THdfsCompression::SNAPPY,
> Why not LZ4 to be consistent with what we do for exchanges? I don't have a
LZ4 decompressor requires a pre-allocated output. The problem in this case is that we don't know the initial (pre-compressed) size of the catalog object.
--
To view, visit http://gerrit.cloudera.org:8080/7268
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-MessageType: comment
Gerrit-Change-Id: I2f725cd8596205e6101d5b56abf08125faa30b0a
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5500: Reduce catalog update topic size
Posted by "Dimitris Tsirogiannis (Code Review)" <ge...@cloudera.org>.
Dimitris Tsirogiannis has posted comments on this change.
Change subject: IMPALA-5500: Reduce catalog update topic size
......................................................................
Patch Set 7:
(2 comments)
http://gerrit.cloudera.org:8080/#/c/7268/7/be/src/catalog/catalog-server.cc
File be/src/catalog/catalog-server.cc:
Line 335: pending_topic_updates_.pop_back();
> I think we could incorrectly end up popping two pending topic updates here.
Good catch. Done
http://gerrit.cloudera.org:8080/#/c/7268/7/be/src/catalog/catalog-util.cc
File be/src/catalog/catalog-util.cc:
Line 202: ReadWriteUtil::PutInt(output_buffer_ptr, (uint32_t)catalog_object->size());
> static_cast instead of C-style cast
Done
--
To view, visit http://gerrit.cloudera.org:8080/7268
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-MessageType: comment
Gerrit-Change-Id: I2f725cd8596205e6101d5b56abf08125faa30b0a
Gerrit-PatchSet: 7
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5500: Reduce catalog update topic size
Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has posted comments on this change.
Change subject: IMPALA-5500: Reduce catalog update topic size
......................................................................
Patch Set 7:
The code changes look good. We don't seem to have any test coverage for this flag though - I grepped for it and don't see it referenced in any tests. Can we add a custom cluster test that starts up Impala with this flag and runs a query?
Although I'm actually wondering if this should just be the default/only mode. It seems like compression and decompression are probably less of a bottleneck than the network.
--
To view, visit http://gerrit.cloudera.org:8080/7268
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-MessageType: comment
Gerrit-Change-Id: I2f725cd8596205e6101d5b56abf08125faa30b0a
Gerrit-PatchSet: 7
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5500: Reduce catalog update topic size
Posted by "Dimitris Tsirogiannis (Code Review)" <ge...@cloudera.org>.
Dimitris Tsirogiannis has uploaded a new patch set (#5).
Change subject: IMPALA-5500: Reduce catalog update topic size
......................................................................
IMPALA-5500: Reduce catalog update topic size
Problem: IMPALA-4029 introduced the use of the flatbuffers serialization
libary for storing file and block metadata. That change reduced the
effectiveness of the Thrift compaction protocol (when
--compact_catalog_topic is used), thereby causing a 2X increase in
catalog update topic size when the compact protocol is used.
Fix: Snappy compress the catalog topic updates before sent to the
statestore when --compact_catalog_topic is set to true.
Results: ~4X reduction in catalog update topic size
Change-Id: I2f725cd8596205e6101d5b56abf08125faa30b0a
---
M be/src/catalog/catalog-server.cc
M be/src/catalog/catalog-server.h
M be/src/service/impala-server.cc
M be/src/service/impala-server.h
A tests/custom_cluster/test_compact_catalog_updates.py
5 files changed, 138 insertions(+), 5 deletions(-)
git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/68/7268/5
--
To view, visit http://gerrit.cloudera.org:8080/7268
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I2f725cd8596205e6101d5b56abf08125faa30b0a
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
[Impala-ASF-CR] IMPALA-5500: Reduce catalog update topic size
Posted by "Dimitris Tsirogiannis (Code Review)" <ge...@cloudera.org>.
Dimitris Tsirogiannis has posted comments on this change.
Change subject: IMPALA-5500: Reduce catalog update topic size
......................................................................
Patch Set 3:
(6 comments)
http://gerrit.cloudera.org:8080/#/c/7268/3/be/src/catalog/catalog-server.cc
File be/src/catalog/catalog-server.cc:
Line 331: status = SerializeAndCompressCatalogObject(catalog_object, item);
> Thinking we should have a similar separation of serialization and compressi
Done
Line 366: compressed_buffer.resize(result_len);
> Remove since we're copying into item.value anyway
Done
Line 367: item.value = string(reinterpret_cast<char*>(compressed_buffer_ptr),
> use item.value.assign()
Done
http://gerrit.cloudera.org:8080/#/c/7268/3/be/src/service/impala-server.h
File be/src/service/impala-server.h:
Line 557: std::vector<uint8_t>& data_buffer) WARN_UNUSED_RESULT;
> we use pointers for out params, i.e. use std::vector<uint8_t>*
Done
http://gerrit.cloudera.org:8080/#/c/7268/3/be/src/util/compress.h
File be/src/util/compress.h:
Line 113
> revert
Done
http://gerrit.cloudera.org:8080/#/c/7268/3/be/src/util/decompress.h
File be/src/util/decompress.h:
Line 86:
> revert
Done
--
To view, visit http://gerrit.cloudera.org:8080/7268
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-MessageType: comment
Gerrit-Change-Id: I2f725cd8596205e6101d5b56abf08125faa30b0a
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5500: Reduce catalog update topic size
Posted by "Dimitris Tsirogiannis (Code Review)" <ge...@cloudera.org>.
Dimitris Tsirogiannis has uploaded a new patch set (#2).
Change subject: IMPALA-5500: Reduce catalog update topic size
......................................................................
IMPALA-5500: Reduce catalog update topic size
Problem: IMPALA-4029 introduced the use of the flatbuffers serialization
libary for storing file and block metadata. That change reduced the
effectiveness of the Thrift compaction protocol (when
--compact_catalog_topic is used), thereby causing a 2X increase in
catalog update topic size when the compact protocol is used.
Fix: Snappy compress the catalog topic updates before sent to the
statestore when --compact_catalog_topic is set to true.
Results: ~4X reduction in catalog update topic size
Change-Id: I2f725cd8596205e6101d5b56abf08125faa30b0a
---
M be/src/catalog/catalog-server.cc
M be/src/catalog/catalog-server.h
M be/src/service/impala-server.cc
M be/src/service/impala-server.h
M be/src/util/compress.h
M be/src/util/decompress.h
A tests/custom_cluster/test_compact_catalog_updates.py
7 files changed, 168 insertions(+), 16 deletions(-)
git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/68/7268/2
--
To view, visit http://gerrit.cloudera.org:8080/7268
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I2f725cd8596205e6101d5b56abf08125faa30b0a
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis <dt...@cloudera.com>
[Impala-ASF-CR] IMPALA-5500: Reduce catalog update topic size
Posted by "Alex Behm (Code Review)" <ge...@cloudera.org>.
Alex Behm has posted comments on this change.
Change subject: IMPALA-5500: Reduce catalog update topic size
......................................................................
Patch Set 5:
(3 comments)
http://gerrit.cloudera.org:8080/#/c/7268/5/be/src/catalog/catalog-server.cc
File be/src/catalog/catalog-server.cc:
Line 334: // Compress the catalog object
remove (same as code)
Line 367: compressed_data.resize(result_len);
remove
http://gerrit.cloudera.org:8080/#/c/7268/5/be/src/service/impala-server.h
File be/src/service/impala-server.h:
Line 556: Status DecompressCatalogObject(const TTopicItem& item,
It's a little weird to have the Compress/Decompress functions in different classes and specific to catalog objects. How about we
introduce static functions in SnappyCompressor like this:
static Status Compress(string*);
static Status Decompress(string*);
Both compress/decompress in place using temporaries as necessary.
Before you make the change, let me ask Tim what he thinks.
--
To view, visit http://gerrit.cloudera.org:8080/7268
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-MessageType: comment
Gerrit-Change-Id: I2f725cd8596205e6101d5b56abf08125faa30b0a
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5500: Reduce catalog update topic size
Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has submitted this change and it was merged.
Change subject: IMPALA-5500: Reduce catalog update topic size
......................................................................
IMPALA-5500: Reduce catalog update topic size
Problem: IMPALA-4029 introduced the use of the flatbuffers serialization
libary for storing file and block metadata. That change reduced the
effectiveness of the Thrift compaction protocol (when
--compact_catalog_topic is used), thereby causing a 2X increase in
catalog update topic size when the compact protocol is used.
Fix: LZ4 compress the catalog topic updates before sent to the
statestore when --compact_catalog_topic is set to true.
Results: ~4X reduction in catalog update topic size
Change-Id: I2f725cd8596205e6101d5b56abf08125faa30b0a
Reviewed-on: http://gerrit.cloudera.org:8080/7268
Reviewed-by: Dimitris Tsirogiannis <dt...@cloudera.com>
Tested-by: Impala Public Jenkins
---
M be/src/catalog/catalog-server.cc
M be/src/catalog/catalog-util.cc
M be/src/catalog/catalog-util.h
M be/src/service/impala-server.cc
A tests/custom_cluster/test_compact_catalog_updates.py
5 files changed, 136 insertions(+), 5 deletions(-)
Approvals:
Impala Public Jenkins: Verified
Dimitris Tsirogiannis: Looks good to me, approved
--
To view, visit http://gerrit.cloudera.org:8080/7268
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-MessageType: merged
Gerrit-Change-Id: I2f725cd8596205e6101d5b56abf08125faa30b0a
Gerrit-PatchSet: 11
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
[Impala-ASF-CR] IMPALA-5500: Reduce catalog update topic size
Posted by "Alex Behm (Code Review)" <ge...@cloudera.org>.
Alex Behm has posted comments on this change.
Change subject: IMPALA-5500: Reduce catalog update topic size
......................................................................
Patch Set 5:
(1 comment)
http://gerrit.cloudera.org:8080/#/c/7268/5/be/src/catalog/catalog-server.cc
File be/src/catalog/catalog-server.cc:
Line 358: RETURN_IF_ERROR(Codec::CreateCompressor(nullptr, false, THdfsCompression::SNAPPY,
> I see your point.
I'm in favor of LZ4.
--
To view, visit http://gerrit.cloudera.org:8080/7268
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-MessageType: comment
Gerrit-Change-Id: I2f725cd8596205e6101d5b56abf08125faa30b0a
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5500: Reduce catalog update topic size
Posted by "Alex Behm (Code Review)" <ge...@cloudera.org>.
Alex Behm has posted comments on this change.
Change subject: IMPALA-5500: Reduce catalog update topic size
......................................................................
Patch Set 4:
(1 comment)
http://gerrit.cloudera.org:8080/#/c/7268/4/be/src/catalog/catalog-server.cc
File be/src/catalog/catalog-server.cc:
Line 328: Status status = thrift_serializer_.Serialize(&catalog_object, &serialized_object);
Let's try to avoid the extra copy for the !FLAGS_compact_catalog_topic case. I think you could pass in &item.value to Serialize like before, and then change CompressCatalogObject() to take a const uint8_t* and a length as input.
--
To view, visit http://gerrit.cloudera.org:8080/7268
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-MessageType: comment
Gerrit-Change-Id: I2f725cd8596205e6101d5b56abf08125faa30b0a
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5500: Reduce catalog update topic size
Posted by "Dimitris Tsirogiannis (Code Review)" <ge...@cloudera.org>.
Dimitris Tsirogiannis has posted comments on this change.
Change subject: IMPALA-5500: Reduce catalog update topic size
......................................................................
Patch Set 10: Code-Review+2
Rebase, keep Alex's +2
--
To view, visit http://gerrit.cloudera.org:8080/7268
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-MessageType: comment
Gerrit-Change-Id: I2f725cd8596205e6101d5b56abf08125faa30b0a
Gerrit-PatchSet: 10
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5500: Reduce catalog update topic size
Posted by "Dimitris Tsirogiannis (Code Review)" <ge...@cloudera.org>.
Hello Tim Armstrong,
I'd like you to reexamine a change. Please visit
http://gerrit.cloudera.org:8080/7268
to look at the new patch set (#8).
Change subject: IMPALA-5500: Reduce catalog update topic size
......................................................................
IMPALA-5500: Reduce catalog update topic size
Problem: IMPALA-4029 introduced the use of the flatbuffers serialization
libary for storing file and block metadata. That change reduced the
effectiveness of the Thrift compaction protocol (when
--compact_catalog_topic is used), thereby causing a 2X increase in
catalog update topic size when the compact protocol is used.
Fix: LZ4 compress the catalog topic updates before sent to the
statestore when --compact_catalog_topic is set to true.
Results: ~4X reduction in catalog update topic size
Change-Id: I2f725cd8596205e6101d5b56abf08125faa30b0a
---
M be/src/catalog/catalog-server.cc
M be/src/catalog/catalog-util.cc
M be/src/catalog/catalog-util.h
M be/src/service/impala-server.cc
M fe/src/main/java/org/apache/impala/analysis/TableSampleClause.java
M fe/src/main/java/org/apache/impala/analysis/TimestampLiteral.java
M fe/src/test/java/org/apache/impala/hive/executor/UdfExecutorTest.java
A tests/custom_cluster/test_compact_catalog_updates.py
8 files changed, 139 insertions(+), 8 deletions(-)
git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/68/7268/8
--
To view, visit http://gerrit.cloudera.org:8080/7268
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I2f725cd8596205e6101d5b56abf08125faa30b0a
Gerrit-PatchSet: 8
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
[Impala-ASF-CR] IMPALA-5500: Reduce catalog update topic size
Posted by "Alex Behm (Code Review)" <ge...@cloudera.org>.
Alex Behm has posted comments on this change.
Change subject: IMPALA-5500: Reduce catalog update topic size
......................................................................
Patch Set 3:
(7 comments)
http://gerrit.cloudera.org:8080/#/c/7268/3/be/src/catalog/catalog-server.cc
File be/src/catalog/catalog-server.cc:
Line 331: status = SerializeAndCompressCatalogObject(catalog_object, item);
Thinking we should have a similar separation of serialization and compression like we do for the decompression.
Line 366: compressed_buffer.resize(result_len);
Remove since we're copying into item.value anyway
Line 367: item.value = string(reinterpret_cast<char*>(compressed_buffer_ptr),
use item.value.assign()
http://gerrit.cloudera.org:8080/#/c/7268/3/be/src/service/impala-server.h
File be/src/service/impala-server.h:
Line 557: std::vector<uint8_t>& data_buffer) WARN_UNUSED_RESULT;
we use pointers for out params, i.e. use std::vector<uint8_t>*
http://gerrit.cloudera.org:8080/#/c/7268/3/be/src/util/compress.h
File be/src/util/compress.h:
Line 113
revert
http://gerrit.cloudera.org:8080/#/c/7268/3/be/src/util/decompress.h
File be/src/util/decompress.h:
Line 86:
revert
http://gerrit.cloudera.org:8080/#/c/7268/2/tests/custom_cluster/test_compact_catalog_updates.py
File tests/custom_cluster/test_compact_catalog_updates.py:
Line 45: assert result.data[0] == "7300"
> Added the result check. Out of curiosity, what's so special about the parqu
Nothing special about Parquet, just another table.
--
To view, visit http://gerrit.cloudera.org:8080/7268
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-MessageType: comment
Gerrit-Change-Id: I2f725cd8596205e6101d5b56abf08125faa30b0a
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5500: Reduce catalog update topic size
Posted by "Dimitris Tsirogiannis (Code Review)" <ge...@cloudera.org>.
Hello Tim Armstrong,
I'd like you to reexamine a change. Please visit
http://gerrit.cloudera.org:8080/7268
to look at the new patch set (#9).
Change subject: IMPALA-5500: Reduce catalog update topic size
......................................................................
IMPALA-5500: Reduce catalog update topic size
Problem: IMPALA-4029 introduced the use of the flatbuffers serialization
libary for storing file and block metadata. That change reduced the
effectiveness of the Thrift compaction protocol (when
--compact_catalog_topic is used), thereby causing a 2X increase in
catalog update topic size when the compact protocol is used.
Fix: LZ4 compress the catalog topic updates before sent to the
statestore when --compact_catalog_topic is set to true.
Results: ~4X reduction in catalog update topic size
Change-Id: I2f725cd8596205e6101d5b56abf08125faa30b0a
---
M be/src/catalog/catalog-server.cc
M be/src/catalog/catalog-util.cc
M be/src/catalog/catalog-util.h
M be/src/service/impala-server.cc
A tests/custom_cluster/test_compact_catalog_updates.py
5 files changed, 136 insertions(+), 5 deletions(-)
git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/68/7268/9
--
To view, visit http://gerrit.cloudera.org:8080/7268
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I2f725cd8596205e6101d5b56abf08125faa30b0a
Gerrit-PatchSet: 9
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>