You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@impala.apache.org by "Vuk Ercegovac (Code Review)" <ge...@cloudera.org> on 2018/02/08 03:04:17 UTC

[Impala-ASF-CR] IMPALA-4475: part 1, reduce size of TExecQueryFInstancesParams

Vuk Ercegovac has uploaded this change for review. ( http://gerrit.cloudera.org:8080/9251


Change subject: IMPALA-4475: part 1, reduce size of TExecQueryFInstancesParams
......................................................................

IMPALA-4475: part 1, reduce size of TExecQueryFInstancesParams

The request sent from coordinator to backends, TExecQueryFInstancesParams,
grows with the number of Partitions and ScanRanges needed
for a query. For a synthetic dataset of 250K partitions, each
with one HDFS file (one block per file), each backend is sent 57 MB.
As the number of backends grows, total network transfer
per query grows linearly. For the example, roughly 90% of
the space is taken by the DescriptorTable.

This change uses LZ4 to compress serialized TDescriptorTables.
The coordinator compresses the TDescriptorTable that its
passed via QueryCtx per query, and each backend decompresses
it when it receives its request from the coordinator.
With compression, the total size sent is reduced from 57 MB to 8.3 MB.

For the example, compression at the coordinator adds on an extra ~0.5s
(out of ~2.8s) and an extra ~0.8 s at each backend. For a small example
(10 partitions), ~0.5ms is added (out of ~12ms) at the coordinator
and ~1ms is added per backend.

Context:
This change is one of a series of changes aimed at reducing
plan (and metadata) size. Additional steps to reduce the size
include:
1) switch the Thrift protocol to use Compact instead of Binary
   (for above example, saves 50%. however, the network-perf-benchmark
    degrades substantially so needs more investigation)

2) factor out format information that's repeated per partition
   (for above example, saves ~15%)

3) simplify partition key expressions
   (expect similar savings as (2))

Testing:
- existing code paths for end-to-end tests cover this path.
- TODO: performance tests. change is in the critical path so may be
  unacceptable as a default.

Change-Id: I195c59efc73e5fd4c310ccfc96b480d2209bde09
---
M be/src/runtime/coordinator.cc
M be/src/runtime/coordinator.h
M be/src/runtime/descriptors.cc
M be/src/runtime/descriptors.h
M be/src/runtime/query-state.cc
M common/thrift/Descriptors.thrift
M common/thrift/ImpalaInternalService.thrift
7 files changed, 125 insertions(+), 9 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/51/9251/1
-- 
To view, visit http://gerrit.cloudera.org:8080/9251
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I195c59efc73e5fd4c310ccfc96b480d2209bde09
Gerrit-Change-Number: 9251
Gerrit-PatchSet: 1
Gerrit-Owner: Vuk Ercegovac <ve...@cloudera.com>

[Impala-ASF-CR] IMPALA-4475: part 1, reduce size of TExecQueryFInstancesParams

Posted by "Philip Zeyliger (Code Review)" <ge...@cloudera.org>.
Philip Zeyliger has posted comments on this change. ( http://gerrit.cloudera.org:8080/9251 )

Change subject: IMPALA-4475: part 1, reduce size of TExecQueryFInstancesParams
......................................................................


Patch Set 1:

The hammer here is to compress all RPCs (perhaps for certain methods, or certain size thresholds). Have we considered that? (It looks like toolchain/thrift-0.9.0-p11/include/thrift/transport/TZlibTransport.h exists. We're only doing KRPC in some code paths.) It may not be pragmatically reasonable, but compressing field-by-field is pretty painful.


-- 
To view, visit http://gerrit.cloudera.org:8080/9251
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I195c59efc73e5fd4c310ccfc96b480d2209bde09
Gerrit-Change-Number: 9251
Gerrit-PatchSet: 1
Gerrit-Owner: Vuk Ercegovac <ve...@cloudera.com>
Gerrit-Reviewer: Philip Zeyliger <ph...@cloudera.com>
Gerrit-Comment-Date: Thu, 08 Feb 2018 05:05:08 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-4475: part 1, reduce size of TExecQueryFInstancesParams

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has abandoned this change. ( http://gerrit.cloudera.org:8080/9251 )

Change subject: IMPALA-4475: part 1, reduce size of TExecQueryFInstancesParams
......................................................................


Abandoned

Appears stale, reopen if you disagree
-- 
To view, visit http://gerrit.cloudera.org:8080/9251
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: abandon
Gerrit-Change-Id: I195c59efc73e5fd4c310ccfc96b480d2209bde09
Gerrit-Change-Number: 9251
Gerrit-PatchSet: 1
Gerrit-Owner: Vuk Ercegovac <ve...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Philip Zeyliger <ph...@cloudera.com>
Gerrit-Reviewer: Thomas Marshall <th...@cmu.edu>
Gerrit-Reviewer: Vuk Ercegovac <ve...@cloudera.com>

[Impala-ASF-CR] IMPALA-4475: part 1, reduce size of TExecQueryFInstancesParams

Posted by "Thomas Tauber-Marshall (Code Review)" <ge...@cloudera.org>.
Thomas Tauber-Marshall has abandoned this change. ( http://gerrit.cloudera.org:8080/9251 )

Change subject: IMPALA-4475: part 1, reduce size of TExecQueryFInstancesParams
......................................................................


Abandoned

We can re-open a new review if we decide to take this task back up.
-- 
To view, visit http://gerrit.cloudera.org:8080/9251
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: abandon
Gerrit-Change-Id: I195c59efc73e5fd4c310ccfc96b480d2209bde09
Gerrit-Change-Number: 9251
Gerrit-PatchSet: 1
Gerrit-Owner: Vuk Ercegovac <ve...@cloudera.com>
Gerrit-Reviewer: Michael Ho <mi...@gmail.com>
Gerrit-Reviewer: Philip Zeyliger <ph...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Vuk Ercegovac <ve...@cloudera.com>

[Impala-ASF-CR] IMPALA-4475: part 1, reduce size of TExecQueryFInstancesParams

Posted by "Michael Ho (Code Review)" <ge...@cloudera.org>.
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/9251 )

Change subject: IMPALA-4475: part 1, reduce size of TExecQueryFInstancesParams
......................................................................


Patch Set 1:

We may consider reviving it as we look into IMPALA-7467. Let's keep it opened for now.


-- 
To view, visit http://gerrit.cloudera.org:8080/9251
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I195c59efc73e5fd4c310ccfc96b480d2209bde09
Gerrit-Change-Number: 9251
Gerrit-PatchSet: 1
Gerrit-Owner: Vuk Ercegovac <ve...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Philip Zeyliger <ph...@cloudera.com>
Gerrit-Reviewer: Thomas Marshall <th...@cmu.edu>
Gerrit-Reviewer: Vuk Ercegovac <ve...@cloudera.com>
Gerrit-Comment-Date: Thu, 06 Dec 2018 18:54:17 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-4475: part 1, reduce size of TExecQueryFInstancesParams

Posted by "Vuk Ercegovac (Code Review)" <ge...@cloudera.org>.
Vuk Ercegovac has posted comments on this change. ( http://gerrit.cloudera.org:8080/9251 )

Change subject: IMPALA-4475: part 1, reduce size of TExecQueryFInstancesParams
......................................................................


Patch Set 1:

yes, that's what the Compact protocol in the change description refers to (as one of the options). that should have been the least invasive yet showed some substantial slow downs in one of the micro benchmarks. needs to be investigated more. I saw the Zlib protocol but opted not to use that (I wanted choice for compression and was unclear how to do this per method without reorganizing the service). I had a prototype for choosing to compress or not based on sizes so I can add that back if needed. Though I am not a fan of adding yet more flags, this one should probably be behind one.

fwiw, we do such field-level compression for row-batches.

other question I have here is how this will live with krpc and eventual transition to protos.


-- 
To view, visit http://gerrit.cloudera.org:8080/9251
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I195c59efc73e5fd4c310ccfc96b480d2209bde09
Gerrit-Change-Number: 9251
Gerrit-PatchSet: 1
Gerrit-Owner: Vuk Ercegovac <ve...@cloudera.com>
Gerrit-Reviewer: Philip Zeyliger <ph...@cloudera.com>
Gerrit-Reviewer: Vuk Ercegovac <ve...@cloudera.com>
Gerrit-Comment-Date: Thu, 08 Feb 2018 05:37:08 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-4475: part 1, reduce size of TExecQueryFInstancesParams

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/9251 )

Change subject: IMPALA-4475: part 1, reduce size of TExecQueryFInstancesParams
......................................................................


Patch Set 1:

I think it's better if we keep it abandoned until someone adopts it and moves it forward though, it gets confusing with all these open code reviews that aren't meant to be reviewed.


-- 
To view, visit http://gerrit.cloudera.org:8080/9251
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I195c59efc73e5fd4c310ccfc96b480d2209bde09
Gerrit-Change-Number: 9251
Gerrit-PatchSet: 1
Gerrit-Owner: Vuk Ercegovac <ve...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Philip Zeyliger <ph...@cloudera.com>
Gerrit-Reviewer: Thomas Marshall <th...@cmu.edu>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Vuk Ercegovac <ve...@cloudera.com>
Gerrit-Comment-Date: Thu, 06 Dec 2018 19:10:18 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-4475: part 1, reduce size of TExecQueryFInstancesParams

Posted by "Philip Zeyliger (Code Review)" <ge...@cloudera.org>.
Philip Zeyliger has posted comments on this change. ( http://gerrit.cloudera.org:8080/9251 )

Change subject: IMPALA-4475: part 1, reduce size of TExecQueryFInstancesParams
......................................................................


Patch Set 1:

(1 comment)

Your answer re: compressing RPCs makes sense.

http://gerrit.cloudera.org:8080/#/c/9251/1/be/src/runtime/descriptors.cc
File be/src/runtime/descriptors.cc:

http://gerrit.cloudera.org:8080/#/c/9251/1/be/src/runtime/descriptors.cc@595
PS1, Line 595: Status DescriptorTbl::CompressThrift(const TDescriptorTable& thrift_tbl,
Do you think we should re-use CompressCatalogObject/DecompressCatalogObject? It looks like there we used a string, with the size at the beginning, rather than a Thrift struct for the compressed item.



-- 
To view, visit http://gerrit.cloudera.org:8080/9251
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I195c59efc73e5fd4c310ccfc96b480d2209bde09
Gerrit-Change-Number: 9251
Gerrit-PatchSet: 1
Gerrit-Owner: Vuk Ercegovac <ve...@cloudera.com>
Gerrit-Reviewer: Philip Zeyliger <ph...@cloudera.com>
Gerrit-Reviewer: Vuk Ercegovac <ve...@cloudera.com>
Gerrit-Comment-Date: Thu, 08 Feb 2018 16:31:28 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-4475: part 1, reduce size of TExecQueryFInstancesParams

Posted by "Michael Ho (Code Review)" <ge...@cloudera.org>.
Michael Ho has restored this change. ( http://gerrit.cloudera.org:8080/9251 )

Change subject: IMPALA-4475: part 1, reduce size of TExecQueryFInstancesParams
......................................................................


Restored

Relevant for IMPALA-7467 as we may still keep a lot of the Plan related structures from Frontend in Thrift.
-- 
To view, visit http://gerrit.cloudera.org:8080/9251
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: restore
Gerrit-Change-Id: I195c59efc73e5fd4c310ccfc96b480d2209bde09
Gerrit-Change-Number: 9251
Gerrit-PatchSet: 1
Gerrit-Owner: Vuk Ercegovac <ve...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Philip Zeyliger <ph...@cloudera.com>
Gerrit-Reviewer: Thomas Marshall <th...@cmu.edu>
Gerrit-Reviewer: Vuk Ercegovac <ve...@cloudera.com>