You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@impala.apache.org by "Thomas Tauber-Marshall (Code Review)" <ge...@cloudera.org> on 2019/11/21 23:44:11 UTC

[Impala-ASF-CR] IMPALA-9181: Serialize TQueryCtx once per query

Thomas Tauber-Marshall has uploaded this change for review. ( http://gerrit.cloudera.org:8080/14777


Change subject: IMPALA-9181: Serialize TQueryCtx once per query
......................................................................

IMPALA-9181: Serialize TQueryCtx once per query

When issuing Exec() rpcs to backends, we currently serialize the
TQueryCtx once per backend. This is inefficient as the TQueryCtx is
the same for all backends and really only needs to be serialized once.

This patch serializes the TQueryCtx in the coordinator and then passes
it to each BackendState when calling Exec().

Testing:
- Passed full run of existing tests.
- Single node perf run showed no significant change.

Change-Id: I6a4dd302fd5602ec2775492a041ddd51e7d7a6c6
---
M be/src/runtime/coordinator-backend-state.cc
M be/src/runtime/coordinator-backend-state.h
M be/src/runtime/coordinator.cc
M be/src/runtime/coordinator.h
M be/src/runtime/query-exec-mgr.cc
M be/src/runtime/query-exec-mgr.h
M be/src/runtime/query-state.cc
M be/src/runtime/query-state.h
M be/src/runtime/test-env.cc
M be/src/service/control-service.cc
M be/src/service/control-service.h
M common/protobuf/control_service.proto
M common/thrift/ImpalaInternalService.thrift
13 files changed, 133 insertions(+), 108 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/77/14777/1
-- 
To view, visit http://gerrit.cloudera.org:8080/14777
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I6a4dd302fd5602ec2775492a041ddd51e7d7a6c6
Gerrit-Change-Number: 14777
Gerrit-PatchSet: 1
Gerrit-Owner: Thomas Tauber-Marshall <tm...@cloudera.com>

[Impala-ASF-CR] IMPALA-9181: Serialize TQueryCtx once per query

Posted by "Michael Ho (Code Review)" <ge...@cloudera.org>.
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/14777 )

Change subject: IMPALA-9181: Serialize TQueryCtx once per query
......................................................................


Patch Set 3: Code-Review+2

(2 comments)

http://gerrit.cloudera.org:8080/#/c/14777/3/be/src/runtime/coordinator-backend-state.cc
File be/src/runtime/coordinator-backend-state.cc:

http://gerrit.cloudera.org:8080/#/c/14777/3/be/src/runtime/coordinator-backend-state.cc@215
PS3, Line 215:   unique_ptr<kudu::faststring> sidecar_buf = make_unique<kudu::faststring>();
             :   sidecar_buf->assign_copy(serialized_buf, serialized_len);
             :   unique_ptr<RpcSidecar> rpc_sidecar = RpcSidecar::FromFaststring(move(sidecar_buf));
Not this change but this can in theory be converted to sidecar using Slice too. That said, we may better hold off from doing so now until we convert ExecQueryFInstance() RPC to asynchronous as the memory management is slightly different in that case.

May be a TODO for now is sufficient.


http://gerrit.cloudera.org:8080/#/c/14777/3/be/src/service/control-service.cc
File be/src/service/control-service.cc:

http://gerrit.cloudera.org:8080/#/c/14777/3/be/src/service/control-service.cc@120
PS3, Line 120: Status GetSidecar(int sidecar_idx, RpcContext* rpc_context, T* thrift_obj) {
static



-- 
To view, visit http://gerrit.cloudera.org:8080/14777
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I6a4dd302fd5602ec2775492a041ddd51e7d7a6c6
Gerrit-Change-Number: 14777
Gerrit-PatchSet: 3
Gerrit-Owner: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Sahil Takiar <st...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Comment-Date: Wed, 27 Nov 2019 00:09:52 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-9181: Serialize TQueryCtx once per query

Posted by "Thomas Tauber-Marshall (Code Review)" <ge...@cloudera.org>.
Thomas Tauber-Marshall has posted comments on this change. ( http://gerrit.cloudera.org:8080/14777 )

Change subject: IMPALA-9181: Serialize TQueryCtx once per query
......................................................................


Patch Set 4: Code-Review+2

(2 comments)

carrying forward

http://gerrit.cloudera.org:8080/#/c/14777/3/be/src/runtime/coordinator-backend-state.cc
File be/src/runtime/coordinator-backend-state.cc:

http://gerrit.cloudera.org:8080/#/c/14777/3/be/src/runtime/coordinator-backend-state.cc@215
PS3, Line 215:   // TODO: eliminate the extra copy here by using a Slice
             :   unique_ptr<kudu::faststring> sidecar_buf = make_unique<kudu::faststring>();
             :   sidecar_buf->assign_copy(serialized_buf, serialized_len);
> Not this change but this can in theory be converted to sidecar using Slice 
Done


http://gerrit.cloudera.org:8080/#/c/14777/3/be/src/service/control-service.cc
File be/src/service/control-service.cc:

http://gerrit.cloudera.org:8080/#/c/14777/3/be/src/service/control-service.cc@120
PS3, Line 120: static Status GetSidecar(int sidecar_idx, RpcContext* rpc_context, T* thrift_obj) {
> static
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/14777
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I6a4dd302fd5602ec2775492a041ddd51e7d7a6c6
Gerrit-Change-Number: 14777
Gerrit-PatchSet: 4
Gerrit-Owner: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Sahil Takiar <st...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Comment-Date: Tue, 03 Dec 2019 18:58:50 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-9181: Serialize TQueryCtx once per query

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/14777 )

Change subject: IMPALA-9181: Serialize TQueryCtx once per query
......................................................................


Patch Set 5:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/5309/ DRY_RUN=false


-- 
To view, visit http://gerrit.cloudera.org:8080/14777
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I6a4dd302fd5602ec2775492a041ddd51e7d7a6c6
Gerrit-Change-Number: 14777
Gerrit-PatchSet: 5
Gerrit-Owner: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Sahil Takiar <st...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Comment-Date: Tue, 03 Dec 2019 18:59:08 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-9181: Serialize TQueryCtx once per query

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/14777 )

Change subject: IMPALA-9181: Serialize TQueryCtx once per query
......................................................................


Patch Set 2:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/5150/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/14777
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I6a4dd302fd5602ec2775492a041ddd51e7d7a6c6
Gerrit-Change-Number: 14777
Gerrit-PatchSet: 2
Gerrit-Owner: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Sahil Takiar <st...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Comment-Date: Tue, 26 Nov 2019 22:42:41 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-9181: Serialize TQueryCtx once per query

Posted by "Sahil Takiar (Code Review)" <ge...@cloudera.org>.
Sahil Takiar has posted comments on this change. ( http://gerrit.cloudera.org:8080/14777 )

Change subject: IMPALA-9181: Serialize TQueryCtx once per query
......................................................................


Patch Set 3: Code-Review+1


-- 
To view, visit http://gerrit.cloudera.org:8080/14777
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I6a4dd302fd5602ec2775492a041ddd51e7d7a6c6
Gerrit-Change-Number: 14777
Gerrit-PatchSet: 3
Gerrit-Owner: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Sahil Takiar <st...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Comment-Date: Tue, 26 Nov 2019 22:36:31 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-9181: Serialize TQueryCtx once per query

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/14777 )

Change subject: IMPALA-9181: Serialize TQueryCtx once per query
......................................................................

IMPALA-9181: Serialize TQueryCtx once per query

When issuing Exec() rpcs to backends, we currently serialize the
TQueryCtx once per backend. This is inefficient as the TQueryCtx is
the same for all backends and really only needs to be serialized once.

Serializing the TQueryCtx can be expensive as it contains both the
full text of the original query and the descriptor table, which can be
quite large. In a synthetic dataset I tested with, scanning a table
with 100k partitions leads to a descriptor table size of ~20MB.

This patch serializes the TQueryCtx in the coordinator and then passes
it to each BackendState when calling Exec().

Followup work might consider if we really need all of the info in the
TQueryCtx to be distributed to all backends.

Testing:
- Passed full run of existing tests.
- Single node perf run showed no significant change.

Change-Id: I6a4dd302fd5602ec2775492a041ddd51e7d7a6c6
Reviewed-on: http://gerrit.cloudera.org:8080/14777
Reviewed-by: Impala Public Jenkins <im...@cloudera.com>
Tested-by: Impala Public Jenkins <im...@cloudera.com>
---
M be/src/runtime/coordinator-backend-state.cc
M be/src/runtime/coordinator-backend-state.h
M be/src/runtime/coordinator.cc
M be/src/runtime/coordinator.h
M be/src/runtime/query-exec-mgr.cc
M be/src/runtime/query-exec-mgr.h
M be/src/runtime/query-state.cc
M be/src/runtime/query-state.h
M be/src/runtime/test-env.cc
M be/src/service/control-service.cc
M be/src/service/control-service.h
M common/protobuf/control_service.proto
M common/thrift/ImpalaInternalService.thrift
13 files changed, 139 insertions(+), 108 deletions(-)

Approvals:
  Impala Public Jenkins: Looks good to me, approved; Verified

-- 
To view, visit http://gerrit.cloudera.org:8080/14777
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I6a4dd302fd5602ec2775492a041ddd51e7d7a6c6
Gerrit-Change-Number: 14777
Gerrit-PatchSet: 6
Gerrit-Owner: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Sahil Takiar <st...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>

[Impala-ASF-CR] 1;10;0cIMPALA-9181: Serialize TQueryCtx once per query

Posted by "Thomas Tauber-Marshall (Code Review)" <ge...@cloudera.org>.
Thomas Tauber-Marshall has posted comments on this change. ( http://gerrit.cloudera.org:8080/14777 )

Change subject: 1;10;0cIMPALA-9181: Serialize TQueryCtx once per query
......................................................................


Patch Set 2:

(6 comments)

http://gerrit.cloudera.org:8080/#/c/14777/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/14777/1//COMMIT_MSG@10
PS1, Line 10: TQueryCtx
> why is serializing this per fragment expensive, is it typically a very larg
Done


http://gerrit.cloudera.org:8080/#/c/14777/1/be/src/runtime/coordinator-backend-state.h
File be/src/runtime/coordinator-backend-state.h:

http://gerrit.cloudera.org:8080/#/c/14777/1/be/src/runtime/coordinator-backend-state.h@85
PS1, Line 85: const kudu::Slice& serialized_query_ctx
> nit: pass via constructor instead? since that is where query_ctx is passed 
That makes the memory management more complicated, since we don't want the serialized_query_ctx memory to be used except for when Exec() is running.

The only thing we use the query_ctx from the constructor for is to get the TQueryOptions, so we could just pass that in instead of the TQueryCtx if you think that makes it clearer what's going on.


http://gerrit.cloudera.org:8080/#/c/14777/1/be/src/runtime/query-exec-mgr.cc
File be/src/runtime/query-exec-mgr.cc:

http://gerrit.cloudera.org:8080/#/c/14777/1/be/src/runtime/query-exec-mgr.cc@50
PS1, Line 50:  bool dummy;
            :   QueryState* qs =
            :       GetOrCreateQueryState(query_ctx, request->per_backend_mem_limit(), &dummy);
> not your code, but is it possible for the QueryState to already exist, or c
Yes, its possible for it to already exist if this is the coordinator, as the coordinator calls CreateQueryState() in Coordinator::Exec().


http://gerrit.cloudera.org:8080/#/c/14777/1/be/src/service/control-service.cc
File be/src/service/control-service.cc:

http://gerrit.cloudera.org:8080/#/c/14777/1/be/src/service/control-service.cc@118
PS1, Line 118: // 'thrift_obj'.
> nit: add some method code comments
Done


http://gerrit.cloudera.org:8080/#/c/14777/1/be/src/service/control-service.cc@135
PS1, Line 135: e sidecars. The QueryState will mak
> probably could do this if you make query_ctx a unique_ptr and just transfer
Yeah, I think I would prefer to leave this as a followup task rather than making this patch more complicated.


http://gerrit.cloudera.org:8080/#/c/14777/1/common/thrift/ImpalaInternalService.thrift
File common/thrift/ImpalaInternalService.thrift:

http://gerrit.cloudera.org:8080/#/c/14777/1/common/thrift/ImpalaInternalService.thrift@658
PS1, Line 658: er than fully coverti
> could you add some more docs to this? it's not that clear to me what this e
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/14777
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I6a4dd302fd5602ec2775492a041ddd51e7d7a6c6
Gerrit-Change-Number: 14777
Gerrit-PatchSet: 2
Gerrit-Owner: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Sahil Takiar <st...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Comment-Date: Tue, 26 Nov 2019 22:00:00 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-9181: Serialize TQueryCtx once per query

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/14777 )

Change subject: IMPALA-9181: Serialize TQueryCtx once per query
......................................................................


Patch Set 3:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/5151/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/14777
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I6a4dd302fd5602ec2775492a041ddd51e7d7a6c6
Gerrit-Change-Number: 14777
Gerrit-PatchSet: 3
Gerrit-Owner: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Sahil Takiar <st...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Comment-Date: Tue, 26 Nov 2019 22:49:04 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-9181: Serialize TQueryCtx once per query

Posted by "Sahil Takiar (Code Review)" <ge...@cloudera.org>.
Sahil Takiar has posted comments on this change. ( http://gerrit.cloudera.org:8080/14777 )

Change subject: IMPALA-9181: Serialize TQueryCtx once per query
......................................................................


Patch Set 1:

(6 comments)

first pass

http://gerrit.cloudera.org:8080/#/c/14777/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/14777/1//COMMIT_MSG@10
PS1, Line 10: TQueryCtx
why is serializing this per fragment expensive, is it typically a very large object? does each fragment need all the info in the TQueryCtx?


http://gerrit.cloudera.org:8080/#/c/14777/1/be/src/runtime/coordinator-backend-state.h
File be/src/runtime/coordinator-backend-state.h:

http://gerrit.cloudera.org:8080/#/c/14777/1/be/src/runtime/coordinator-backend-state.h@85
PS1, Line 85: const kudu::Slice& serialized_query_ctx
nit: pass via constructor instead? since that is where query_ctx is passed as well.


http://gerrit.cloudera.org:8080/#/c/14777/1/be/src/runtime/query-exec-mgr.cc
File be/src/runtime/query-exec-mgr.cc:

http://gerrit.cloudera.org:8080/#/c/14777/1/be/src/runtime/query-exec-mgr.cc@50
PS1, Line 50:  bool dummy;
            :   QueryState* qs =
            :       GetOrCreateQueryState(query_ctx, request->per_backend_mem_limit(), &dummy);
not your code, but is it possible for the QueryState to already exist, or can we use CreateQueryState instead?


http://gerrit.cloudera.org:8080/#/c/14777/1/be/src/service/control-service.cc
File be/src/service/control-service.cc:

http://gerrit.cloudera.org:8080/#/c/14777/1/be/src/service/control-service.cc@118
PS1, Line 118: Status GetSidecar(int sidecar_idx, RpcContext* rpc_context, T* thrift_obj) {
nit: add some method code comments


http://gerrit.cloudera.org:8080/#/c/14777/1/be/src/service/control-service.cc@135
PS1, Line 135: TODO: can we avoid this extra copy?
probably could do this if you make query_ctx a unique_ptr and just transfer ownership from this method --> StartQuery --> GetOrCreateQueryState --> QueryState c'tor. It might be a bit of a pain to re-factor this all though, so its up to you.


http://gerrit.cloudera.org:8080/#/c/14777/1/common/thrift/ImpalaInternalService.thrift
File common/thrift/ImpalaInternalService.thrift:

http://gerrit.cloudera.org:8080/#/c/14777/1/common/thrift/ImpalaInternalService.thrift@658
PS1, Line 658: TExecPlanFragmentInfo
could you add some more docs to this? it's not that clear to me what this encapsulates / why adding this info as a sidecar to ExecQueryFInstances is necessary.



-- 
To view, visit http://gerrit.cloudera.org:8080/14777
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I6a4dd302fd5602ec2775492a041ddd51e7d7a6c6
Gerrit-Change-Number: 14777
Gerrit-PatchSet: 1
Gerrit-Owner: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Sahil Takiar <st...@cloudera.com>
Gerrit-Comment-Date: Mon, 25 Nov 2019 18:32:10 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-9181: Serialize TQueryCtx once per query

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/14777 )

Change subject: IMPALA-9181: Serialize TQueryCtx once per query
......................................................................


Patch Set 5: Verified+1


-- 
To view, visit http://gerrit.cloudera.org:8080/14777
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I6a4dd302fd5602ec2775492a041ddd51e7d7a6c6
Gerrit-Change-Number: 14777
Gerrit-PatchSet: 5
Gerrit-Owner: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Sahil Takiar <st...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Comment-Date: Tue, 03 Dec 2019 23:26:24 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-9181: Serialize TQueryCtx once per query

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/14777 )

Change subject: IMPALA-9181: Serialize TQueryCtx once per query
......................................................................


Patch Set 1:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/5107/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/14777
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I6a4dd302fd5602ec2775492a041ddd51e7d7a6c6
Gerrit-Change-Number: 14777
Gerrit-PatchSet: 1
Gerrit-Owner: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Sahil Takiar <st...@cloudera.com>
Gerrit-Comment-Date: Fri, 22 Nov 2019 00:28:38 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-9181: Serialize TQueryCtx once per query

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/14777 )

Change subject: IMPALA-9181: Serialize TQueryCtx once per query
......................................................................


Patch Set 4:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/5185/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/14777
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I6a4dd302fd5602ec2775492a041ddd51e7d7a6c6
Gerrit-Change-Number: 14777
Gerrit-PatchSet: 4
Gerrit-Owner: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Sahil Takiar <st...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Comment-Date: Tue, 03 Dec 2019 19:27:43 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] 1;10;0cIMPALA-9181: Serialize TQueryCtx once per query

Posted by "Thomas Tauber-Marshall (Code Review)" <ge...@cloudera.org>.
Hello Michael Ho, Sahil Takiar, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/14777

to look at the new patch set (#2).

Change subject: 1;10;0cIMPALA-9181: Serialize TQueryCtx once per query
......................................................................

1;10;0cIMPALA-9181: Serialize TQueryCtx once per query

When issuing Exec() rpcs to backends, we currently serialize the
TQueryCtx once per backend. This is inefficient as the TQueryCtx is
the same for all backends and really only needs to be serialized once.

Serializing the TQueryCtx can be expensive as it contains both the
full text of the original query and the descriptor table, which can be
quite large. In a synthetic dataset I tested with, scanning a table
with 100k partitions leads to a descriptor table size of ~20MB.

This patch serializes the TQueryCtx in the coordinator and then passes
it to each BackendState when calling Exec().

Followup work might consider if we really need all of the info in the
TQueryCtx to be distributed to all backends.

Testing:
- Passed full run of existing tests.
- Single node perf run showed no significant change.

Change-Id: I6a4dd302fd5602ec2775492a041ddd51e7d7a6c6
---
M be/src/runtime/coordinator-backend-state.cc
M be/src/runtime/coordinator-backend-state.h
M be/src/runtime/coordinator.cc
M be/src/runtime/coordinator.h
M be/src/runtime/query-exec-mgr.cc
M be/src/runtime/query-exec-mgr.h
M be/src/runtime/query-state.cc
M be/src/runtime/query-state.h
M be/src/runtime/test-env.cc
M be/src/service/control-service.cc
M be/src/service/control-service.h
M common/protobuf/control_service.proto
M common/thrift/ImpalaInternalService.thrift
13 files changed, 138 insertions(+), 108 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/77/14777/2
-- 
To view, visit http://gerrit.cloudera.org:8080/14777
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I6a4dd302fd5602ec2775492a041ddd51e7d7a6c6
Gerrit-Change-Number: 14777
Gerrit-PatchSet: 2
Gerrit-Owner: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Sahil Takiar <st...@cloudera.com>

[Impala-ASF-CR] IMPALA-9181: Serialize TQueryCtx once per query

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/14777 )

Change subject: IMPALA-9181: Serialize TQueryCtx once per query
......................................................................


Patch Set 5: Code-Review+2


-- 
To view, visit http://gerrit.cloudera.org:8080/14777
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I6a4dd302fd5602ec2775492a041ddd51e7d7a6c6
Gerrit-Change-Number: 14777
Gerrit-PatchSet: 5
Gerrit-Owner: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Sahil Takiar <st...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Comment-Date: Tue, 03 Dec 2019 18:59:07 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-9181: Serialize TQueryCtx once per query

Posted by "Thomas Tauber-Marshall (Code Review)" <ge...@cloudera.org>.
Hello Michael Ho, Sahil Takiar, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/14777

to look at the new patch set (#4).

Change subject: IMPALA-9181: Serialize TQueryCtx once per query
......................................................................

IMPALA-9181: Serialize TQueryCtx once per query

When issuing Exec() rpcs to backends, we currently serialize the
TQueryCtx once per backend. This is inefficient as the TQueryCtx is
the same for all backends and really only needs to be serialized once.

Serializing the TQueryCtx can be expensive as it contains both the
full text of the original query and the descriptor table, which can be
quite large. In a synthetic dataset I tested with, scanning a table
with 100k partitions leads to a descriptor table size of ~20MB.

This patch serializes the TQueryCtx in the coordinator and then passes
it to each BackendState when calling Exec().

Followup work might consider if we really need all of the info in the
TQueryCtx to be distributed to all backends.

Testing:
- Passed full run of existing tests.
- Single node perf run showed no significant change.

Change-Id: I6a4dd302fd5602ec2775492a041ddd51e7d7a6c6
---
M be/src/runtime/coordinator-backend-state.cc
M be/src/runtime/coordinator-backend-state.h
M be/src/runtime/coordinator.cc
M be/src/runtime/coordinator.h
M be/src/runtime/query-exec-mgr.cc
M be/src/runtime/query-exec-mgr.h
M be/src/runtime/query-state.cc
M be/src/runtime/query-state.h
M be/src/runtime/test-env.cc
M be/src/service/control-service.cc
M be/src/service/control-service.h
M common/protobuf/control_service.proto
M common/thrift/ImpalaInternalService.thrift
13 files changed, 139 insertions(+), 108 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/77/14777/4
-- 
To view, visit http://gerrit.cloudera.org:8080/14777
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I6a4dd302fd5602ec2775492a041ddd51e7d7a6c6
Gerrit-Change-Number: 14777
Gerrit-PatchSet: 4
Gerrit-Owner: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Sahil Takiar <st...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>

[Impala-ASF-CR] IMPALA-9181: Serialize TQueryCtx once per query

Posted by "Sahil Takiar (Code Review)" <ge...@cloudera.org>.
Sahil Takiar has posted comments on this change. ( http://gerrit.cloudera.org:8080/14777 )

Change subject: IMPALA-9181: Serialize TQueryCtx once per query
......................................................................


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/14777/1/be/src/runtime/coordinator-backend-state.h
File be/src/runtime/coordinator-backend-state.h:

http://gerrit.cloudera.org:8080/#/c/14777/1/be/src/runtime/coordinator-backend-state.h@85
PS1, Line 85: const kudu::Slice& serialized_query_ctx
> That makes the memory management more complicated, since we don't want the 
No, I think it should be fine as is.



-- 
To view, visit http://gerrit.cloudera.org:8080/14777
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I6a4dd302fd5602ec2775492a041ddd51e7d7a6c6
Gerrit-Change-Number: 14777
Gerrit-PatchSet: 1
Gerrit-Owner: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Sahil Takiar <st...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Comment-Date: Tue, 26 Nov 2019 22:36:24 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-9181: Serialize TQueryCtx once per query

Posted by "Thomas Tauber-Marshall (Code Review)" <ge...@cloudera.org>.
Hello Michael Ho, Sahil Takiar, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/14777

to look at the new patch set (#3).

Change subject: IMPALA-9181: Serialize TQueryCtx once per query
......................................................................

IMPALA-9181: Serialize TQueryCtx once per query

When issuing Exec() rpcs to backends, we currently serialize the
TQueryCtx once per backend. This is inefficient as the TQueryCtx is
the same for all backends and really only needs to be serialized once.

Serializing the TQueryCtx can be expensive as it contains both the
full text of the original query and the descriptor table, which can be
quite large. In a synthetic dataset I tested with, scanning a table
with 100k partitions leads to a descriptor table size of ~20MB.

This patch serializes the TQueryCtx in the coordinator and then passes
it to each BackendState when calling Exec().

Followup work might consider if we really need all of the info in the
TQueryCtx to be distributed to all backends.

Testing:
- Passed full run of existing tests.
- Single node perf run showed no significant change.

Change-Id: I6a4dd302fd5602ec2775492a041ddd51e7d7a6c6
---
M be/src/runtime/coordinator-backend-state.cc
M be/src/runtime/coordinator-backend-state.h
M be/src/runtime/coordinator.cc
M be/src/runtime/coordinator.h
M be/src/runtime/query-exec-mgr.cc
M be/src/runtime/query-exec-mgr.h
M be/src/runtime/query-state.cc
M be/src/runtime/query-state.h
M be/src/runtime/test-env.cc
M be/src/service/control-service.cc
M be/src/service/control-service.h
M common/protobuf/control_service.proto
M common/thrift/ImpalaInternalService.thrift
13 files changed, 138 insertions(+), 108 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/77/14777/3
-- 
To view, visit http://gerrit.cloudera.org:8080/14777
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I6a4dd302fd5602ec2775492a041ddd51e7d7a6c6
Gerrit-Change-Number: 14777
Gerrit-PatchSet: 3
Gerrit-Owner: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Sahil Takiar <st...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>