You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@impala.apache.org by "Michael Ho (Code Review)" <ge...@cloudera.org> on 2018/07/03 00:58:21 UTC

[Impala-ASF-CR] IMPALA-5486: Port ReportExecStatus() RPC to use KRPC

Michael Ho has uploaded this change for review. ( http://gerrit.cloudera.org:8080/10855


Change subject: IMPALA-5486: Port ReportExecStatus() RPC to use KRPC
......................................................................

IMPALA-5486: Port ReportExecStatus() RPC to use KRPC

This change converts ReportExecStatus() RPC from thrift
based RPC to KRPC. This is done in part of the preparation
for fixing IMPALA-2990 as we can take advantage of TCP connection
multiplexing in KRPC to avoid overwhelming the coordinator
with too many connections by reducing the number of TCP connection
to one for each executor.

This patch also introduces a new service pool for all query execution
control related RPCs in the future so they control commands from
coordinators aren't blocked by long-running DataStream services' RPCs.
The majority of this patch is mechanical convertion of some Thrift
structures to Protobuf. Note that the runtime profile is still retained
as Thrift structure as Impala clients will still fetch query profile in
Thrift. This avoids duplicated serialization implementation in both
Thrift and Protobuf for the runtime profile. The Thrift runtime
profiles are serialized and sent as a sidecar in ReportExecStatus() RPC.

Change-Id: I7638583b433dcac066b87198e448743d90415ebe
---
M be/src/benchmarks/expr-benchmark.cc
M be/src/exec/data-sink.cc
M be/src/exec/data-sink.h
M be/src/exec/hbase-table-sink.cc
M be/src/exec/hdfs-parquet-table-writer.cc
M be/src/exec/hdfs-parquet-table-writer.h
M be/src/exec/hdfs-table-sink.cc
M be/src/exec/hdfs-table-writer.cc
M be/src/exec/hdfs-table-writer.h
M be/src/rpc/thrift-util.h
M be/src/runtime/backend-client.h
M be/src/runtime/coordinator-backend-state.cc
M be/src/runtime/coordinator-backend-state.h
M be/src/runtime/coordinator.cc
M be/src/runtime/coordinator.h
M be/src/runtime/data-stream-test.cc
M be/src/runtime/dml-exec-state.cc
M be/src/runtime/dml-exec-state.h
M be/src/runtime/exec-env.cc
M be/src/runtime/exec-env.h
M be/src/runtime/fragment-instance-state.cc
M be/src/runtime/fragment-instance-state.h
M be/src/runtime/query-state.cc
M be/src/runtime/runtime-state.cc
M be/src/runtime/runtime-state.h
M be/src/service/CMakeLists.txt
M be/src/service/client-request-state.cc
M be/src/service/client-request-state.h
A be/src/service/control-service.cc
A be/src/service/control-service.h
M be/src/service/impala-internal-service.cc
M be/src/service/impala-internal-service.h
M be/src/service/impala-server.cc
M be/src/service/impala-server.h
M be/src/testutil/in-process-servers.cc
M be/src/util/container-util.h
A be/src/util/error-util-internal.h
M be/src/util/error-util-test.cc
M be/src/util/error-util.cc
M be/src/util/error-util.h
M be/src/util/runtime-profile.cc
M be/src/util/runtime-profile.h
M be/src/util/uid-util.h
M bin/bootstrap_toolchain.py
M bin/impala-config.sh
M common/protobuf/CMakeLists.txt
M common/protobuf/common.proto
A common/protobuf/control_service.proto
M common/protobuf/data_stream_service.proto
M common/protobuf/row_batch.proto
M common/protobuf/rpc_test.proto
M common/thrift/ImpalaInternalService.thrift
52 files changed, 942 insertions(+), 555 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/55/10855/1
-- 
To view, visit http://gerrit.cloudera.org:8080/10855
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I7638583b433dcac066b87198e448743d90415ebe
Gerrit-Change-Number: 10855
Gerrit-PatchSet: 1
Gerrit-Owner: Michael Ho <kw...@cloudera.com>

[Impala-ASF-CR] IMPALA-7213, IMPALA-7241: Port ReportExecStatus() RPC to use KRPC

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/10855 )

Change subject: IMPALA-7213, IMPALA-7241: Port ReportExecStatus() RPC to use KRPC
......................................................................


Patch Set 12:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/608/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/10855
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7638583b433dcac066b87198e448743d90415ebe
Gerrit-Change-Number: 10855
Gerrit-PatchSet: 12
Gerrit-Owner: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Thu, 06 Sep 2018 18:23:34 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-7213: Port ReportExecStatus() RPC to use KRPC

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/10855 )

Change subject: IMPALA-7213: Port ReportExecStatus() RPC to use KRPC
......................................................................


Patch Set 5:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/136/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/10855
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7638583b433dcac066b87198e448743d90415ebe
Gerrit-Change-Number: 10855
Gerrit-PatchSet: 5
Gerrit-Owner: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Wed, 01 Aug 2018 19:37:46 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-7213, IMPALA-7241: Port ReportExecStatus() RPC to use KRPC

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/10855 )

Change subject: IMPALA-7213, IMPALA-7241: Port ReportExecStatus() RPC to use KRPC
......................................................................


Patch Set 20:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/1149/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/10855
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7638583b433dcac066b87198e448743d90415ebe
Gerrit-Change-Number: 10855
Gerrit-PatchSet: 20
Gerrit-Owner: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Michal Ostrowski <mo...@cloudera.com>
Gerrit-Reviewer: Thomas Marshall <th...@cmu.edu>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Thu, 25 Oct 2018 03:00:13 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-7213: Port ReportExecStatus() RPC to use KRPC

Posted by "Michael Ho (Code Review)" <ge...@cloudera.org>.
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/10855 )

Change subject: IMPALA-7213: Port ReportExecStatus() RPC to use KRPC
......................................................................


Patch Set 3:

Still working on some targeted BE tests for some error cases but please feel free to go ahead to do another pass.


-- 
To view, visit http://gerrit.cloudera.org:8080/10855
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7638583b433dcac066b87198e448743d90415ebe
Gerrit-Change-Number: 10855
Gerrit-PatchSet: 3
Gerrit-Owner: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>
Gerrit-Comment-Date: Wed, 25 Jul 2018 00:01:16 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-7213, IMPALA-7241: Port ReportExecStatus() RPC to use KRPC

Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Todd Lipcon has posted comments on this change. ( http://gerrit.cloudera.org:8080/10855 )

Change subject: IMPALA-7213, IMPALA-7241: Port ReportExecStatus() RPC to use KRPC
......................................................................


Patch Set 16:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/10855/16/be/src/rpc/rpc-mgr.h
File be/src/rpc/rpc-mgr.h:

http://gerrit.cloudera.org:8080/#/c/10855/16/be/src/rpc/rpc-mgr.h@146
PS16, Line 146: service_name
admittedly it's clever that you were able to hack the username in order to get separate connections, but I'm finding it somewhat strange to follow when looking at the code.

Here, it's not really clear why you need to pass the service name if you're already passing the service class type -- the service name is already available there by calling P::static_service_name(), so the parameter seems redundant.

Given that, as more of these services get converted over to KRPC, we probably just want a 'data plane' and 'control plane', maybe it makes more sense to add a new enum type like 'CommunicationPlane' with options kDataPlane and kControlPlane? Internally, you could still implement this by hacking the username for now, but at least it makes it clearer what the purpose of this parameter is.


http://gerrit.cloudera.org:8080/#/c/10855/16/be/src/runtime/coordinator-backend-state.cc
File be/src/runtime/coordinator-backend-state.cc:

http://gerrit.cloudera.org:8080/#/c/10855/16/be/src/runtime/coordinator-backend-state.cc@506
PS16, Line 506:   {
why is this extra indentation block here? doesn't seem like there is any RAII or lock acquisition that needs this scope


http://gerrit.cloudera.org:8080/#/c/10855/16/be/src/runtime/fragment-instance-state.h
File be/src/runtime/fragment-instance-state.h:

http://gerrit.cloudera.org:8080/#/c/10855/16/be/src/runtime/fragment-instance-state.h@107
PS16, Line 107: const
nit: const non-reference parameters dont make much sense


http://gerrit.cloudera.org:8080/#/c/10855/12/common/protobuf/common.proto
File common/protobuf/common.proto:

http://gerrit.cloudera.org:8080/#/c/10855/12/common/protobuf/common.proto@33
PS12, Line 33: fixed
> Agreed that they don't tend to be small. Does using fixed64 make the encodi
yea, not that it really matters for this use case, but figured it's more instructive



-- 
To view, visit http://gerrit.cloudera.org:8080/10855
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7638583b433dcac066b87198e448743d90415ebe
Gerrit-Change-Number: 10855
Gerrit-PatchSet: 16
Gerrit-Owner: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Michal Ostrowski <mo...@cloudera.com>
Gerrit-Reviewer: Thomas Marshall <th...@cmu.edu>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Wed, 10 Oct 2018 23:18:32 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-7213, IMPALA-7241: Port ReportExecStatus() RPC to use KRPC

Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Todd Lipcon has posted comments on this change. ( http://gerrit.cloudera.org:8080/10855 )

Change subject: IMPALA-7213, IMPALA-7241: Port ReportExecStatus() RPC to use KRPC
......................................................................


Patch Set 11:

(15 comments)

http://gerrit.cloudera.org:8080/#/c/10855/10/be/src/exec/hdfs-parquet-table-writer.h
File be/src/exec/hdfs-parquet-table-writer.h:

http://gerrit.cloudera.org:8080/#/c/10855/10/be/src/exec/hdfs-parquet-table-writer.h@199
PS10, Line 199:   ParquetInsertStatsPB parquet_insert_stats_;
nit: should #include the appropriate .pb.h here ("include-what-you-use")


http://gerrit.cloudera.org:8080/#/c/10855/10/be/src/runtime/coordinator-backend-state.h
File be/src/runtime/coordinator-backend-state.h:

http://gerrit.cloudera.org:8080/#/c/10855/10/be/src/runtime/coordinator-backend-state.h@60
PS10, Line 60: Coordinator* coord
I think this can probably change back to being const if you take the suggestion below. Also please add a doc comment indicating that the Coordinator must remain valid until the BackendState object is destructed.


http://gerrit.cloudera.org:8080/#/c/10855/10/be/src/runtime/coordinator-backend-state.h@176
PS10, Line 176: last_report_ti
nit: I think the term "sequence number" is more usual here -- "version" to me sounds like each update is always a "replace the previous update", whereas "sequence" indicates that there should not be gaps


http://gerrit.cloudera.org:8080/#/c/10855/10/be/src/runtime/coordinator-backend-state.h@220
PS10, Line 220:   /// Reference to the owning coordinator's 'dml_exec_state_'.
This "back pointer" still seems error-prone to me. I think the object lifetimes and relationships will be clearer if you pass this in as an out-argument to every call to ApplyExecStatusReport(), so that the struct becomes single-owner without "escaping" pointers.


http://gerrit.cloudera.org:8080/#/c/10855/10/be/src/runtime/coordinator-backend-state.cc
File be/src/runtime/coordinator-backend-state.cc:

http://gerrit.cloudera.org:8080/#/c/10855/10/be/src/runtime/coordinator-backend-state.cc@265
PS10, Line 265: }
              : 
isn't it possible (though unlikely) that you have some old report sitting in the service queue long enough that you could get some late delivery with a sequence number that is < last_report_version, not just equal to?


http://gerrit.cloudera.org:8080/#/c/10855/10/be/src/runtime/coordinator-backend-state.cc@267
PS10, Line 267: inline bool Coordinator::BackendState::IsDone() const {
I think a VLOG_QUERY about the skipped RPC is probably useful


http://gerrit.cloudera.org:8080/#/c/10855/10/be/src/runtime/coordinator-backend-state.cc@294
PS10, Line 294:     instance_stats->Update(
nit: why not:

 const Status& instance_status = instance_exec_status.status();

it's a bit more conventional for making a copy (if necessary) and will also just take a reference if safe to do so.


http://gerrit.cloudera.org:8080/#/c/10855/10/be/src/runtime/query-state.cc
File be/src/runtime/query-state.cc:

http://gerrit.cloudera.org:8080/#/c/10855/10/be/src/runtime/query-state.cc@287
PS10, Line 287: 
if I understand the threading correctly, there are two entry points for sending a report: one is periodic during the query, and the other is on query completion ("done"). Once the "done" report is sent, any other reports will be ignored.

Does this mean that it's possible we can race as follows?

- periodic report constructs a request including calling GetUnreportedErrors
- perhaps gets a SerivceUnavailable and goes to sleep
- the query finishes and triggers a 'done' report, with no errors (because they got consumed by the above already)
- the first report retries and its report gets ignored because the fragment is already "done"

thus we'd be losing the reported errors. Perhaps this is a known/existing bug, but maybe its' worth documenting the behavior in a TODO here if not a JIRA?

On a related note, I'm afraid the following race might cause a problem:

- exec thread reporting "done" allocates seqno 10
- makes a request, gets "unavailable" or whatever (or just goes to sleep for a few millis)
- profile reporting thread allocates seqno 11 and sends a request, which is accepted by the server
-- the server side increases the sequence number to 11
- the exec thread sends the "I'm done" message with seqno 10
- the server now ignores the "done" request and the query hangs

Does that seem like a possibility to you? Maybe we could trigger such a race in a test by setting the service pool size for this endpoint to e smaller than the number of fragments and increasing the report frequency to once every millisecond or something? Given that the effect of a bug in this area is serious (hung queries) we should be pretty careful about testing I think.


http://gerrit.cloudera.org:8080/#/c/10855/10/be/src/runtime/query-state.cc@375
PS10, Line 375:         MonoDelta::FromMilliseconds(FLAGS_backend_client_rpc_timeout_ms));
should we have a failure injection point on the RPC itself? I only saw failure injection on the serialization of the profile


http://gerrit.cloudera.org:8080/#/c/10855/10/be/src/runtime/query-state.cc@379
PS10, Line 379: (resp.status());
should we backoff?


http://gerrit.cloudera.org:8080/#/c/10855/10/be/src/runtime/runtime-state.cc
File be/src/runtime/runtime-state.cc:

http://gerrit.cloudera.org:8080/#/c/10855/10/be/src/runtime/runtime-state.cc@202
PS10, Line 202:     (*new_errors)[v.first] = v.second;
the method doc says that new_errors is cleared, but it's actually written into without a prior clear


http://gerrit.cloudera.org:8080/#/c/10855/10/be/src/service/control-service.cc
File be/src/service/control-service.cc:

PS10: 
This is a general krpc-in-Impala question: I can't find where you set up authorization for the KRPC services so that only the impala service user can connect to them. Is there a risk of an unauthorized (but authenticated) client principal connecting to this service?


http://gerrit.cloudera.org:8080/#/c/10855/10/be/src/util/uid-util.h
File be/src/util/uid-util.h:

http://gerrit.cloudera.org:8080/#/c/10855/10/be/src/util/uid-util.h@79
PS10, Line 79:   TUniqueId result;
worth DCHECKs here that the fields are set by calling uid_pb.IsInitialized()?


http://gerrit.cloudera.org:8080/#/c/10855/10/bin/impala-config.sh
File bin/impala-config.sh:

http://gerrit.cloudera.org:8080/#/c/10855/10/bin/impala-config.sh@562
PS10, Line 562: export HBASE_CONF_DIR="$IMPALA_FE_DIR/src/test/resources"
why's this necessary? Can we change cmake to invoke it from the full path instead?


http://gerrit.cloudera.org:8080/#/c/10855/10/tests/custom_cluster/test_rpc_exception.py
File tests/custom_cluster/test_rpc_exception.py:

http://gerrit.cloudera.org:8080/#/c/10855/10/tests/custom_cluster/test_rpc_exception.py@97
PS10, Line 97:   @CustomClusterTestSuite.with_args("--status_report_interval=1 \
can we change this flag to be in millis instead of seconds? Or do we advertise its existence to users so it would be a breaking change to rename it? once a second isn't much of a stress



-- 
To view, visit http://gerrit.cloudera.org:8080/10855
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7638583b433dcac066b87198e448743d90415ebe
Gerrit-Change-Number: 10855
Gerrit-PatchSet: 11
Gerrit-Owner: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Wed, 22 Aug 2018 00:15:24 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-7213: Port ReportExecStatus() RPC to use KRPC

Posted by "Michael Ho (Code Review)" <ge...@cloudera.org>.
Hello Sailesh Mukil, Todd Lipcon, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/10855

to look at the new patch set (#8).

Change subject: IMPALA-7213: Port ReportExecStatus() RPC to use KRPC
......................................................................

IMPALA-7213: Port ReportExecStatus() RPC to use KRPC

This change converts ReportExecStatus() RPC from thrift
based RPC to KRPC. This is done in part of the preparation
for fixing IMPALA-2990 as we can take advantage of TCP connection
multiplexing in KRPC to avoid overwhelming the coordinator
with too many connections by reducing the number of TCP connection
to one for each executor.

This patch also introduces a new service pool for all query execution
control related RPCs in the future so that control commands from
coordinators aren't blocked by long-running DataStream services' RPCs.
The majority of this patch is mechanical conversion of some Thrift
structures used in ReportExecStatus() RPC to Protobuf. Note that the
runtime profile is still retained as a Thrift structure as Impala
clients will still fetch query profiles using Thrift RPCs. This also
avoids duplicating the serialization implementation in both Thrift
and Protobuf for the runtime profile. The Thrift runtime profiles
are serialized and sent as a sidecar in ReportExecStatus() RPC.

Change-Id: I7638583b433dcac066b87198e448743d90415ebe
---
M be/src/benchmarks/expr-benchmark.cc
M be/src/catalog/catalog-util.cc
M be/src/exec/data-sink.cc
M be/src/exec/data-sink.h
M be/src/exec/hbase-table-sink.cc
M be/src/exec/hdfs-parquet-table-writer.cc
M be/src/exec/hdfs-parquet-table-writer.h
M be/src/exec/hdfs-table-sink.cc
M be/src/exec/hdfs-table-writer.cc
M be/src/exec/hdfs-table-writer.h
M be/src/kudu/rpc/rpc_context.cc
M be/src/kudu/rpc/rpc_context.h
M be/src/rpc/jni-thrift-util.h
M be/src/rpc/thrift-util-test.cc
M be/src/rpc/thrift-util.h
M be/src/runtime/backend-client.h
M be/src/runtime/coordinator-backend-state.cc
M be/src/runtime/coordinator-backend-state.h
M be/src/runtime/coordinator.cc
M be/src/runtime/coordinator.h
M be/src/runtime/dml-exec-state.cc
M be/src/runtime/dml-exec-state.h
M be/src/runtime/exec-env.cc
M be/src/runtime/exec-env.h
M be/src/runtime/fragment-instance-state.cc
M be/src/runtime/fragment-instance-state.h
M be/src/runtime/query-state.cc
M be/src/runtime/query-state.h
M be/src/runtime/runtime-state.cc
M be/src/runtime/runtime-state.h
M be/src/runtime/test-env.cc
M be/src/scheduling/admission-controller.cc
M be/src/scheduling/scheduler-test-util.cc
M be/src/service/CMakeLists.txt
M be/src/service/client-request-state.cc
M be/src/service/client-request-state.h
A be/src/service/control-service.cc
A be/src/service/control-service.h
M be/src/service/data-stream-service.cc
M be/src/service/data-stream-service.h
M be/src/service/impala-internal-service.cc
M be/src/service/impala-internal-service.h
M be/src/service/impala-server.cc
M be/src/service/impala-server.h
M be/src/testutil/in-process-servers.cc
M be/src/util/container-util.h
A be/src/util/error-util-internal.h
M be/src/util/error-util-test.cc
M be/src/util/error-util.cc
M be/src/util/error-util.h
M be/src/util/runtime-profile.cc
M be/src/util/uid-util.h
M bin/impala-config.sh
M common/protobuf/CMakeLists.txt
M common/protobuf/common.proto
A common/protobuf/control_service.proto
M common/protobuf/data_stream_service.proto
M common/protobuf/row_batch.proto
M common/protobuf/rpc_test.proto
M common/thrift/ImpalaInternalService.thrift
M tests/custom_cluster/test_rpc_exception.py
M tests/failure/test_failpoints.py
62 files changed, 1,096 insertions(+), 666 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/55/10855/8
-- 
To view, visit http://gerrit.cloudera.org:8080/10855
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I7638583b433dcac066b87198e448743d90415ebe
Gerrit-Change-Number: 10855
Gerrit-PatchSet: 8
Gerrit-Owner: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[Impala-ASF-CR] IMPALA-7213: Port ReportExecStatus() RPC to use KRPC

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/10855 )

Change subject: IMPALA-7213: Port ReportExecStatus() RPC to use KRPC
......................................................................


Patch Set 7:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/10855/7/tests/failure/test_failpoints.py
File tests/failure/test_failpoints.py:

http://gerrit.cloudera.org:8080/#/c/10855/7/tests/failure/test_failpoints.py@200
PS7, Line 200: \
flake8: E502 the backslash is redundant between brackets


http://gerrit.cloudera.org:8080/#/c/10855/7/tests/failure/test_failpoints.py@202
PS7, Line 202: :
flake8: E231 missing whitespace after ':'



-- 
To view, visit http://gerrit.cloudera.org:8080/10855
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7638583b433dcac066b87198e448743d90415ebe
Gerrit-Change-Number: 10855
Gerrit-PatchSet: 7
Gerrit-Owner: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Wed, 08 Aug 2018 17:30:22 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-7213: Port ReportExecStatus() RPC to use KRPC

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/10855 )

Change subject: IMPALA-7213: Port ReportExecStatus() RPC to use KRPC
......................................................................


Patch Set 3:

Build Failed 

https://jenkins.impala.io/job/gerrit-code-review-checks/44/ : Initial code review checks failed. See linked job for details on the failure.


-- 
To view, visit http://gerrit.cloudera.org:8080/10855
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7638583b433dcac066b87198e448743d90415ebe
Gerrit-Change-Number: 10855
Gerrit-PatchSet: 3
Gerrit-Owner: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>
Gerrit-Comment-Date: Wed, 25 Jul 2018 00:33:45 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-7213, IMPALA-7241: Port ReportExecStatus() RPC to use KRPC

Posted by "Michael Ho (Code Review)" <ge...@cloudera.org>.
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/10855 )

Change subject: IMPALA-7213, IMPALA-7241: Port ReportExecStatus() RPC to use KRPC
......................................................................


Patch Set 15:

(6 comments)

http://gerrit.cloudera.org:8080/#/c/10855/14/be/src/runtime/coordinator-backend-state.cc
File be/src/runtime/coordinator-backend-state.cc:

http://gerrit.cloudera.org:8080/#/c/10855/14/be/src/runtime/coordinator-backend-state.cc@275
PS14, Line 275:   // Hold the exec_summary's lock to avoid exposing it half-way through
> Can we document the lock order in coordinator.h (which already references E
Done


http://gerrit.cloudera.org:8080/#/c/10855/14/be/src/runtime/fragment-instance-state.h
File be/src/runtime/fragment-instance-state.h:

http://gerrit.cloudera.org:8080/#/c/10855/14/be/src/runtime/fragment-instance-state.h@177
PS14, Line 177: Monotonically 
> Typo: Monotonically
Done


http://gerrit.cloudera.org:8080/#/c/10855/14/be/src/runtime/fragment-instance-state.h@179
PS14, Line 179:   int64_t report_seq_no_ = 0;
> I imagine you already thought about this and concluded that overflows were 
Done


http://gerrit.cloudera.org:8080/#/c/10855/14/be/src/testutil/in-process-servers.cc
File be/src/testutil/in-process-servers.cc:

http://gerrit.cloudera.org:8080/#/c/10855/14/be/src/testutil/in-process-servers.cc@51
PS14, Line 51:  FLAGS_krpc_port = 
> I guess we can't set this to 0 to automatically choose an ephemeral port? W
Some tests were failing without this. I don't recall which one. Apparently, something is sourcing FLAGS_krpc_port so we have to set it to be consistent.


http://gerrit.cloudera.org:8080/#/c/10855/14/be/src/testutil/in-process-servers.cc@51
PS14, Line 51: ;
> Extra semicolon
Done


http://gerrit.cloudera.org:8080/#/c/10855/13/common/protobuf/control_service.proto
File common/protobuf/control_service.proto:

http://gerrit.cloudera.org:8080/#/c/10855/13/common/protobuf/control_service.proto@50
PS13, Line 50: }
> Yeah, I think the rename makes sense to do, though not a big deal, and othe
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/10855
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7638583b433dcac066b87198e448743d90415ebe
Gerrit-Change-Number: 10855
Gerrit-PatchSet: 15
Gerrit-Owner: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Michal Ostrowski <mo...@cloudera.com>
Gerrit-Reviewer: Thomas Marshall <th...@cmu.edu>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Wed, 10 Oct 2018 08:11:13 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-7213: Port ReportExecStatus() RPC to use KRPC

Posted by "Sailesh Mukil (Code Review)" <ge...@cloudera.org>.
Sailesh Mukil has posted comments on this change. ( http://gerrit.cloudera.org:8080/10855 )

Change subject: IMPALA-7213: Port ReportExecStatus() RPC to use KRPC
......................................................................


Patch Set 4:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/10855/4//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/10855/4//COMMIT_MSG@16
PS4, Line 16: This patch also introduces a new service pool for all query execution
            : control related RPCs in the future so that control commands from
            : coordinators aren't blocked by long-running DataStream services' RPCs.
> do we want to consider the ability to put these on separate TCP connections
+1 for this.

I guess another way to do this would be to slightly modify how ConnectionId::Equals() works, which today matches the remote_ Sockaddr and the hostname_, and user credentials only:
https://github.com/apache/kudu/blob/master/src/kudu/rpc/connection_id.cc#L70-L74

We can change it to include another field to match based on the type of RPC (call it 'proxy_hash_' or something). This can be done by making sure that different Proxy objects get different conn_id_ fields by initializing their ConnectionIds with these unique 'proxy_hash_' fields.
https://github.com/apache/kudu/blob/master/src/kudu/rpc/proxy.cc#L68

This way FindConnection() would only find TCP connections meant to be used by that Proxy():
https://github.com/apache/kudu/blob/master/src/kudu/rpc/reactor.cc#L489


http://gerrit.cloudera.org:8080/#/c/10855/4/be/src/service/control-service.cc
File be/src/service/control-service.cc:

http://gerrit.cloudera.org:8080/#/c/10855/4/be/src/service/control-service.cc@72
PS4, Line 72:   // TODO: implement something more efficient here, we're currently
            :   // acquiring/releasing the map lock and doing a map lookup for
            :   // every report (assign each query a local int32_t id and use that to index into a
            :   // vector of ClientRequestStates, w/o lookup or locking?)
> seems like an easy fix here is to use a RWLock since we expect that queries
I'd done an analysis on using a R/W lock for the ClientRequestState map earlier and found that it could starve writers pretty badly, which means that new queries could be starved on admittance, badly affecting user experience: IMPALA-4456

In any case, I've sharded the map as part of the above JIRA and this comment predates my patch. So I don't think this is as big an issue anymore, and even less so after IMPALA-4063. I'd say we could get rid of this TODO.



-- 
To view, visit http://gerrit.cloudera.org:8080/10855
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7638583b433dcac066b87198e448743d90415ebe
Gerrit-Change-Number: 10855
Gerrit-PatchSet: 4
Gerrit-Owner: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Wed, 25 Jul 2018 22:08:32 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-7213: Port ReportExecStatus() RPC to use KRPC

Posted by "Sailesh Mukil (Code Review)" <ge...@cloudera.org>.
Sailesh Mukil has posted comments on this change. ( http://gerrit.cloudera.org:8080/10855 )

Change subject: IMPALA-7213: Port ReportExecStatus() RPC to use KRPC
......................................................................


Patch Set 4:

(16 comments)

http://gerrit.cloudera.org:8080/#/c/10855/4/be/src/runtime/coordinator.h
File be/src/runtime/coordinator.h:

http://gerrit.cloudera.org:8080/#/c/10855/4/be/src/runtime/coordinator.h@31
PS4, Line 31: #include "kudu/util/slice.h"
Not needed?


http://gerrit.cloudera.org:8080/#/c/10855/4/be/src/runtime/exec-env.h
File be/src/runtime/exec-env.h:

http://gerrit.cloudera.org:8080/#/c/10855/4/be/src/runtime/exec-env.h@136
PS4, Line 136: DataStreamService* data_svc() const { return data_svc_.get(); }
Not used, we can remove this.


http://gerrit.cloudera.org:8080/#/c/10855/2/be/src/runtime/query-state.cc
File be/src/runtime/query-state.cc:

http://gerrit.cloudera.org:8080/#/c/10855/2/be/src/runtime/query-state.cc@290
PS2, Line 290:     query_ctx().coord_address.hostname, &proxy);
> We do retry when the server is too busy. There is currently no timeout for 
I think I'm missing something here. Where is the code that does the retry?

Also, if there's no timeout, wouldn't this hang if the coordinator node goes offline, since we wouldn't get a "Connection timed out" error?


http://gerrit.cloudera.org:8080/#/c/10855/4/be/src/service/client-request-state.h
File be/src/service/client-request-state.h:

http://gerrit.cloudera.org:8080/#/c/10855/4/be/src/service/client-request-state.h@32
PS4, Line 32: #include "gen-cpp/control_service.pb.h"
Not needed. Forward declaring ReportExecStatusRequestPB should suffice.


http://gerrit.cloudera.org:8080/#/c/10855/4/be/src/service/client-request-state.h@35
PS4, Line 35: #include "gen-cpp/RuntimeProfile_types.h"
Not needed. Forward declaring TRuntimeProfileTree should suffice.


http://gerrit.cloudera.org:8080/#/c/10855/4/be/src/service/control-service.h
File be/src/service/control-service.h:

http://gerrit.cloudera.org:8080/#/c/10855/4/be/src/service/control-service.h@23
PS4, Line 23: #include "common/status.h"
            : #include "runtime/mem-tracker.h"
Not required. You can forward declare 'Status', 'MemTracker' and 'MetricGroup' instead, and include "runtime/mem-tracker.h" in the .cc file.

I just checked and the same applies to data-stream-service.h


http://gerrit.cloudera.org:8080/#/c/10855/4/be/src/service/control-service.h@34
PS4, Line 34: class ImpalaServer;
Not required.


http://gerrit.cloudera.org:8080/#/c/10855/4/be/src/service/control-service.h@53
PS4, Line 53:    /// Reference to the singleton ImpalaServer object. Not owned.
            :    ImpalaServer* impala_server_ = nullptr;
This is probably not needed. You can get the ImpalaServer using:
ExecEnv::GetInstances()->impala_server()


http://gerrit.cloudera.org:8080/#/c/10855/4/be/src/service/control-service.cc
File be/src/service/control-service.cc:

http://gerrit.cloudera.org:8080/#/c/10855/4/be/src/service/control-service.cc@60
PS4, Line 60: num_svc_threads
Was there any noticeable slowdown on stress workloads since we now have only a limited number of threads that can process reports in parallel, vs. before where we could have an unbounded number of virtual threads with Thrift?

I'm guessing not, since parallelism would be bounded by the number of physical cores, but just thought I'd check.


http://gerrit.cloudera.org:8080/#/c/10855/4/be/src/service/control-service.cc@71
PS4, Line 71: 
Shouldn't we add:
FAULT_INJECTION_RPC_DELAY(RPC_REPORTEXECSTATUS); ?


http://gerrit.cloudera.org:8080/#/c/10855/4/be/src/service/control-service.cc@111
PS4, Line 111: dummy
nit: empty


http://gerrit.cloudera.org:8080/#/c/10855/2/be/src/util/error-util-internal.h
File be/src/util/error-util-internal.h:

http://gerrit.cloudera.org:8080/#/c/10855/2/be/src/util/error-util-internal.h@26
PS2, Line 26: /// Factor out the following structures from 'error-util.h' to prevent circular dependency
> There is circular dependency between control_service.pb.h and some files in
Just curious but what was the circular dependency? Is there a way to get rid of it using forward declares so that this new file isn't needed?


http://gerrit.cloudera.org:8080/#/c/10855/4/be/src/util/error-util.cc
File be/src/util/error-util.cc:

http://gerrit.cloudera.org:8080/#/c/10855/4/be/src/util/error-util.cc@144
PS4, Line 144: for (auto iter : entry.messages()) *stream << iter << "\n";
Formatting. Also, 'iter' isn't an iterator, it's the element itself, so we'd prefer to call it 'msg' or something similar.


http://gerrit.cloudera.org:8080/#/c/10855/4/be/src/util/error-util.cc@170
PS4, Line 170: for (auto iter : entry.messages()) target->add_messages(iter);
Formatting, we should avoid single line for loops.


http://gerrit.cloudera.org:8080/#/c/10855/4/be/src/util/error-util.cc@194
PS4, Line 194: if (target.messages_size() == 0) target.add_messages(e.msg());
Formatting


http://gerrit.cloudera.org:8080/#/c/10855/4/be/src/util/runtime-profile.h
File be/src/util/runtime-profile.h:

http://gerrit.cloudera.org:8080/#/c/10855/4/be/src/util/runtime-profile.h@28
PS4, Line 28: #include "kudu/util/faststring.h"
Forward declare 'faststring'



-- 
To view, visit http://gerrit.cloudera.org:8080/10855
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7638583b433dcac066b87198e448743d90415ebe
Gerrit-Change-Number: 10855
Gerrit-PatchSet: 4
Gerrit-Owner: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Thu, 26 Jul 2018 05:09:52 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-7213, IMPALA-7241: Port ReportExecStatus() RPC to use KRPC

Posted by "Michael Ho (Code Review)" <ge...@cloudera.org>.
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/10855 )

Change subject: IMPALA-7213, IMPALA-7241: Port ReportExecStatus() RPC to use KRPC
......................................................................


Patch Set 18: Code-Review+1

Carry +1


-- 
To view, visit http://gerrit.cloudera.org:8080/10855
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7638583b433dcac066b87198e448743d90415ebe
Gerrit-Change-Number: 10855
Gerrit-PatchSet: 18
Gerrit-Owner: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Michal Ostrowski <mo...@cloudera.com>
Gerrit-Reviewer: Thomas Marshall <th...@cmu.edu>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Sun, 14 Oct 2018 02:21:59 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-7213: Port ReportExecStatus() RPC to use KRPC

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/10855 )

Change subject: IMPALA-7213: Port ReportExecStatus() RPC to use KRPC
......................................................................


Patch Set 4:

Build Failed 

https://jenkins.impala.io/job/gerrit-code-review-checks/45/ : Initial code review checks failed. See linked job for details on the failure.


-- 
To view, visit http://gerrit.cloudera.org:8080/10855
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7638583b433dcac066b87198e448743d90415ebe
Gerrit-Change-Number: 10855
Gerrit-PatchSet: 4
Gerrit-Owner: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>
Gerrit-Comment-Date: Wed, 25 Jul 2018 00:54:44 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-7213, IMPALA-7241: Port ReportExecStatus() RPC to use KRPC

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/10855 )

Change subject: IMPALA-7213, IMPALA-7241: Port ReportExecStatus() RPC to use KRPC
......................................................................


Patch Set 13:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/714/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/10855
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7638583b433dcac066b87198e448743d90415ebe
Gerrit-Change-Number: 10855
Gerrit-PatchSet: 13
Gerrit-Owner: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Michal Ostrowski <mo...@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>
Gerrit-Reviewer: Thomas Marshall <th...@cmu.edu>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Wed, 19 Sep 2018 20:05:24 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-7213, IMPALA-7241: Port ReportExecStatus() RPC to use KRPC

Posted by "Anonymous Coward (Code Review)" <ge...@cloudera.org>.
Anonymous Coward #431 has posted comments on this change. ( http://gerrit.cloudera.org:8080/10855 )

Change subject: IMPALA-7213, IMPALA-7241: Port ReportExecStatus() RPC to use KRPC
......................................................................


Patch Set 12:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/10855/12/common/protobuf/CMakeLists.txt
File common/protobuf/CMakeLists.txt:

http://gerrit.cloudera.org:8080/#/c/10855/12/common/protobuf/CMakeLists.txt@47
PS12, Line 47: KRPC_GENERATE(CONTROL_SVC_PROTO_SRCS CONTROL_SVC_PROTO_HDRS
You could factor out the replication of the protobuf generation by putting in a loop:

cmake_minimum_required(VERSION 2.6)

set(PROTOBUF_OUTPUT_DIR ${CMAKE_SOURCE_DIR}/be/generated-sources/gen-cpp/)
add_custom_target(proto-deps DEPENDS token_proto rpc_header_proto data_stream_svc_proto)

foreach(pb_src COMMON ROW_BATCH CONTROL_SERVICE_PROTO)
  string(TOLOWER "${pb_src}_proto"  "${pb_src}_PROTO_TGT")
  string(TOLOWER "${pb_src}.proto" "${pb_src}_PROTO_SRC")
  PROTOBUF_GENERATE_CPP("${pb_src}_PROTO_SRCS" "${pb_src}_PROTO_HDRS" "${pb_src}_PROTO_TGTS"
	SOURCE_ROOT ${CMAKE_CURRENT_SOURCE_DIR}
	BINARY_ROOT ${PROTOBUF_OUTPUT_DIR}
	PROTO_FILES "${${pb_src}_PROTO_SRC}")
  add_custom_target("${${pb_src}_PROTO_TGT}" DEPENDS "${${pb_src}_PROTO_TGTS}")
  set("${pb_src}_PROTO_SRCS}" "${${pb_src}_PROTO_SRCS}" PARENT_SCOPE)  
  add_dependencies(proto-deps "${${pb_src}_PROTO_TGT}")
endforeach(pb_src)


KRPC_GENERATE(DATA_STREAM_SVC_PROTO_SRCS DATA_STREAM_SVC_PROTO_HDRS
  DATA_STREAM_SVC_PROTO_TGTS
  SOURCE_ROOT ${CMAKE_CURRENT_SOURCE_DIR}
  BINARY_ROOT ${PROTOBUF_OUTPUT_DIR}
  PROTO_FILES data_stream_service.proto)
add_custom_target(data_stream_svc_proto DEPENDS ${DATA_STREAM_SVC_PROTO_TGTS})
set(DATA_STREAM_SVC_PROTO_SRCS ${DATA_STREAM_SVC_PROTO_SRCS} PARENT_SCOPE)

KRPC_GENERATE(RPC_TEST_PROTO_SRCS RPC_TEST_PROTO_HDRS
  RPC_TEST_PROTO_TGTS
  SOURCE_ROOT ${CMAKE_CURRENT_SOURCE_DIR}
  BINARY_ROOT ${PROTOBUF_OUTPUT_DIR}
  PROTO_FILES rpc_test.proto)
add_custom_target(rpc_test_proto_tgt DEPENDS ${RPC_TEST_PROTO_TGTS})
set(RPC_TEST_PROTO_SRCS ${RPC_TEST_PROTO_SRCS} PARENT_SCOPE)



-- 
To view, visit http://gerrit.cloudera.org:8080/10855
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7638583b433dcac066b87198e448743d90415ebe
Gerrit-Change-Number: 10855
Gerrit-PatchSet: 12
Gerrit-Owner: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward #431
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Fri, 07 Sep 2018 22:14:32 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-7213, IMPALA-7241: Port ReportExecStatus() RPC to use KRPC

Posted by "Michael Ho (Code Review)" <ge...@cloudera.org>.
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/10855 )

Change subject: IMPALA-7213, IMPALA-7241: Port ReportExecStatus() RPC to use KRPC
......................................................................


Patch Set 10:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/10855/10/be/src/runtime/coordinator-backend-state.cc
File be/src/runtime/coordinator-backend-state.cc:

http://gerrit.cloudera.org:8080/#/c/10855/10/be/src/runtime/coordinator-backend-state.cc@265
PS10, Line 265:       DCHECK(!instance_stats->done_ ||
              :           report_version == instance_stats->last_report_version_);
> isn't it possible (though unlikely) that you have some old report sitting i
Yup. That's actually a DCHECK which Sailesh ran into when testing IMPALA-4063. Removed in patch 11.


http://gerrit.cloudera.org:8080/#/c/10855/10/be/src/runtime/query-state.cc
File be/src/runtime/query-state.cc:

http://gerrit.cloudera.org:8080/#/c/10855/10/be/src/runtime/query-state.cc@287
PS10, Line 287: ReportVersion
> if I understand the threading correctly, there are two entry points for sen
To answer the question about your second race, I think it's impossible today because we explicitly turn off the reporting thread before sending the final profile (https://github.com/apache/impala/blob/master/be/src/runtime/fragment-instance-state.cc#L479-L496)

The first race is also kind of impossible today because we wait for the report thread to exit before sending the final report but yes, it's possible that the RPC in the report thread fails somehow and we just drop those error statuses on the floor.



-- 
To view, visit http://gerrit.cloudera.org:8080/10855
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7638583b433dcac066b87198e448743d90415ebe
Gerrit-Change-Number: 10855
Gerrit-PatchSet: 10
Gerrit-Owner: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Wed, 22 Aug 2018 01:58:29 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-7213: Port ReportExecStatus() RPC to use KRPC

Posted by "Michael Ho (Code Review)" <ge...@cloudera.org>.
Hello Sailesh Mukil, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/10855

to look at the new patch set (#4).

Change subject: IMPALA-7213: Port ReportExecStatus() RPC to use KRPC
......................................................................

IMPALA-7213: Port ReportExecStatus() RPC to use KRPC

This change converts ReportExecStatus() RPC from thrift
based RPC to KRPC. This is done in part of the preparation
for fixing IMPALA-2990 as we can take advantage of TCP connection
multiplexing in KRPC to avoid overwhelming the coordinator
with too many connections by reducing the number of TCP connection
to one for each executor.

This patch also introduces a new service pool for all query execution
control related RPCs in the future so that control commands from
coordinators aren't blocked by long-running DataStream services' RPCs.
The majority of this patch is mechanical conversion of some Thrift
structures used in ReportExecStatus() RPC to Protobuf. Note that the
runtime profile is still retained as a Thrift structure as Impala
clients will still fetch query profiles using Thrift RPCs. This also
avoids duplicating the serialization implementation in both Thrift
and Protobuf for the runtime profile. The Thrift runtime profiles
are serialized and sent as a sidecar in ReportExecStatus() RPC.

Change-Id: I7638583b433dcac066b87198e448743d90415ebe
---
M be/src/benchmarks/expr-benchmark.cc
M be/src/catalog/catalog-util.cc
M be/src/exec/data-sink.cc
M be/src/exec/data-sink.h
M be/src/exec/hbase-table-sink.cc
M be/src/exec/hdfs-parquet-table-writer.cc
M be/src/exec/hdfs-parquet-table-writer.h
M be/src/exec/hdfs-table-sink.cc
M be/src/exec/hdfs-table-writer.cc
M be/src/exec/hdfs-table-writer.h
M be/src/rpc/jni-thrift-util.h
M be/src/rpc/thrift-util-test.cc
M be/src/rpc/thrift-util.h
M be/src/runtime/backend-client.h
M be/src/runtime/coordinator-backend-state.cc
M be/src/runtime/coordinator-backend-state.h
M be/src/runtime/coordinator.cc
M be/src/runtime/coordinator.h
M be/src/runtime/dml-exec-state.cc
M be/src/runtime/dml-exec-state.h
M be/src/runtime/exec-env.cc
M be/src/runtime/exec-env.h
M be/src/runtime/fragment-instance-state.cc
M be/src/runtime/fragment-instance-state.h
M be/src/runtime/query-state.cc
M be/src/runtime/query-state.h
M be/src/runtime/runtime-state.cc
M be/src/runtime/runtime-state.h
M be/src/runtime/test-env.cc
M be/src/scheduling/admission-controller.cc
M be/src/scheduling/scheduler-test-util.cc
M be/src/service/CMakeLists.txt
M be/src/service/client-request-state.cc
M be/src/service/client-request-state.h
A be/src/service/control-service.cc
A be/src/service/control-service.h
M be/src/service/data-stream-service.cc
M be/src/service/impala-internal-service.cc
M be/src/service/impala-internal-service.h
M be/src/service/impala-server.cc
M be/src/service/impala-server.h
M be/src/testutil/in-process-servers.cc
M be/src/util/container-util.h
A be/src/util/error-util-internal.h
M be/src/util/error-util-test.cc
M be/src/util/error-util.cc
M be/src/util/error-util.h
M be/src/util/runtime-profile.cc
M be/src/util/runtime-profile.h
M be/src/util/uid-util.h
M bin/impala-config.sh
M common/protobuf/CMakeLists.txt
M common/protobuf/common.proto
A common/protobuf/control_service.proto
M common/protobuf/data_stream_service.proto
M common/protobuf/row_batch.proto
M common/protobuf/rpc_test.proto
M common/thrift/ImpalaInternalService.thrift
58 files changed, 1,019 insertions(+), 616 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/55/10855/4
-- 
To view, visit http://gerrit.cloudera.org:8080/10855
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I7638583b433dcac066b87198e448743d90415ebe
Gerrit-Change-Number: 10855
Gerrit-PatchSet: 4
Gerrit-Owner: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>

[Impala-ASF-CR] IMPALA-7213, IMPALA-7241: Port ReportExecStatus() RPC to use KRPC

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/10855 )

Change subject: IMPALA-7213, IMPALA-7241: Port ReportExecStatus() RPC to use KRPC
......................................................................


Patch Set 22:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/3402/ DRY_RUN=false


-- 
To view, visit http://gerrit.cloudera.org:8080/10855
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7638583b433dcac066b87198e448743d90415ebe
Gerrit-Change-Number: 10855
Gerrit-PatchSet: 22
Gerrit-Owner: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Michal Ostrowski <mo...@cloudera.com>
Gerrit-Reviewer: Thomas Marshall <th...@cmu.edu>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Thu, 01 Nov 2018 17:17:56 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-7213, IMPALA-7241: Port ReportExecStatus() RPC to use KRPC

Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Todd Lipcon has posted comments on this change. ( http://gerrit.cloudera.org:8080/10855 )

Change subject: IMPALA-7213, IMPALA-7241: Port ReportExecStatus() RPC to use KRPC
......................................................................


Patch Set 9:

(1 comment)

just responding to the comment. will review the new rev later tonight

http://gerrit.cloudera.org:8080/#/c/10855/8/be/src/runtime/coordinator-backend-state.cc
File be/src/runtime/coordinator-backend-state.cc:

http://gerrit.cloudera.org:8080/#/c/10855/8/be/src/runtime/coordinator-backend-state.cc@262
PS8, Line 262:         ProtoToQueryId(instance_exec_status.fragment_instance_id()));
> The reason for doing it here is to handle the case in which we can have 0 t
OK. I'm still not 100% convinced that the sidecars even matter in the first place (just eliminating a copy of a few MB which probably takes on the order of tens of us or less), but the solution you proposed seems to make sense if we take sidecar usage as a requirement.



-- 
To view, visit http://gerrit.cloudera.org:8080/10855
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7638583b433dcac066b87198e448743d90415ebe
Gerrit-Change-Number: 10855
Gerrit-PatchSet: 9
Gerrit-Owner: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Thu, 09 Aug 2018 00:19:03 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-7213, IMPALA-7241: Port ReportExecStatus() RPC to use KRPC

Posted by "Michael Ho (Code Review)" <ge...@cloudera.org>.
Hello Sailesh Mukil, Todd Lipcon, Impala Public Jenkins, Dan Hecht, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/10855

to look at the new patch set (#11).

Change subject: IMPALA-7213, IMPALA-7241: Port ReportExecStatus() RPC to use KRPC
......................................................................

IMPALA-7213, IMPALA-7241: Port ReportExecStatus() RPC to use KRPC

This change converts ReportExecStatus() RPC from thrift
based RPC to KRPC. This is done in part of the preparation
for fixing IMPALA-2990 as we can take advantage of TCP connection
multiplexing in KRPC to avoid overwhelming the coordinator
with too many connections by reducing the number of TCP connection
to one for each executor.

This patch also introduces a new service pool for all query execution
control related RPCs in the future so that control commands from
coordinators aren't blocked by long-running DataStream services' RPCs.
The majority of this patch is mechanical conversion of some Thrift
structures used in ReportExecStatus() RPC to Protobuf. Note that the
runtime profile is still retained as a Thrift structure as Impala
clients will still fetch query profiles using Thrift RPCs. This also
avoids duplicating the serialization implementation in both Thrift
and Protobuf for the runtime profile. The Thrift runtime profiles
are serialized and sent as a sidecar in ReportExecStatus() RPC.

This patch also fixes IMPALA-7241 which may lead to duplicated
dml stats being applied. The fix is by adding a monotonically
increasing version number for fragment instances' reports. The
coordinator will ignore any report smaller than or equal to the
version in the last report.

Testing done: core build. Added some targeted test cases for profile
serialization failure and RPC retry.

Change-Id: I7638583b433dcac066b87198e448743d90415ebe
---
M be/src/benchmarks/expr-benchmark.cc
M be/src/catalog/catalog-util.cc
M be/src/exec/data-sink.cc
M be/src/exec/data-sink.h
M be/src/exec/hbase-table-sink.cc
M be/src/exec/hdfs-parquet-table-writer.cc
M be/src/exec/hdfs-parquet-table-writer.h
M be/src/exec/hdfs-table-sink.cc
M be/src/exec/hdfs-table-writer.cc
M be/src/exec/hdfs-table-writer.h
M be/src/rpc/jni-thrift-util.h
M be/src/rpc/thrift-util-test.cc
M be/src/rpc/thrift-util.h
M be/src/runtime/backend-client.h
M be/src/runtime/coordinator-backend-state.cc
M be/src/runtime/coordinator-backend-state.h
M be/src/runtime/coordinator.cc
M be/src/runtime/coordinator.h
M be/src/runtime/dml-exec-state.cc
M be/src/runtime/dml-exec-state.h
M be/src/runtime/exec-env.cc
M be/src/runtime/exec-env.h
M be/src/runtime/fragment-instance-state.cc
M be/src/runtime/fragment-instance-state.h
M be/src/runtime/query-state.cc
M be/src/runtime/query-state.h
M be/src/runtime/runtime-state.cc
M be/src/runtime/runtime-state.h
M be/src/runtime/test-env.cc
M be/src/scheduling/admission-controller.cc
M be/src/scheduling/scheduler-test-util.cc
M be/src/service/CMakeLists.txt
M be/src/service/client-request-state.cc
M be/src/service/client-request-state.h
A be/src/service/control-service.cc
A be/src/service/control-service.h
M be/src/service/data-stream-service.cc
M be/src/service/data-stream-service.h
M be/src/service/impala-internal-service.cc
M be/src/service/impala-internal-service.h
M be/src/service/impala-server.cc
M be/src/service/impala-server.h
M be/src/testutil/in-process-servers.cc
M be/src/util/container-util.h
A be/src/util/error-util-internal.h
M be/src/util/error-util-test.cc
M be/src/util/error-util.cc
M be/src/util/error-util.h
M be/src/util/runtime-profile.cc
M be/src/util/uid-util.h
M bin/impala-config.sh
M common/protobuf/CMakeLists.txt
M common/protobuf/common.proto
A common/protobuf/control_service.proto
M common/protobuf/data_stream_service.proto
M common/protobuf/row_batch.proto
M common/protobuf/rpc_test.proto
M common/thrift/ImpalaInternalService.thrift
M tests/custom_cluster/test_rpc_exception.py
M tests/failure/test_failpoints.py
60 files changed, 1,106 insertions(+), 670 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/55/10855/11
-- 
To view, visit http://gerrit.cloudera.org:8080/10855
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I7638583b433dcac066b87198e448743d90415ebe
Gerrit-Change-Number: 10855
Gerrit-PatchSet: 11
Gerrit-Owner: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[Impala-ASF-CR] IMPALA-7213, IMPALA-7241: Port ReportExecStatus() RPC to use KRPC

Posted by "Michael Ho (Code Review)" <ge...@cloudera.org>.
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/10855 )

Change subject: IMPALA-7213, IMPALA-7241: Port ReportExecStatus() RPC to use KRPC
......................................................................


Patch Set 9:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/10855/9/be/src/runtime/coordinator-backend-state.cc
File be/src/runtime/coordinator-backend-state.cc:

http://gerrit.cloudera.org:8080/#/c/10855/9/be/src/runtime/coordinator-backend-state.cc@52
PS9, Line 52:  DCHECK_NOTNULL(coord);
This should have been moved to line 48.



-- 
To view, visit http://gerrit.cloudera.org:8080/10855
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7638583b433dcac066b87198e448743d90415ebe
Gerrit-Change-Number: 10855
Gerrit-PatchSet: 9
Gerrit-Owner: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Thu, 09 Aug 2018 01:49:54 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-7213, IMPALA-7241: Port ReportExecStatus() RPC to use KRPC

Posted by "Michael Ho (Code Review)" <ge...@cloudera.org>.
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/10855 )

Change subject: IMPALA-7213, IMPALA-7241: Port ReportExecStatus() RPC to use KRPC
......................................................................


Patch Set 12:

(9 comments)

http://gerrit.cloudera.org:8080/#/c/10855/12/be/src/runtime/coordinator-backend-state.cc
File be/src/runtime/coordinator-backend-state.cc:

http://gerrit.cloudera.org:8080/#/c/10855/12/be/src/runtime/coordinator-backend-state.cc@497
PS12, Line 497:   last_report_seq_no_ = exec_status.report_seq_no();
> Can you add a CHECK here that it isn't decreasing? It's verified above, but
Done. Used a DCHECK instead for documenting the invariant.


http://gerrit.cloudera.org:8080/#/c/10855/12/be/src/runtime/coordinator-backend-state.cc@508
PS12, Line 508:   lock_guard<SpinLock> l1(exec_summary->lock);
> is this critical section extending farther than it needs to? should it end 
Fixed.


http://gerrit.cloudera.org:8080/#/c/10855/12/be/src/runtime/fragment-instance-state.h
File be/src/runtime/fragment-instance-state.h:

http://gerrit.cloudera.org:8080/#/c/10855/12/be/src/runtime/fragment-instance-state.h@128
PS12, Line 128:   int32_t ReportSeqNo() {
> nit: maybe rename to AdvanceReportSeqNo or NextReportSeqNo or something so 
Done


http://gerrit.cloudera.org:8080/#/c/10855/12/be/src/runtime/fragment-instance-state.h@178
PS12, Line 178:   DFAKE_MUTEX(report_seq_no_lock_);
> I think this fake mutex should be held around the whole function that gener
Good point. Moved to SendReport() instead.


http://gerrit.cloudera.org:8080/#/c/10855/12/be/src/runtime/query-state.cc
File be/src/runtime/query-state.cc:

http://gerrit.cloudera.org:8080/#/c/10855/12/be/src/runtime/query-state.cc@306
PS12, Line 306: REPORT_EXEC_STATUS
> hmm, I dont see this one used. did I miss something?
Nice catch. I meant to rename this to REPORT_EXEC_STATUS_PROFILE.


http://gerrit.cloudera.org:8080/#/c/10855/12/common/protobuf/CMakeLists.txt
File common/protobuf/CMakeLists.txt:

http://gerrit.cloudera.org:8080/#/c/10855/12/common/protobuf/CMakeLists.txt@47
PS12, Line 47: KRPC_GENERATE(CONTROL_SVC_PROTO_SRCS CONTROL_SVC_PROTO_HDRS
> You could factor out the replication of the protobuf generation by putting 
Good idea. Adopted something similar to your suggested code.


http://gerrit.cloudera.org:8080/#/c/10855/12/common/protobuf/common.proto
File common/protobuf/common.proto:

http://gerrit.cloudera.org:8080/#/c/10855/12/common/protobuf/common.proto@33
PS12, Line 33: int64
> nit: i think fixed64 is more appropriate for these fields since they dont t
Agreed that they don't tend to be small. Does using fixed64 make the encoding slightly faster ?


http://gerrit.cloudera.org:8080/#/c/10855/12/tests/custom_cluster/test_rpc_timeout.py
File tests/custom_cluster/test_rpc_timeout.py:

http://gerrit.cloudera.org:8080/#/c/10855/12/tests/custom_cluster/test_rpc_timeout.py@42
PS12, Line 42:  
> flake8: E251 unexpected spaces around keyword / parameter equals
Done


http://gerrit.cloudera.org:8080/#/c/10855/12/tests/custom_cluster/test_rpc_timeout.py@42
PS12, Line 42:  
> flake8: E251 unexpected spaces around keyword / parameter equals
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/10855
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7638583b433dcac066b87198e448743d90415ebe
Gerrit-Change-Number: 10855
Gerrit-PatchSet: 12
Gerrit-Owner: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Michal Ostrowski <mo...@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Wed, 19 Sep 2018 19:01:20 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-7213: Port ReportExecStatus() RPC to use KRPC

Posted by "Michael Ho (Code Review)" <ge...@cloudera.org>.
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/10855 )

Change subject: IMPALA-7213: Port ReportExecStatus() RPC to use KRPC
......................................................................


Patch Set 4:

(30 comments)

http://gerrit.cloudera.org:8080/#/c/10855/4//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/10855/4//COMMIT_MSG@16
PS4, Line 16: This patch also introduces a new service pool for all query execution
            : control related RPCs in the future so that control commands from
            : coordinators aren't blocked by long-running DataStream services' RPCs.
> yep, that's essentially what I meant. Thanks for explaining it better than 
Thanks for the suggestion. I think it's fine to do it as a follow-up or parallel work on Kudu side. This doesn't necessarily block the merging of this patch IMHO as ReportExecStatus() can already be late today for various reasons such as an overloaded coordinator.


http://gerrit.cloudera.org:8080/#/c/10855/4/be/src/runtime/coordinator.h
File be/src/runtime/coordinator.h:

http://gerrit.cloudera.org:8080/#/c/10855/4/be/src/runtime/coordinator.h@31
PS4, Line 31: #include "kudu/util/slice.h"
> Not needed?
Done


http://gerrit.cloudera.org:8080/#/c/10855/4/be/src/runtime/dml-exec-state.cc
File be/src/runtime/dml-exec-state.cc:

http://gerrit.cloudera.org:8080/#/c/10855/4/be/src/runtime/dml-exec-state.cc@404
PS4, Line 404: void DmlExecState::ToProto(InsertExecStatusPB* dml_status) {
> do you want to clear dml_status first just in case? Similarly for other ToP
Done


http://gerrit.cloudera.org:8080/#/c/10855/4/be/src/runtime/dml-exec-state.cc@455
PS4, Line 455:    if (!kudu_stats->has_num_row_errors()) {
             :       kudu_stats->set_num_row_errors(0);
             :     }
> I don't think this is necessary- it's not illegal to access an "unset" fiel
Done


http://gerrit.cloudera.org:8080/#/c/10855/4/be/src/runtime/exec-env.h
File be/src/runtime/exec-env.h:

http://gerrit.cloudera.org:8080/#/c/10855/4/be/src/runtime/exec-env.h@136
PS4, Line 136: DataStreamService* data_svc() const { return data_svc_.get(); }
> Not used, we can remove this.
Done


http://gerrit.cloudera.org:8080/#/c/10855/4/be/src/runtime/query-state.cc
File be/src/runtime/query-state.cc:

http://gerrit.cloudera.org:8080/#/c/10855/4/be/src/runtime/query-state.cc@268
PS4, Line 268: rpc_controller.AddOutboundSidecar(move(sidecar), &sidecar_idx).ok
> should this be a CHECK_OK?
Done


http://gerrit.cloudera.org:8080/#/c/10855/4/be/src/runtime/query-state.cc@273
PS4, Line 273: "final"
> need a space here
Oops. Done.


http://gerrit.cloudera.org:8080/#/c/10855/4/be/src/runtime/query-state.cc@274
PS4, Line 274: ERROR
> should this be a DFATAL? do you expect this to ever actually happen?
I don't think this is expected to happen very often but it seems more robust to not crash Impala because of failure to serialize the query profile. The query can still run to completion without the profile.


http://gerrit.cloudera.org:8080/#/c/10855/4/be/src/runtime/query-state.cc@283
PS4, Line 283:     req.clear_error_log();
> isnt the error log already clear because this is a new instance?
Done


http://gerrit.cloudera.org:8080/#/c/10855/4/be/src/runtime/query-state.cc@284
PS4, Line 284:     state->GetUnreportedErrors(req.mutable_error_log());
> it's a shame that this mutates the state, making retries a bit more difficu
The new patch creates a single instance of the RPC parameter and reuse it on retry. My understanding is that the RPC layer shouldn't mutate the input RPC request parameter.


http://gerrit.cloudera.org:8080/#/c/10855/4/be/src/runtime/query-state.cc@289
PS4, Line 289: rpc_mgr->GetProxy
> clang-tidy failure: Missing status check.
Done


http://gerrit.cloudera.org:8080/#/c/10855/4/be/src/runtime/query-state.cc@293
PS4, Line 293:   kudu::Status rpc_status = proxy->ReportExecStatus(req, &resp, &rpc_controller);
> do you want any timeout on this RPC?
Yes. Switched back to the behavior of existing code of using FLAGS_backend_client_rpc_timeout_ms. This is not ideal but I guess it's simplest to just preserve the existing behavior and fix IMPALA-2990 in a separate patch.


http://gerrit.cloudera.org:8080/#/c/10855/4/be/src/runtime/query-state.cc@305
PS4, Line 305:     // We can retry the RPC if the payload hasn't been sent yet.
> are these reports idempotent? seems like it shoudl be easy to make them ide
Yes, I believe they are idempotent. So, I was worried about the connection being reset in the middle of transmission so only partial payload was sent. In that case, I think the server side will also drop that connection so it should be fine I suppose. Comments removed.


http://gerrit.cloudera.org:8080/#/c/10855/4/be/src/runtime/query-state.cc@306
PS4, Line 306: RpcMgr::IsServerTooBusy(rpc_controller
> I'm a bit fuzzy on the context of this code, but if we get TOO_BUSY, it see
Restored to the old behavior of retrying for at most 2 times. It will still fail due to IMPALA-2990 which we will work on fixing later (e.g. coordinator cancelling query if certain fragment hasn't sent a report for an extended period of time).


http://gerrit.cloudera.org:8080/#/c/10855/4/be/src/service/client-request-state.h
File be/src/service/client-request-state.h:

http://gerrit.cloudera.org:8080/#/c/10855/4/be/src/service/client-request-state.h@32
PS4, Line 32: #include "gen-cpp/control_service.pb.h"
> Not needed. Forward declaring ReportExecStatusRequestPB should suffice.
Done


http://gerrit.cloudera.org:8080/#/c/10855/4/be/src/service/client-request-state.h@35
PS4, Line 35: #include "gen-cpp/RuntimeProfile_types.h"
> Not needed. Forward declaring TRuntimeProfileTree should suffice.
Done


http://gerrit.cloudera.org:8080/#/c/10855/4/be/src/service/control-service.h
File be/src/service/control-service.h:

http://gerrit.cloudera.org:8080/#/c/10855/4/be/src/service/control-service.h@23
PS4, Line 23: #include "common/status.h"
            : #include "runtime/mem-tracker.h"
> Not required. You can forward declare 'Status', 'MemTracker' and 'MetricGro
Done


http://gerrit.cloudera.org:8080/#/c/10855/4/be/src/service/control-service.h@34
PS4, Line 34: class ImpalaServer;
> Not required.
Done


http://gerrit.cloudera.org:8080/#/c/10855/4/be/src/service/control-service.h@53
PS4, Line 53:    /// Reference to the singleton ImpalaServer object. Not owned.
            :    ImpalaServer* impala_server_ = nullptr;
> This is probably not needed. You can get the ImpalaServer using:
Done


http://gerrit.cloudera.org:8080/#/c/10855/4/be/src/service/control-service.cc
File be/src/service/control-service.cc:

http://gerrit.cloudera.org:8080/#/c/10855/4/be/src/service/control-service.cc@60
PS4, Line 60: num_svc_threads
> Was there any noticeable slowdown on stress workloads since we now have onl
No serious measure on perf yet but yes, there is no point in pushing beyond number of cores.


http://gerrit.cloudera.org:8080/#/c/10855/4/be/src/service/control-service.cc@71
PS4, Line 71: 
> Shouldn't we add:
Done


http://gerrit.cloudera.org:8080/#/c/10855/4/be/src/service/control-service.cc@72
PS4, Line 72:   // TODO: implement something more efficient here, we're currently
            :   // acquiring/releasing the map lock and doing a map lookup for
            :   // every report (assign each query a local int32_t id and use that to index into a
            :   // vector of ClientRequestStates, w/o lookup or locking?)
> Sounds reasonable to me.
TODO removed.


http://gerrit.cloudera.org:8080/#/c/10855/4/be/src/service/control-service.cc@87
PS4, Line 87:     mem_tracker_->Release(rpc_context->GetTransferSize());
            :     rpc_context->RespondSuccess();
> is everything in this function exception-safe? eg I thought Thrift could oc
As far as I know, yes. DeserializeThriftMsg() has a try-catch clause inside to catch exception.


http://gerrit.cloudera.org:8080/#/c/10855/4/be/src/service/control-service.cc@104
PS4, Line 104:     if (LIKELY(sidecar_status.ok())) {
> do you want to warn or even CHECK on this?
I switched to using a CHECK() for any operations with sidecar as that's unexpected but I kept the error handling code for thrift deserialization failure.


http://gerrit.cloudera.org:8080/#/c/10855/4/be/src/service/control-service.cc@111
PS4, Line 111: dummy
> nit: empty
Done


http://gerrit.cloudera.org:8080/#/c/10855/4/be/src/util/container-util.h
File be/src/util/container-util.h:

http://gerrit.cloudera.org:8080/#/c/10855/4/be/src/util/container-util.h@147
PS4, Line 147: void MergeMapValues(const google::protobuf::Map<K, V>& src,
> If you templatized this on the map type instead of the K and V types, you c
Good point. Done.


http://gerrit.cloudera.org:8080/#/c/10855/4/be/src/util/error-util.cc
File be/src/util/error-util.cc:

http://gerrit.cloudera.org:8080/#/c/10855/4/be/src/util/error-util.cc@144
PS4, Line 144: for (auto iter : entry.messages()) *stream << iter << "\n";
> Formatting. Also, 'iter' isn't an iterator, it's the element itself, so we'
Done


http://gerrit.cloudera.org:8080/#/c/10855/4/be/src/util/error-util.cc@170
PS4, Line 170: for (auto iter : entry.messages()) target->add_messages(iter);
> Formatting, we should avoid single line for loops.
Done


http://gerrit.cloudera.org:8080/#/c/10855/4/be/src/util/error-util.cc@194
PS4, Line 194: if (target.messages_size() == 0) target.add_messages(e.msg());
> Formatting
Done


http://gerrit.cloudera.org:8080/#/c/10855/4/be/src/util/runtime-profile.h
File be/src/util/runtime-profile.h:

http://gerrit.cloudera.org:8080/#/c/10855/4/be/src/util/runtime-profile.h@28
PS4, Line 28: #include "kudu/util/faststring.h"
> Forward declare 'faststring'
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/10855
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7638583b433dcac066b87198e448743d90415ebe
Gerrit-Change-Number: 10855
Gerrit-PatchSet: 4
Gerrit-Owner: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Wed, 01 Aug 2018 19:05:01 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-7213, IMPALA-7241: Port ReportExecStatus() RPC to use KRPC

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/10855 )

Change subject: IMPALA-7213, IMPALA-7241: Port ReportExecStatus() RPC to use KRPC
......................................................................


Patch Set 16:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/1016/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/10855
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7638583b433dcac066b87198e448743d90415ebe
Gerrit-Change-Number: 10855
Gerrit-PatchSet: 16
Gerrit-Owner: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Michal Ostrowski <mo...@cloudera.com>
Gerrit-Reviewer: Thomas Marshall <th...@cmu.edu>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Wed, 10 Oct 2018 22:37:53 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-7213, IMPALA-7241: Port ReportExecStatus() RPC to use KRPC

Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Todd Lipcon has posted comments on this change. ( http://gerrit.cloudera.org:8080/10855 )

Change subject: IMPALA-7213, IMPALA-7241: Port ReportExecStatus() RPC to use KRPC
......................................................................


Patch Set 17:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/10855/17/be/src/rpc/rpc-mgr.inline.h
File be/src/rpc/rpc-mgr.inline.h:

http://gerrit.cloudera.org:8080/#/c/10855/17/be/src/rpc/rpc-mgr.inline.h@35
PS17, Line 35: template <typename S, typename P>
hrm, it still feels like we are going through a lot of gymnastics to avoid making a small change to KRPC (adding a service class member in connection id). Do you have a particular resistance to making that change? We can do it on the Impala side first and then port it over to Kudu if that's easier than splitting up the patches.



-- 
To view, visit http://gerrit.cloudera.org:8080/10855
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7638583b433dcac066b87198e448743d90415ebe
Gerrit-Change-Number: 10855
Gerrit-PatchSet: 17
Gerrit-Owner: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Michal Ostrowski <mo...@cloudera.com>
Gerrit-Reviewer: Thomas Marshall <th...@cmu.edu>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Fri, 12 Oct 2018 18:19:20 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-7213, IMPALA-7241: Port ReportExecStatus() RPC to use KRPC

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/10855 )

Change subject: IMPALA-7213, IMPALA-7241: Port ReportExecStatus() RPC to use KRPC
......................................................................


Patch Set 19: Code-Review+2


-- 
To view, visit http://gerrit.cloudera.org:8080/10855
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7638583b433dcac066b87198e448743d90415ebe
Gerrit-Change-Number: 10855
Gerrit-PatchSet: 19
Gerrit-Owner: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Michal Ostrowski <mo...@cloudera.com>
Gerrit-Reviewer: Thomas Marshall <th...@cmu.edu>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Mon, 22 Oct 2018 15:20:59 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-7213, IMPALA-7241: Port ReportExecStatus() RPC to use KRPC

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/10855 )

Change subject: IMPALA-7213, IMPALA-7241: Port ReportExecStatus() RPC to use KRPC
......................................................................


Patch Set 11:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/444/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/10855
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7638583b433dcac066b87198e448743d90415ebe
Gerrit-Change-Number: 10855
Gerrit-PatchSet: 11
Gerrit-Owner: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Wed, 22 Aug 2018 00:44:06 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-7213, IMPALA-7241: Port ReportExecStatus() RPC to use KRPC

Posted by "Michael Ho (Code Review)" <ge...@cloudera.org>.
Michael Ho has removed Sailesh Mukil from this change.  ( http://gerrit.cloudera.org:8080/10855 )

Change subject: IMPALA-7213, IMPALA-7241: Port ReportExecStatus() RPC to use KRPC
......................................................................


Removed reviewer Sailesh Mukil.
-- 
To view, visit http://gerrit.cloudera.org:8080/10855
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: deleteReviewer
Gerrit-Change-Id: I7638583b433dcac066b87198e448743d90415ebe
Gerrit-Change-Number: 10855
Gerrit-PatchSet: 13
Gerrit-Owner: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Michal Ostrowski <mo...@cloudera.com>
Gerrit-Reviewer: Thomas Marshall <th...@cmu.edu>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[Impala-ASF-CR] IMPALA-7213: Port ReportExecStatus() RPC to use KRPC

Posted by "Michael Ho (Code Review)" <ge...@cloudera.org>.
Hello Sailesh Mukil, Todd Lipcon, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/10855

to look at the new patch set (#6).

Change subject: IMPALA-7213: Port ReportExecStatus() RPC to use KRPC
......................................................................

IMPALA-7213: Port ReportExecStatus() RPC to use KRPC

This change converts ReportExecStatus() RPC from thrift
based RPC to KRPC. This is done in part of the preparation
for fixing IMPALA-2990 as we can take advantage of TCP connection
multiplexing in KRPC to avoid overwhelming the coordinator
with too many connections by reducing the number of TCP connection
to one for each executor.

This patch also introduces a new service pool for all query execution
control related RPCs in the future so that control commands from
coordinators aren't blocked by long-running DataStream services' RPCs.
The majority of this patch is mechanical conversion of some Thrift
structures used in ReportExecStatus() RPC to Protobuf. Note that the
runtime profile is still retained as a Thrift structure as Impala
clients will still fetch query profiles using Thrift RPCs. This also
avoids duplicating the serialization implementation in both Thrift
and Protobuf for the runtime profile. The Thrift runtime profiles
are serialized and sent as a sidecar in ReportExecStatus() RPC.

Change-Id: I7638583b433dcac066b87198e448743d90415ebe
---
M be/src/benchmarks/expr-benchmark.cc
M be/src/catalog/catalog-util.cc
M be/src/exec/data-sink.cc
M be/src/exec/data-sink.h
M be/src/exec/hbase-table-sink.cc
M be/src/exec/hdfs-parquet-table-writer.cc
M be/src/exec/hdfs-parquet-table-writer.h
M be/src/exec/hdfs-table-sink.cc
M be/src/exec/hdfs-table-writer.cc
M be/src/exec/hdfs-table-writer.h
M be/src/kudu/rpc/rpc_context.cc
M be/src/kudu/rpc/rpc_context.h
M be/src/rpc/jni-thrift-util.h
M be/src/rpc/thrift-util-test.cc
M be/src/rpc/thrift-util.h
M be/src/runtime/backend-client.h
M be/src/runtime/coordinator-backend-state.cc
M be/src/runtime/coordinator-backend-state.h
M be/src/runtime/coordinator.cc
M be/src/runtime/coordinator.h
M be/src/runtime/dml-exec-state.cc
M be/src/runtime/dml-exec-state.h
M be/src/runtime/exec-env.cc
M be/src/runtime/exec-env.h
M be/src/runtime/fragment-instance-state.cc
M be/src/runtime/fragment-instance-state.h
M be/src/runtime/query-state.cc
M be/src/runtime/query-state.h
M be/src/runtime/runtime-state.cc
M be/src/runtime/runtime-state.h
M be/src/runtime/test-env.cc
M be/src/scheduling/admission-controller.cc
M be/src/scheduling/scheduler-test-util.cc
M be/src/service/CMakeLists.txt
M be/src/service/client-request-state.cc
M be/src/service/client-request-state.h
A be/src/service/control-service.cc
A be/src/service/control-service.h
M be/src/service/data-stream-service.cc
M be/src/service/data-stream-service.h
M be/src/service/impala-internal-service.cc
M be/src/service/impala-internal-service.h
M be/src/service/impala-server.cc
M be/src/service/impala-server.h
M be/src/testutil/in-process-servers.cc
M be/src/util/container-util.h
A be/src/util/error-util-internal.h
M be/src/util/error-util-test.cc
M be/src/util/error-util.cc
M be/src/util/error-util.h
M be/src/util/runtime-profile.cc
M be/src/util/uid-util.h
M bin/impala-config.sh
M common/protobuf/CMakeLists.txt
M common/protobuf/common.proto
A common/protobuf/control_service.proto
M common/protobuf/data_stream_service.proto
M common/protobuf/row_batch.proto
M common/protobuf/rpc_test.proto
M common/thrift/ImpalaInternalService.thrift
M tests/custom_cluster/test_rpc_exception.py
M tests/failure/test_failpoints.py
62 files changed, 1,096 insertions(+), 666 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/55/10855/6
-- 
To view, visit http://gerrit.cloudera.org:8080/10855
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I7638583b433dcac066b87198e448743d90415ebe
Gerrit-Change-Number: 10855
Gerrit-PatchSet: 6
Gerrit-Owner: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[Impala-ASF-CR] IMPALA-7213: Port ReportExecStatus() RPC to use KRPC

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/10855 )

Change subject: IMPALA-7213: Port ReportExecStatus() RPC to use KRPC
......................................................................


Patch Set 6:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/10855/6/tests/failure/test_failpoints.py
File tests/failure/test_failpoints.py:

http://gerrit.cloudera.org:8080/#/c/10855/6/tests/failure/test_failpoints.py@200
PS6, Line 200: \
flake8: E502 the backslash is redundant between brackets


http://gerrit.cloudera.org:8080/#/c/10855/6/tests/failure/test_failpoints.py@202
PS6, Line 202: :
flake8: E231 missing whitespace after ':'



-- 
To view, visit http://gerrit.cloudera.org:8080/10855
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7638583b433dcac066b87198e448743d90415ebe
Gerrit-Change-Number: 10855
Gerrit-PatchSet: 6
Gerrit-Owner: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Wed, 08 Aug 2018 17:28:05 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-7213, IMPALA-7241: Port ReportExecStatus() RPC to use KRPC

Posted by "Michael Ho (Code Review)" <ge...@cloudera.org>.
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/10855 )

Change subject: IMPALA-7213, IMPALA-7241: Port ReportExecStatus() RPC to use KRPC
......................................................................


Patch Set 9:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/10855/8/be/src/runtime/query-state.cc
File be/src/runtime/query-state.cc:

http://gerrit.cloudera.org:8080/#/c/10855/8/be/src/runtime/query-state.cc@373
PS8, Line 373:         MonoDelta::FromMilliseconds(FLAGS_backend_client_rpc_timeout_ms));
> We have some logging already below at line 385.
Actually, mis-interpreted your comment. Will add a LOG(WARNING).



-- 
To view, visit http://gerrit.cloudera.org:8080/10855
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7638583b433dcac066b87198e448743d90415ebe
Gerrit-Change-Number: 10855
Gerrit-PatchSet: 9
Gerrit-Owner: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Thu, 09 Aug 2018 00:49:56 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-7213: Port ReportExecStatus() RPC to use KRPC

Posted by "Sailesh Mukil (Code Review)" <ge...@cloudera.org>.
Sailesh Mukil has posted comments on this change. ( http://gerrit.cloudera.org:8080/10855 )

Change subject: IMPALA-7213: Port ReportExecStatus() RPC to use KRPC
......................................................................


Patch Set 2:

(37 comments)

Did a first pass. Will have another look after these are addressed.

Also, a rebase might be required, it seems like this is based off a commit which has aged a bit.

http://gerrit.cloudera.org:8080/#/c/10855/2//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/10855/2//COMMIT_MSG@17
PS2, Line 17: they
nit: that


http://gerrit.cloudera.org:8080/#/c/10855/2//COMMIT_MSG@19
PS2, Line 19: convertion
nit: conversion


http://gerrit.cloudera.org:8080/#/c/10855/2//COMMIT_MSG@25
PS2, Line 25: 
Not to be pedantic, but a lot of Thrift structs are being removed as part of this patch, so I think it would help if you outline them and their new Protobuf counterparts in the commit message, so that it's easier to reference in the future in case of bugs around the new code that use these protobuf structs.


http://gerrit.cloudera.org:8080/#/c/10855/2/be/src/exec/hdfs-parquet-table-writer.cc
File be/src/exec/hdfs-parquet-table-writer.cc:

http://gerrit.cloudera.org:8080/#/c/10855/2/be/src/exec/hdfs-parquet-table-writer.cc@1183
PS2, Line 1183:     (*parquet_insert_stats_.mutable_per_column_size())[col_name] +=
              :         col_writer->total_compressed_size();
nit: Should we have a level of indirection here? It's not very obvious what this line is doing on first glance.


http://gerrit.cloudera.org:8080/#/c/10855/2/be/src/rpc/thrift-util.h
File be/src/rpc/thrift-util.h:

http://gerrit.cloudera.org:8080/#/c/10855/2/be/src/rpc/thrift-util.h@93
PS2, Line 93: Serialize
Should we explicitly qualify the names of these functions to avoid confusion?
Eg: SerializeToVector(), SerializeToString(), SerializeToFaststring(), etc.


http://gerrit.cloudera.org:8080/#/c/10855/2/be/src/rpc/thrift-util.h@103
PS2, Line 103: uint8_t* buffer;
             :     uint32_t len;
             :     mem_buffer_->getBuffer(&buffer, &len);
             :     result->assign_copy(buffer, len);
I think you can avoid the copy here if you do:

uint32_t len = mem_buffer_->available_read();
result->resize(len);
uint8_t* buffer = result->data();

// 'len' wouldn't have changed from the first line to here.
mem_buffer_->getBuffer(&buffer, &len);


http://gerrit.cloudera.org:8080/#/c/10855/2/be/src/runtime/dml-exec-state.h
File be/src/runtime/dml-exec-state.h:

http://gerrit.cloudera.org:8080/#/c/10855/2/be/src/runtime/dml-exec-state.h@35
PS2, Line 35: class TInsertResult;
            : class TFinalizeParams;
            : class TUpdateCatalogRequest;
            : class RuntimeProfile;
            : class HdfsTableDescriptor;
nit: Not your change, but the ordering is not alphabetical.


http://gerrit.cloudera.org:8080/#/c/10855/2/be/src/runtime/dml-exec-state.h@126
PS2, Line 126: std::map<std::string, std::string>
I'm wondering if it just makes sense to use google::protobuf::Map here instead of std::map, so that we don't have to pay the cost of serializing/deserializing from a payload to the map, and can just std::move() from one to the other.

Eg: Kudu does this in certain places:
https://github.com/apache/kudu/blob/d1d7572b364e06320e7afab8724242508709625d/src/kudu/rpc/service_if.h#L47-L48


http://gerrit.cloudera.org:8080/#/c/10855/2/be/src/runtime/dml-exec-state.cc
File be/src/runtime/dml-exec-state.cc:

http://gerrit.cloudera.org:8080/#/c/10855/2/be/src/runtime/dml-exec-state.cc@83
PS2, Line 83: for (auto i = parquet_stats.per_column_size().begin();
            :            i != parquet_stats.per_column_size().end(); ++i) {
Range based for-loop with const ref.

Range based for loops evaluation end() only once, whereas iterator loops like this evaluate end() on every iteration.

Range based for loops are good when you need to iterate over every element in a collection and when you're not modifying the size of that collection.


http://gerrit.cloudera.org:8080/#/c/10855/2/be/src/runtime/dml-exec-state.cc@95
PS2, Line 95: auto
Let's try to stay explicit about data types where we can. Also take it as a reference:

const PartitionStatusMap&


http://gerrit.cloudera.org:8080/#/c/10855/2/be/src/runtime/dml-exec-state.cc@95
PS2, Line 95: new_partition_status
nit: new_per_partition_status_map


http://gerrit.cloudera.org:8080/#/c/10855/2/be/src/runtime/dml-exec-state.cc@96
PS2, Line 96: (auto iter = new_partition_status.begin(); iter != new_partition_status.end();
            :        ++iter)
Why not use a range based for loop? It's much more readable. Also, reference to avoid iterator copy.

for (auto const& partition_status : new_per_partition_status_map) {
  ...
}


http://gerrit.cloudera.org:8080/#/c/10855/2/be/src/runtime/dml-exec-state.cc@404
PS2, Line 404: for (auto iter = files_to_move_.begin(); iter != files_to_move_.end(); ++iter) {
Range based for loop, and take iterator as const ref.


http://gerrit.cloudera.org:8080/#/c/10855/2/be/src/runtime/dml-exec-state.cc@407
PS2, Line 407:   for (auto iter = per_partition_status_.begin();
Same here


http://gerrit.cloudera.org:8080/#/c/10855/2/be/src/runtime/dml-exec-state.cc@453
PS2, Line 453: KuduDmlStatsPB* kudu_stats
This can be a reference, no reason to use a pointer here.


http://gerrit.cloudera.org:8080/#/c/10855/2/be/src/runtime/query-state.cc
File be/src/runtime/query-state.cc:

http://gerrit.cloudera.org:8080/#/c/10855/2/be/src/runtime/query-state.cc@237
PS2, Line 237: rpc_params
Wondering if we should rename this to 'exec_rpc_params()'. Else it's confusing as to what RPC this is referring to.


http://gerrit.cloudera.org:8080/#/c/10855/2/be/src/runtime/query-state.cc@260
PS2, Line 260: faststring
Can you comment in the code, on why you used 'faststring' vs. 'Slice'?


http://gerrit.cloudera.org:8080/#/c/10855/2/be/src/runtime/query-state.cc@290
PS2, Line 290: kudu::Status rpc_status = proxy->ReportExecStatus(req, &resp, &rpc_controller);
Why don't we retry on failures? That's a behavioral change from our current implementation.


http://gerrit.cloudera.org:8080/#/c/10855/2/be/src/runtime/runtime-state.h
File be/src/runtime/runtime-state.h:

http://gerrit.cloudera.org:8080/#/c/10855/2/be/src/runtime/runtime-state.h@30
PS2, Line 30: #include "util/error-util-internal.h"
Is this needed here?


http://gerrit.cloudera.org:8080/#/c/10855/2/be/src/runtime/runtime-state.h@228
PS2, Line 228: ReportExecStatusRequestPB* exec_status
Instead of tying an RPC specific member to this function, why not get it as an ErrorLogMap (same as before), and have a new error-util which is called at the caller side, that translates an ErrorLogMap to a google::protobuf::Map<int32_t, ErrorLogEntryPB> ?


http://gerrit.cloudera.org:8080/#/c/10855/2/be/src/runtime/runtime-state.cc
File be/src/runtime/runtime-state.cc:

http://gerrit.cloudera.org:8080/#/c/10855/2/be/src/runtime/runtime-state.cc@209
PS2, Line 209: for (auto iter = error_log_.begin(); iter != error_log_.end(); ++iter) {
Range based for-loop with const ref.


http://gerrit.cloudera.org:8080/#/c/10855/2/be/src/service/control-service.cc
File be/src/service/control-service.cc:

http://gerrit.cloudera.org:8080/#/c/10855/2/be/src/service/control-service.cc@37
PS2, Line 37: queue_limit_msg
I think we use all caps for static const members


http://gerrit.cloudera.org:8080/#/c/10855/2/be/src/service/control-service.cc@39
PS2, Line 39: 50MB
How did you arrive at this number?


http://gerrit.cloudera.org:8080/#/c/10855/2/be/src/service/control-service.cc@39
PS2, Line 39: DEFINE_string(control_service_queue_mem_limit, "50MB", queue_limit_msg.c_str());
            : DEFINE_int32(control_service_num_svc_threads, 0, "Number of threads for processing "
            :     "control services' RPCs. If left at default value 0, it will be set to number of "
            :     "CPU cores");
Please mention these new flags in the commit message.


http://gerrit.cloudera.org:8080/#/c/10855/2/be/src/service/control-service.cc@71
PS2, Line 71:   // Release the memory against the control service's memory tracker.
            :   mem_tracker_->Release(rpc_context->GetTransferSize());
Shouldn't we release towards the end; i.e. after UpdateBackendExecStatus() returns?


http://gerrit.cloudera.org:8080/#/c/10855/2/be/src/service/control-service.cc@96
PS2, Line 96: deserializes
nit: deserialize


http://gerrit.cloudera.org:8080/#/c/10855/2/be/src/service/control-service.cc@97
PS2, Line 97: so there may not be any
"... so an empty thrift profile is valid."


http://gerrit.cloudera.org:8080/#/c/10855/2/be/src/service/control-service.cc@99
PS2, Line 99:   if (LIKELY(instance_exec_status.has_thrift_profile_sidecar_idx())) {
We should mention in a comment that the RuntimeProfile is a Thrift serialized sidecar. Else it might confuse readers to see protobuf and Thrift used side by side.


http://gerrit.cloudera.org:8080/#/c/10855/2/be/src/service/control-service.cc@100
PS2, Line 100: kudu::Slice thrift_profile_slice;
Any reason we serialize as a kudu::faststring and deserialize as a Slice? Is there any perf impact because of this?


http://gerrit.cloudera.org:8080/#/c/10855/2/be/src/service/control-service.cc@112
PS2, Line 112: dummy_profile
empty_profile


http://gerrit.cloudera.org:8080/#/c/10855/2/be/src/service/control-service.cc@117
PS2, Line 117: 
Can we call RpcContext::DiscardTransfer() at this point to free up the memory used by the sidecar? Since we've already deserialized it to 'thrift_profile'.


http://gerrit.cloudera.org:8080/#/c/10855/2/be/src/util/error-util-internal.h
File be/src/util/error-util-internal.h:

http://gerrit.cloudera.org:8080/#/c/10855/2/be/src/util/error-util-internal.h@26
PS2, Line 26: 
Why is this new file necessary? Also, it seems to be included by other files too, so why is it named 'internal'?


http://gerrit.cloudera.org:8080/#/c/10855/2/be/src/util/error-util-internal.h@28
PS2, Line 28: std::map<TErrorCode::type, ErrorLogEntryPB>
Consider making this a google::protobuf::Map<> type.


http://gerrit.cloudera.org:8080/#/c/10855/2/be/src/util/error-util.cc
File be/src/util/error-util.cc:

http://gerrit.cloudera.org:8080/#/c/10855/2/be/src/util/error-util.cc@144
PS2, Line 144: for (auto iter = entry.messages().begin(); iter != entry.messages().end(); ++iter) {
range based for-loop with const ref.


http://gerrit.cloudera.org:8080/#/c/10855/2/bin/bootstrap_toolchain.py
File bin/bootstrap_toolchain.py:

http://gerrit.cloudera.org:8080/#/c/10855/2/bin/bootstrap_toolchain.py@432
PS2, Line 432: 5.0.1-asserts-p1
Please add the reason for this bump to the commit message.


http://gerrit.cloudera.org:8080/#/c/10855/2/bin/impala-config.sh
File bin/impala-config.sh:

http://gerrit.cloudera.org:8080/#/c/10855/2/bin/impala-config.sh@71
PS2, Line 71: export IMPALA_TOOLCHAIN_BUILD_ID=146-f2d5380be6
Please mention the reasons behind all these version bumps int he commit message.


http://gerrit.cloudera.org:8080/#/c/10855/2/common/protobuf/control_service.proto
File common/protobuf/control_service.proto:

http://gerrit.cloudera.org:8080/#/c/10855/2/common/protobuf/control_service.proto@156
PS2, Line 156: instance_exec_statu
nit: instance_exec_status



-- 
To view, visit http://gerrit.cloudera.org:8080/10855
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7638583b433dcac066b87198e448743d90415ebe
Gerrit-Change-Number: 10855
Gerrit-PatchSet: 2
Gerrit-Owner: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>
Gerrit-Comment-Date: Fri, 06 Jul 2018 21:24:07 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-7213, IMPALA-7241: Port ReportExecStatus() RPC to use KRPC

Posted by "Michael Ho (Code Review)" <ge...@cloudera.org>.
Hello Sailesh Mukil, Todd Lipcon, Impala Public Jenkins, Dan Hecht, Michal Ostrowski, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/10855

to look at the new patch set (#13).

Change subject: IMPALA-7213, IMPALA-7241: Port ReportExecStatus() RPC to use KRPC
......................................................................

IMPALA-7213, IMPALA-7241: Port ReportExecStatus() RPC to use KRPC

This change converts ReportExecStatus() RPC from thrift
based RPC to KRPC. This is done in part of the preparation
for fixing IMPALA-2990 as we can take advantage of TCP connection
multiplexing in KRPC to avoid overwhelming the coordinator
with too many connections by reducing the number of TCP connection
to one for each executor.

This patch also introduces a new service pool for all query execution
control related RPCs in the future so that control commands from
coordinators aren't blocked by long-running DataStream services' RPCs.
The majority of this patch is mechanical conversion of some Thrift
structures used in ReportExecStatus() RPC to Protobuf. Note that the
runtime profile is still retained as a Thrift structure as Impala
clients will still fetch query profiles using Thrift RPCs. This also
avoids duplicating the serialization implementation in both Thrift
and Protobuf for the runtime profile. The Thrift runtime profiles
are serialized and sent as a sidecar in ReportExecStatus() RPC.

This patch also fixes IMPALA-7241 which may lead to duplicated
dml stats being applied. The fix is by adding a monotonically
increasing version number for fragment instances' reports. The
coordinator will ignore any report smaller than or equal to the
version in the last report.

Testing done:
1. Exhaustive build.
2. Added some targeted test cases for profile serialization failure and RPC retries/timeout.

Change-Id: I7638583b433dcac066b87198e448743d90415ebe
---
M be/src/benchmarks/expr-benchmark.cc
M be/src/catalog/catalog-util.cc
M be/src/common/global-flags.cc
M be/src/exec/data-sink.cc
M be/src/exec/data-sink.h
M be/src/exec/hbase-table-sink.cc
M be/src/exec/hdfs-parquet-table-writer.cc
M be/src/exec/hdfs-parquet-table-writer.h
M be/src/exec/hdfs-table-sink.cc
M be/src/exec/hdfs-table-writer.cc
M be/src/exec/hdfs-table-writer.h
M be/src/rpc/CMakeLists.txt
M be/src/rpc/jni-thrift-util.h
M be/src/rpc/thrift-util-test.cc
M be/src/rpc/thrift-util.h
M be/src/runtime/backend-client.h
M be/src/runtime/coordinator-backend-state.cc
M be/src/runtime/coordinator-backend-state.h
M be/src/runtime/coordinator.cc
M be/src/runtime/coordinator.h
M be/src/runtime/dml-exec-state.cc
M be/src/runtime/dml-exec-state.h
M be/src/runtime/exec-env.cc
M be/src/runtime/exec-env.h
M be/src/runtime/fragment-instance-state.cc
M be/src/runtime/fragment-instance-state.h
M be/src/runtime/query-state.cc
M be/src/runtime/query-state.h
M be/src/runtime/runtime-state.cc
M be/src/runtime/runtime-state.h
M be/src/runtime/test-env.cc
M be/src/scheduling/admission-controller.cc
M be/src/scheduling/scheduler-test-util.cc
M be/src/service/CMakeLists.txt
M be/src/service/client-request-state.cc
M be/src/service/client-request-state.h
A be/src/service/control-service.cc
A be/src/service/control-service.h
M be/src/service/data-stream-service.cc
M be/src/service/data-stream-service.h
M be/src/service/impala-internal-service.cc
M be/src/service/impala-internal-service.h
M be/src/service/impala-server.cc
M be/src/service/impala-server.h
M be/src/testutil/in-process-servers.cc
M be/src/util/container-util.h
A be/src/util/error-util-internal.h
M be/src/util/error-util-test.cc
M be/src/util/error-util.cc
M be/src/util/error-util.h
M be/src/util/runtime-profile.cc
M be/src/util/uid-util.h
M common/protobuf/CMakeLists.txt
M common/protobuf/common.proto
A common/protobuf/control_service.proto
M common/protobuf/data_stream_service.proto
M common/protobuf/row_batch.proto
M common/protobuf/rpc_test.proto
M common/thrift/ImpalaInternalService.thrift
M tests/custom_cluster/test_rpc_timeout.py
60 files changed, 1,212 insertions(+), 758 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/55/10855/13
-- 
To view, visit http://gerrit.cloudera.org:8080/10855
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I7638583b433dcac066b87198e448743d90415ebe
Gerrit-Change-Number: 10855
Gerrit-PatchSet: 13
Gerrit-Owner: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Michal Ostrowski <mo...@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[Impala-ASF-CR] IMPALA-7213, IMPALA-7241: Port ReportExecStatus() RPC to use KRPC

Posted by "Michael Ho (Code Review)" <ge...@cloudera.org>.
Hello Sailesh Mukil, Todd Lipcon, Impala Public Jenkins, Dan Hecht, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/10855

to look at the new patch set (#12).

Change subject: IMPALA-7213, IMPALA-7241: Port ReportExecStatus() RPC to use KRPC
......................................................................

IMPALA-7213, IMPALA-7241: Port ReportExecStatus() RPC to use KRPC

This change converts ReportExecStatus() RPC from thrift
based RPC to KRPC. This is done in part of the preparation
for fixing IMPALA-2990 as we can take advantage of TCP connection
multiplexing in KRPC to avoid overwhelming the coordinator
with too many connections by reducing the number of TCP connection
to one for each executor.

This patch also introduces a new service pool for all query execution
control related RPCs in the future so that control commands from
coordinators aren't blocked by long-running DataStream services' RPCs.
The majority of this patch is mechanical conversion of some Thrift
structures used in ReportExecStatus() RPC to Protobuf. Note that the
runtime profile is still retained as a Thrift structure as Impala
clients will still fetch query profiles using Thrift RPCs. This also
avoids duplicating the serialization implementation in both Thrift
and Protobuf for the runtime profile. The Thrift runtime profiles
are serialized and sent as a sidecar in ReportExecStatus() RPC.

This patch also fixes IMPALA-7241 which may lead to duplicated
dml stats being applied. The fix is by adding a monotonically
increasing version number for fragment instances' reports. The
coordinator will ignore any report smaller than or equal to the
version in the last report.

Testing done:
1. Exhaustive build.
2. Added some targeted test cases for profile serialization failure and RPC retries/timeout.

Change-Id: I7638583b433dcac066b87198e448743d90415ebe
---
M be/src/benchmarks/expr-benchmark.cc
M be/src/catalog/catalog-util.cc
M be/src/common/global-flags.cc
M be/src/exec/data-sink.cc
M be/src/exec/data-sink.h
M be/src/exec/hbase-table-sink.cc
M be/src/exec/hdfs-parquet-table-writer.cc
M be/src/exec/hdfs-parquet-table-writer.h
M be/src/exec/hdfs-table-sink.cc
M be/src/exec/hdfs-table-writer.cc
M be/src/exec/hdfs-table-writer.h
M be/src/rpc/jni-thrift-util.h
M be/src/rpc/thrift-util-test.cc
M be/src/rpc/thrift-util.h
M be/src/runtime/backend-client.h
M be/src/runtime/coordinator-backend-state.cc
M be/src/runtime/coordinator-backend-state.h
M be/src/runtime/coordinator.cc
M be/src/runtime/coordinator.h
M be/src/runtime/dml-exec-state.cc
M be/src/runtime/dml-exec-state.h
M be/src/runtime/exec-env.cc
M be/src/runtime/exec-env.h
M be/src/runtime/fragment-instance-state.cc
M be/src/runtime/fragment-instance-state.h
M be/src/runtime/query-state.cc
M be/src/runtime/query-state.h
M be/src/runtime/runtime-state.cc
M be/src/runtime/runtime-state.h
M be/src/runtime/test-env.cc
M be/src/scheduling/admission-controller.cc
M be/src/scheduling/scheduler-test-util.cc
M be/src/service/CMakeLists.txt
M be/src/service/client-request-state.cc
M be/src/service/client-request-state.h
A be/src/service/control-service.cc
A be/src/service/control-service.h
M be/src/service/data-stream-service.cc
M be/src/service/data-stream-service.h
M be/src/service/impala-internal-service.cc
M be/src/service/impala-internal-service.h
M be/src/service/impala-server.cc
M be/src/service/impala-server.h
M be/src/testutil/in-process-servers.cc
M be/src/util/container-util.h
A be/src/util/error-util-internal.h
M be/src/util/error-util-test.cc
M be/src/util/error-util.cc
M be/src/util/error-util.h
M be/src/util/runtime-profile.cc
M be/src/util/uid-util.h
M common/protobuf/CMakeLists.txt
M common/protobuf/common.proto
A common/protobuf/control_service.proto
M common/protobuf/data_stream_service.proto
M common/protobuf/row_batch.proto
M common/protobuf/rpc_test.proto
M common/thrift/ImpalaInternalService.thrift
M tests/custom_cluster/test_rpc_timeout.py
59 files changed, 1,155 insertions(+), 695 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/55/10855/12
-- 
To view, visit http://gerrit.cloudera.org:8080/10855
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I7638583b433dcac066b87198e448743d90415ebe
Gerrit-Change-Number: 10855
Gerrit-PatchSet: 12
Gerrit-Owner: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[Impala-ASF-CR] IMPALA-7213, IMPALA-7241: Port ReportExecStatus() RPC to use KRPC

Posted by "Michael Ho (Code Review)" <ge...@cloudera.org>.
Hello Thomas Marshall, Todd Lipcon, Tim Armstrong, Bikramjeet Vig, Impala Public Jenkins, Michal Ostrowski, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/10855

to look at the new patch set (#19).

Change subject: IMPALA-7213, IMPALA-7241: Port ReportExecStatus() RPC to use KRPC
......................................................................

IMPALA-7213, IMPALA-7241: Port ReportExecStatus() RPC to use KRPC

This change converts ReportExecStatus() RPC from thrift
based RPC to KRPC. This is done in part of the preparation
for fixing IMPALA-2990 as we can take advantage of TCP connection
multiplexing in KRPC to avoid overwhelming the coordinator
with too many connections by reducing the number of TCP connection
to one for each executor.

This patch also introduces a new service pool for all query execution
control related RPCs in the future so that control commands from
coordinators aren't blocked by long-running DataStream services' RPCs.
To avoid unnecessary delays due to sharing the network connections
between DataStream service and Control service, this change added the
service name as part of the user credentials for the ConnectionId
so each service will use a separate connection.

The majority of this patch is mechanical conversion of some Thrift
structures used in ReportExecStatus() RPC to Protobuf. Note that the
runtime profile is still retained as a Thrift structure as Impala
clients will still fetch query profiles using Thrift RPCs. This also
avoids duplicating the serialization implementation in both Thrift
and Protobuf for the runtime profile. The Thrift runtime profiles
are serialized and sent as a sidecar in ReportExecStatus() RPC.

This patch also fixes IMPALA-7241 which may lead to duplicated
dml stats being applied. The fix is by adding a monotonically
increasing version number for fragment instances' reports. The
coordinator will ignore any report smaller than or equal to the
version in the last report.

Testing done:
1. Exhaustive build.
2. Added some targeted test cases for profile serialization failure
   and RPC retries/timeout.

Change-Id: I7638583b433dcac066b87198e448743d90415ebe
---
M be/src/benchmarks/expr-benchmark.cc
M be/src/catalog/catalog-util.cc
M be/src/common/global-flags.cc
M be/src/exec/data-sink.cc
M be/src/exec/data-sink.h
M be/src/exec/hbase-table-sink.cc
M be/src/exec/hdfs-parquet-table-writer.cc
M be/src/exec/hdfs-parquet-table-writer.h
M be/src/exec/hdfs-table-sink.cc
M be/src/exec/hdfs-table-writer.cc
M be/src/exec/hdfs-table-writer.h
M be/src/rpc/CMakeLists.txt
M be/src/rpc/jni-thrift-util.h
M be/src/rpc/rpc-mgr-kerberized-test.cc
M be/src/rpc/rpc-mgr-test.cc
M be/src/rpc/rpc-mgr-test.h
M be/src/rpc/rpc-mgr.h
M be/src/rpc/thrift-util-test.cc
M be/src/rpc/thrift-util.h
M be/src/runtime/backend-client.h
M be/src/runtime/coordinator-backend-state.cc
M be/src/runtime/coordinator-backend-state.h
M be/src/runtime/coordinator.cc
M be/src/runtime/coordinator.h
M be/src/runtime/dml-exec-state.cc
M be/src/runtime/dml-exec-state.h
M be/src/runtime/exec-env.cc
M be/src/runtime/exec-env.h
M be/src/runtime/fragment-instance-state.cc
M be/src/runtime/fragment-instance-state.h
M be/src/runtime/krpc-data-stream-sender.cc
M be/src/runtime/query-state.cc
M be/src/runtime/query-state.h
M be/src/runtime/runtime-state.cc
M be/src/runtime/runtime-state.h
M be/src/runtime/test-env.cc
M be/src/scheduling/admission-controller.cc
M be/src/scheduling/scheduler-test-util.cc
M be/src/service/CMakeLists.txt
M be/src/service/client-request-state.cc
M be/src/service/client-request-state.h
A be/src/service/control-service.cc
A be/src/service/control-service.h
M be/src/service/data-stream-service.cc
M be/src/service/data-stream-service.h
M be/src/service/impala-internal-service.cc
M be/src/service/impala-internal-service.h
M be/src/service/impala-server.cc
M be/src/service/impala-server.h
M be/src/testutil/in-process-servers.cc
M be/src/util/container-util.h
A be/src/util/error-util-internal.h
M be/src/util/error-util-test.cc
M be/src/util/error-util.cc
M be/src/util/error-util.h
M be/src/util/runtime-profile.cc
M be/src/util/uid-util.h
M common/protobuf/CMakeLists.txt
M common/protobuf/common.proto
A common/protobuf/control_service.proto
M common/protobuf/data_stream_service.proto
M common/protobuf/row_batch.proto
M common/protobuf/rpc_test.proto
M common/thrift/ImpalaInternalService.thrift
M tests/custom_cluster/test_rpc_timeout.py
65 files changed, 1,298 insertions(+), 769 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/55/10855/19
-- 
To view, visit http://gerrit.cloudera.org:8080/10855
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I7638583b433dcac066b87198e448743d90415ebe
Gerrit-Change-Number: 10855
Gerrit-PatchSet: 19
Gerrit-Owner: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Michal Ostrowski <mo...@cloudera.com>
Gerrit-Reviewer: Thomas Marshall <th...@cmu.edu>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[Impala-ASF-CR] IMPALA-7213, IMPALA-7241: Port ReportExecStatus() RPC to use KRPC

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/10855 )

Change subject: IMPALA-7213, IMPALA-7241: Port ReportExecStatus() RPC to use KRPC
......................................................................


Patch Set 22: Verified+1


-- 
To view, visit http://gerrit.cloudera.org:8080/10855
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7638583b433dcac066b87198e448743d90415ebe
Gerrit-Change-Number: 10855
Gerrit-PatchSet: 22
Gerrit-Owner: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Michal Ostrowski <mo...@cloudera.com>
Gerrit-Reviewer: Thomas Marshall <th...@cmu.edu>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Thu, 01 Nov 2018 21:12:11 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-7213, IMPALA-7241: Port ReportExecStatus() RPC to use KRPC

Posted by "Michael Ho (Code Review)" <ge...@cloudera.org>.
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/10855 )

Change subject: IMPALA-7213, IMPALA-7241: Port ReportExecStatus() RPC to use KRPC
......................................................................


Patch Set 10:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/10855/9/be/src/runtime/coordinator-backend-state.h
File be/src/runtime/coordinator-backend-state.h:

http://gerrit.cloudera.org:8080/#/c/10855/9/be/src/runtime/coordinator-backend-state.h@33
PS9, Line 33: #include "runtime/coordina
> Not needed.
Done


http://gerrit.cloudera.org:8080/#/c/10855/9/be/src/runtime/coordinator-backend-state.cc
File be/src/runtime/coordinator-backend-state.cc:

http://gerrit.cloudera.org:8080/#/c/10855/9/be/src/runtime/coordinator-backend-state.cc@52
PS9, Line 52: 
> This should have been moved to line 48.
Done


http://gerrit.cloudera.org:8080/#/c/10855/9/common/protobuf/control_service.proto
File common/protobuf/control_service.proto:

http://gerrit.cloudera.org:8080/#/c/10855/9/common/protobuf/control_service.proto@137
PS9, Line 137: 5
> Skipped 5.
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/10855
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7638583b433dcac066b87198e448743d90415ebe
Gerrit-Change-Number: 10855
Gerrit-PatchSet: 10
Gerrit-Owner: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Thu, 09 Aug 2018 02:46:41 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-7213, IMPALA-7241: Port ReportExecStatus() RPC to use KRPC

Posted by "Michael Ho (Code Review)" <ge...@cloudera.org>.
Hello Thomas Marshall, Todd Lipcon, Tim Armstrong, Bikramjeet Vig, Impala Public Jenkins, Michal Ostrowski, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/10855

to look at the new patch set (#20).

Change subject: IMPALA-7213, IMPALA-7241: Port ReportExecStatus() RPC to use KRPC
......................................................................

IMPALA-7213, IMPALA-7241: Port ReportExecStatus() RPC to use KRPC

This change converts ReportExecStatus() RPC from thrift
based RPC to KRPC. This is done in part of the preparation
for fixing IMPALA-2990 as we can take advantage of TCP connection
multiplexing in KRPC to avoid overwhelming the coordinator
with too many connections by reducing the number of TCP connection
to one for each executor.

This patch also introduces a new service pool for all query execution
control related RPCs in the future so that control commands from
coordinators aren't blocked by long-running DataStream services' RPCs.
To avoid unnecessary delays due to sharing the network connections
between DataStream service and Control service, this change added the
service name as part of the user credentials for the ConnectionId
so each service will use a separate connection.

The majority of this patch is mechanical conversion of some Thrift
structures used in ReportExecStatus() RPC to Protobuf. Note that the
runtime profile is still retained as a Thrift structure as Impala
clients will still fetch query profiles using Thrift RPCs. This also
avoids duplicating the serialization implementation in both Thrift
and Protobuf for the runtime profile. The Thrift runtime profiles
are serialized and sent as a sidecar in ReportExecStatus() RPC.

This patch also fixes IMPALA-7241 which may lead to duplicated
dml stats being applied. The fix is by adding a monotonically
increasing version number for fragment instances' reports. The
coordinator will ignore any report smaller than or equal to the
version in the last report.

Testing done:
1. Exhaustive build.
2. Added some targeted test cases for profile serialization failure
   and RPC retries/timeout.

Change-Id: I7638583b433dcac066b87198e448743d90415ebe
---
M be/src/benchmarks/expr-benchmark.cc
M be/src/catalog/catalog-util.cc
M be/src/common/global-flags.cc
M be/src/exec/data-sink.cc
M be/src/exec/data-sink.h
M be/src/exec/hbase-table-sink.cc
M be/src/exec/hdfs-parquet-table-writer.cc
M be/src/exec/hdfs-parquet-table-writer.h
M be/src/exec/hdfs-table-sink.cc
M be/src/exec/hdfs-table-writer.cc
M be/src/exec/hdfs-table-writer.h
M be/src/rpc/CMakeLists.txt
M be/src/rpc/jni-thrift-util.h
M be/src/rpc/rpc-mgr-kerberized-test.cc
M be/src/rpc/rpc-mgr-test.cc
M be/src/rpc/rpc-mgr-test.h
M be/src/rpc/rpc-mgr.h
M be/src/rpc/thrift-util-test.cc
M be/src/rpc/thrift-util.h
M be/src/runtime/backend-client.h
M be/src/runtime/coordinator-backend-state.cc
M be/src/runtime/coordinator-backend-state.h
M be/src/runtime/coordinator.cc
M be/src/runtime/coordinator.h
M be/src/runtime/dml-exec-state.cc
M be/src/runtime/dml-exec-state.h
M be/src/runtime/exec-env.cc
M be/src/runtime/exec-env.h
M be/src/runtime/fragment-instance-state.cc
M be/src/runtime/fragment-instance-state.h
M be/src/runtime/krpc-data-stream-sender.cc
M be/src/runtime/query-state.cc
M be/src/runtime/query-state.h
M be/src/runtime/runtime-state.cc
M be/src/runtime/runtime-state.h
M be/src/runtime/test-env.cc
M be/src/scheduling/admission-controller.cc
M be/src/scheduling/scheduler-test-util.cc
M be/src/service/CMakeLists.txt
M be/src/service/client-request-state.cc
M be/src/service/client-request-state.h
A be/src/service/control-service.cc
A be/src/service/control-service.h
M be/src/service/data-stream-service.cc
M be/src/service/data-stream-service.h
M be/src/service/impala-internal-service.cc
M be/src/service/impala-internal-service.h
M be/src/service/impala-server.cc
M be/src/service/impala-server.h
M be/src/testutil/in-process-servers.cc
M be/src/util/container-util.h
A be/src/util/error-util-internal.h
M be/src/util/error-util-test.cc
M be/src/util/error-util.cc
M be/src/util/error-util.h
M be/src/util/runtime-profile.cc
M be/src/util/uid-util.h
M common/protobuf/CMakeLists.txt
M common/protobuf/common.proto
A common/protobuf/control_service.proto
M common/protobuf/data_stream_service.proto
M common/protobuf/row_batch.proto
M common/protobuf/rpc_test.proto
M common/thrift/ImpalaInternalService.thrift
M tests/custom_cluster/test_rpc_timeout.py
65 files changed, 1,299 insertions(+), 769 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/55/10855/20
-- 
To view, visit http://gerrit.cloudera.org:8080/10855
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I7638583b433dcac066b87198e448743d90415ebe
Gerrit-Change-Number: 10855
Gerrit-PatchSet: 20
Gerrit-Owner: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Michal Ostrowski <mo...@cloudera.com>
Gerrit-Reviewer: Thomas Marshall <th...@cmu.edu>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[Impala-ASF-CR] IMPALA-7213, IMPALA-7241: Port ReportExecStatus() RPC to use KRPC

Posted by "Michael Ho (Code Review)" <ge...@cloudera.org>.
Hello Thomas Marshall, Todd Lipcon, Tim Armstrong, Bikramjeet Vig, Impala Public Jenkins, Michal Ostrowski, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/10855

to look at the new patch set (#21).

Change subject: IMPALA-7213, IMPALA-7241: Port ReportExecStatus() RPC to use KRPC
......................................................................

IMPALA-7213, IMPALA-7241: Port ReportExecStatus() RPC to use KRPC

This change converts ReportExecStatus() RPC from thrift
based RPC to KRPC. This is done in part of the preparation
for fixing IMPALA-2990 as we can take advantage of TCP connection
multiplexing in KRPC to avoid overwhelming the coordinator
with too many connections by reducing the number of TCP connection
to one for each executor.

This patch also introduces a new service pool for all query execution
control related RPCs in the future so that control commands from
coordinators aren't blocked by long-running DataStream services' RPCs.
To avoid unnecessary delays due to sharing the network connections
between DataStream service and Control service, this change added the
service name as part of the user credentials for the ConnectionId
so each service will use a separate connection.

The majority of this patch is mechanical conversion of some Thrift
structures used in ReportExecStatus() RPC to Protobuf. Note that the
runtime profile is still retained as a Thrift structure as Impala
clients will still fetch query profiles using Thrift RPCs. This also
avoids duplicating the serialization implementation in both Thrift
and Protobuf for the runtime profile. The Thrift runtime profiles
are serialized and sent as a sidecar in ReportExecStatus() RPC.

This patch also fixes IMPALA-7241 which may lead to duplicated
dml stats being applied. The fix is by adding a monotonically
increasing version number for fragment instances' reports. The
coordinator will ignore any report smaller than or equal to the
version in the last report.

Testing done:
1. Exhaustive build.
2. Added some targeted test cases for profile serialization failure
   and RPC retries/timeout.

Change-Id: I7638583b433dcac066b87198e448743d90415ebe
---
M be/src/benchmarks/expr-benchmark.cc
M be/src/catalog/catalog-util.cc
M be/src/common/global-flags.cc
M be/src/exec/data-sink.cc
M be/src/exec/data-sink.h
M be/src/exec/hbase-table-sink.cc
M be/src/exec/hdfs-parquet-table-writer.cc
M be/src/exec/hdfs-parquet-table-writer.h
M be/src/exec/hdfs-table-sink.cc
M be/src/exec/hdfs-table-writer.cc
M be/src/exec/hdfs-table-writer.h
M be/src/rpc/CMakeLists.txt
M be/src/rpc/jni-thrift-util.h
M be/src/rpc/rpc-mgr-kerberized-test.cc
M be/src/rpc/rpc-mgr-test.cc
M be/src/rpc/rpc-mgr-test.h
M be/src/rpc/rpc-mgr.h
M be/src/rpc/thrift-util-test.cc
M be/src/rpc/thrift-util.h
M be/src/runtime/backend-client.h
M be/src/runtime/coordinator-backend-state.cc
M be/src/runtime/coordinator-backend-state.h
M be/src/runtime/coordinator.cc
M be/src/runtime/coordinator.h
M be/src/runtime/dml-exec-state.cc
M be/src/runtime/dml-exec-state.h
M be/src/runtime/exec-env.cc
M be/src/runtime/exec-env.h
M be/src/runtime/fragment-instance-state.cc
M be/src/runtime/fragment-instance-state.h
M be/src/runtime/krpc-data-stream-sender.cc
M be/src/runtime/query-state.cc
M be/src/runtime/query-state.h
M be/src/runtime/runtime-state.cc
M be/src/runtime/runtime-state.h
M be/src/runtime/test-env.cc
M be/src/scheduling/admission-controller.cc
M be/src/scheduling/scheduler-test-util.cc
M be/src/service/CMakeLists.txt
M be/src/service/client-request-state.cc
M be/src/service/client-request-state.h
A be/src/service/control-service.cc
A be/src/service/control-service.h
M be/src/service/data-stream-service.cc
M be/src/service/data-stream-service.h
M be/src/service/impala-internal-service.cc
M be/src/service/impala-internal-service.h
M be/src/service/impala-server.cc
M be/src/service/impala-server.h
M be/src/testutil/in-process-servers.cc
M be/src/util/container-util.h
A be/src/util/error-util-internal.h
M be/src/util/error-util-test.cc
M be/src/util/error-util.cc
M be/src/util/error-util.h
M be/src/util/runtime-profile.cc
M be/src/util/uid-util.h
M common/protobuf/CMakeLists.txt
M common/protobuf/common.proto
A common/protobuf/control_service.proto
M common/protobuf/data_stream_service.proto
M common/protobuf/row_batch.proto
M common/protobuf/rpc_test.proto
M common/thrift/ImpalaInternalService.thrift
M tests/custom_cluster/test_rpc_timeout.py
65 files changed, 1,299 insertions(+), 769 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/55/10855/21
-- 
To view, visit http://gerrit.cloudera.org:8080/10855
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I7638583b433dcac066b87198e448743d90415ebe
Gerrit-Change-Number: 10855
Gerrit-PatchSet: 21
Gerrit-Owner: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Michal Ostrowski <mo...@cloudera.com>
Gerrit-Reviewer: Thomas Marshall <th...@cmu.edu>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[Impala-ASF-CR] IMPALA-7213: Port ReportExecStatus() RPC to use KRPC

Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Todd Lipcon has posted comments on this change. ( http://gerrit.cloudera.org:8080/10855 )

Change subject: IMPALA-7213: Port ReportExecStatus() RPC to use KRPC
......................................................................


Patch Set 4:

(7 comments)

http://gerrit.cloudera.org:8080/#/c/10855/4//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/10855/4//COMMIT_MSG@16
PS4, Line 16: This patch also introduces a new service pool for all query execution
            : control related RPCs in the future so that control commands from
            : coordinators aren't blocked by long-running DataStream services' RPCs.
> +1 for this.
yep, that's essentially what I meant. Thanks for explaining it better than I did :) My thought of "connection class" is a different name for your "proxy hash".

Should we consider this a prerequisite for merging this, or a nice-to-have follow-up? I'm not clear how latency-sensitive ReportExecStatus is. If a call takes 10 seconds to arrive is that a problem?


http://gerrit.cloudera.org:8080/#/c/10855/2/be/src/runtime/query-state.cc
File be/src/runtime/query-state.cc:

http://gerrit.cloudera.org:8080/#/c/10855/2/be/src/runtime/query-state.cc@290
PS2, Line 290:     query_ctx().coord_address.hostname, &proxy);
> I think I'm missing something here. Where is the code that does the retry?
Semi-related: perhaps we should be setting the TCP_USER_TIMEOUT socket option to try to improve the hang behavior when a node goes offline? It seems this may allow us to get more aggressive connection dropping (and concomitant call failure) when a node's network connectivity is lost. See 'man 7 tcp' for a description.


http://gerrit.cloudera.org:8080/#/c/10855/4/be/src/runtime/query-state.cc
File be/src/runtime/query-state.cc:

http://gerrit.cloudera.org:8080/#/c/10855/4/be/src/runtime/query-state.cc@283
PS4, Line 283:     req.clear_error_log();
isnt the error log already clear because this is a new instance?


http://gerrit.cloudera.org:8080/#/c/10855/4/be/src/runtime/query-state.cc@284
PS4, Line 284:     state->GetUnreportedErrors(req.mutable_error_log());
it's a shame that this mutates the state, making retries a bit more difficult


http://gerrit.cloudera.org:8080/#/c/10855/4/be/src/runtime/query-state.cc@305
PS4, Line 305:     // We can retry the RPC if the payload hasn't been sent yet.
are these reports idempotent? seems like it shoudl be easy to make them idempotent by including a sequence number or something if they aren't, but idempotency seems a prereq for retrying on conn reset


http://gerrit.cloudera.org:8080/#/c/10855/4/be/src/runtime/query-state.cc@306
PS4, Line 306: RpcMgr::IsServerTooBusy(rpc_controller
I'm a bit fuzzy on the context of this code, but if we get TOO_BUSY, it seems like we should retry either inline for a final report, or just on the next interval for a profile report? (likely with some other small change to avoid clearing the error log in the case that the RPC fails)


http://gerrit.cloudera.org:8080/#/c/10855/4/be/src/service/control-service.cc
File be/src/service/control-service.cc:

http://gerrit.cloudera.org:8080/#/c/10855/4/be/src/service/control-service.cc@72
PS4, Line 72:   // TODO: implement something more efficient here, we're currently
            :   // acquiring/releasing the map lock and doing a map lookup for
            :   // every report (assign each query a local int32_t id and use that to index into a
            :   // vector of ClientRequestStates, w/o lookup or locking?)
> I'd done an analysis on using a R/W lock for the ClientRequestState map ear
Sounds reasonable to me.



-- 
To view, visit http://gerrit.cloudera.org:8080/10855
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7638583b433dcac066b87198e448743d90415ebe
Gerrit-Change-Number: 10855
Gerrit-PatchSet: 4
Gerrit-Owner: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Thu, 26 Jul 2018 16:37:52 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-7213, IMPALA-7241: Port ReportExecStatus() RPC to use KRPC

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/10855 )

Change subject: IMPALA-7213, IMPALA-7241: Port ReportExecStatus() RPC to use KRPC
......................................................................


Patch Set 9:

Build Failed 

https://jenkins.impala.io/job/gerrit-code-review-checks/260/ : Initial code review checks failed. See linked job for details on the failure.


-- 
To view, visit http://gerrit.cloudera.org:8080/10855
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7638583b433dcac066b87198e448743d90415ebe
Gerrit-Change-Number: 10855
Gerrit-PatchSet: 9
Gerrit-Owner: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Thu, 09 Aug 2018 01:03:47 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-7213: Port ReportExecStatus() RPC to use KRPC

Posted by "Michael Ho (Code Review)" <ge...@cloudera.org>.
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/10855 )

Change subject: IMPALA-7213: Port ReportExecStatus() RPC to use KRPC
......................................................................


Patch Set 2:

(36 comments)

http://gerrit.cloudera.org:8080/#/c/10855/2//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/10855/2//COMMIT_MSG@17
PS2, Line 17: they
> nit: that
Done


http://gerrit.cloudera.org:8080/#/c/10855/2//COMMIT_MSG@19
PS2, Line 19: convertion
> nit: conversion
Done


http://gerrit.cloudera.org:8080/#/c/10855/2//COMMIT_MSG@25
PS2, Line 25: 
> Not to be pedantic, but a lot of Thrift structs are being removed as part o
Done


http://gerrit.cloudera.org:8080/#/c/10855/2/be/src/exec/hdfs-parquet-table-writer.cc
File be/src/exec/hdfs-parquet-table-writer.cc:

http://gerrit.cloudera.org:8080/#/c/10855/2/be/src/exec/hdfs-parquet-table-writer.cc@1183
PS2, Line 1183:     (*parquet_insert_stats_.mutable_per_column_size())[col_name] +=
              :         col_writer->total_compressed_size();
> nit: Should we have a level of indirection here? It's not very obvious what
Done


http://gerrit.cloudera.org:8080/#/c/10855/2/be/src/rpc/thrift-util.h
File be/src/rpc/thrift-util.h:

http://gerrit.cloudera.org:8080/#/c/10855/2/be/src/rpc/thrift-util.h@93
PS2, Line 93: Serialize
> Should we explicitly qualify the names of these functions to avoid confusio
Done


http://gerrit.cloudera.org:8080/#/c/10855/2/be/src/rpc/thrift-util.h@103
PS2, Line 103: uint8_t* buffer;
             :     uint32_t len;
             :     mem_buffer_->getBuffer(&buffer, &len);
             :     result->assign_copy(buffer, len);
> I think you can avoid the copy here if you do:
I believe faststring manages its own buffer internally so there is no easy way to share the internal buffer inside mem_buffer_ with the faststring.

If I understand it correctly, mem_buffer_->getBuffer() shouldn't do any data copying so there should be only one copy into the faststring.


http://gerrit.cloudera.org:8080/#/c/10855/2/be/src/runtime/dml-exec-state.h
File be/src/runtime/dml-exec-state.h:

http://gerrit.cloudera.org:8080/#/c/10855/2/be/src/runtime/dml-exec-state.h@35
PS2, Line 35: class TInsertResult;
            : class TFinalizeParams;
            : class TUpdateCatalogRequest;
            : class RuntimeProfile;
            : class HdfsTableDescriptor;
> nit: Not your change, but the ordering is not alphabetical.
Done


http://gerrit.cloudera.org:8080/#/c/10855/2/be/src/runtime/dml-exec-state.h@126
PS2, Line 126: std::map<std::string, std::string>
> I'm wondering if it just makes sense to use google::protobuf::Map here inst
The comment explicitly says "Uses ordered map so that iteration order is deterministic." so I am sticking to map for now. According to the specification of protobuf, google::protobuf::Map acts more like an unordered_map (https://developers.google.com/protocol-buffers/docs/proto#maps-features)

  - Wire format ordering and map iteration ordering of map values is undefined, so you cannot rely on your map items being in a particular order.


http://gerrit.cloudera.org:8080/#/c/10855/2/be/src/runtime/dml-exec-state.cc
File be/src/runtime/dml-exec-state.cc:

http://gerrit.cloudera.org:8080/#/c/10855/2/be/src/runtime/dml-exec-state.cc@83
PS2, Line 83: for (auto i = parquet_stats.per_column_size().begin();
            :            i != parquet_stats.per_column_size().end(); ++i) {
> Range based for-loop with const ref.
Done


http://gerrit.cloudera.org:8080/#/c/10855/2/be/src/runtime/dml-exec-state.cc@95
PS2, Line 95: auto
> Let's try to stay explicit about data types where we can. Also take it as a
Done


http://gerrit.cloudera.org:8080/#/c/10855/2/be/src/runtime/dml-exec-state.cc@95
PS2, Line 95: new_partition_status
> nit: new_per_partition_status_map
Done


http://gerrit.cloudera.org:8080/#/c/10855/2/be/src/runtime/dml-exec-state.cc@96
PS2, Line 96: (auto iter = new_partition_status.begin(); iter != new_partition_status.end();
            :        ++iter)
> Why not use a range based for loop? It's much more readable. Also, referenc
Done


http://gerrit.cloudera.org:8080/#/c/10855/2/be/src/runtime/dml-exec-state.cc@404
PS2, Line 404: for (auto iter = files_to_move_.begin(); iter != files_to_move_.end(); ++iter) {
> Range based for loop, and take iterator as const ref.
Done


http://gerrit.cloudera.org:8080/#/c/10855/2/be/src/runtime/dml-exec-state.cc@407
PS2, Line 407:   for (auto iter = per_partition_status_.begin();
> Same here
Done


http://gerrit.cloudera.org:8080/#/c/10855/2/be/src/runtime/dml-exec-state.cc@453
PS2, Line 453: KuduDmlStatsPB* kudu_stats
> This can be a reference, no reason to use a pointer here.
kudu_stats is actually being modified here so using a pointer seems to fit the coding convention.


http://gerrit.cloudera.org:8080/#/c/10855/2/be/src/runtime/query-state.cc
File be/src/runtime/query-state.cc:

http://gerrit.cloudera.org:8080/#/c/10855/2/be/src/runtime/query-state.cc@237
PS2, Line 237: rpc_params
> Wondering if we should rename this to 'exec_rpc_params()'. Else it's confus
Done


http://gerrit.cloudera.org:8080/#/c/10855/2/be/src/runtime/query-state.cc@260
PS2, Line 260: faststring
> Can you comment in the code, on why you used 'faststring' vs. 'Slice'?
Done


http://gerrit.cloudera.org:8080/#/c/10855/2/be/src/runtime/query-state.cc@290
PS2, Line 290: kudu::Status rpc_status = proxy->ReportExecStatus(req, &resp, &rpc_controller);
> Why don't we retry on failures? That's a behavioral change from our current
We do retry when the server is too busy. There is currently no timeout for this RPC so we don't have the same class of issue where the RPC times out.

Do you think it may be better to revert to the behavior of having a timeout and then retrying ?


http://gerrit.cloudera.org:8080/#/c/10855/2/be/src/runtime/runtime-state.h
File be/src/runtime/runtime-state.h:

http://gerrit.cloudera.org:8080/#/c/10855/2/be/src/runtime/runtime-state.h@30
PS2, Line 30: #include "util/error-util-internal.h"
> Is this needed here?
Needed for ErrorLogMap below.


http://gerrit.cloudera.org:8080/#/c/10855/2/be/src/runtime/runtime-state.h@228
PS2, Line 228: ReportExecStatusRequestPB* exec_status
> Instead of tying an RPC specific member to this function, why not get it as
Done


http://gerrit.cloudera.org:8080/#/c/10855/2/be/src/runtime/runtime-state.cc
File be/src/runtime/runtime-state.cc:

http://gerrit.cloudera.org:8080/#/c/10855/2/be/src/runtime/runtime-state.cc@209
PS2, Line 209: for (auto iter = error_log_.begin(); iter != error_log_.end(); ++iter) {
> Range based for-loop with const ref.
Done


http://gerrit.cloudera.org:8080/#/c/10855/2/be/src/service/control-service.cc
File be/src/service/control-service.cc:

http://gerrit.cloudera.org:8080/#/c/10855/2/be/src/service/control-service.cc@37
PS2, Line 37: queue_limit_msg
> I think we use all caps for static const members
Done


http://gerrit.cloudera.org:8080/#/c/10855/2/be/src/service/control-service.cc@39
PS2, Line 39: 50MB
> How did you arrive at this number?
Assuming about 1KB per runtime profile per query (after IMPALA-4036 is fixed), with a cluster of 500 nodes, running 100 concurrent queries.


http://gerrit.cloudera.org:8080/#/c/10855/2/be/src/service/control-service.cc@71
PS2, Line 71:   // Release the memory against the control service's memory tracker.
            :   mem_tracker_->Release(rpc_context->GetTransferSize());
> Shouldn't we release towards the end; i.e. after UpdateBackendExecStatus() 
Good point. I was thinking of transferring it to another MemTracker but this seems like unnecessary complication.


http://gerrit.cloudera.org:8080/#/c/10855/2/be/src/service/control-service.cc@96
PS2, Line 96: deserializes
> nit: deserialize
Done


http://gerrit.cloudera.org:8080/#/c/10855/2/be/src/service/control-service.cc@97
PS2, Line 97: so there may not be any
> "... so an empty thrift profile is valid."
Done


http://gerrit.cloudera.org:8080/#/c/10855/2/be/src/service/control-service.cc@99
PS2, Line 99:   if (LIKELY(instance_exec_status.has_thrift_profile_sidecar_idx())) {
> We should mention in a comment that the RuntimeProfile is a Thrift serializ
Done


http://gerrit.cloudera.org:8080/#/c/10855/2/be/src/service/control-service.cc@100
PS2, Line 100: kudu::Slice thrift_profile_slice;
> Any reason we serialize as a kudu::faststring and deserialize as a Slice? I
This avoids doing an extra copying into the faststring. The payload is already in the sidecar so no point in copying it out.

For the send side, we need to use a faststring to transfer ownership of the buffer to the RPC layer. This makes handling of RPC timeout easier due to KUDU-2011.


http://gerrit.cloudera.org:8080/#/c/10855/2/be/src/service/control-service.cc@112
PS2, Line 112: dummy_profile
> empty_profile
Done


http://gerrit.cloudera.org:8080/#/c/10855/2/be/src/service/control-service.cc@117
PS2, Line 117: 
> Can we call RpcContext::DiscardTransfer() at this point to free up the memo
It's possible but it makes it slightly more tricky to update the MemTracker. Not sure if it's worth it.


http://gerrit.cloudera.org:8080/#/c/10855/2/be/src/util/error-util-internal.h
File be/src/util/error-util-internal.h:

http://gerrit.cloudera.org:8080/#/c/10855/2/be/src/util/error-util-internal.h@26
PS2, Line 26: 
> Why is this new file necessary? Also, it seems to be included by other file
There is circular dependency between control_service.pb.h and some files in the kudu directory which is pre-requisite for the protobuf generation code.


http://gerrit.cloudera.org:8080/#/c/10855/2/be/src/util/error-util-internal.h@28
PS2, Line 28: std::map<TErrorCode::type, ErrorLogEntryPB>
> Consider making this a google::protobuf::Map<> type.
I suppose it's okay to not maintain a deterministic order when iterating through an ErrorLogMap. This will cause non-deterministic order for the output of different errors from PrintErrorMap(). I looked through the callers in RuntimeState() and I suppose it's acceptable. Not sure if it's breaking any compatibility ?!


http://gerrit.cloudera.org:8080/#/c/10855/2/be/src/util/error-util.cc
File be/src/util/error-util.cc:

http://gerrit.cloudera.org:8080/#/c/10855/2/be/src/util/error-util.cc@144
PS2, Line 144: for (auto iter = entry.messages().begin(); iter != entry.messages().end(); ++iter) {
> range based for-loop with const ref.
Done


http://gerrit.cloudera.org:8080/#/c/10855/2/bin/bootstrap_toolchain.py
File bin/bootstrap_toolchain.py:

http://gerrit.cloudera.org:8080/#/c/10855/2/bin/bootstrap_toolchain.py@432
PS2, Line 432: 5.0.1-asserts-p1
> Please add the reason for this bump to the commit message.
Not needed after rebase.


http://gerrit.cloudera.org:8080/#/c/10855/2/bin/impala-config.sh
File bin/impala-config.sh:

http://gerrit.cloudera.org:8080/#/c/10855/2/bin/impala-config.sh@71
PS2, Line 71: export IMPALA_TOOLCHAIN_BUILD_ID=146-f2d5380be6
> Please mention the reasons behind all these version bumps int he commit mes
Not needed in new change.


http://gerrit.cloudera.org:8080/#/c/10855/2/common/protobuf/control_service.proto
File common/protobuf/control_service.proto:

http://gerrit.cloudera.org:8080/#/c/10855/2/common/protobuf/control_service.proto@156
PS2, Line 156: instance_exec_statu
> nit: instance_exec_status
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/10855
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7638583b433dcac066b87198e448743d90415ebe
Gerrit-Change-Number: 10855
Gerrit-PatchSet: 2
Gerrit-Owner: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>
Gerrit-Comment-Date: Tue, 24 Jul 2018 23:58:23 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-7213, IMPALA-7241: Port ReportExecStatus() RPC to use KRPC

Posted by "Thomas Marshall (Code Review)" <ge...@cloudera.org>.
Thomas Marshall has posted comments on this change. ( http://gerrit.cloudera.org:8080/10855 )

Change subject: IMPALA-7213, IMPALA-7241: Port ReportExecStatus() RPC to use KRPC
......................................................................


Patch Set 14: Code-Review+1

(1 comment)

http://gerrit.cloudera.org:8080/#/c/10855/13/common/protobuf/control_service.proto
File common/protobuf/control_service.proto:

http://gerrit.cloudera.org:8080/#/c/10855/13/common/protobuf/control_service.proto@50
PS13, Line 50:   optional KuduDmlStatsPB kudu_stats = 3;
> Looking at (https://gerrit.cloudera.org/#/c/4985/1/common/thrift/ImpalaInte
Yeah, I think the rename makes sense to do, though not a big deal, and otherwise its not much use keeping a TODO that we don't know what it means.



-- 
To view, visit http://gerrit.cloudera.org:8080/10855
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7638583b433dcac066b87198e448743d90415ebe
Gerrit-Change-Number: 10855
Gerrit-PatchSet: 14
Gerrit-Owner: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Michal Ostrowski <mo...@cloudera.com>
Gerrit-Reviewer: Thomas Marshall <th...@cmu.edu>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Mon, 08 Oct 2018 20:51:27 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-7213, IMPALA-7241: Port ReportExecStatus() RPC to use KRPC

Posted by "Michael Ho (Code Review)" <ge...@cloudera.org>.
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/10855 )

Change subject: IMPALA-7213, IMPALA-7241: Port ReportExecStatus() RPC to use KRPC
......................................................................


Patch Set 16:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/10855/16/be/src/rpc/rpc-mgr.h
File be/src/rpc/rpc-mgr.h:

http://gerrit.cloudera.org:8080/#/c/10855/16/be/src/rpc/rpc-mgr.h@146
PS16, Line 146: service_name
> admittedly it's clever that you were able to hack the username in order to 
I didn't realize the existence of static_service_name(). Thanks for pointing that out. I added a new template parameter S for this as the static_service_name or service_name doesn't seem available from the proxy.

I kind of implemented your suggestion for enum in this patch by defining GetProxy() in DataStreamService and ControlService respectively.


http://gerrit.cloudera.org:8080/#/c/10855/16/be/src/runtime/coordinator-backend-state.cc
File be/src/runtime/coordinator-backend-state.cc:

http://gerrit.cloudera.org:8080/#/c/10855/16/be/src/runtime/coordinator-backend-state.cc@506
PS16, Line 506:   {
> why is this extra indentation block here? doesn't seem like there is any RA
Fixed. Forgot to revert the change after undoing the move for exec_summary_->lock


http://gerrit.cloudera.org:8080/#/c/10855/16/be/src/runtime/fragment-instance-state.h
File be/src/runtime/fragment-instance-state.h:

http://gerrit.cloudera.org:8080/#/c/10855/16/be/src/runtime/fragment-instance-state.h@107
PS16, Line 107: const
> nit: const non-reference parameters dont make much sense
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/10855
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7638583b433dcac066b87198e448743d90415ebe
Gerrit-Change-Number: 10855
Gerrit-PatchSet: 16
Gerrit-Owner: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Michal Ostrowski <mo...@cloudera.com>
Gerrit-Reviewer: Thomas Marshall <th...@cmu.edu>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Fri, 12 Oct 2018 06:12:02 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-7213, IMPALA-7241: Port ReportExecStatus() RPC to use KRPC

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/10855 )

Change subject: IMPALA-7213, IMPALA-7241: Port ReportExecStatus() RPC to use KRPC
......................................................................


Patch Set 14:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/979/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/10855
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7638583b433dcac066b87198e448743d90415ebe
Gerrit-Change-Number: 10855
Gerrit-PatchSet: 14
Gerrit-Owner: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Michal Ostrowski <mo...@cloudera.com>
Gerrit-Reviewer: Thomas Marshall <th...@cmu.edu>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Mon, 08 Oct 2018 06:48:49 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-7213, IMPALA-7241: Port ReportExecStatus() RPC to use KRPC

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/10855 )

Change subject: IMPALA-7213, IMPALA-7241: Port ReportExecStatus() RPC to use KRPC
......................................................................


Patch Set 13:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/10855/13/be/src/runtime/coordinator-backend-state.cc
File be/src/runtime/coordinator-backend-state.cc:

http://gerrit.cloudera.org:8080/#/c/10855/13/be/src/runtime/coordinator-backend-state.cc@523
PS13, Line 523:           << " node_id=" << node_id << " instance_id=" << PrintId(exec_params_.instance_id)
line too long (91 > 90)


http://gerrit.cloudera.org:8080/#/c/10855/13/be/src/runtime/coordinator-backend-state.cc@530
PS13, Line 530:       if (rows_counter != nullptr) instance_stats.__set_cardinality(rows_counter->value());
line too long (91 > 90)



-- 
To view, visit http://gerrit.cloudera.org:8080/10855
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7638583b433dcac066b87198e448743d90415ebe
Gerrit-Change-Number: 10855
Gerrit-PatchSet: 13
Gerrit-Owner: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Michal Ostrowski <mo...@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>
Gerrit-Reviewer: Thomas Marshall <th...@cmu.edu>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Wed, 19 Sep 2018 23:15:24 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-7213: Port ReportExecStatus() RPC to use KRPC

Posted by "Michael Ho (Code Review)" <ge...@cloudera.org>.
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/10855 )

Change subject: IMPALA-7213: Port ReportExecStatus() RPC to use KRPC
......................................................................


Patch Set 2:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/10855/2/be/src/runtime/query-state.cc
File be/src/runtime/query-state.cc:

http://gerrit.cloudera.org:8080/#/c/10855/2/be/src/runtime/query-state.cc@263
PS2, Line 263: serialize_status.ok()
Need a test case for serialization failure. May be global debug action is useful ?


http://gerrit.cloudera.org:8080/#/c/10855/2/be/src/runtime/query-state.cc@303
PS2, Line 303: !RpcMgr::IsServerTooBusy(rpc_controller))
Need a test case for this.


http://gerrit.cloudera.org:8080/#/c/10855/2/be/src/service/control-service.cc
File be/src/service/control-service.cc:

http://gerrit.cloudera.org:8080/#/c/10855/2/be/src/service/control-service.cc@107
PS2, Line 107: (UNLIKELY(!deserialize_status.ok())
Need a test case for deserialization error.



-- 
To view, visit http://gerrit.cloudera.org:8080/10855
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7638583b433dcac066b87198e448743d90415ebe
Gerrit-Change-Number: 10855
Gerrit-PatchSet: 2
Gerrit-Owner: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>
Gerrit-Comment-Date: Tue, 03 Jul 2018 22:30:11 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-7213: Port ReportExecStatus() RPC to use KRPC

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/10855 )

Change subject: IMPALA-7213: Port ReportExecStatus() RPC to use KRPC
......................................................................


Patch Set 7:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/242/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/10855
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7638583b433dcac066b87198e448743d90415ebe
Gerrit-Change-Number: 10855
Gerrit-PatchSet: 7
Gerrit-Owner: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Wed, 08 Aug 2018 18:15:37 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-7213, IMPALA-7241: Port ReportExecStatus() RPC to use KRPC

Posted by "Michael Ho (Code Review)" <ge...@cloudera.org>.
Hello Thomas Marshall, Todd Lipcon, Tim Armstrong, Bikramjeet Vig, Impala Public Jenkins, Michal Ostrowski, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/10855

to look at the new patch set (#18).

Change subject: IMPALA-7213, IMPALA-7241: Port ReportExecStatus() RPC to use KRPC
......................................................................

IMPALA-7213, IMPALA-7241: Port ReportExecStatus() RPC to use KRPC

This change converts ReportExecStatus() RPC from thrift
based RPC to KRPC. This is done in part of the preparation
for fixing IMPALA-2990 as we can take advantage of TCP connection
multiplexing in KRPC to avoid overwhelming the coordinator
with too many connections by reducing the number of TCP connection
to one for each executor.

This patch also introduces a new service pool for all query execution
control related RPCs in the future so that control commands from
coordinators aren't blocked by long-running DataStream services' RPCs.
To avoid unnecessary delays due to sharing the network connections
between DataStream service and Control service, this change added the
service name as part of the user credentials for the ConnectionId
so each service will use a separate connection.

The majority of this patch is mechanical conversion of some Thrift
structures used in ReportExecStatus() RPC to Protobuf. Note that the
runtime profile is still retained as a Thrift structure as Impala
clients will still fetch query profiles using Thrift RPCs. This also
avoids duplicating the serialization implementation in both Thrift
and Protobuf for the runtime profile. The Thrift runtime profiles
are serialized and sent as a sidecar in ReportExecStatus() RPC.

This patch also fixes IMPALA-7241 which may lead to duplicated
dml stats being applied. The fix is by adding a monotonically
increasing version number for fragment instances' reports. The
coordinator will ignore any report smaller than or equal to the
version in the last report.

Testing done:
1. Exhaustive build.
2. Added some targeted test cases for profile serialization failure
   and RPC retries/timeout.

Change-Id: I7638583b433dcac066b87198e448743d90415ebe
---
M be/src/benchmarks/expr-benchmark.cc
M be/src/catalog/catalog-util.cc
M be/src/common/global-flags.cc
M be/src/exec/data-sink.cc
M be/src/exec/data-sink.h
M be/src/exec/hbase-table-sink.cc
M be/src/exec/hdfs-parquet-table-writer.cc
M be/src/exec/hdfs-parquet-table-writer.h
M be/src/exec/hdfs-table-sink.cc
M be/src/exec/hdfs-table-writer.cc
M be/src/exec/hdfs-table-writer.h
M be/src/rpc/CMakeLists.txt
M be/src/rpc/jni-thrift-util.h
M be/src/rpc/rpc-mgr-kerberized-test.cc
M be/src/rpc/rpc-mgr-test.cc
M be/src/rpc/rpc-mgr-test.h
M be/src/rpc/rpc-mgr.h
M be/src/rpc/thrift-util-test.cc
M be/src/rpc/thrift-util.h
M be/src/runtime/backend-client.h
M be/src/runtime/coordinator-backend-state.cc
M be/src/runtime/coordinator-backend-state.h
M be/src/runtime/coordinator.cc
M be/src/runtime/coordinator.h
M be/src/runtime/dml-exec-state.cc
M be/src/runtime/dml-exec-state.h
M be/src/runtime/exec-env.cc
M be/src/runtime/exec-env.h
M be/src/runtime/fragment-instance-state.cc
M be/src/runtime/fragment-instance-state.h
M be/src/runtime/krpc-data-stream-sender.cc
M be/src/runtime/query-state.cc
M be/src/runtime/query-state.h
M be/src/runtime/runtime-state.cc
M be/src/runtime/runtime-state.h
M be/src/runtime/test-env.cc
M be/src/scheduling/admission-controller.cc
M be/src/scheduling/scheduler-test-util.cc
M be/src/service/CMakeLists.txt
M be/src/service/client-request-state.cc
M be/src/service/client-request-state.h
A be/src/service/control-service.cc
A be/src/service/control-service.h
M be/src/service/data-stream-service.cc
M be/src/service/data-stream-service.h
M be/src/service/impala-internal-service.cc
M be/src/service/impala-internal-service.h
M be/src/service/impala-server.cc
M be/src/service/impala-server.h
M be/src/testutil/in-process-servers.cc
M be/src/util/container-util.h
A be/src/util/error-util-internal.h
M be/src/util/error-util-test.cc
M be/src/util/error-util.cc
M be/src/util/error-util.h
M be/src/util/runtime-profile.cc
M be/src/util/uid-util.h
M common/protobuf/CMakeLists.txt
M common/protobuf/common.proto
A common/protobuf/control_service.proto
M common/protobuf/data_stream_service.proto
M common/protobuf/row_batch.proto
M common/protobuf/rpc_test.proto
M common/thrift/ImpalaInternalService.thrift
M tests/custom_cluster/test_rpc_timeout.py
65 files changed, 1,298 insertions(+), 769 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/55/10855/18
-- 
To view, visit http://gerrit.cloudera.org:8080/10855
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I7638583b433dcac066b87198e448743d90415ebe
Gerrit-Change-Number: 10855
Gerrit-PatchSet: 18
Gerrit-Owner: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Michal Ostrowski <mo...@cloudera.com>
Gerrit-Reviewer: Thomas Marshall <th...@cmu.edu>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[Impala-ASF-CR] IMPALA-7213, IMPALA-7241: Port ReportExecStatus() RPC to use KRPC

Posted by "Michael Ho (Code Review)" <ge...@cloudera.org>.
Hello Thomas Marshall, Todd Lipcon, Tim Armstrong, Bikramjeet Vig, Impala Public Jenkins, Michal Ostrowski, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/10855

to look at the new patch set (#15).

Change subject: IMPALA-7213, IMPALA-7241: Port ReportExecStatus() RPC to use KRPC
......................................................................

IMPALA-7213, IMPALA-7241: Port ReportExecStatus() RPC to use KRPC

This change converts ReportExecStatus() RPC from thrift
based RPC to KRPC. This is done in part of the preparation
for fixing IMPALA-2990 as we can take advantage of TCP connection
multiplexing in KRPC to avoid overwhelming the coordinator
with too many connections by reducing the number of TCP connection
to one for each executor.

This patch also introduces a new service pool for all query execution
control related RPCs in the future so that control commands from
coordinators aren't blocked by long-running DataStream services' RPCs.
To avoid unnecessary delays due to sharing the network connections
between DataStream service and Control service, this change added the
service name as part of the user credentials for the ConnectionId
so each service will use a separate connection.

The majority of this patch is mechanical conversion of some Thrift
structures used in ReportExecStatus() RPC to Protobuf. Note that the
runtime profile is still retained as a Thrift structure as Impala
clients will still fetch query profiles using Thrift RPCs. This also
avoids duplicating the serialization implementation in both Thrift
and Protobuf for the runtime profile. The Thrift runtime profiles
are serialized and sent as a sidecar in ReportExecStatus() RPC.

This patch also fixes IMPALA-7241 which may lead to duplicated
dml stats being applied. The fix is by adding a monotonically
increasing version number for fragment instances' reports. The
coordinator will ignore any report smaller than or equal to the
version in the last report.

Testing done:
1. Exhaustive build.
2. Added some targeted test cases for profile serialization failure and RPC retries/timeout.

Change-Id: I7638583b433dcac066b87198e448743d90415ebe
---
M be/src/benchmarks/expr-benchmark.cc
M be/src/catalog/catalog-util.cc
M be/src/common/global-flags.cc
M be/src/exec/data-sink.cc
M be/src/exec/data-sink.h
M be/src/exec/hbase-table-sink.cc
M be/src/exec/hdfs-parquet-table-writer.cc
M be/src/exec/hdfs-parquet-table-writer.h
M be/src/exec/hdfs-table-sink.cc
M be/src/exec/hdfs-table-writer.cc
M be/src/exec/hdfs-table-writer.h
M be/src/rpc/CMakeLists.txt
M be/src/rpc/jni-thrift-util.h
M be/src/rpc/rpc-mgr-kerberized-test.cc
M be/src/rpc/rpc-mgr-test.cc
M be/src/rpc/rpc-mgr-test.h
M be/src/rpc/rpc-mgr.h
M be/src/rpc/rpc-mgr.inline.h
M be/src/rpc/thrift-util-test.cc
M be/src/rpc/thrift-util.h
M be/src/runtime/backend-client.h
M be/src/runtime/coordinator-backend-state.cc
M be/src/runtime/coordinator-backend-state.h
M be/src/runtime/coordinator.cc
M be/src/runtime/coordinator.h
M be/src/runtime/dml-exec-state.cc
M be/src/runtime/dml-exec-state.h
M be/src/runtime/exec-env.cc
M be/src/runtime/exec-env.h
M be/src/runtime/fragment-instance-state.cc
M be/src/runtime/fragment-instance-state.h
M be/src/runtime/krpc-data-stream-sender.cc
M be/src/runtime/query-state.cc
M be/src/runtime/query-state.h
M be/src/runtime/runtime-state.cc
M be/src/runtime/runtime-state.h
M be/src/runtime/test-env.cc
M be/src/scheduling/admission-controller.cc
M be/src/scheduling/scheduler-test-util.cc
M be/src/service/CMakeLists.txt
M be/src/service/client-request-state.cc
M be/src/service/client-request-state.h
A be/src/service/control-service.cc
A be/src/service/control-service.h
M be/src/service/data-stream-service.cc
M be/src/service/data-stream-service.h
M be/src/service/impala-internal-service.cc
M be/src/service/impala-internal-service.h
M be/src/service/impala-server.cc
M be/src/service/impala-server.h
M be/src/testutil/in-process-servers.cc
M be/src/util/container-util.h
A be/src/util/error-util-internal.h
M be/src/util/error-util-test.cc
M be/src/util/error-util.cc
M be/src/util/error-util.h
M be/src/util/runtime-profile.cc
M be/src/util/uid-util.h
M common/protobuf/CMakeLists.txt
M common/protobuf/common.proto
A common/protobuf/control_service.proto
M common/protobuf/data_stream_service.proto
M common/protobuf/row_batch.proto
M common/protobuf/rpc_test.proto
M common/thrift/ImpalaInternalService.thrift
M tests/custom_cluster/test_rpc_timeout.py
66 files changed, 1,287 insertions(+), 787 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/55/10855/15
-- 
To view, visit http://gerrit.cloudera.org:8080/10855
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I7638583b433dcac066b87198e448743d90415ebe
Gerrit-Change-Number: 10855
Gerrit-PatchSet: 15
Gerrit-Owner: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Michal Ostrowski <mo...@cloudera.com>
Gerrit-Reviewer: Thomas Marshall <th...@cmu.edu>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[Impala-ASF-CR] IMPALA-7213, IMPALA-7241: Port ReportExecStatus() RPC to use KRPC

Posted by "Michael Ho (Code Review)" <ge...@cloudera.org>.
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/10855 )

Change subject: IMPALA-7213, IMPALA-7241: Port ReportExecStatus() RPC to use KRPC
......................................................................


Patch Set 4:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/10855/4//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/10855/4//COMMIT_MSG@16
PS4, Line 16: This patch also introduces a new service pool for all query execution
            : control related RPCs in the future so that control commands from
            : coordinators aren't blocked by long-running DataStream services' RPCs.
> Thanks for the suggestion. I think it's fine to do it as a follow-up or par
Actually, after IMPALA-7585, we can achieve this by just setting a different username in the UserCredentials for different services. No change in Kudu code is needed.


http://gerrit.cloudera.org:8080/#/c/10855/14/be/src/runtime/fragment-instance-state.h
File be/src/runtime/fragment-instance-state.h:

http://gerrit.cloudera.org:8080/#/c/10855/14/be/src/runtime/fragment-instance-state.h@179
PS14, Line 179:   /// Event sequence tracking the completion of various stages of this fragment instance.
> I imagine you already thought about this and concluded that overflows were 
Good idea. With 64-bit and 1 millisecond reporting interval, it will take about 292471208 years for it to overflow.  With 32-bit values, it takes about 24 days. I believe no legitimate query will run for 24 days but then I guess using 64-bit provides the peace of mind.



-- 
To view, visit http://gerrit.cloudera.org:8080/10855
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7638583b433dcac066b87198e448743d90415ebe
Gerrit-Change-Number: 10855
Gerrit-PatchSet: 4
Gerrit-Owner: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Michal Ostrowski <mo...@cloudera.com>
Gerrit-Reviewer: Thomas Marshall <th...@cmu.edu>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Tue, 09 Oct 2018 17:55:14 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-7213, IMPALA-7241: Port ReportExecStatus() RPC to use KRPC

Posted by "Michael Ho (Code Review)" <ge...@cloudera.org>.
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/10855 )

Change subject: IMPALA-7213, IMPALA-7241: Port ReportExecStatus() RPC to use KRPC
......................................................................


Patch Set 21: Code-Review+2

Carry +2


-- 
To view, visit http://gerrit.cloudera.org:8080/10855
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7638583b433dcac066b87198e448743d90415ebe
Gerrit-Change-Number: 10855
Gerrit-PatchSet: 21
Gerrit-Owner: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Michal Ostrowski <mo...@cloudera.com>
Gerrit-Reviewer: Thomas Marshall <th...@cmu.edu>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Thu, 01 Nov 2018 16:56:11 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-7213, IMPALA-7241: Port ReportExecStatus() RPC to use KRPC

Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Todd Lipcon has posted comments on this change. ( http://gerrit.cloudera.org:8080/10855 )

Change subject: IMPALA-7213, IMPALA-7241: Port ReportExecStatus() RPC to use KRPC
......................................................................


Patch Set 18: Code-Review+1

(1 comment)

http://gerrit.cloudera.org:8080/#/c/10855/18/be/src/runtime/query-state.cc
File be/src/runtime/query-state.cc:

http://gerrit.cloudera.org:8080/#/c/10855/18/be/src/runtime/query-state.cc@363
PS18, Line 363: ERROR
I'd use DFATAL so that it would at least crash in debug builds. (DFATAL turns into ERROR in release builds)



-- 
To view, visit http://gerrit.cloudera.org:8080/10855
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7638583b433dcac066b87198e448743d90415ebe
Gerrit-Change-Number: 10855
Gerrit-PatchSet: 18
Gerrit-Owner: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Michal Ostrowski <mo...@cloudera.com>
Gerrit-Reviewer: Thomas Marshall <th...@cmu.edu>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Wed, 17 Oct 2018 19:42:03 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-7213: Port ReportExecStatus() RPC to use KRPC

Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Todd Lipcon has posted comments on this change. ( http://gerrit.cloudera.org:8080/10855 )

Change subject: IMPALA-7213: Port ReportExecStatus() RPC to use KRPC
......................................................................


Patch Set 4:

(11 comments)

http://gerrit.cloudera.org:8080/#/c/10855/4//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/10855/4//COMMIT_MSG@16
PS4, Line 16: This patch also introduces a new service pool for all query execution
            : control related RPCs in the future so that control commands from
            : coordinators aren't blocked by long-running DataStream services' RPCs.
do we want to consider the ability to put these on separate TCP connections altogether? If they're latency-critical it might be worth doing it so that a control-plane message can jump ahead a 300MB row batch message in the case of big batches.

I think doing so wouldn't be too tough -- just need to change the connection lookup code in KRPC to add another field to the equality check beyond just remote address (eg an opaque "connection class" field or somesuch)


http://gerrit.cloudera.org:8080/#/c/10855/4/be/src/runtime/dml-exec-state.cc
File be/src/runtime/dml-exec-state.cc:

http://gerrit.cloudera.org:8080/#/c/10855/4/be/src/runtime/dml-exec-state.cc@404
PS4, Line 404: void DmlExecState::ToProto(InsertExecStatusPB* dml_status) {
do you want to clear dml_status first just in case? Similarly for other ToProto functions


http://gerrit.cloudera.org:8080/#/c/10855/4/be/src/runtime/dml-exec-state.cc@455
PS4, Line 455:    if (!kudu_stats->has_num_row_errors()) {
             :       kudu_stats->set_num_row_errors(0);
             :     }
I don't think this is necessary- it's not illegal to access an "unset" field. It just returns the default value. In the case that the default value is unspecified, it's what you'd expect (ie 0 for ints)


http://gerrit.cloudera.org:8080/#/c/10855/4/be/src/runtime/query-state.cc
File be/src/runtime/query-state.cc:

http://gerrit.cloudera.org:8080/#/c/10855/4/be/src/runtime/query-state.cc@268
PS4, Line 268: rpc_controller.AddOutboundSidecar(move(sidecar), &sidecar_idx).ok
should this be a CHECK_OK?


http://gerrit.cloudera.org:8080/#/c/10855/4/be/src/runtime/query-state.cc@273
PS4, Line 273: "final"
need a space here


http://gerrit.cloudera.org:8080/#/c/10855/4/be/src/runtime/query-state.cc@274
PS4, Line 274: ERROR
should this be a DFATAL? do you expect this to ever actually happen?


http://gerrit.cloudera.org:8080/#/c/10855/4/be/src/runtime/query-state.cc@293
PS4, Line 293:   kudu::Status rpc_status = proxy->ReportExecStatus(req, &resp, &rpc_controller);
do you want any timeout on this RPC?


http://gerrit.cloudera.org:8080/#/c/10855/4/be/src/service/control-service.cc
File be/src/service/control-service.cc:

http://gerrit.cloudera.org:8080/#/c/10855/4/be/src/service/control-service.cc@72
PS4, Line 72:   // TODO: implement something more efficient here, we're currently
            :   // acquiring/releasing the map lock and doing a map lookup for
            :   // every report (assign each query a local int32_t id and use that to index into a
            :   // vector of ClientRequestStates, w/o lookup or locking?)
seems like an easy fix here is to use a RWLock since we expect that queries start and stop much less frequently than status is reported, right?


http://gerrit.cloudera.org:8080/#/c/10855/4/be/src/service/control-service.cc@87
PS4, Line 87:     mem_tracker_->Release(rpc_context->GetTransferSize());
            :     rpc_context->RespondSuccess();
is everything in this function exception-safe? eg I thought Thrift could occasionally throw exceptions, but maybe your wrappers are already covering for that. If not, you may want to use something like a SCOPED_CLEANUP to ensure that the RPC is responded to and the memory is released.


http://gerrit.cloudera.org:8080/#/c/10855/4/be/src/service/control-service.cc@104
PS4, Line 104:     if (LIKELY(sidecar_status.ok())) {
do you want to warn or even CHECK on this?


http://gerrit.cloudera.org:8080/#/c/10855/4/be/src/util/container-util.h
File be/src/util/container-util.h:

http://gerrit.cloudera.org:8080/#/c/10855/4/be/src/util/container-util.h@147
PS4, Line 147: void MergeMapValues(const google::protobuf::Map<K, V>& src,
If you templatized this on the map type instead of the K and V types, you could probably just use 'auto' below, and then avoid having to include the protobuf headers in this generic util file. It would also likely become generic enough to use with any associative container.



-- 
To view, visit http://gerrit.cloudera.org:8080/10855
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7638583b433dcac066b87198e448743d90415ebe
Gerrit-Change-Number: 10855
Gerrit-PatchSet: 4
Gerrit-Owner: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Wed, 25 Jul 2018 01:57:25 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-7213, IMPALA-7241: Port ReportExecStatus() RPC to use KRPC

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/10855 )

Change subject: IMPALA-7213, IMPALA-7241: Port ReportExecStatus() RPC to use KRPC
......................................................................


Patch Set 19:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/1121/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/10855
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7638583b433dcac066b87198e448743d90415ebe
Gerrit-Change-Number: 10855
Gerrit-PatchSet: 19
Gerrit-Owner: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Michal Ostrowski <mo...@cloudera.com>
Gerrit-Reviewer: Thomas Marshall <th...@cmu.edu>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Sat, 20 Oct 2018 02:26:24 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-7213: Port ReportExecStatus() RPC to use KRPC

Posted by "Michael Ho (Code Review)" <ge...@cloudera.org>.
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/10855 )

Change subject: IMPALA-7213: Port ReportExecStatus() RPC to use KRPC
......................................................................


Patch Set 8:

(22 comments)

http://gerrit.cloudera.org:8080/#/c/10855/5/be/src/exec/data-sink.h
File be/src/exec/data-sink.h:

http://gerrit.cloudera.org:8080/#/c/10855/5/be/src/exec/data-sink.h@100
PS5, Line 100:   static const char* ROOT_PARTITION_KEY;
> can this be a const char* const ROOT_PARTITION_KEY instead? usually static 
Done


http://gerrit.cloudera.org:8080/#/c/10855/5/be/src/exec/hdfs-table-writer.h
File be/src/exec/hdfs-table-writer.h:

http://gerrit.cloudera.org:8080/#/c/10855/5/be/src/exec/hdfs-table-writer.h@91
PS5, Line 91: const InsertS
> style: is this meant to be returning a mutable reference or should this be 
Done


http://gerrit.cloudera.org:8080/#/c/10855/5/be/src/runtime/coordinator-backend-state.cc
File be/src/runtime/coordinator-backend-state.cc:

http://gerrit.cloudera.org:8080/#/c/10855/5/be/src/runtime/coordinator-backend-state.cc@254
PS5, Line 254:   // If this backend completed previously, don't apply the update.
             :   if (IsDone()) return false;
             :   for (const FragmentInstanceExecStatusPB& instance_
> you can use a C++11 style loop with protobuf repeated elements too
Done


http://gerrit.cloudera.org:8080/#/c/10855/5/be/src/runtime/query-state.h
File be/src/runtime/query-state.h:

http://gerrit.cloudera.org:8080/#/c/10855/5/be/src/runtime/query-state.h@276
PS5, Line 276: 
> "... to construct a status report ..."
Done


http://gerrit.cloudera.org:8080/#/c/10855/5/be/src/runtime/query-state.h@277
PS5, Line 277: 
> "If 'fis' is not 'nullptr', the runtime profile ..."
Done


http://gerrit.cloudera.org:8080/#/c/10855/5/be/src/runtime/query-state.cc
File be/src/runtime/query-state.cc:

http://gerrit.cloudera.org:8080/#/c/10855/5/be/src/runtime/query-state.cc@233
PS5, Line 233:   return state == BackendExecState::FINISHED
             :       || state == BackendExecState::CANCELLED
             :       || state == BackendExecState::E
> I thought I saw some utility code for this elsewhere
Good point about adding this as utility function which we don't have one right now. The utility function may not be needed eventually once we convert other ExecFInstance() RPC to KRPC too.


http://gerrit.cloudera.org:8080/#/c/10855/5/be/src/runtime/query-state.cc@243
PS5, Line 243:   DCHECK(!IsTerminalState(backend_exec_state_))
             :       << " Current State: " << BackendExecStateToString(backend_exec_state_)
             :       << " | Current Status: " << query_status_.Get
> and here
Done


http://gerrit.cloudera.org:8080/#/c/10855/5/be/src/runtime/query-state.cc@256
PS5, Line 256: 
> Shouldn't we print the fragment instance ID instead?
Done


http://gerrit.cloudera.org:8080/#/c/10855/5/be/src/runtime/query-state.cc@295
PS5, Line 295: tus serialize_status = serializer->
> The profile is a thrift structure serialized to a string ...
Done


http://gerrit.cloudera.org:8080/#/c/10855/5/be/src/runtime/query-state.cc@306
PS5, Line 306:  Only send updates to insert status if
> We could probably consider moving 'profile_str' to the 'sidecar_buf' to avo
I guess it's a tradeoff because moving the 'profile_str' means having to redo the serialization of the profile on when retrying the RPC. I'm a bit worried about the overhead of doing so once we merge the profile of multiple fragment instances together. Timeout is more likely to occur once the coordinator is under stress.


http://gerrit.cloudera.org:8080/#/c/10855/5/be/src/runtime/query-state.cc@304
PS5, Line 304:     }
             : 
             :     // Only send updates to insert status if fragment is finished, the coordinator waits
             :     // until query execution is done to use them anyhow.
             :     RuntimeState* state = fis->runtime_state();
             :     if (done) {
             :       state->dml_exec_state()->ToProto(instance_status->mutable_insert_exec_status());
             :     }
             : 
             :     // Send new errors to coordinator
             :     state->GetUnreportedErrors(instance_status->mutable_error_log());
             :   }
             : }
             : 
> Can't we just do this once before the loop starts?
Not really. We need to re-arm the sidecar on every retry. The ownership of the buffer is transferred to the OutboundCall so we need to re-arm the sidecar.


http://gerrit.cloudera.org:8080/#/c/10855/5/be/src/runtime/runtime-state.cc
File be/src/runtime/runtime-state.cc:

http://gerrit.cloudera.org:8080/#/c/10855/5/be/src/runtime/runtime-state.cc@200
PS5, Line 200: g_lock_);
> this is weird here because there is no 'exec_status' in this function
Typo. Meant 'new_errors'.


http://gerrit.cloudera.org:8080/#/c/10855/5/be/src/service/control-service.cc
File be/src/service/control-service.cc:

http://gerrit.cloudera.org:8080/#/c/10855/5/be/src/service/control-service.cc@43
PS5, Line 43: control service'
> shouldn't this be "control service's" since there is only one control servi
Done


http://gerrit.cloudera.org:8080/#/c/10855/5/be/src/service/control-service.cc@89
PS5, Line 89: }
> hmm, why is this repeated instead of just optional, then?
That's a direct translation from the existing structure definition.  I believe it was done for multi-threading (or IMPALA-4063 ?) when there could be more than one instances of a given fragment running but the conversion for multi-threading seems to be half done at this point so yes, this could have been an optional instead.

With IMPALA-4063 which consolidates all fragment instances' statuses into a single RPC, we may actually be having more than one entries in instance_exec_status so I will be keeping it as repeated for now.

That said, I believe this DCHECK is actually invalid after some thought. I will remove it and handle the case in which request->instance_exec_status_size() == 0; properly. In fact, Sailesh added a test case for this in this patch here: https://gerrit.cloudera.org/#/c/10813/


http://gerrit.cloudera.org:8080/#/c/10855/5/be/src/service/control-service.cc@117
PS5, Line 117: 
             : 
             : 
             : 
> since both of the return points of thsi function have this same code, I thi
Done. We actually have something like that for DataStreamService.


http://gerrit.cloudera.org:8080/#/c/10855/5/be/src/util/error-util.cc
File be/src/util/error-util.cc:

http://gerrit.cloudera.org:8080/#/c/10855/5/be/src/util/error-util.cc@144
PS5, Line 144: auto
> auto&
Done


http://gerrit.cloudera.org:8080/#/c/10855/5/be/src/util/error-util.cc@172
PS5, Line 172: auto
> auto&
Done


http://gerrit.cloudera.org:8080/#/c/10855/5/common/protobuf/common.proto
File common/protobuf/common.proto:

http://gerrit.cloudera.org:8080/#/c/10855/5/common/protobuf/common.proto@43
PS5, Line 43: 
> do these need to be kept in sync with common/thrift/Metrics.thrift? Worth l
Actually, they aren't needed in this patch anymore as we decided to keep RuntimeProfile in Thrift. They were originally ported when converting TRuntimeProfile to protobuf


http://gerrit.cloudera.org:8080/#/c/10855/5/common/protobuf/control_service.proto
File common/protobuf/control_service.proto:

http://gerrit.cloudera.org:8080/#/c/10855/5/common/protobuf/control_service.proto@153
PS5, Line 153: 
> can you document whether this should be sent incrementally as you go (poten
Yes, this is not idempotent. In fact, I notice that there may lead to inaccurate dml stats when RPC is retried due to timeout and I believe this is already a bug today.

The new patch moves 'insert_exec_status' and 'error_log' into FragmentInstanceExecStatusPB instead. As we do avoid applying any updates for a fragment instance once its final report has been received (see https://github.com/apache/impala/blob/master/be/src/runtime/coordinator-backend-state.cc#L258) so we should be safe from this problem in the new code.


http://gerrit.cloudera.org:8080/#/c/10855/5/common/thrift/ImpalaInternalService.thrift
File common/thrift/ImpalaInternalService.thrift:

http://gerrit.cloudera.org:8080/#/c/10855/5/common/thrift/ImpalaInternalService.thrift@380
PS5, Line 380:   6: optional Types.TNetworkAddress coord_address
> unrelated to this patch but it always bugs me to see changes to thrift fiel
Yes, unfortunately, our compatibility story is really poor. At some point,  we should commit to backward compatibility for Thrift / Protobuf structures.


http://gerrit.cloudera.org:8080/#/c/10855/6/tests/failure/test_failpoints.py
File tests/failure/test_failpoints.py:

http://gerrit.cloudera.org:8080/#/c/10855/6/tests/failure/test_failpoints.py@200
PS6, Line 200: 
> flake8: E502 the backslash is redundant between brackets
Done


http://gerrit.cloudera.org:8080/#/c/10855/6/tests/failure/test_failpoints.py@202
PS6, Line 202: 
> flake8: E231 missing whitespace after ':'
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/10855
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7638583b433dcac066b87198e448743d90415ebe
Gerrit-Change-Number: 10855
Gerrit-PatchSet: 8
Gerrit-Owner: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Wed, 08 Aug 2018 17:39:49 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-7213: Port ReportExecStatus() RPC to use KRPC

Posted by "Michael Ho (Code Review)" <ge...@cloudera.org>.
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/10855 )

Change subject: IMPALA-7213: Port ReportExecStatus() RPC to use KRPC
......................................................................


Patch Set 8:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/10855/8/be/src/kudu/rpc/rpc_context.h
File be/src/kudu/rpc/rpc_context.h:

http://gerrit.cloudera.org:8080/#/c/10855/8/be/src/kudu/rpc/rpc_context.h@166
PS8, Line 166: Status GetInboundSidecar(int idx, Slice* slice) const;
Changes done on Kudu side already: https://github.com/apache/kudu/commit/37d0c35bcf43aef92fdde82c629f732f1466d8ae



-- 
To view, visit http://gerrit.cloudera.org:8080/10855
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7638583b433dcac066b87198e448743d90415ebe
Gerrit-Change-Number: 10855
Gerrit-PatchSet: 8
Gerrit-Owner: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Wed, 08 Aug 2018 18:14:20 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-7213, IMPALA-7241: Port ReportExecStatus() RPC to use KRPC

Posted by "Michael Ho (Code Review)" <ge...@cloudera.org>.
Michael Ho has removed Dan Hecht from this change.  ( http://gerrit.cloudera.org:8080/10855 )

Change subject: IMPALA-7213, IMPALA-7241: Port ReportExecStatus() RPC to use KRPC
......................................................................


Removed reviewer Dan Hecht.
-- 
To view, visit http://gerrit.cloudera.org:8080/10855
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: deleteReviewer
Gerrit-Change-Id: I7638583b433dcac066b87198e448743d90415ebe
Gerrit-Change-Number: 10855
Gerrit-PatchSet: 13
Gerrit-Owner: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Michal Ostrowski <mo...@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>
Gerrit-Reviewer: Thomas Marshall <th...@cmu.edu>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[Impala-ASF-CR] IMPALA-7213, IMPALA-7241: Port ReportExecStatus() RPC to use KRPC

Posted by "Michael Ho (Code Review)" <ge...@cloudera.org>.
Hello Sailesh Mukil, Todd Lipcon, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/10855

to look at the new patch set (#10).

Change subject: IMPALA-7213, IMPALA-7241: Port ReportExecStatus() RPC to use KRPC
......................................................................

IMPALA-7213, IMPALA-7241: Port ReportExecStatus() RPC to use KRPC

This change converts ReportExecStatus() RPC from thrift
based RPC to KRPC. This is done in part of the preparation
for fixing IMPALA-2990 as we can take advantage of TCP connection
multiplexing in KRPC to avoid overwhelming the coordinator
with too many connections by reducing the number of TCP connection
to one for each executor.

This patch also introduces a new service pool for all query execution
control related RPCs in the future so that control commands from
coordinators aren't blocked by long-running DataStream services' RPCs.
The majority of this patch is mechanical conversion of some Thrift
structures used in ReportExecStatus() RPC to Protobuf. Note that the
runtime profile is still retained as a Thrift structure as Impala
clients will still fetch query profiles using Thrift RPCs. This also
avoids duplicating the serialization implementation in both Thrift
and Protobuf for the runtime profile. The Thrift runtime profiles
are serialized and sent as a sidecar in ReportExecStatus() RPC.

This patch also fixes IMPALA-7241 which may lead to duplicated
dml stats being applied. The fix is by adding a monotonically
increasing version number for fragment instances' reports. The
coordinator will ignore any report smaller than or equal to the
version in the last report.

Testing done: core build. Added some targeted test cases for profile
serialization failure and RPC retry.

Change-Id: I7638583b433dcac066b87198e448743d90415ebe
---
M be/src/benchmarks/expr-benchmark.cc
M be/src/catalog/catalog-util.cc
M be/src/exec/data-sink.cc
M be/src/exec/data-sink.h
M be/src/exec/hbase-table-sink.cc
M be/src/exec/hdfs-parquet-table-writer.cc
M be/src/exec/hdfs-parquet-table-writer.h
M be/src/exec/hdfs-table-sink.cc
M be/src/exec/hdfs-table-writer.cc
M be/src/exec/hdfs-table-writer.h
M be/src/rpc/jni-thrift-util.h
M be/src/rpc/thrift-util-test.cc
M be/src/rpc/thrift-util.h
M be/src/runtime/backend-client.h
M be/src/runtime/coordinator-backend-state.cc
M be/src/runtime/coordinator-backend-state.h
M be/src/runtime/coordinator.cc
M be/src/runtime/coordinator.h
M be/src/runtime/dml-exec-state.cc
M be/src/runtime/dml-exec-state.h
M be/src/runtime/exec-env.cc
M be/src/runtime/exec-env.h
M be/src/runtime/fragment-instance-state.cc
M be/src/runtime/fragment-instance-state.h
M be/src/runtime/query-state.cc
M be/src/runtime/query-state.h
M be/src/runtime/runtime-state.cc
M be/src/runtime/runtime-state.h
M be/src/runtime/test-env.cc
M be/src/scheduling/admission-controller.cc
M be/src/scheduling/scheduler-test-util.cc
M be/src/service/CMakeLists.txt
M be/src/service/client-request-state.cc
M be/src/service/client-request-state.h
A be/src/service/control-service.cc
A be/src/service/control-service.h
M be/src/service/data-stream-service.cc
M be/src/service/data-stream-service.h
M be/src/service/impala-internal-service.cc
M be/src/service/impala-internal-service.h
M be/src/service/impala-server.cc
M be/src/service/impala-server.h
M be/src/testutil/in-process-servers.cc
M be/src/util/container-util.h
A be/src/util/error-util-internal.h
M be/src/util/error-util-test.cc
M be/src/util/error-util.cc
M be/src/util/error-util.h
M be/src/util/runtime-profile.cc
M be/src/util/uid-util.h
M bin/impala-config.sh
M common/protobuf/CMakeLists.txt
M common/protobuf/common.proto
A common/protobuf/control_service.proto
M common/protobuf/data_stream_service.proto
M common/protobuf/row_batch.proto
M common/protobuf/rpc_test.proto
M common/thrift/ImpalaInternalService.thrift
M tests/custom_cluster/test_rpc_exception.py
M tests/failure/test_failpoints.py
60 files changed, 1,110 insertions(+), 673 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/55/10855/10
-- 
To view, visit http://gerrit.cloudera.org:8080/10855
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I7638583b433dcac066b87198e448743d90415ebe
Gerrit-Change-Number: 10855
Gerrit-PatchSet: 10
Gerrit-Owner: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[Impala-ASF-CR] IMPALA-7213: Port ReportExecStatus() RPC to use KRPC

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/10855 )

Change subject: IMPALA-7213: Port ReportExecStatus() RPC to use KRPC
......................................................................


Patch Set 3:

Build Started https://jenkins.impala.io/job/gerrit-code-review-checks/44/ 

Running initial code review checks. This is experimental - please report any issues to tarmstrong@cloudera.com or on this JIRA: IMPALA-7317


-- 
To view, visit http://gerrit.cloudera.org:8080/10855
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7638583b433dcac066b87198e448743d90415ebe
Gerrit-Change-Number: 10855
Gerrit-PatchSet: 3
Gerrit-Owner: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>
Gerrit-Comment-Date: Tue, 24 Jul 2018 23:58:23 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-7213, IMPALA-7241: Port ReportExecStatus() RPC to use KRPC

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/10855 )

Change subject: IMPALA-7213, IMPALA-7241: Port ReportExecStatus() RPC to use KRPC
......................................................................


Patch Set 10:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/264/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/10855
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7638583b433dcac066b87198e448743d90415ebe
Gerrit-Change-Number: 10855
Gerrit-PatchSet: 10
Gerrit-Owner: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Thu, 09 Aug 2018 03:20:19 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-7213, IMPALA-7241: Port ReportExecStatus() RPC to use KRPC

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/10855 )

Change subject: IMPALA-7213, IMPALA-7241: Port ReportExecStatus() RPC to use KRPC
......................................................................


Patch Set 14: Code-Review+1

(6 comments)

I wasn't able to do a particularly deep review but I don't think that should hold things up. Looks great at a high level and I'm excited to get it in. Had a few minor suggestions but nothing critical.

http://gerrit.cloudera.org:8080/#/c/10855/14/be/src/runtime/coordinator-backend-state.cc
File be/src/runtime/coordinator-backend-state.cc:

http://gerrit.cloudera.org:8080/#/c/10855/14/be/src/runtime/coordinator-backend-state.cc@275
PS14, Line 275:   unique_lock<mutex> lock(lock_);
Can we document the lock order in coordinator.h (which already references ExecSummary::lock)


http://gerrit.cloudera.org:8080/#/c/10855/14/be/src/runtime/fragment-instance-state.h
File be/src/runtime/fragment-instance-state.h:

http://gerrit.cloudera.org:8080/#/c/10855/14/be/src/runtime/fragment-instance-state.h@177
PS14, Line 177: Monontonically
Typo: Monotonically


http://gerrit.cloudera.org:8080/#/c/10855/14/be/src/runtime/fragment-instance-state.h@179
PS14, Line 179:   int32_t report_seq_no_ = 0;
I imagine you already thought about this and concluded that overflows were not possible, but I'm wondering why not just make it an int64_t anyway so that it's super-obvious that it's not possible.


http://gerrit.cloudera.org:8080/#/c/10855/14/be/src/runtime/fragment-instance-state.cc
File be/src/runtime/fragment-instance-state.cc:

http://gerrit.cloudera.org:8080/#/c/10855/14/be/src/runtime/fragment-instance-state.cc@396
PS14, Line 396:   DFAKE_SCOPED_LOCK(report_status_lock_);
This is kind of cool.


http://gerrit.cloudera.org:8080/#/c/10855/14/be/src/testutil/in-process-servers.cc
File be/src/testutil/in-process-servers.cc:

http://gerrit.cloudera.org:8080/#/c/10855/14/be/src/testutil/in-process-servers.cc@51
PS14, Line 51: ;
Extra semicolon


http://gerrit.cloudera.org:8080/#/c/10855/14/be/src/testutil/in-process-servers.cc@51
PS14, Line 51:  FLAGS_krpc_port = 
I guess we can't set this to 0 to automatically choose an ephemeral port? We had some issues in the past with flaky tests because of port conflicts.



-- 
To view, visit http://gerrit.cloudera.org:8080/10855
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7638583b433dcac066b87198e448743d90415ebe
Gerrit-Change-Number: 10855
Gerrit-PatchSet: 14
Gerrit-Owner: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Michal Ostrowski <mo...@cloudera.com>
Gerrit-Reviewer: Thomas Marshall <th...@cmu.edu>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Mon, 08 Oct 2018 19:35:27 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-7213, IMPALA-7241: Port ReportExecStatus() RPC to use KRPC

Posted by "Michael Ho (Code Review)" <ge...@cloudera.org>.
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/10855 )

Change subject: IMPALA-7213, IMPALA-7241: Port ReportExecStatus() RPC to use KRPC
......................................................................


Patch Set 18:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/10855/18/be/src/runtime/query-state.cc
File be/src/runtime/query-state.cc:

http://gerrit.cloudera.org:8080/#/c/10855/18/be/src/runtime/query-state.cc@363
PS18, Line 363: ERROR
> I'd use DFATAL so that it would at least crash in debug builds. (DFATAL tur
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/10855
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7638583b433dcac066b87198e448743d90415ebe
Gerrit-Change-Number: 10855
Gerrit-PatchSet: 18
Gerrit-Owner: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Michal Ostrowski <mo...@cloudera.com>
Gerrit-Reviewer: Thomas Marshall <th...@cmu.edu>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Sat, 20 Oct 2018 01:52:28 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-7213, IMPALA-7241: Port ReportExecStatus() RPC to use KRPC

Posted by "Michael Ho (Code Review)" <ge...@cloudera.org>.
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/10855 )

Change subject: IMPALA-7213, IMPALA-7241: Port ReportExecStatus() RPC to use KRPC
......................................................................


Patch Set 18:

(8 comments)

http://gerrit.cloudera.org:8080/#/c/10855/17//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/10855/17//COMMIT_MSG@40
PS17, Line 40: 2. Added some targeted test cases for profile serialization failure
> nit: long line
Done


http://gerrit.cloudera.org:8080/#/c/10855/17/be/src/rpc/rpc-mgr.inline.h
File be/src/rpc/rpc-mgr.inline.h:

http://gerrit.cloudera.org:8080/#/c/10855/17/be/src/rpc/rpc-mgr.inline.h@35
PS17, Line 35:     std::unique_ptr<P>* proxy) {
> hrm, it still feels like we are going through a lot of gymnastics to avoid 
OK. Reverted to sharing the connection for now in this patch. The kudu patch (https://gerrit.cloudera.org/#/c/11681/) is already out for review.


http://gerrit.cloudera.org:8080/#/c/10855/14/be/src/rpc/thrift-util-test.cc
File be/src/rpc/thrift-util-test.cc:

http://gerrit.cloudera.org:8080/#/c/10855/14/be/src/rpc/thrift-util-test.cc@58
PS14, Line 58: 
> Should we add a test for SerializeToString as well?
Done


http://gerrit.cloudera.org:8080/#/c/10855/17/be/src/runtime/query-state.cc
File be/src/runtime/query-state.cc:

http://gerrit.cloudera.org:8080/#/c/10855/17/be/src/runtime/query-state.cc@356
PS17, Line 356: 
> should we handle errors like these and log it instead?
Yes, while Todd previously commented that this should be a CHECK(), I think it's better to not crash Impala on a non-fatal issue. Admittedly, the system is most likely in a bad state if this fails but not being able to send the profile shouldn't be fatal either.


http://gerrit.cloudera.org:8080/#/c/10855/16/be/src/service/control-service.h
File be/src/service/control-service.h:

http://gerrit.cloudera.org:8080/#/c/10855/16/be/src/service/control-service.h@52
PS16, Line 52: e*
> nit: with
Done


http://gerrit.cloudera.org:8080/#/c/10855/14/be/src/testutil/in-process-servers.cc
File be/src/testutil/in-process-servers.cc:

http://gerrit.cloudera.org:8080/#/c/10855/14/be/src/testutil/in-process-servers.cc@51
PS14, Line 51:  // Thrift server c
> can you document the reason here? it was not clear to me as to why only thi
Done


http://gerrit.cloudera.org:8080/#/c/10855/14/be/src/util/runtime-profile.cc
File be/src/util/runtime-profile.cc:

http://gerrit.cloudera.org:8080/#/c/10855/14/be/src/util/runtime-profile.cc@251
PS14, Line 251: if (UNLIKELY(nodes.size()) == 0) return;
> does this only happen when thrift de-serialization fails?
Or serialization failure on the executor side.


http://gerrit.cloudera.org:8080/#/c/10855/14/common/protobuf/control_service.proto
File common/protobuf/control_service.proto:

http://gerrit.cloudera.org:8080/#/c/10855/14/common/protobuf/control_service.proto@27
PS14, Line 27: message ParquetDmlStatsPB {
> nit: maybe retain the comment in thrift file
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/10855
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7638583b433dcac066b87198e448743d90415ebe
Gerrit-Change-Number: 10855
Gerrit-PatchSet: 18
Gerrit-Owner: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Michal Ostrowski <mo...@cloudera.com>
Gerrit-Reviewer: Thomas Marshall <th...@cmu.edu>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Sun, 14 Oct 2018 02:21:32 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-7213, IMPALA-7241: Port ReportExecStatus() RPC to use KRPC

Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Todd Lipcon has posted comments on this change. ( http://gerrit.cloudera.org:8080/10855 )

Change subject: IMPALA-7213, IMPALA-7241: Port ReportExecStatus() RPC to use KRPC
......................................................................


Patch Set 11:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/10855/10/be/src/runtime/query-state.cc
File be/src/runtime/query-state.cc:

http://gerrit.cloudera.org:8080/#/c/10855/10/be/src/runtime/query-state.cc@287
PS10, Line 287: 
> To answer the question about your second race, I think it's impossible toda
Ah, I missed that we join on the reporter thread first.

Given that, does the report sequence number generation need to be atomic? It seems you expect only a single thread to be reporting at once, in which case non-atomic int would be fine.

On a similar note, the interaction between 'done' and the sequence numbers is a bit strange to me. It seems like, if we trust our sequence number generation is free of the above race, then we dont need the special handling where "done" always wins out over the sequence-number checks, right? i.e we already guarantee that the "done" message has a higher sequence than any prior report?

I also wonder whether it is feasible to add an assertion for non-concurrency of report-sending by using a DFAKE_MUTEX inside the ReportExecStatusAux function. Then if someone accidentally breaks these assumptions we'll get a clear failure in debug builds instead of some more rare hard-to-debug stuck query.



-- 
To view, visit http://gerrit.cloudera.org:8080/10855
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7638583b433dcac066b87198e448743d90415ebe
Gerrit-Change-Number: 10855
Gerrit-PatchSet: 11
Gerrit-Owner: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Wed, 22 Aug 2018 18:47:07 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-7213, IMPALA-7241: Port ReportExecStatus() RPC to use KRPC

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/10855 )

Change subject: IMPALA-7213, IMPALA-7241: Port ReportExecStatus() RPC to use KRPC
......................................................................


Patch Set 19:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/10855/19/be/src/runtime/query-state.cc
File be/src/runtime/query-state.cc:

http://gerrit.cloudera.org:8080/#/c/10855/19/be/src/runtime/query-state.cc@363
PS19, Line 363:         LOG(DFATAL) << FromKuduStatus(sidecar_status, "Failed to add sidecar").GetDetail();
line too long (91 > 90)



-- 
To view, visit http://gerrit.cloudera.org:8080/10855
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7638583b433dcac066b87198e448743d90415ebe
Gerrit-Change-Number: 10855
Gerrit-PatchSet: 19
Gerrit-Owner: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Michal Ostrowski <mo...@cloudera.com>
Gerrit-Reviewer: Thomas Marshall <th...@cmu.edu>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Sat, 20 Oct 2018 12:12:38 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-7213, IMPALA-7241: Port ReportExecStatus() RPC to use KRPC

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/10855 )

Change subject: IMPALA-7213, IMPALA-7241: Port ReportExecStatus() RPC to use KRPC
......................................................................

IMPALA-7213, IMPALA-7241: Port ReportExecStatus() RPC to use KRPC

This change converts ReportExecStatus() RPC from thrift
based RPC to KRPC. This is done in part of the preparation
for fixing IMPALA-2990 as we can take advantage of TCP connection
multiplexing in KRPC to avoid overwhelming the coordinator
with too many connections by reducing the number of TCP connection
to one for each executor.

This patch also introduces a new service pool for all query execution
control related RPCs in the future so that control commands from
coordinators aren't blocked by long-running DataStream services' RPCs.
To avoid unnecessary delays due to sharing the network connections
between DataStream service and Control service, this change added the
service name as part of the user credentials for the ConnectionId
so each service will use a separate connection.

The majority of this patch is mechanical conversion of some Thrift
structures used in ReportExecStatus() RPC to Protobuf. Note that the
runtime profile is still retained as a Thrift structure as Impala
clients will still fetch query profiles using Thrift RPCs. This also
avoids duplicating the serialization implementation in both Thrift
and Protobuf for the runtime profile. The Thrift runtime profiles
are serialized and sent as a sidecar in ReportExecStatus() RPC.

This patch also fixes IMPALA-7241 which may lead to duplicated
dml stats being applied. The fix is by adding a monotonically
increasing version number for fragment instances' reports. The
coordinator will ignore any report smaller than or equal to the
version in the last report.

Testing done:
1. Exhaustive build.
2. Added some targeted test cases for profile serialization failure
   and RPC retries/timeout.

Change-Id: I7638583b433dcac066b87198e448743d90415ebe
Reviewed-on: http://gerrit.cloudera.org:8080/10855
Reviewed-by: Impala Public Jenkins <im...@cloudera.com>
Tested-by: Impala Public Jenkins <im...@cloudera.com>
---
M be/src/benchmarks/expr-benchmark.cc
M be/src/catalog/catalog-util.cc
M be/src/common/global-flags.cc
M be/src/exec/data-sink.cc
M be/src/exec/data-sink.h
M be/src/exec/hbase-table-sink.cc
M be/src/exec/hdfs-parquet-table-writer.cc
M be/src/exec/hdfs-parquet-table-writer.h
M be/src/exec/hdfs-table-sink.cc
M be/src/exec/hdfs-table-writer.cc
M be/src/exec/hdfs-table-writer.h
M be/src/rpc/CMakeLists.txt
M be/src/rpc/jni-thrift-util.h
M be/src/rpc/rpc-mgr-kerberized-test.cc
M be/src/rpc/rpc-mgr-test.cc
M be/src/rpc/rpc-mgr-test.h
M be/src/rpc/rpc-mgr.h
M be/src/rpc/thrift-util-test.cc
M be/src/rpc/thrift-util.h
M be/src/runtime/backend-client.h
M be/src/runtime/coordinator-backend-state.cc
M be/src/runtime/coordinator-backend-state.h
M be/src/runtime/coordinator.cc
M be/src/runtime/coordinator.h
M be/src/runtime/dml-exec-state.cc
M be/src/runtime/dml-exec-state.h
M be/src/runtime/exec-env.cc
M be/src/runtime/exec-env.h
M be/src/runtime/fragment-instance-state.cc
M be/src/runtime/fragment-instance-state.h
M be/src/runtime/krpc-data-stream-sender.cc
M be/src/runtime/query-state.cc
M be/src/runtime/query-state.h
M be/src/runtime/runtime-state.cc
M be/src/runtime/runtime-state.h
M be/src/runtime/test-env.cc
M be/src/scheduling/admission-controller.cc
M be/src/scheduling/scheduler-test-util.cc
M be/src/service/CMakeLists.txt
M be/src/service/client-request-state.cc
M be/src/service/client-request-state.h
A be/src/service/control-service.cc
A be/src/service/control-service.h
M be/src/service/data-stream-service.cc
M be/src/service/data-stream-service.h
M be/src/service/impala-internal-service.cc
M be/src/service/impala-internal-service.h
M be/src/service/impala-server.cc
M be/src/service/impala-server.h
M be/src/testutil/in-process-servers.cc
M be/src/util/container-util.h
A be/src/util/error-util-internal.h
M be/src/util/error-util-test.cc
M be/src/util/error-util.cc
M be/src/util/error-util.h
M be/src/util/runtime-profile.cc
M be/src/util/uid-util.h
M common/protobuf/CMakeLists.txt
M common/protobuf/common.proto
A common/protobuf/control_service.proto
M common/protobuf/data_stream_service.proto
M common/protobuf/row_batch.proto
M common/protobuf/rpc_test.proto
M common/thrift/ImpalaInternalService.thrift
M tests/custom_cluster/test_rpc_timeout.py
65 files changed, 1,299 insertions(+), 769 deletions(-)

Approvals:
  Impala Public Jenkins: Looks good to me, approved; Verified

-- 
To view, visit http://gerrit.cloudera.org:8080/10855
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I7638583b433dcac066b87198e448743d90415ebe
Gerrit-Change-Number: 10855
Gerrit-PatchSet: 23
Gerrit-Owner: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Michal Ostrowski <mo...@cloudera.com>
Gerrit-Reviewer: Thomas Marshall <th...@cmu.edu>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[Impala-ASF-CR] IMPALA-7213, IMPALA-7241: Port ReportExecStatus() RPC to use KRPC

Posted by "Michael Ho (Code Review)" <ge...@cloudera.org>.
Hello Thomas Marshall, Todd Lipcon, Tim Armstrong, Bikramjeet Vig, Impala Public Jenkins, Michal Ostrowski, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/10855

to look at the new patch set (#16).

Change subject: IMPALA-7213, IMPALA-7241: Port ReportExecStatus() RPC to use KRPC
......................................................................

IMPALA-7213, IMPALA-7241: Port ReportExecStatus() RPC to use KRPC

This change converts ReportExecStatus() RPC from thrift
based RPC to KRPC. This is done in part of the preparation
for fixing IMPALA-2990 as we can take advantage of TCP connection
multiplexing in KRPC to avoid overwhelming the coordinator
with too many connections by reducing the number of TCP connection
to one for each executor.

This patch also introduces a new service pool for all query execution
control related RPCs in the future so that control commands from
coordinators aren't blocked by long-running DataStream services' RPCs.
To avoid unnecessary delays due to sharing the network connections
between DataStream service and Control service, this change added the
service name as part of the user credentials for the ConnectionId
so each service will use a separate connection.

The majority of this patch is mechanical conversion of some Thrift
structures used in ReportExecStatus() RPC to Protobuf. Note that the
runtime profile is still retained as a Thrift structure as Impala
clients will still fetch query profiles using Thrift RPCs. This also
avoids duplicating the serialization implementation in both Thrift
and Protobuf for the runtime profile. The Thrift runtime profiles
are serialized and sent as a sidecar in ReportExecStatus() RPC.

This patch also fixes IMPALA-7241 which may lead to duplicated
dml stats being applied. The fix is by adding a monotonically
increasing version number for fragment instances' reports. The
coordinator will ignore any report smaller than or equal to the
version in the last report.

Testing done:
1. Exhaustive build.
2. Added some targeted test cases for profile serialization failure and RPC retries/timeout.

Change-Id: I7638583b433dcac066b87198e448743d90415ebe
---
M be/src/benchmarks/expr-benchmark.cc
M be/src/catalog/catalog-util.cc
M be/src/common/global-flags.cc
M be/src/exec/data-sink.cc
M be/src/exec/data-sink.h
M be/src/exec/hbase-table-sink.cc
M be/src/exec/hdfs-parquet-table-writer.cc
M be/src/exec/hdfs-parquet-table-writer.h
M be/src/exec/hdfs-table-sink.cc
M be/src/exec/hdfs-table-writer.cc
M be/src/exec/hdfs-table-writer.h
M be/src/rpc/CMakeLists.txt
M be/src/rpc/jni-thrift-util.h
M be/src/rpc/rpc-mgr-kerberized-test.cc
M be/src/rpc/rpc-mgr-test.cc
M be/src/rpc/rpc-mgr-test.h
M be/src/rpc/rpc-mgr.h
M be/src/rpc/rpc-mgr.inline.h
M be/src/rpc/thrift-util-test.cc
M be/src/rpc/thrift-util.h
M be/src/runtime/backend-client.h
M be/src/runtime/coordinator-backend-state.cc
M be/src/runtime/coordinator-backend-state.h
M be/src/runtime/coordinator.cc
M be/src/runtime/coordinator.h
M be/src/runtime/dml-exec-state.cc
M be/src/runtime/dml-exec-state.h
M be/src/runtime/exec-env.cc
M be/src/runtime/exec-env.h
M be/src/runtime/fragment-instance-state.cc
M be/src/runtime/fragment-instance-state.h
M be/src/runtime/krpc-data-stream-sender.cc
M be/src/runtime/query-state.cc
M be/src/runtime/query-state.h
M be/src/runtime/runtime-state.cc
M be/src/runtime/runtime-state.h
M be/src/runtime/test-env.cc
M be/src/scheduling/admission-controller.cc
M be/src/scheduling/scheduler-test-util.cc
M be/src/service/CMakeLists.txt
M be/src/service/client-request-state.cc
M be/src/service/client-request-state.h
A be/src/service/control-service.cc
A be/src/service/control-service.h
M be/src/service/data-stream-service.cc
M be/src/service/data-stream-service.h
M be/src/service/impala-internal-service.cc
M be/src/service/impala-internal-service.h
M be/src/service/impala-server.cc
M be/src/service/impala-server.h
M be/src/testutil/in-process-servers.cc
M be/src/util/container-util.h
A be/src/util/error-util-internal.h
M be/src/util/error-util-test.cc
M be/src/util/error-util.cc
M be/src/util/error-util.h
M be/src/util/runtime-profile.cc
M be/src/util/uid-util.h
M common/protobuf/CMakeLists.txt
M common/protobuf/common.proto
A common/protobuf/control_service.proto
M common/protobuf/data_stream_service.proto
M common/protobuf/row_batch.proto
M common/protobuf/rpc_test.proto
M common/thrift/ImpalaInternalService.thrift
M tests/custom_cluster/test_rpc_timeout.py
66 files changed, 1,287 insertions(+), 787 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/55/10855/16
-- 
To view, visit http://gerrit.cloudera.org:8080/10855
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I7638583b433dcac066b87198e448743d90415ebe
Gerrit-Change-Number: 10855
Gerrit-PatchSet: 16
Gerrit-Owner: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Michal Ostrowski <mo...@cloudera.com>
Gerrit-Reviewer: Thomas Marshall <th...@cmu.edu>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[Impala-ASF-CR] IMPALA-7213, IMPALA-7241: Port ReportExecStatus() RPC to use KRPC

Posted by "Michael Ho (Code Review)" <ge...@cloudera.org>.
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/10855 )

Change subject: IMPALA-7213, IMPALA-7241: Port ReportExecStatus() RPC to use KRPC
......................................................................


Patch Set 8:

(16 comments)

http://gerrit.cloudera.org:8080/#/c/10855/8/be/src/exec/data-sink.cc
File be/src/exec/data-sink.cc:

http://gerrit.cloudera.org:8080/#/c/10855/8/be/src/exec/data-sink.cc@49
PS8, Line 49: const char* DataSink::ROOT_PARTITION_KEY = "";
> this should be const char* const -- ie it's a const pointer to const charac
True. Same comment probably applies to other classes too (e.g. https://github.com/apache/impala/blob/master/be/src/exec/hash-table.h#L196)


http://gerrit.cloudera.org:8080/#/c/10855/8/be/src/exec/hdfs-table-writer.h
File be/src/exec/hdfs-table-writer.h:

http://gerrit.cloudera.org:8080/#/c/10855/8/be/src/exec/hdfs-table-writer.h@91
PS8, Line 91:   const InsertStatsPB& stats() { return stats_; }
> probably should be a const function as well?
Done


http://gerrit.cloudera.org:8080/#/c/10855/8/be/src/runtime/coordinator-backend-state.h
File be/src/runtime/coordinator-backend-state.h:

http://gerrit.cloudera.org:8080/#/c/10855/8/be/src/runtime/coordinator-backend-state.h@211
PS8, Line 211:   Coordinator* coord_; /// Coordinator object that owns this BackendState
> this went from const to non-const. does this mutate the coordinator that ow
Indirectly. We are updating the coord_->dml_exec_state_(). The locking is done inside dml_exec_state_ itself so it should be thread safe.


http://gerrit.cloudera.org:8080/#/c/10855/8/be/src/runtime/coordinator-backend-state.cc
File be/src/runtime/coordinator-backend-state.cc:

http://gerrit.cloudera.org:8080/#/c/10855/8/be/src/runtime/coordinator-backend-state.cc@50
PS8, Line 50:   : coord_(coord),
> nit: you could just do coord_(DHECK_NOTNULL(coord)) here since DCHECK_NOTNU
Done


http://gerrit.cloudera.org:8080/#/c/10855/8/be/src/runtime/coordinator-backend-state.cc@262
PS8, Line 262:     lock.unlock();
> I'm nervous about this optimization. Do we really have data that deserializ
The reason for doing it here is to handle the case in which we can have 0 to # fragment instances (after IMPALA-4063) entries in "backend_exec_status. instance_exec_status". We can have 0 entries if say we failed to start a thread. So, while we can do it in the RPC handler, it's a bit awkward to handle the zero entry case.

The new patch moves the profile sidecar idx into backend_exec_status from backend_exec_status.instance_exec_status. That allows us to keep the deserialization in the ControlService::ReportExecStatus().

In the patch for IMPALA-4063, we will just replace TRuntimeProfileTree with a list<TRuntimeProfileTree> instead.


http://gerrit.cloudera.org:8080/#/c/10855/8/be/src/runtime/coordinator-backend-state.cc@310
PS8, Line 310: coord_
> I feel that allowing for a non-const pointer to the Coordinator object to t
Done


http://gerrit.cloudera.org:8080/#/c/10855/8/be/src/runtime/coordinator-backend-state.cc@317
PS8, Line 317: MergeErrorMaps(instance_exec_status.error_log(), &error_log_);
> This is less severe, but I guess that this is already wrong too. We potenti
The new report has a monotonically increasing version number. This is a variant of IMPALA-7241.


http://gerrit.cloudera.org:8080/#/c/10855/8/be/src/runtime/coordinator-backend-state.cc@516
PS8, Line 516: lock_guard<SpinLock> l1(exec_summary->lock);
> Are we violating any lock ordering here?
Yes, I also looked and didn't see anything.


http://gerrit.cloudera.org:8080/#/c/10855/8/be/src/runtime/query-state.cc
File be/src/runtime/query-state.cc:

http://gerrit.cloudera.org:8080/#/c/10855/8/be/src/runtime/query-state.cc@318
PS8, Line 318: bool done
> trying to understand the purpose of this parameter. I don't think this shou
I agree in general. Let me add a TODO.


http://gerrit.cloudera.org:8080/#/c/10855/8/be/src/runtime/query-state.cc@331
PS8, Line 331:   ThriftSerializer serializer(true);
> Why construct this here and pass it into ConstructReport? It seems Construc
The serialization output is owned by ThriftSerializer. My mistake to use a string below. I was using (buffer, len) earlier to avoid the extra copying. Also added comments about the lifetime of the buffer.


http://gerrit.cloudera.org:8080/#/c/10855/8/be/src/runtime/query-state.cc@359
PS8, Line 359:       CHECK(sidecar_status.ok())
             :           << FromKuduStatus(sidecar_status, "Failed to add sidecar").GetDetail();
> is there any chance that the profile would extend past max RPC size? I gues
Added a CHECK for it in ConstructReport().


http://gerrit.cloudera.org:8080/#/c/10855/8/be/src/runtime/query-state.cc@373
PS8, Line 373:     if (i < 2) SleepForMs(FLAGS_report_status_retry_interval_ms);
> worth a LOG(WARNING) here?
We have some logging already below at line 385.


http://gerrit.cloudera.org:8080/#/c/10855/8/be/src/service/control-service.cc
File be/src/service/control-service.cc:

http://gerrit.cloudera.org:8080/#/c/10855/8/be/src/service/control-service.cc@76
PS8, Line 76: 
> I feel that we can still keep the deserialization here and just check for t
The complication comes from the fact the sidecar index is stashed inside instance_exec_status. The new patch has moved that index ReportExecStatusRequestPB which also makes it easier for IMPALA-4063 when there are multiple profiles. This allows us to handle the case where instance_exec_status_size() == 0 here.


http://gerrit.cloudera.org:8080/#/c/10855/8/be/src/util/uid-util.h
File be/src/util/uid-util.h:

http://gerrit.cloudera.org:8080/#/c/10855/8/be/src/util/uid-util.h@61
PS8, Line 61:   unique_id_pb->set_lo(t_unique_id.lo);
> worth a DCHECK that t_unique_id.__isset_.lo and hi?
Both lo and hi are required fields in TUniqueId. Updated UniqueIdPB to reflect that too. Don't recall why those fields were set to optional when it was ported to protobuf.


http://gerrit.cloudera.org:8080/#/c/10855/8/common/protobuf/control_service.proto
File common/protobuf/control_service.proto:

http://gerrit.cloudera.org:8080/#/c/10855/8/common/protobuf/control_service.proto@140
PS8, Line 140: This is sent only on completion of a fragment instance.
> can you be more explicit and say that this is sent only when 'done' is true
Done


http://gerrit.cloudera.org:8080/#/c/10855/8/common/protobuf/control_service.proto@145
PS8, Line 145:   // Map of TErrorCode to ErrorLogEntryPB; New errors that have not been reported to
             :   // the coordinator by this fragment instance. Not idempotent. The done flag helps
             :   // prevent applying the same update twice.
             :   map<int32, ErrorLogEntryPB> error_log = 7;
> so this is only sent when 'done' is true? if not, then the done flag doesn'
That's true. Guess we do need a version number of some sort as part of FragmentInstanceExecStatusPB



-- 
To view, visit http://gerrit.cloudera.org:8080/10855
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7638583b433dcac066b87198e448743d90415ebe
Gerrit-Change-Number: 10855
Gerrit-PatchSet: 8
Gerrit-Owner: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Thu, 09 Aug 2018 00:14:56 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-7213: Port ReportExecStatus() RPC to use KRPC

Posted by "Michael Ho (Code Review)" <ge...@cloudera.org>.
Hello Sailesh Mukil, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/10855

to look at the new patch set (#3).

Change subject: IMPALA-7213: Port ReportExecStatus() RPC to use KRPC
......................................................................

IMPALA-7213: Port ReportExecStatus() RPC to use KRPC

This change converts ReportExecStatus() RPC from thrift
based RPC to KRPC. This is done in part of the preparation
for fixing IMPALA-2990 as we can take advantage of TCP connection
multiplexing in KRPC to avoid overwhelming the coordinator
with too many connections by reducing the number of TCP connection
to one for each executor.

This patch also introduces a new service pool for all query execution
control related RPCs in the future so that control commands from
coordinators aren't blocked by long-running DataStream services' RPCs.
The majority of this patch is mechanical conversion of some Thrift
structures used in ReportExecStatus() RPC to Protobuf. Note that the
runtime profile is still retained as a Thrift structure as Impala
clients will still fetch query profiles using Thrift RPCs. This also
avoids duplicating the serialization implementation in both Thrift
and Protobuf for the runtime profile. The Thrift runtime profiles
are serialized and sent as a sidecar in ReportExecStatus() RPC.

Change-Id: I7638583b433dcac066b87198e448743d90415ebe
---
M be/src/benchmarks/expr-benchmark.cc
M be/src/catalog/catalog-util.cc
M be/src/exec/data-sink.cc
M be/src/exec/data-sink.h
M be/src/exec/hbase-table-sink.cc
M be/src/exec/hdfs-parquet-table-writer.cc
M be/src/exec/hdfs-parquet-table-writer.h
M be/src/exec/hdfs-table-sink.cc
M be/src/exec/hdfs-table-writer.cc
M be/src/exec/hdfs-table-writer.h
M be/src/rpc/jni-thrift-util.h
M be/src/rpc/thrift-util-test.cc
M be/src/rpc/thrift-util.h
M be/src/runtime/backend-client.h
M be/src/runtime/coordinator-backend-state.cc
M be/src/runtime/coordinator-backend-state.h
M be/src/runtime/coordinator.cc
M be/src/runtime/coordinator.h
M be/src/runtime/data-stream-test.cc
M be/src/runtime/dml-exec-state.cc
M be/src/runtime/dml-exec-state.h
M be/src/runtime/exec-env.cc
M be/src/runtime/exec-env.h
M be/src/runtime/fragment-instance-state.cc
M be/src/runtime/fragment-instance-state.h
M be/src/runtime/query-state.cc
M be/src/runtime/query-state.h
M be/src/runtime/runtime-state.cc
M be/src/runtime/runtime-state.h
M be/src/runtime/test-env.cc
M be/src/scheduling/admission-controller.cc
M be/src/scheduling/scheduler-test-util.cc
M be/src/service/CMakeLists.txt
M be/src/service/client-request-state.cc
M be/src/service/client-request-state.h
A be/src/service/control-service.cc
A be/src/service/control-service.h
M be/src/service/data-stream-service.cc
M be/src/service/impala-internal-service.cc
M be/src/service/impala-internal-service.h
M be/src/service/impala-server.cc
M be/src/service/impala-server.h
M be/src/testutil/in-process-servers.cc
M be/src/util/container-util.h
A be/src/util/error-util-internal.h
M be/src/util/error-util-test.cc
M be/src/util/error-util.cc
M be/src/util/error-util.h
M be/src/util/runtime-profile.cc
M be/src/util/runtime-profile.h
M be/src/util/uid-util.h
M bin/impala-config.sh
M common/protobuf/CMakeLists.txt
M common/protobuf/common.proto
A common/protobuf/control_service.proto
M common/protobuf/data_stream_service.proto
M common/protobuf/row_batch.proto
M common/protobuf/rpc_test.proto
M common/thrift/ImpalaInternalService.thrift
59 files changed, 1,017 insertions(+), 617 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/55/10855/3
-- 
To view, visit http://gerrit.cloudera.org:8080/10855
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I7638583b433dcac066b87198e448743d90415ebe
Gerrit-Change-Number: 10855
Gerrit-PatchSet: 3
Gerrit-Owner: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>

[Impala-ASF-CR] IMPALA-7213: Port ReportExecStatus() RPC to use KRPC

Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Todd Lipcon has posted comments on this change. ( http://gerrit.cloudera.org:8080/10855 )

Change subject: IMPALA-7213: Port ReportExecStatus() RPC to use KRPC
......................................................................


Patch Set 5:

(16 comments)

http://gerrit.cloudera.org:8080/#/c/10855/5/be/src/exec/data-sink.h
File be/src/exec/data-sink.h:

http://gerrit.cloudera.org:8080/#/c/10855/5/be/src/exec/data-sink.h@100
PS5, Line 100:   static const string ROOT_PARTITION_KEY;
can this be a const char* const ROOT_PARTITION_KEY instead? usually static non-PODs are discouraged -- https://google.github.io/styleguide/cppguide.html#Static_and_Global_Variables


http://gerrit.cloudera.org:8080/#/c/10855/5/be/src/exec/hdfs-table-writer.h
File be/src/exec/hdfs-table-writer.h:

http://gerrit.cloudera.org:8080/#/c/10855/5/be/src/exec/hdfs-table-writer.h@91
PS5, Line 91: InsertStatsPB
style: is this meant to be returning a mutable reference or should this be const? if mutable maybe it would be better to return a pointer to align with the style guide? (I see that you just translated the existing code but maybe worth fixing while you are here)


http://gerrit.cloudera.org:8080/#/c/10855/5/be/src/runtime/coordinator-backend-state.cc
File be/src/runtime/coordinator-backend-state.cc:

http://gerrit.cloudera.org:8080/#/c/10855/5/be/src/runtime/coordinator-backend-state.cc@254
PS5, Line 254:   for (int i = 0; i < backend_exec_status.instance_exec_status_size(); ++i) {
             :     const FragmentInstanceExecStatusPB& instance_exec_status =
             :         backend_exec_status.instance_exec_status(i);
you can use a C++11 style loop with protobuf repeated elements too


http://gerrit.cloudera.org:8080/#/c/10855/4/be/src/runtime/query-state.cc
File be/src/runtime/query-state.cc:

http://gerrit.cloudera.org:8080/#/c/10855/4/be/src/runtime/query-state.cc@274
PS4, Line 274: is is
> I don't think this is expected to happen very often but it seems more robus
right, I was suggesting DFATAL which crashes impala only in DEBUG builds, but otherwise just emits an ERROR-level log. That way if this starts to get broken we'd get a more obvious failure in debug test runs


http://gerrit.cloudera.org:8080/#/c/10855/4/be/src/runtime/query-state.cc@284
PS4, Line 284:   ThriftSerializer serializer(true);
> The new patch creates a single instance of the RPC parameter and reuse it o
yep, that's right. The parameter is serialized out of the PB when you call the function on the proxy, and you're free to do what you want with the protobuf after that.


http://gerrit.cloudera.org:8080/#/c/10855/5/be/src/runtime/query-state.cc
File be/src/runtime/query-state.cc:

http://gerrit.cloudera.org:8080/#/c/10855/5/be/src/runtime/query-state.cc@233
PS5, Line 233:   UniqueIdPB* query_id_pb = report->mutable_query_id();
             :   query_id_pb->set_lo(query_id().lo);
             :   query_id_pb->set_hi(query_id().hi);
I thought I saw some utility code for this elsewhere


http://gerrit.cloudera.org:8080/#/c/10855/5/be/src/runtime/query-state.cc@243
PS5, Line 243:     UniqueIdPB* finstance_id_pb = instance_status->mutable_fragment_instance_id();
             :     finstance_id_pb->set_lo(fis->instance_id().lo);
             :     finstance_id_pb->set_hi(fis->instance_id().hi);
and here


http://gerrit.cloudera.org:8080/#/c/10855/5/be/src/runtime/runtime-state.cc
File be/src/runtime/runtime-state.cc:

http://gerrit.cloudera.org:8080/#/c/10855/5/be/src/runtime/runtime-state.cc@200
PS5, Line 200: exec_status
this is weird here because there is no 'exec_status' in this function


http://gerrit.cloudera.org:8080/#/c/10855/4/be/src/service/control-service.cc
File be/src/service/control-service.cc:

http://gerrit.cloudera.org:8080/#/c/10855/4/be/src/service/control-service.cc@60
PS4, Line 60:  ControlService
> No serious measure on perf yet but yes, there is no point in pushing beyond
It seems like Coordinator::BackendState::ApplyExecStatusReport takes a per-query lock. So if you have one query with lots and lots of fragments reporting, threads can stack up waiting on that lock because you're effectively limited to one core worth of throughput doing the merging. Reports for some other concurrent query which might be small (few fragments) can get starved because the big query is tying up all the service threads.

There are some fancier options here like queueing the reports instead of acquiring the lock, etc, but maybe in the interim before trying anything fancy it makes sense to have a limit here that is larger than the number of cores?


http://gerrit.cloudera.org:8080/#/c/10855/4/be/src/service/control-service.cc@82
PS4, Line 82:     Status::Expected(err).ToProto(response->mutable_status());
            :     // Release the memory against the control service's memory tracker.
would it be useful to also print rpc_context->requestor_string in this error so we know which impalad sent the report?


http://gerrit.cloudera.org:8080/#/c/10855/5/be/src/service/control-service.cc
File be/src/service/control-service.cc:

http://gerrit.cloudera.org:8080/#/c/10855/5/be/src/service/control-service.cc@43
PS5, Line 43: control services
shouldn't this be "control service's" since there is only one control service?


http://gerrit.cloudera.org:8080/#/c/10855/5/be/src/service/control-service.cc@89
PS5, Line 89:   DCHECK_EQ(request->instance_exec_status_size(), 1);
hmm, why is this repeated instead of just optional, then?

Also for robustness I think it's usually safer to check conditions on RPC arguments with an if statement and respoind failure in the case of an incorrect one. Or, just use a CHECK, because in a release build, if this check doesn't pass, you'll likely end up reading or scribbling over some random memory below on line 91 and it'll just be harder to understand what the problem is.


http://gerrit.cloudera.org:8080/#/c/10855/5/be/src/service/control-service.cc@117
PS5, Line 117:   status.ToProto(response->mutable_status());
             :   // Release the memory against the control service's memory tracker.
             :   mem_tracker_->Release(rpc_context->GetTransferSize());
             :   rpc_context->RespondSuccess();
since both of the return points of thsi function have this same code, I think it makes sense to refactor out the body of the function to something like ReportExecStatusInternal or HandleReportExecStatus, which would return a Status. That way if you add a later return point to this function there's less risk of a bug like forgetting to release the memory from the memtracker (which would be tricky to debug if it happened)


http://gerrit.cloudera.org:8080/#/c/10855/5/common/protobuf/common.proto
File common/protobuf/common.proto:

http://gerrit.cloudera.org:8080/#/c/10855/5/common/protobuf/common.proto@43
PS5, Line 43: // The kind of value that a metric represents.
do these need to be kept in sync with common/thrift/Metrics.thrift? Worth leaving a note and probably a pointer to any code which is required to translate between them, if we aren't just relying on equal enum int values


http://gerrit.cloudera.org:8080/#/c/10855/5/common/protobuf/control_service.proto
File common/protobuf/control_service.proto:

http://gerrit.cloudera.org:8080/#/c/10855/5/common/protobuf/control_service.proto@153
PS5, Line 153:   optional InsertExecStatusPB insert_exec_status = 5;
can you document whether this should be sent incrementally as you go (potentially muiltiple times) or whether it should only be sent once when the insert finishes? It seems to me that the handling of it is not idempotent (eg counters are summed rather than replaced).


http://gerrit.cloudera.org:8080/#/c/10855/5/common/thrift/ImpalaInternalService.thrift
File common/thrift/ImpalaInternalService.thrift:

http://gerrit.cloudera.org:8080/#/c/10855/5/common/thrift/ImpalaInternalService.thrift@380
PS5, Line 380:   7: optional Types.TNetworkAddress coord_krpc_address
unrelated to this patch but it always bugs me to see changes to thrift field IDs, since it breaks rolling upgrades in very hard-to-understand ways. I know we don't support rolling upgrade but I'd rather see an error message about a missing field than have it attempt to deserialize one field using the data of another



-- 
To view, visit http://gerrit.cloudera.org:8080/10855
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7638583b433dcac066b87198e448743d90415ebe
Gerrit-Change-Number: 10855
Gerrit-PatchSet: 5
Gerrit-Owner: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Mon, 06 Aug 2018 17:36:15 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-7213: Port ReportExecStatus() RPC to use KRPC

Posted by "Sailesh Mukil (Code Review)" <ge...@cloudera.org>.
Sailesh Mukil has posted comments on this change. ( http://gerrit.cloudera.org:8080/10855 )

Change subject: IMPALA-7213: Port ReportExecStatus() RPC to use KRPC
......................................................................


Patch Set 5:

(8 comments)

http://gerrit.cloudera.org:8080/#/c/10855/5/be/src/runtime/query-state.h
File be/src/runtime/query-state.h:

http://gerrit.cloudera.org:8080/#/c/10855/5/be/src/runtime/query-state.h@276
PS5, Line 276: to construct a fragment instance's status report
"... to construct a status report ..."

Since if 'fis' is nullptr, then we send a generic report with only the status of the query.


http://gerrit.cloudera.org:8080/#/c/10855/5/be/src/runtime/query-state.h@277
PS5, Line 277: The runtime profile 
"If 'fis' is not 'nullptr', the runtime profile ..."


http://gerrit.cloudera.org:8080/#/c/10855/5/be/src/runtime/query-state.cc
File be/src/runtime/query-state.cc:

http://gerrit.cloudera.org:8080/#/c/10855/5/be/src/runtime/query-state.cc@256
PS5, Line 256: query ID $1
Shouldn't we print the fragment instance ID instead?


http://gerrit.cloudera.org:8080/#/c/10855/5/be/src/runtime/query-state.cc@295
PS5, Line 295: The profile is serialized in Thrift
The profile is a thrift structure serialized to a string ...


http://gerrit.cloudera.org:8080/#/c/10855/5/be/src/runtime/query-state.cc@306
PS5, Line 306: sidecar_buf->assign_copy(profile_str);
We could probably consider moving 'profile_str' to the 'sidecar_buf' to avoid the copy, but that requires changes in Kudu code, so we can defer for later.


http://gerrit.cloudera.org:8080/#/c/10855/5/be/src/runtime/query-state.cc@304
PS5, Line 304:     if (!profile_str.empty()) {
             :       unique_ptr<kudu::faststring> sidecar_buf = make_unique<kudu::faststring>();
             :       sidecar_buf->assign_copy(profile_str);
             :       unique_ptr<RpcSidecar> sidecar = RpcSidecar::FromFaststring(move(sidecar_buf));
             : 
             :       int sidecar_idx;
             :       kudu::Status sidecar_status =
             :           rpc_controller.AddOutboundSidecar(move(sidecar), &sidecar_idx);
             :       CHECK(sidecar_status.ok())
             :           << FromKuduStatus(sidecar_status, "Failed to add sidecar").GetDetail();
             : 
             :       DCHECK_EQ(report.instance_exec_status_size(), 1);
             :       report.mutable_instance_exec_status(0)->set_thrift_profile_sidecar_idx(sidecar_idx);
             :     }
Can't we just do this once before the loop starts?


http://gerrit.cloudera.org:8080/#/c/10855/5/be/src/util/error-util.cc
File be/src/util/error-util.cc:

http://gerrit.cloudera.org:8080/#/c/10855/5/be/src/util/error-util.cc@144
PS5, Line 144: auto
auto&


http://gerrit.cloudera.org:8080/#/c/10855/5/be/src/util/error-util.cc@172
PS5, Line 172: auto
auto&



-- 
To view, visit http://gerrit.cloudera.org:8080/10855
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7638583b433dcac066b87198e448743d90415ebe
Gerrit-Change-Number: 10855
Gerrit-PatchSet: 5
Gerrit-Owner: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Sun, 05 Aug 2018 21:34:47 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-7213, IMPALA-7241: Port ReportExecStatus() RPC to use KRPC

Posted by "Michael Ho (Code Review)" <ge...@cloudera.org>.
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/10855 )

Change subject: IMPALA-7213, IMPALA-7241: Port ReportExecStatus() RPC to use KRPC
......................................................................


Patch Set 20: Code-Review+2

Carry +2


-- 
To view, visit http://gerrit.cloudera.org:8080/10855
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7638583b433dcac066b87198e448743d90415ebe
Gerrit-Change-Number: 10855
Gerrit-PatchSet: 20
Gerrit-Owner: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Michal Ostrowski <mo...@cloudera.com>
Gerrit-Reviewer: Thomas Marshall <th...@cmu.edu>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Thu, 25 Oct 2018 02:12:39 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-7213: Port ReportExecStatus() RPC to use KRPC

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/10855 )

Change subject: IMPALA-7213: Port ReportExecStatus() RPC to use KRPC
......................................................................


Patch Set 6:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/241/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/10855
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7638583b433dcac066b87198e448743d90415ebe
Gerrit-Change-Number: 10855
Gerrit-PatchSet: 6
Gerrit-Owner: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Wed, 08 Aug 2018 18:01:02 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-7213, IMPALA-7241: Port ReportExecStatus() RPC to use KRPC

Posted by "Michael Ho (Code Review)" <ge...@cloudera.org>.
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/10855 )

Change subject: IMPALA-7213, IMPALA-7241: Port ReportExecStatus() RPC to use KRPC
......................................................................


Patch Set 19: Code-Review+1

Carry +1


-- 
To view, visit http://gerrit.cloudera.org:8080/10855
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7638583b433dcac066b87198e448743d90415ebe
Gerrit-Change-Number: 10855
Gerrit-PatchSet: 19
Gerrit-Owner: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Michal Ostrowski <mo...@cloudera.com>
Gerrit-Reviewer: Thomas Marshall <th...@cmu.edu>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Sat, 20 Oct 2018 01:52:39 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-7213: Port ReportExecStatus() RPC to use KRPC

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/10855 )

Change subject: IMPALA-7213: Port ReportExecStatus() RPC to use KRPC
......................................................................


Patch Set 4:

Build Started https://jenkins.impala.io/job/gerrit-code-review-checks/45/ 

Running initial code review checks. This is experimental - please report any issues to tarmstrong@cloudera.com or on this JIRA: IMPALA-7317


-- 
To view, visit http://gerrit.cloudera.org:8080/10855
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7638583b433dcac066b87198e448743d90415ebe
Gerrit-Change-Number: 10855
Gerrit-PatchSet: 4
Gerrit-Owner: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>
Gerrit-Comment-Date: Wed, 25 Jul 2018 00:16:23 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-7213: Port ReportExecStatus() RPC to use KRPC

Posted by "Michael Ho (Code Review)" <ge...@cloudera.org>.
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/10855 )

Change subject: IMPALA-7213: Port ReportExecStatus() RPC to use KRPC
......................................................................


Patch Set 4:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/10855/4/be/src/runtime/query-state.cc
File be/src/runtime/query-state.cc:

http://gerrit.cloudera.org:8080/#/c/10855/4/be/src/runtime/query-state.cc@289
PS4, Line 289: rpc_mgr->GetProxy
clang-tidy failure: Missing status check.



-- 
To view, visit http://gerrit.cloudera.org:8080/10855
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7638583b433dcac066b87198e448743d90415ebe
Gerrit-Change-Number: 10855
Gerrit-PatchSet: 4
Gerrit-Owner: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>
Gerrit-Comment-Date: Wed, 25 Jul 2018 00:43:46 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-7213: Port ReportExecStatus() RPC to use KRPC

Posted by "Michael Ho (Code Review)" <ge...@cloudera.org>.
Michael Ho has uploaded a new patch set (#2). ( http://gerrit.cloudera.org:8080/10855 )

Change subject: IMPALA-7213: Port ReportExecStatus() RPC to use KRPC
......................................................................

IMPALA-7213: Port ReportExecStatus() RPC to use KRPC

This change converts ReportExecStatus() RPC from thrift
based RPC to KRPC. This is done in part of the preparation
for fixing IMPALA-2990 as we can take advantage of TCP connection
multiplexing in KRPC to avoid overwhelming the coordinator
with too many connections by reducing the number of TCP connection
to one for each executor.

This patch also introduces a new service pool for all query execution
control related RPCs in the future so they control commands from
coordinators aren't blocked by long-running DataStream services' RPCs.
The majority of this patch is mechanical convertion of some Thrift
structures to Protobuf. Note that the runtime profile is still retained
as Thrift structure as Impala clients will still fetch query profile in
Thrift. This avoids duplicated serialization implementation in both
Thrift and Protobuf for the runtime profile. The Thrift runtime
profiles are serialized and sent as a sidecar in ReportExecStatus() RPC.

Change-Id: I7638583b433dcac066b87198e448743d90415ebe
---
M be/src/benchmarks/expr-benchmark.cc
M be/src/exec/data-sink.cc
M be/src/exec/data-sink.h
M be/src/exec/hbase-table-sink.cc
M be/src/exec/hdfs-parquet-table-writer.cc
M be/src/exec/hdfs-parquet-table-writer.h
M be/src/exec/hdfs-table-sink.cc
M be/src/exec/hdfs-table-writer.cc
M be/src/exec/hdfs-table-writer.h
M be/src/rpc/thrift-util.h
M be/src/runtime/backend-client.h
M be/src/runtime/coordinator-backend-state.cc
M be/src/runtime/coordinator-backend-state.h
M be/src/runtime/coordinator.cc
M be/src/runtime/coordinator.h
M be/src/runtime/data-stream-test.cc
M be/src/runtime/dml-exec-state.cc
M be/src/runtime/dml-exec-state.h
M be/src/runtime/exec-env.cc
M be/src/runtime/exec-env.h
M be/src/runtime/fragment-instance-state.cc
M be/src/runtime/fragment-instance-state.h
M be/src/runtime/query-state.cc
M be/src/runtime/runtime-state.cc
M be/src/runtime/runtime-state.h
M be/src/service/CMakeLists.txt
M be/src/service/client-request-state.cc
M be/src/service/client-request-state.h
A be/src/service/control-service.cc
A be/src/service/control-service.h
M be/src/service/impala-internal-service.cc
M be/src/service/impala-internal-service.h
M be/src/service/impala-server.cc
M be/src/service/impala-server.h
M be/src/testutil/in-process-servers.cc
M be/src/util/container-util.h
A be/src/util/error-util-internal.h
M be/src/util/error-util-test.cc
M be/src/util/error-util.cc
M be/src/util/error-util.h
M be/src/util/runtime-profile.cc
M be/src/util/runtime-profile.h
M be/src/util/uid-util.h
M bin/bootstrap_toolchain.py
M bin/impala-config.sh
M common/protobuf/CMakeLists.txt
M common/protobuf/common.proto
A common/protobuf/control_service.proto
M common/protobuf/data_stream_service.proto
M common/protobuf/row_batch.proto
M common/protobuf/rpc_test.proto
M common/thrift/ImpalaInternalService.thrift
52 files changed, 942 insertions(+), 555 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/55/10855/2
-- 
To view, visit http://gerrit.cloudera.org:8080/10855
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I7638583b433dcac066b87198e448743d90415ebe
Gerrit-Change-Number: 10855
Gerrit-PatchSet: 2
Gerrit-Owner: Michael Ho <kw...@cloudera.com>

[Impala-ASF-CR] IMPALA-7213, IMPALA-7241: Port ReportExecStatus() RPC to use KRPC

Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Todd Lipcon has posted comments on this change. ( http://gerrit.cloudera.org:8080/10855 )

Change subject: IMPALA-7213, IMPALA-7241: Port ReportExecStatus() RPC to use KRPC
......................................................................


Patch Set 12:

(7 comments)

http://gerrit.cloudera.org:8080/#/c/10855/10/be/src/runtime/coordinator-backend-state.cc
File be/src/runtime/coordinator-backend-state.cc:

http://gerrit.cloudera.org:8080/#/c/10855/10/be/src/runtime/coordinator-backend-state.cc@294
PS10, Line 294:     DCHECK(!instance_stats->done_);
> The ctor was marked explicit so not sure it's allowed:
Oh, I missed that this was converted from a PB to a non-PB status.


http://gerrit.cloudera.org:8080/#/c/10855/12/be/src/runtime/coordinator-backend-state.cc
File be/src/runtime/coordinator-backend-state.cc:

http://gerrit.cloudera.org:8080/#/c/10855/12/be/src/runtime/coordinator-backend-state.cc@497
PS12, Line 497:   last_report_seq_no_ = exec_status.report_seq_no();
Can you add a CHECK here that it isn't decreasing? It's verified above, but seems good to verify it here just to be safe


http://gerrit.cloudera.org:8080/#/c/10855/12/be/src/runtime/coordinator-backend-state.cc@508
PS12, Line 508:   lock_guard<SpinLock> l1(exec_summary->lock);
is this critical section extending farther than it needs to? should it end around line 534?


http://gerrit.cloudera.org:8080/#/c/10855/12/be/src/runtime/fragment-instance-state.h
File be/src/runtime/fragment-instance-state.h:

http://gerrit.cloudera.org:8080/#/c/10855/12/be/src/runtime/fragment-instance-state.h@128
PS12, Line 128:   int32_t ReportSeqNo() {
nit: maybe rename to AdvanceReportSeqNo or NextReportSeqNo or something so it's clear that it has a side effect?


http://gerrit.cloudera.org:8080/#/c/10855/12/be/src/runtime/fragment-instance-state.h@178
PS12, Line 178:   DFAKE_MUTEX(report_seq_no_lock_);
I think this fake mutex should be held around the whole function that generates a report, not just the advancing of the sequence number. As is, the fake mutex won't throw in the case of a race:

T1: generate seq num
T2: generate seq num
T2: generate and send report
T1: generate and send report


http://gerrit.cloudera.org:8080/#/c/10855/12/be/src/runtime/query-state.cc
File be/src/runtime/query-state.cc:

http://gerrit.cloudera.org:8080/#/c/10855/12/be/src/runtime/query-state.cc@306
PS12, Line 306: REPORT_EXEC_STATUS
hmm, I dont see this one used. did I miss something?


http://gerrit.cloudera.org:8080/#/c/10855/12/common/protobuf/common.proto
File common/protobuf/common.proto:

http://gerrit.cloudera.org:8080/#/c/10855/12/common/protobuf/common.proto@33
PS12, Line 33: int64
nit: i think fixed64 is more appropriate for these fields since they dont tend to be small, right?



-- 
To view, visit http://gerrit.cloudera.org:8080/10855
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7638583b433dcac066b87198e448743d90415ebe
Gerrit-Change-Number: 10855
Gerrit-PatchSet: 12
Gerrit-Owner: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Fri, 07 Sep 2018 18:10:53 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-7213: Port ReportExecStatus() RPC to use KRPC

Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Todd Lipcon has posted comments on this change. ( http://gerrit.cloudera.org:8080/10855 )

Change subject: IMPALA-7213: Port ReportExecStatus() RPC to use KRPC
......................................................................


Patch Set 8:

(12 comments)

http://gerrit.cloudera.org:8080/#/c/10855/8/be/src/exec/data-sink.cc
File be/src/exec/data-sink.cc:

http://gerrit.cloudera.org:8080/#/c/10855/8/be/src/exec/data-sink.cc@49
PS8, Line 49: const char* DataSink::ROOT_PARTITION_KEY = "";
this should be const char* const -- ie it's a const pointer to const characters. As is, the pointer is non-const and could be reassigned


http://gerrit.cloudera.org:8080/#/c/10855/8/be/src/exec/hdfs-table-writer.h
File be/src/exec/hdfs-table-writer.h:

http://gerrit.cloudera.org:8080/#/c/10855/8/be/src/exec/hdfs-table-writer.h@91
PS8, Line 91:   const InsertStatsPB& stats() { return stats_; }
probably should be a const function as well?


http://gerrit.cloudera.org:8080/#/c/10855/8/be/src/runtime/coordinator-backend-state.h
File be/src/runtime/coordinator-backend-state.h:

http://gerrit.cloudera.org:8080/#/c/10855/8/be/src/runtime/coordinator-backend-state.h@211
PS8, Line 211:   Coordinator* coord_; /// Coordinator object that owns this BackendState
this went from const to non-const. does this mutate the coordinator that owns it?


http://gerrit.cloudera.org:8080/#/c/10855/8/be/src/runtime/coordinator-backend-state.cc
File be/src/runtime/coordinator-backend-state.cc:

http://gerrit.cloudera.org:8080/#/c/10855/8/be/src/runtime/coordinator-backend-state.cc@50
PS8, Line 50:   : coord_(coord),
nit: you could just do coord_(DHECK_NOTNULL(coord)) here since DCHECK_NOTNULL passes through its parameter


http://gerrit.cloudera.org:8080/#/c/10855/8/be/src/runtime/coordinator-backend-state.cc@262
PS8, Line 262:     lock.unlock();
I'm nervous about this optimization. Do we really have data that deserialization is slow enough to be worth it? What was the motivation to move the deserialization here instead of where it was before?

 It seems like, while you unlock the lock here, it's possible that the query could change state so that IsDone() is true after re-acquiring it, for example.


http://gerrit.cloudera.org:8080/#/c/10855/8/be/src/runtime/query-state.cc
File be/src/runtime/query-state.cc:

http://gerrit.cloudera.org:8080/#/c/10855/8/be/src/runtime/query-state.cc@318
PS8, Line 318: bool done
trying to understand the purpose of this parameter. I don't think this should be changed in this patch since it has nothing to do with krpc vs thrift, but shouldn't the 'done'-ness be able to be inferred from a combination of backend_exec_state_, and fis->current_state_? ie when a query or fragment fails, those states are transitioned before ReportExecStatus is called, and then we can just check that state to determine whether it's the last report or not?

Anyway, something for later.


http://gerrit.cloudera.org:8080/#/c/10855/8/be/src/runtime/query-state.cc@331
PS8, Line 331:   ThriftSerializer serializer(true);
Why construct this here and pass it into ConstructReport? It seems ConstructReport is only called here, and this serializer is only used by ConstructReport, so it could just be constructed in there.


http://gerrit.cloudera.org:8080/#/c/10855/8/be/src/runtime/query-state.cc@359
PS8, Line 359:       CHECK(sidecar_status.ok())
             :           << FromKuduStatus(sidecar_status, "Failed to add sidecar").GetDetail();
is there any chance that the profile would extend past max RPC size? I guess not since you configure the max RPC size to multiple GBs?


http://gerrit.cloudera.org:8080/#/c/10855/8/be/src/runtime/query-state.cc@373
PS8, Line 373:     if (i < 2) SleepForMs(FLAGS_report_status_retry_interval_ms);
worth a LOG(WARNING) here?


http://gerrit.cloudera.org:8080/#/c/10855/8/be/src/util/uid-util.h
File be/src/util/uid-util.h:

http://gerrit.cloudera.org:8080/#/c/10855/8/be/src/util/uid-util.h@61
PS8, Line 61:   unique_id_pb->set_lo(t_unique_id.lo);
worth a DCHECK that t_unique_id.__isset_.lo and hi?


http://gerrit.cloudera.org:8080/#/c/10855/8/common/protobuf/control_service.proto
File common/protobuf/control_service.proto:

http://gerrit.cloudera.org:8080/#/c/10855/8/common/protobuf/control_service.proto@140
PS8, Line 140: This is sent only on completion of a fragment instance.
can you be more explicit and say that this is sent only when 'done' is true?


http://gerrit.cloudera.org:8080/#/c/10855/8/common/protobuf/control_service.proto@145
PS8, Line 145:   // Map of TErrorCode to ErrorLogEntryPB; New errors that have not been reported to
             :   // the coordinator by this fragment instance. Not idempotent. The done flag helps
             :   // prevent applying the same update twice.
             :   map<int32, ErrorLogEntryPB> error_log = 7;
so this is only sent when 'done' is true? if not, then the done flag doesn't help avoid missing or duplicated errors, right? ie if these are sent for non-done fragments, the sender side may retry 3 times and given up and drop some errors on the floor. Or, it may retry three times, and the server side may multiply-apply the updates?



-- 
To view, visit http://gerrit.cloudera.org:8080/10855
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7638583b433dcac066b87198e448743d90415ebe
Gerrit-Change-Number: 10855
Gerrit-PatchSet: 8
Gerrit-Owner: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Wed, 08 Aug 2018 19:11:48 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-7213, IMPALA-7241: Port ReportExecStatus() RPC to use KRPC

Posted by "Michael Ho (Code Review)" <ge...@cloudera.org>.
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/10855 )

Change subject: IMPALA-7213, IMPALA-7241: Port ReportExecStatus() RPC to use KRPC
......................................................................


Patch Set 15:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/10855/15/common/protobuf/control_service.proto
File common/protobuf/control_service.proto:

http://gerrit.cloudera.org:8080/#/c/10855/15/common/protobuf/control_service.proto@113
PS15, Line 113: int32
int64



-- 
To view, visit http://gerrit.cloudera.org:8080/10855
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7638583b433dcac066b87198e448743d90415ebe
Gerrit-Change-Number: 10855
Gerrit-PatchSet: 15
Gerrit-Owner: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Michal Ostrowski <mo...@cloudera.com>
Gerrit-Reviewer: Thomas Marshall <th...@cmu.edu>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Wed, 10 Oct 2018 20:48:41 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-7213, IMPALA-7241: Port ReportExecStatus() RPC to use KRPC

Posted by "Michael Ho (Code Review)" <ge...@cloudera.org>.
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/10855 )

Change subject: IMPALA-7213, IMPALA-7241: Port ReportExecStatus() RPC to use KRPC
......................................................................


Patch Set 12:

(14 comments)

http://gerrit.cloudera.org:8080/#/c/10855/10/be/src/exec/hdfs-parquet-table-writer.h
File be/src/exec/hdfs-parquet-table-writer.h:

http://gerrit.cloudera.org:8080/#/c/10855/10/be/src/exec/hdfs-parquet-table-writer.h@199
PS10, Line 199: 
> nit: should #include the appropriate .pb.h here ("include-what-you-use")
Done


http://gerrit.cloudera.org:8080/#/c/10855/10/be/src/runtime/coordinator-backend-state.h
File be/src/runtime/coordinator-backend-state.h:

http://gerrit.cloudera.org:8080/#/c/10855/10/be/src/runtime/coordinator-backend-state.h@60
PS10, Line 60: const Coordinator&
> I think this can probably change back to being const if you take the sugges
Done


http://gerrit.cloudera.org:8080/#/c/10855/10/be/src/runtime/coordinator-backend-state.h@176
PS10, Line 176: last_report_ti
> nit: I think the term "sequence number" is more usual here -- "version" to 
Done


http://gerrit.cloudera.org:8080/#/c/10855/10/be/src/runtime/coordinator-backend-state.h@220
PS10, Line 220:   /// Backend exec params, owned by the QuerySchedule and has query lifetime.
> This "back pointer" still seems error-prone to me. I think the object lifet
Done


http://gerrit.cloudera.org:8080/#/c/10855/10/be/src/runtime/coordinator-backend-state.cc
File be/src/runtime/coordinator-backend-state.cc:

http://gerrit.cloudera.org:8080/#/c/10855/10/be/src/runtime/coordinator-backend-state.cc@267
PS10, Line 267:   return num_remaining_instances_ == 0 || !status_.ok();
> I think a VLOG_QUERY about the skipped RPC is probably useful
Done


http://gerrit.cloudera.org:8080/#/c/10855/10/be/src/runtime/coordinator-backend-state.cc@294
PS10, Line 294:     DCHECK(!instance_stats->done_);
> nit: why not:
The ctor was marked explicit so not sure it's allowed:
  "explicit Status(const StatusPB& status);"


http://gerrit.cloudera.org:8080/#/c/10855/10/be/src/runtime/query-state.cc
File be/src/runtime/query-state.cc:

http://gerrit.cloudera.org:8080/#/c/10855/10/be/src/runtime/query-state.cc@287
PS10, Line 287: atus = report
> Ah, I missed that we join on the reporter thread first.
Good idea about using DFAKE_MUTEX(). Also switched to using a non-atomic.

Also simplified the logic in Coordinator::BackendState::ApplyExecStatusReport() as we can rely purely on the sequence number as you suggested.


http://gerrit.cloudera.org:8080/#/c/10855/10/be/src/runtime/query-state.cc@375
PS10, Line 375:     ReportExecStatusResponsePB resp;
> should we have a failure injection point on the RPC itself? I only saw fail
Please find the tests in test_rpc_timeout.py which:

1. inject delays in the RPC handler to induce timeout
2. run with a very short service queue to emulate a busy server.


http://gerrit.cloudera.org:8080/#/c/10855/10/be/src/runtime/query-state.cc@379
PS10, Line 379: reak;
> should we backoff?
I will refrain from changing the logic here too much. There will be a follow up patch after IMPALA-4063 which will change the retry logic. TODO added.


http://gerrit.cloudera.org:8080/#/c/10855/10/be/src/runtime/runtime-state.cc
File be/src/runtime/runtime-state.cc:

http://gerrit.cloudera.org:8080/#/c/10855/10/be/src/runtime/runtime-state.cc@202
PS10, Line 202:   }
> the method doc says that new_errors is cleared, but it's actually written i
This was lost after refactoring this function. Fixed now.


http://gerrit.cloudera.org:8080/#/c/10855/10/be/src/service/control-service.cc
File be/src/service/control-service.cc:

PS10: 
> This is a general krpc-in-Impala question: I can't find where you set up au
Very good point. This is definitely a bug and it's now fixed in this commit here (https://github.com/apache/impala/commit/5c541b960491ba91533712144599fb3b6d99521d)


http://gerrit.cloudera.org:8080/#/c/10855/10/be/src/util/uid-util.h
File be/src/util/uid-util.h:

http://gerrit.cloudera.org:8080/#/c/10855/10/be/src/util/uid-util.h@79
PS10, Line 79:   DCHECK(uid_pb.IsInitialized());
> worth DCHECKs here that the fields are set by calling uid_pb.IsInitialized(
Done


http://gerrit.cloudera.org:8080/#/c/10855/10/bin/impala-config.sh
File bin/impala-config.sh:

http://gerrit.cloudera.org:8080/#/c/10855/10/bin/impala-config.sh@562
PS10, Line 562: export HBASE_CONF_DIR="$IMPALA_FE_DIR/src/test/resources"
> why's this necessary? Can we change cmake to invoke it from the full path i
FindProtobuf should have set PROTOBUF_PROTOC_EXECUTABLE. Not sure why I needed to set it before.


http://gerrit.cloudera.org:8080/#/c/10855/10/tests/custom_cluster/test_rpc_exception.py
File tests/custom_cluster/test_rpc_exception.py:

http://gerrit.cloudera.org:8080/#/c/10855/10/tests/custom_cluster/test_rpc_exception.py@97
PS10, Line 97: 
> can we change this flag to be in millis instead of seconds? Or do we advert
I don't think this flag is documented as far as I understand.  We can deprecate this old flag and rename it to include '_ms' suffix.



-- 
To view, visit http://gerrit.cloudera.org:8080/10855
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7638583b433dcac066b87198e448743d90415ebe
Gerrit-Change-Number: 10855
Gerrit-PatchSet: 12
Gerrit-Owner: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Thu, 06 Sep 2018 17:48:58 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-7213, IMPALA-7241: Port ReportExecStatus() RPC to use KRPC

Posted by "Michael Ho (Code Review)" <ge...@cloudera.org>.
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/10855 )

Change subject: IMPALA-7213, IMPALA-7241: Port ReportExecStatus() RPC to use KRPC
......................................................................


Patch Set 23:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/10855/23/be/src/runtime/query-state.cc
File be/src/runtime/query-state.cc:

http://gerrit.cloudera.org:8080/#/c/10855/23/be/src/runtime/query-state.cc@353
PS23, Line 353:       unique_ptr<kudu::faststring> sidecar_buf = make_unique<kudu::faststring>();
              :       sidecar_buf->assign_copy(profile_buf, profile_len);
              :       unique_ptr<RpcSidecar> sidecar = RpcSidecar::FromFaststring(move(sidecar_buf));
Can't we use a slice here instead ?



-- 
To view, visit http://gerrit.cloudera.org:8080/10855
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7638583b433dcac066b87198e448743d90415ebe
Gerrit-Change-Number: 10855
Gerrit-PatchSet: 23
Gerrit-Owner: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Michal Ostrowski <mo...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Sat, 02 Nov 2019 00:19:19 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-7213, IMPALA-7241: Port ReportExecStatus() RPC to use KRPC

Posted by "Michael Ho (Code Review)" <ge...@cloudera.org>.
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/10855 )

Change subject: IMPALA-7213, IMPALA-7241: Port ReportExecStatus() RPC to use KRPC
......................................................................


Patch Set 14:

(8 comments)

http://gerrit.cloudera.org:8080/#/c/10855/13/be/src/runtime/coordinator-backend-state.h
File be/src/runtime/coordinator-backend-state.h:

http://gerrit.cloudera.org:8080/#/c/10855/13/be/src/runtime/coordinator-backend-state.h@152
PS13, Line 152: /// Updates 'this' with exec_status and the fragment intance's thrift prof
> comment is out of date
Done


http://gerrit.cloudera.org:8080/#/c/10855/13/be/src/runtime/coordinator-backend-state.cc
File be/src/runtime/coordinator-backend-state.cc:

http://gerrit.cloudera.org:8080/#/c/10855/13/be/src/runtime/coordinator-backend-state.cc@523
PS13, Line 523:       if (rows_counter != nullptr) {
> line too long (91 > 90)
Done


http://gerrit.cloudera.org:8080/#/c/10855/13/be/src/runtime/coordinator-backend-state.cc@530
PS13, Line 530:     }
> line too long (91 > 90)
Done


http://gerrit.cloudera.org:8080/#/c/10855/13/be/src/service/control-service.cc
File be/src/service/control-service.cc:

http://gerrit.cloudera.org:8080/#/c/10855/13/be/src/service/control-service.cc@42
PS13, Line 42: QUEUE_LIMIT_MSG
> Looks like a negative value here gives no limit? Maybe worth mentioning.
Thanks for pointing that out. The handling of negative value is fixed in new PS.

ParseUtil::ParseMemSpec() already distinguishes the input value between absolute value and percentage. If the input is percentage, it will compute the memory limit based on the reference value passed to ParseMemSpec().


http://gerrit.cloudera.org:8080/#/c/10855/13/be/src/service/control-service.cc@46
PS13, Line 46: if left at default value 0
> or negative
Wording fixed.


http://gerrit.cloudera.org:8080/#/c/10855/13/common/protobuf/control_service.proto
File common/protobuf/control_service.proto:

http://gerrit.cloudera.org:8080/#/c/10855/13/common/protobuf/control_service.proto@50
PS13, Line 50:   optional KuduDmlStatsPB kudu_stats = 3;
> is this, and elsewhere, referring to anything other than renaming, eg. to D
Looking at (https://gerrit.cloudera.org/#/c/4985/1/common/thrift/ImpalaInternalService.thrift@430), it seems to suggest more than just renaming the struct. It also involves some sort of refactoring.

That said, if we don't really know the meaning of this TODO statement, we can drop it too. What's your take ?


http://gerrit.cloudera.org:8080/#/c/10855/13/common/protobuf/control_service.proto@107
PS13, Line 107: 
              :   FIRST_BATCH_PRODUCED
> is this true if an error is encountered?
Copied and pasted from Thrift Implementation. Fixed.


http://gerrit.cloudera.org:8080/#/c/10855/13/common/protobuf/control_service.proto@154
PS13, Line 154: individual fra
> I guess this and elsewhere was copied from the original thrift definition. 
I am not 100% sure about the historical context but I believe the rationale behind all the RPC versioning is that we can have backward compatibility between different versions of Impala (See also https://gerrit.cloudera.org/#/c/6535/4/common/thrift/ImpalaInternalService.thrift@698)

That said, I don't think we ever implemented any true support for mixed versions of Impala (which is very useful for rolling upgrade). May be it's fair to drop the version string for now until we commit to supporting mixed version of Impala. In which case, we can take whatever snapshot of the structs then as V1. Before that, having the version string seems like some cruft that we don't need at the moment.

Also, I don't quite understand the rationale behind the required fields in some of the existing Thrift structs as that seems to make it hard to change the definition of the struct later. So, for now, I am marking all fields to be optional (and latter version of protobuf actually gets rid of required/optional so everything is optional anyway).



-- 
To view, visit http://gerrit.cloudera.org:8080/10855
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7638583b433dcac066b87198e448743d90415ebe
Gerrit-Change-Number: 10855
Gerrit-PatchSet: 14
Gerrit-Owner: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Michal Ostrowski <mo...@cloudera.com>
Gerrit-Reviewer: Thomas Marshall <th...@cmu.edu>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Mon, 08 Oct 2018 06:14:44 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-7213, IMPALA-7241: Port ReportExecStatus() RPC to use KRPC

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/10855 )

Change subject: IMPALA-7213, IMPALA-7241: Port ReportExecStatus() RPC to use KRPC
......................................................................


Patch Set 15:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/1006/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/10855
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7638583b433dcac066b87198e448743d90415ebe
Gerrit-Change-Number: 10855
Gerrit-PatchSet: 15
Gerrit-Owner: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Michal Ostrowski <mo...@cloudera.com>
Gerrit-Reviewer: Thomas Marshall <th...@cmu.edu>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Wed, 10 Oct 2018 08:46:18 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-7213, IMPALA-7241: Port ReportExecStatus() RPC to use KRPC

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/10855 )

Change subject: IMPALA-7213, IMPALA-7241: Port ReportExecStatus() RPC to use KRPC
......................................................................


Patch Set 17:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/1031/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/10855
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7638583b433dcac066b87198e448743d90415ebe
Gerrit-Change-Number: 10855
Gerrit-PatchSet: 17
Gerrit-Owner: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Michal Ostrowski <mo...@cloudera.com>
Gerrit-Reviewer: Thomas Marshall <th...@cmu.edu>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Fri, 12 Oct 2018 06:46:13 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-7213, IMPALA-7241: Port ReportExecStatus() RPC to use KRPC

Posted by "Michael Ho (Code Review)" <ge...@cloudera.org>.
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/10855 )

Change subject: IMPALA-7213, IMPALA-7241: Port ReportExecStatus() RPC to use KRPC
......................................................................


Patch Set 9:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/10855/9/be/src/runtime/coordinator-backend-state.h
File be/src/runtime/coordinator-backend-state.h:

http://gerrit.cloudera.org:8080/#/c/10855/9/be/src/runtime/coordinator-backend-state.h@33
PS9, Line 33: #include "common/atomic.h"
Not needed.


http://gerrit.cloudera.org:8080/#/c/10855/9/common/protobuf/control_service.proto
File common/protobuf/control_service.proto:

http://gerrit.cloudera.org:8080/#/c/10855/9/common/protobuf/control_service.proto@137
PS9, Line 137: 6
Skipped 5.



-- 
To view, visit http://gerrit.cloudera.org:8080/10855
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7638583b433dcac066b87198e448743d90415ebe
Gerrit-Change-Number: 10855
Gerrit-PatchSet: 9
Gerrit-Owner: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Thu, 09 Aug 2018 00:22:23 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-7213: Port ReportExecStatus() RPC to use KRPC

Posted by "Michael Ho (Code Review)" <ge...@cloudera.org>.
Hello Sailesh Mukil, Todd Lipcon, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/10855

to look at the new patch set (#7).

Change subject: IMPALA-7213: Port ReportExecStatus() RPC to use KRPC
......................................................................

IMPALA-7213: Port ReportExecStatus() RPC to use KRPC

This change converts ReportExecStatus() RPC from thrift
based RPC to KRPC. This is done in part of the preparation
for fixing IMPALA-2990 as we can take advantage of TCP connection
multiplexing in KRPC to avoid overwhelming the coordinator
with too many connections by reducing the number of TCP connection
to one for each executor.

This patch also introduces a new service pool for all query execution
control related RPCs in the future so that control commands from
coordinators aren't blocked by long-running DataStream services' RPCs.
The majority of this patch is mechanical conversion of some Thrift
structures used in ReportExecStatus() RPC to Protobuf. Note that the
runtime profile is still retained as a Thrift structure as Impala
clients will still fetch query profiles using Thrift RPCs. This also
avoids duplicating the serialization implementation in both Thrift
and Protobuf for the runtime profile. The Thrift runtime profiles
are serialized and sent as a sidecar in ReportExecStatus() RPC.

Change-Id: I7638583b433dcac066b87198e448743d90415ebe
---
M be/src/benchmarks/expr-benchmark.cc
M be/src/catalog/catalog-util.cc
M be/src/exec/data-sink.cc
M be/src/exec/data-sink.h
M be/src/exec/hbase-table-sink.cc
M be/src/exec/hdfs-parquet-table-writer.cc
M be/src/exec/hdfs-parquet-table-writer.h
M be/src/exec/hdfs-table-sink.cc
M be/src/exec/hdfs-table-writer.cc
M be/src/exec/hdfs-table-writer.h
M be/src/kudu/rpc/rpc_context.cc
M be/src/kudu/rpc/rpc_context.h
M be/src/rpc/jni-thrift-util.h
M be/src/rpc/thrift-util-test.cc
M be/src/rpc/thrift-util.h
M be/src/runtime/backend-client.h
M be/src/runtime/coordinator-backend-state.cc
M be/src/runtime/coordinator-backend-state.h
M be/src/runtime/coordinator.cc
M be/src/runtime/coordinator.h
M be/src/runtime/dml-exec-state.cc
M be/src/runtime/dml-exec-state.h
M be/src/runtime/exec-env.cc
M be/src/runtime/exec-env.h
M be/src/runtime/fragment-instance-state.cc
M be/src/runtime/fragment-instance-state.h
M be/src/runtime/query-state.cc
M be/src/runtime/query-state.h
M be/src/runtime/runtime-state.cc
M be/src/runtime/runtime-state.h
M be/src/runtime/test-env.cc
M be/src/scheduling/admission-controller.cc
M be/src/scheduling/scheduler-test-util.cc
M be/src/service/CMakeLists.txt
M be/src/service/client-request-state.cc
M be/src/service/client-request-state.h
A be/src/service/control-service.cc
A be/src/service/control-service.h
M be/src/service/data-stream-service.cc
M be/src/service/data-stream-service.h
M be/src/service/impala-internal-service.cc
M be/src/service/impala-internal-service.h
M be/src/service/impala-server.cc
M be/src/service/impala-server.h
M be/src/testutil/in-process-servers.cc
M be/src/util/container-util.h
A be/src/util/error-util-internal.h
M be/src/util/error-util-test.cc
M be/src/util/error-util.cc
M be/src/util/error-util.h
M be/src/util/runtime-profile.cc
M be/src/util/uid-util.h
M bin/impala-config.sh
M common/protobuf/CMakeLists.txt
M common/protobuf/common.proto
A common/protobuf/control_service.proto
M common/protobuf/data_stream_service.proto
M common/protobuf/row_batch.proto
M common/protobuf/rpc_test.proto
M common/thrift/ImpalaInternalService.thrift
M tests/custom_cluster/test_rpc_exception.py
M tests/failure/test_failpoints.py
62 files changed, 1,095 insertions(+), 666 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/55/10855/7
-- 
To view, visit http://gerrit.cloudera.org:8080/10855
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I7638583b433dcac066b87198e448743d90415ebe
Gerrit-Change-Number: 10855
Gerrit-PatchSet: 7
Gerrit-Owner: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[Impala-ASF-CR] IMPALA-7213, IMPALA-7241: Port ReportExecStatus() RPC to use KRPC

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/10855 )

Change subject: IMPALA-7213, IMPALA-7241: Port ReportExecStatus() RPC to use KRPC
......................................................................


Patch Set 22: Code-Review+2


-- 
To view, visit http://gerrit.cloudera.org:8080/10855
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7638583b433dcac066b87198e448743d90415ebe
Gerrit-Change-Number: 10855
Gerrit-PatchSet: 22
Gerrit-Owner: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Michal Ostrowski <mo...@cloudera.com>
Gerrit-Reviewer: Thomas Marshall <th...@cmu.edu>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Thu, 01 Nov 2018 17:17:55 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-7213: Port ReportExecStatus() RPC to use KRPC

Posted by "Michael Ho (Code Review)" <ge...@cloudera.org>.
Hello Sailesh Mukil, Todd Lipcon, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/10855

to look at the new patch set (#5).

Change subject: IMPALA-7213: Port ReportExecStatus() RPC to use KRPC
......................................................................

IMPALA-7213: Port ReportExecStatus() RPC to use KRPC

This change converts ReportExecStatus() RPC from thrift
based RPC to KRPC. This is done in part of the preparation
for fixing IMPALA-2990 as we can take advantage of TCP connection
multiplexing in KRPC to avoid overwhelming the coordinator
with too many connections by reducing the number of TCP connection
to one for each executor.

This patch also introduces a new service pool for all query execution
control related RPCs in the future so that control commands from
coordinators aren't blocked by long-running DataStream services' RPCs.
The majority of this patch is mechanical conversion of some Thrift
structures used in ReportExecStatus() RPC to Protobuf. Note that the
runtime profile is still retained as a Thrift structure as Impala
clients will still fetch query profiles using Thrift RPCs. This also
avoids duplicating the serialization implementation in both Thrift
and Protobuf for the runtime profile. The Thrift runtime profiles
are serialized and sent as a sidecar in ReportExecStatus() RPC.

Change-Id: I7638583b433dcac066b87198e448743d90415ebe
---
M be/src/benchmarks/expr-benchmark.cc
M be/src/catalog/catalog-util.cc
M be/src/exec/data-sink.cc
M be/src/exec/data-sink.h
M be/src/exec/hbase-table-sink.cc
M be/src/exec/hdfs-parquet-table-writer.cc
M be/src/exec/hdfs-parquet-table-writer.h
M be/src/exec/hdfs-table-sink.cc
M be/src/exec/hdfs-table-writer.cc
M be/src/exec/hdfs-table-writer.h
M be/src/rpc/jni-thrift-util.h
M be/src/rpc/thrift-util-test.cc
M be/src/rpc/thrift-util.h
M be/src/runtime/backend-client.h
M be/src/runtime/coordinator-backend-state.cc
M be/src/runtime/coordinator-backend-state.h
M be/src/runtime/coordinator.cc
M be/src/runtime/coordinator.h
M be/src/runtime/dml-exec-state.cc
M be/src/runtime/dml-exec-state.h
M be/src/runtime/exec-env.cc
M be/src/runtime/exec-env.h
M be/src/runtime/fragment-instance-state.cc
M be/src/runtime/fragment-instance-state.h
M be/src/runtime/query-state.cc
M be/src/runtime/query-state.h
M be/src/runtime/runtime-state.cc
M be/src/runtime/runtime-state.h
M be/src/runtime/test-env.cc
M be/src/scheduling/admission-controller.cc
M be/src/scheduling/scheduler-test-util.cc
M be/src/service/CMakeLists.txt
M be/src/service/client-request-state.cc
M be/src/service/client-request-state.h
A be/src/service/control-service.cc
A be/src/service/control-service.h
M be/src/service/data-stream-service.cc
M be/src/service/data-stream-service.h
M be/src/service/impala-internal-service.cc
M be/src/service/impala-internal-service.h
M be/src/service/impala-server.cc
M be/src/service/impala-server.h
M be/src/testutil/in-process-servers.cc
M be/src/util/container-util.h
A be/src/util/error-util-internal.h
M be/src/util/error-util-test.cc
M be/src/util/error-util.cc
M be/src/util/error-util.h
M be/src/util/runtime-profile.cc
M be/src/util/uid-util.h
M bin/impala-config.sh
M common/protobuf/CMakeLists.txt
M common/protobuf/common.proto
A common/protobuf/control_service.proto
M common/protobuf/data_stream_service.proto
M common/protobuf/row_batch.proto
M common/protobuf/rpc_test.proto
M common/thrift/ImpalaInternalService.thrift
58 files changed, 1,050 insertions(+), 634 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/55/10855/5
-- 
To view, visit http://gerrit.cloudera.org:8080/10855
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I7638583b433dcac066b87198e448743d90415ebe
Gerrit-Change-Number: 10855
Gerrit-PatchSet: 5
Gerrit-Owner: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[Impala-ASF-CR] IMPALA-7213: Port ReportExecStatus() RPC to use KRPC

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/10855 )

Change subject: IMPALA-7213: Port ReportExecStatus() RPC to use KRPC
......................................................................


Patch Set 8:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/244/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/10855
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7638583b433dcac066b87198e448743d90415ebe
Gerrit-Change-Number: 10855
Gerrit-PatchSet: 8
Gerrit-Owner: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Wed, 08 Aug 2018 18:25:12 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-7213, IMPALA-7241: Port ReportExecStatus() RPC to use KRPC

Posted by "Michael Ho (Code Review)" <ge...@cloudera.org>.
Hello Sailesh Mukil, Todd Lipcon, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/10855

to look at the new patch set (#9).

Change subject: IMPALA-7213, IMPALA-7241: Port ReportExecStatus() RPC to use KRPC
......................................................................

IMPALA-7213, IMPALA-7241: Port ReportExecStatus() RPC to use KRPC

This change converts ReportExecStatus() RPC from thrift
based RPC to KRPC. This is done in part of the preparation
for fixing IMPALA-2990 as we can take advantage of TCP connection
multiplexing in KRPC to avoid overwhelming the coordinator
with too many connections by reducing the number of TCP connection
to one for each executor.

This patch also introduces a new service pool for all query execution
control related RPCs in the future so that control commands from
coordinators aren't blocked by long-running DataStream services' RPCs.
The majority of this patch is mechanical conversion of some Thrift
structures used in ReportExecStatus() RPC to Protobuf. Note that the
runtime profile is still retained as a Thrift structure as Impala
clients will still fetch query profiles using Thrift RPCs. This also
avoids duplicating the serialization implementation in both Thrift
and Protobuf for the runtime profile. The Thrift runtime profiles
are serialized and sent as a sidecar in ReportExecStatus() RPC.

This patch also fixes IMPALA-7241 which may lead to duplicated
dml stats being applied. The fix is by adding a monotonically
increasing version number for fragment instances' reports. The
coordinator will ignore any report smaller than or equal to the
version in the last report.

Testing done: core build. Added some targeted test cases for profile
serialization failure and RPC retry.

Change-Id: I7638583b433dcac066b87198e448743d90415ebe
---
M be/src/benchmarks/expr-benchmark.cc
M be/src/catalog/catalog-util.cc
M be/src/exec/data-sink.cc
M be/src/exec/data-sink.h
M be/src/exec/hbase-table-sink.cc
M be/src/exec/hdfs-parquet-table-writer.cc
M be/src/exec/hdfs-parquet-table-writer.h
M be/src/exec/hdfs-table-sink.cc
M be/src/exec/hdfs-table-writer.cc
M be/src/exec/hdfs-table-writer.h
M be/src/rpc/jni-thrift-util.h
M be/src/rpc/thrift-util-test.cc
M be/src/rpc/thrift-util.h
M be/src/runtime/backend-client.h
M be/src/runtime/coordinator-backend-state.cc
M be/src/runtime/coordinator-backend-state.h
M be/src/runtime/coordinator.cc
M be/src/runtime/coordinator.h
M be/src/runtime/dml-exec-state.cc
M be/src/runtime/dml-exec-state.h
M be/src/runtime/exec-env.cc
M be/src/runtime/exec-env.h
M be/src/runtime/fragment-instance-state.cc
M be/src/runtime/fragment-instance-state.h
M be/src/runtime/query-state.cc
M be/src/runtime/query-state.h
M be/src/runtime/runtime-state.cc
M be/src/runtime/runtime-state.h
M be/src/runtime/test-env.cc
M be/src/scheduling/admission-controller.cc
M be/src/scheduling/scheduler-test-util.cc
M be/src/service/CMakeLists.txt
M be/src/service/client-request-state.cc
M be/src/service/client-request-state.h
A be/src/service/control-service.cc
A be/src/service/control-service.h
M be/src/service/data-stream-service.cc
M be/src/service/data-stream-service.h
M be/src/service/impala-internal-service.cc
M be/src/service/impala-internal-service.h
M be/src/service/impala-server.cc
M be/src/service/impala-server.h
M be/src/testutil/in-process-servers.cc
M be/src/util/container-util.h
A be/src/util/error-util-internal.h
M be/src/util/error-util-test.cc
M be/src/util/error-util.cc
M be/src/util/error-util.h
M be/src/util/runtime-profile.cc
M be/src/util/uid-util.h
M bin/impala-config.sh
M common/protobuf/CMakeLists.txt
M common/protobuf/common.proto
A common/protobuf/control_service.proto
M common/protobuf/data_stream_service.proto
M common/protobuf/row_batch.proto
M common/protobuf/rpc_test.proto
M common/thrift/ImpalaInternalService.thrift
M tests/custom_cluster/test_rpc_exception.py
M tests/failure/test_failpoints.py
60 files changed, 1,110 insertions(+), 673 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/55/10855/9
-- 
To view, visit http://gerrit.cloudera.org:8080/10855
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I7638583b433dcac066b87198e448743d90415ebe
Gerrit-Change-Number: 10855
Gerrit-PatchSet: 9
Gerrit-Owner: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[Impala-ASF-CR] IMPALA-7213, IMPALA-7241: Port ReportExecStatus() RPC to use KRPC

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/10855 )

Change subject: IMPALA-7213, IMPALA-7241: Port ReportExecStatus() RPC to use KRPC
......................................................................


Patch Set 12:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/10855/12/tests/custom_cluster/test_rpc_timeout.py
File tests/custom_cluster/test_rpc_timeout.py:

http://gerrit.cloudera.org:8080/#/c/10855/12/tests/custom_cluster/test_rpc_timeout.py@42
PS12, Line 42:  
flake8: E251 unexpected spaces around keyword / parameter equals


http://gerrit.cloudera.org:8080/#/c/10855/12/tests/custom_cluster/test_rpc_timeout.py@42
PS12, Line 42:  
flake8: E251 unexpected spaces around keyword / parameter equals



-- 
To view, visit http://gerrit.cloudera.org:8080/10855
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7638583b433dcac066b87198e448743d90415ebe
Gerrit-Change-Number: 10855
Gerrit-PatchSet: 12
Gerrit-Owner: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Thu, 06 Sep 2018 18:26:18 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-7213: Port ReportExecStatus() RPC to use KRPC

Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Todd Lipcon has posted comments on this change. ( http://gerrit.cloudera.org:8080/10855 )

Change subject: IMPALA-7213: Port ReportExecStatus() RPC to use KRPC
......................................................................


Patch Set 8:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/10855/8/be/src/runtime/coordinator-backend-state.cc
File be/src/runtime/coordinator-backend-state.cc:

http://gerrit.cloudera.org:8080/#/c/10855/8/be/src/runtime/coordinator-backend-state.cc@317
PS8, Line 317: MergeErrorMaps(instance_exec_status.error_log(), &error_log_);
> This is less severe, but I guess that this is already wrong too. We potenti
I think if we want to solve this idempotency issue generally we would probably just want to add sequence numbers to the report, and ignore any reports with lower sequence numbers than we've already seen. I was hesitating suggesting this because in theory this patch is just trying to port rather than also fix these other issues, but if we want to go there that's the route I'd suggest


http://gerrit.cloudera.org:8080/#/c/10855/8/be/src/service/control-service.cc
File be/src/service/control-service.cc:

http://gerrit.cloudera.org:8080/#/c/10855/8/be/src/service/control-service.cc@76
PS8, Line 76: 
> I feel that we can still keep the deserialization here and just check for t
+1



-- 
To view, visit http://gerrit.cloudera.org:8080/10855
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7638583b433dcac066b87198e448743d90415ebe
Gerrit-Change-Number: 10855
Gerrit-PatchSet: 8
Gerrit-Owner: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Wed, 08 Aug 2018 23:12:14 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-7213, IMPALA-7241: Port ReportExecStatus() RPC to use KRPC

Posted by "Bikramjeet Vig (Code Review)" <ge...@cloudera.org>.
Bikramjeet Vig has posted comments on this change. ( http://gerrit.cloudera.org:8080/10855 )

Change subject: IMPALA-7213, IMPALA-7241: Port ReportExecStatus() RPC to use KRPC
......................................................................


Patch Set 17:

(7 comments)

http://gerrit.cloudera.org:8080/#/c/10855/17//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/10855/17//COMMIT_MSG@40
PS17, Line 40: 2. Added some targeted test cases for profile serialization failure and RPC retries/timeout.
nit: long line


http://gerrit.cloudera.org:8080/#/c/10855/14/be/src/rpc/thrift-util-test.cc
File be/src/rpc/thrift-util-test.cc:

http://gerrit.cloudera.org:8080/#/c/10855/14/be/src/rpc/thrift-util-test.cc@58
PS14, Line 58: 
Should we add a test for SerializeToString as well?


http://gerrit.cloudera.org:8080/#/c/10855/17/be/src/runtime/query-state.cc
File be/src/runtime/query-state.cc:

http://gerrit.cloudera.org:8080/#/c/10855/17/be/src/runtime/query-state.cc@356
PS17, Line 356: CHECK
should we handle errors like these and log it instead?


http://gerrit.cloudera.org:8080/#/c/10855/16/be/src/service/control-service.h
File be/src/service/control-service.h:

http://gerrit.cloudera.org:8080/#/c/10855/16/be/src/service/control-service.h@52
PS16, Line 52: of
nit: with


http://gerrit.cloudera.org:8080/#/c/10855/14/be/src/testutil/in-process-servers.cc
File be/src/testutil/in-process-servers.cc:

http://gerrit.cloudera.org:8080/#/c/10855/14/be/src/testutil/in-process-servers.cc@51
PS14, Line 51:  FLAGS_krpc_port = 
> Some tests were failing without this. I don't recall which one. Apparently,
can you document the reason here? it was not clear to me as to why only this one is specifically assigned here.


http://gerrit.cloudera.org:8080/#/c/10855/14/be/src/util/runtime-profile.cc
File be/src/util/runtime-profile.cc:

http://gerrit.cloudera.org:8080/#/c/10855/14/be/src/util/runtime-profile.cc@251
PS14, Line 251: if (UNLIKELY(nodes.size()) == 0) return;
does this only happen when thrift de-serialization fails?


http://gerrit.cloudera.org:8080/#/c/10855/14/common/protobuf/control_service.proto
File common/protobuf/control_service.proto:

http://gerrit.cloudera.org:8080/#/c/10855/14/common/protobuf/control_service.proto@27
PS14, Line 27: message ParquetDmlStatsPB {
nit: maybe retain the comment in thrift file
"// For each column, the on disk byte size"



-- 
To view, visit http://gerrit.cloudera.org:8080/10855
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7638583b433dcac066b87198e448743d90415ebe
Gerrit-Change-Number: 10855
Gerrit-PatchSet: 17
Gerrit-Owner: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Michal Ostrowski <mo...@cloudera.com>
Gerrit-Reviewer: Thomas Marshall <th...@cmu.edu>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Fri, 12 Oct 2018 19:34:14 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-7213, IMPALA-7241: Port ReportExecStatus() RPC to use KRPC

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/10855 )

Change subject: IMPALA-7213, IMPALA-7241: Port ReportExecStatus() RPC to use KRPC
......................................................................


Patch Set 21:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/1247/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/10855
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7638583b433dcac066b87198e448743d90415ebe
Gerrit-Change-Number: 10855
Gerrit-PatchSet: 21
Gerrit-Owner: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Michal Ostrowski <mo...@cloudera.com>
Gerrit-Reviewer: Thomas Marshall <th...@cmu.edu>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Thu, 01 Nov 2018 17:26:30 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-7213, IMPALA-7241: Port ReportExecStatus() RPC to use KRPC

Posted by "Thomas Marshall (Code Review)" <ge...@cloudera.org>.
Thomas Marshall has posted comments on this change. ( http://gerrit.cloudera.org:8080/10855 )

Change subject: IMPALA-7213, IMPALA-7241: Port ReportExecStatus() RPC to use KRPC
......................................................................


Patch Set 13:

(6 comments)

http://gerrit.cloudera.org:8080/#/c/10855/13/be/src/runtime/coordinator-backend-state.h
File be/src/runtime/coordinator-backend-state.h:

http://gerrit.cloudera.org:8080/#/c/10855/13/be/src/runtime/coordinator-backend-state.h@152
PS13, Line 152: /// Updates 'this' with exec_status, the fragment instances' TExecStats in
comment is out of date


http://gerrit.cloudera.org:8080/#/c/10855/13/be/src/service/control-service.cc
File be/src/service/control-service.cc:

http://gerrit.cloudera.org:8080/#/c/10855/13/be/src/service/control-service.cc@42
PS13, Line 42: QUEUE_LIMIT_MSG
Looks like a negative value here gives no limit? Maybe worth mentioning.

Also looks like MEM_UNITS_HELP_MSG says that this can be a '%', but then you ignore 'is_percent' below so eg. '50%' would be interpreted as 50 bytes.


http://gerrit.cloudera.org:8080/#/c/10855/13/be/src/service/control-service.cc@46
PS13, Line 46: If left at default value 0
or negative


http://gerrit.cloudera.org:8080/#/c/10855/13/common/protobuf/control_service.proto
File common/protobuf/control_service.proto:

http://gerrit.cloudera.org:8080/#/c/10855/13/common/protobuf/control_service.proto@50
PS13, Line 50: // TODO: Refactor to reflect usage by other DML statements.
is this, and elsewhere, referring to anything other than renaming, eg. to DMLStatsPB? If not, is it worth just resolving this now while you're here?


http://gerrit.cloudera.org:8080/#/c/10855/13/common/protobuf/control_service.proto@107
PS13, Line 107: one state will only strictly be reached after all
              : // the previous states
is this true if an error is encountered?


http://gerrit.cloudera.org:8080/#/c/10855/13/common/protobuf/control_service.proto@154
PS13, Line 154: required in V1
I guess this and elsewhere was copied from the original thrift definition. Do they still make sense, given that this is V1 of the protobuf implementation?



-- 
To view, visit http://gerrit.cloudera.org:8080/10855
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7638583b433dcac066b87198e448743d90415ebe
Gerrit-Change-Number: 10855
Gerrit-PatchSet: 13
Gerrit-Owner: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Michal Ostrowski <mo...@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>
Gerrit-Reviewer: Thomas Marshall <th...@cmu.edu>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Tue, 02 Oct 2018 21:04:11 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-7213, IMPALA-7241: Port ReportExecStatus() RPC to use KRPC

Posted by "Michael Ho (Code Review)" <ge...@cloudera.org>.
Michael Ho has uploaded a new patch set (#17). ( http://gerrit.cloudera.org:8080/10855 )

Change subject: IMPALA-7213, IMPALA-7241: Port ReportExecStatus() RPC to use KRPC
......................................................................

IMPALA-7213, IMPALA-7241: Port ReportExecStatus() RPC to use KRPC

This change converts ReportExecStatus() RPC from thrift
based RPC to KRPC. This is done in part of the preparation
for fixing IMPALA-2990 as we can take advantage of TCP connection
multiplexing in KRPC to avoid overwhelming the coordinator
with too many connections by reducing the number of TCP connection
to one for each executor.

This patch also introduces a new service pool for all query execution
control related RPCs in the future so that control commands from
coordinators aren't blocked by long-running DataStream services' RPCs.
To avoid unnecessary delays due to sharing the network connections
between DataStream service and Control service, this change added the
service name as part of the user credentials for the ConnectionId
so each service will use a separate connection.

The majority of this patch is mechanical conversion of some Thrift
structures used in ReportExecStatus() RPC to Protobuf. Note that the
runtime profile is still retained as a Thrift structure as Impala
clients will still fetch query profiles using Thrift RPCs. This also
avoids duplicating the serialization implementation in both Thrift
and Protobuf for the runtime profile. The Thrift runtime profiles
are serialized and sent as a sidecar in ReportExecStatus() RPC.

This patch also fixes IMPALA-7241 which may lead to duplicated
dml stats being applied. The fix is by adding a monotonically
increasing version number for fragment instances' reports. The
coordinator will ignore any report smaller than or equal to the
version in the last report.

Testing done:
1. Exhaustive build.
2. Added some targeted test cases for profile serialization failure and RPC retries/timeout.

Change-Id: I7638583b433dcac066b87198e448743d90415ebe
---
M be/src/benchmarks/expr-benchmark.cc
M be/src/catalog/catalog-util.cc
M be/src/common/global-flags.cc
M be/src/exec/data-sink.cc
M be/src/exec/data-sink.h
M be/src/exec/hbase-table-sink.cc
M be/src/exec/hdfs-parquet-table-writer.cc
M be/src/exec/hdfs-parquet-table-writer.h
M be/src/exec/hdfs-table-sink.cc
M be/src/exec/hdfs-table-writer.cc
M be/src/exec/hdfs-table-writer.h
M be/src/rpc/CMakeLists.txt
M be/src/rpc/jni-thrift-util.h
M be/src/rpc/rpc-mgr-kerberized-test.cc
M be/src/rpc/rpc-mgr-test.cc
M be/src/rpc/rpc-mgr-test.h
M be/src/rpc/rpc-mgr.h
M be/src/rpc/rpc-mgr.inline.h
M be/src/rpc/thrift-util-test.cc
M be/src/rpc/thrift-util.h
M be/src/runtime/backend-client.h
M be/src/runtime/coordinator-backend-state.cc
M be/src/runtime/coordinator-backend-state.h
M be/src/runtime/coordinator.cc
M be/src/runtime/coordinator.h
M be/src/runtime/dml-exec-state.cc
M be/src/runtime/dml-exec-state.h
M be/src/runtime/exec-env.cc
M be/src/runtime/exec-env.h
M be/src/runtime/fragment-instance-state.cc
M be/src/runtime/fragment-instance-state.h
M be/src/runtime/krpc-data-stream-sender.cc
M be/src/runtime/query-state.cc
M be/src/runtime/query-state.h
M be/src/runtime/runtime-state.cc
M be/src/runtime/runtime-state.h
M be/src/runtime/test-env.cc
M be/src/scheduling/admission-controller.cc
M be/src/scheduling/scheduler-test-util.cc
M be/src/service/CMakeLists.txt
M be/src/service/client-request-state.cc
M be/src/service/client-request-state.h
A be/src/service/control-service.cc
A be/src/service/control-service.h
M be/src/service/data-stream-service.cc
M be/src/service/data-stream-service.h
M be/src/service/impala-internal-service.cc
M be/src/service/impala-internal-service.h
M be/src/service/impala-server.cc
M be/src/service/impala-server.h
M be/src/testutil/in-process-servers.cc
M be/src/util/container-util.h
A be/src/util/error-util-internal.h
M be/src/util/error-util-test.cc
M be/src/util/error-util.cc
M be/src/util/error-util.h
M be/src/util/runtime-profile.cc
M be/src/util/uid-util.h
M common/protobuf/CMakeLists.txt
M common/protobuf/common.proto
A common/protobuf/control_service.proto
M common/protobuf/data_stream_service.proto
M common/protobuf/row_batch.proto
M common/protobuf/rpc_test.proto
M common/thrift/ImpalaInternalService.thrift
M tests/custom_cluster/test_rpc_timeout.py
66 files changed, 1,269 insertions(+), 771 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/55/10855/17
-- 
To view, visit http://gerrit.cloudera.org:8080/10855
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I7638583b433dcac066b87198e448743d90415ebe
Gerrit-Change-Number: 10855
Gerrit-PatchSet: 17
Gerrit-Owner: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Michal Ostrowski <mo...@cloudera.com>
Gerrit-Reviewer: Thomas Marshall <th...@cmu.edu>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[Impala-ASF-CR] IMPALA-7213, IMPALA-7241: Port ReportExecStatus() RPC to use KRPC

Posted by "Michael Ho (Code Review)" <ge...@cloudera.org>.
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/10855 )

Change subject: IMPALA-7213, IMPALA-7241: Port ReportExecStatus() RPC to use KRPC
......................................................................


Patch Set 19:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/10855/19/be/src/runtime/query-state.cc
File be/src/runtime/query-state.cc:

http://gerrit.cloudera.org:8080/#/c/10855/19/be/src/runtime/query-state.cc@363
PS19, Line 363:         LOG(DFATAL) << FromKuduStatus(sidecar_status, "Failed to add sidecar").GetDetail();
> line too long (91 > 90)
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/10855
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7638583b433dcac066b87198e448743d90415ebe
Gerrit-Change-Number: 10855
Gerrit-PatchSet: 19
Gerrit-Owner: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Michal Ostrowski <mo...@cloudera.com>
Gerrit-Reviewer: Thomas Marshall <th...@cmu.edu>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Thu, 25 Oct 2018 02:12:25 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-7213, IMPALA-7241: Port ReportExecStatus() RPC to use KRPC

Posted by "Michael Ho (Code Review)" <ge...@cloudera.org>.
Michael Ho has uploaded a new patch set (#14). ( http://gerrit.cloudera.org:8080/10855 )

Change subject: IMPALA-7213, IMPALA-7241: Port ReportExecStatus() RPC to use KRPC
......................................................................

IMPALA-7213, IMPALA-7241: Port ReportExecStatus() RPC to use KRPC

This change converts ReportExecStatus() RPC from thrift
based RPC to KRPC. This is done in part of the preparation
for fixing IMPALA-2990 as we can take advantage of TCP connection
multiplexing in KRPC to avoid overwhelming the coordinator
with too many connections by reducing the number of TCP connection
to one for each executor.

This patch also introduces a new service pool for all query execution
control related RPCs in the future so that control commands from
coordinators aren't blocked by long-running DataStream services' RPCs.
The majority of this patch is mechanical conversion of some Thrift
structures used in ReportExecStatus() RPC to Protobuf. Note that the
runtime profile is still retained as a Thrift structure as Impala
clients will still fetch query profiles using Thrift RPCs. This also
avoids duplicating the serialization implementation in both Thrift
and Protobuf for the runtime profile. The Thrift runtime profiles
are serialized and sent as a sidecar in ReportExecStatus() RPC.

This patch also fixes IMPALA-7241 which may lead to duplicated
dml stats being applied. The fix is by adding a monotonically
increasing version number for fragment instances' reports. The
coordinator will ignore any report smaller than or equal to the
version in the last report.

Testing done:
1. Exhaustive build.
2. Added some targeted test cases for profile serialization failure and RPC retries/timeout.

Change-Id: I7638583b433dcac066b87198e448743d90415ebe
---
M be/src/benchmarks/expr-benchmark.cc
M be/src/catalog/catalog-util.cc
M be/src/common/global-flags.cc
M be/src/exec/data-sink.cc
M be/src/exec/data-sink.h
M be/src/exec/hbase-table-sink.cc
M be/src/exec/hdfs-parquet-table-writer.cc
M be/src/exec/hdfs-parquet-table-writer.h
M be/src/exec/hdfs-table-sink.cc
M be/src/exec/hdfs-table-writer.cc
M be/src/exec/hdfs-table-writer.h
M be/src/rpc/CMakeLists.txt
M be/src/rpc/jni-thrift-util.h
M be/src/rpc/thrift-util-test.cc
M be/src/rpc/thrift-util.h
M be/src/runtime/backend-client.h
M be/src/runtime/coordinator-backend-state.cc
M be/src/runtime/coordinator-backend-state.h
M be/src/runtime/coordinator.cc
M be/src/runtime/coordinator.h
M be/src/runtime/dml-exec-state.cc
M be/src/runtime/dml-exec-state.h
M be/src/runtime/exec-env.cc
M be/src/runtime/exec-env.h
M be/src/runtime/fragment-instance-state.cc
M be/src/runtime/fragment-instance-state.h
M be/src/runtime/query-state.cc
M be/src/runtime/query-state.h
M be/src/runtime/runtime-state.cc
M be/src/runtime/runtime-state.h
M be/src/runtime/test-env.cc
M be/src/scheduling/admission-controller.cc
M be/src/scheduling/scheduler-test-util.cc
M be/src/service/CMakeLists.txt
M be/src/service/client-request-state.cc
M be/src/service/client-request-state.h
A be/src/service/control-service.cc
A be/src/service/control-service.h
M be/src/service/data-stream-service.cc
M be/src/service/data-stream-service.h
M be/src/service/impala-internal-service.cc
M be/src/service/impala-internal-service.h
M be/src/service/impala-server.cc
M be/src/service/impala-server.h
M be/src/testutil/in-process-servers.cc
M be/src/util/container-util.h
A be/src/util/error-util-internal.h
M be/src/util/error-util-test.cc
M be/src/util/error-util.cc
M be/src/util/error-util.h
M be/src/util/runtime-profile.cc
M be/src/util/uid-util.h
M common/protobuf/CMakeLists.txt
M common/protobuf/common.proto
A common/protobuf/control_service.proto
M common/protobuf/data_stream_service.proto
M common/protobuf/row_batch.proto
M common/protobuf/rpc_test.proto
M common/thrift/ImpalaInternalService.thrift
M tests/custom_cluster/test_rpc_timeout.py
60 files changed, 1,211 insertions(+), 764 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/55/10855/14
-- 
To view, visit http://gerrit.cloudera.org:8080/10855
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I7638583b433dcac066b87198e448743d90415ebe
Gerrit-Change-Number: 10855
Gerrit-PatchSet: 14
Gerrit-Owner: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Michal Ostrowski <mo...@cloudera.com>
Gerrit-Reviewer: Thomas Marshall <th...@cmu.edu>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[Impala-ASF-CR] IMPALA-7213: Port ReportExecStatus() RPC to use KRPC

Posted by "Sailesh Mukil (Code Review)" <ge...@cloudera.org>.
Sailesh Mukil has posted comments on this change. ( http://gerrit.cloudera.org:8080/10855 )

Change subject: IMPALA-7213: Port ReportExecStatus() RPC to use KRPC
......................................................................


Patch Set 8:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/10855/8/be/src/runtime/coordinator-backend-state.cc
File be/src/runtime/coordinator-backend-state.cc:

http://gerrit.cloudera.org:8080/#/c/10855/8/be/src/runtime/coordinator-backend-state.cc@310
PS8, Line 310: coord_
I feel that allowing for a non-const pointer to the Coordinator object to the Coordinator's child objects is quite dangerous in terms of accidentally introducing bugs in future changes. I'd rather prefer if a pointer to the DmlExecState is passed to this function, and the 'coord_' is left as a const ref.


http://gerrit.cloudera.org:8080/#/c/10855/8/be/src/runtime/coordinator-backend-state.cc@317
PS8, Line 317: MergeErrorMaps(instance_exec_status.error_log(), &error_log_);
This is less severe, but I guess that this is already wrong too. We potentially can merge the same error multiple times if there's a ReportExecStatus() RPC retry due to the coordinator's ACK being lost on the first try, and thus the executor retries and sends the same payload twice.

I'm wondering if it makes sense to also send this only on 'done = true'.


http://gerrit.cloudera.org:8080/#/c/10855/8/be/src/runtime/coordinator-backend-state.cc@516
PS8, Line 516: lock_guard<SpinLock> l1(exec_summary->lock);
Are we violating any lock ordering here?

Previously, we would obtain 'exec_summary->lock' before 'lock_', but now we're doing the opposite.

I looked through the code and it doesn't seem like it, but just want to make sure.


http://gerrit.cloudera.org:8080/#/c/10855/8/be/src/service/control-service.cc
File be/src/service/control-service.cc:

http://gerrit.cloudera.org:8080/#/c/10855/8/be/src/service/control-service.cc@76
PS8, Line 76: 
I feel that we can still keep the deserialization here and just check for the (instance_exec_status_size() == 0) case.

That will keep the code simpler and remove the lock.unlock() thing inside the CoordinatorBackendState.



-- 
To view, visit http://gerrit.cloudera.org:8080/10855
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7638583b433dcac066b87198e448743d90415ebe
Gerrit-Change-Number: 10855
Gerrit-PatchSet: 8
Gerrit-Owner: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Wed, 08 Aug 2018 22:02:08 +0000
Gerrit-HasComments: Yes