You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@impala.apache.org by "Andrew Sherman (Code Review)" <ge...@cloudera.org> on 2019/03/06 00:21:53 UTC

[Impala-ASF-CR] IMPALA-8143: Enhance DoRpcWithRetry().

Andrew Sherman has uploaded this change for review. ( http://gerrit.cloudera.org:8080/12672


Change subject: IMPALA-8143: Enhance DoRpcWithRetry().
......................................................................

IMPALA-8143: Enhance DoRpcWithRetry().

Allow callers of RpcMgr::DoRpcWithRetry to specify a time to sleep if
the remote service is busy. Allow callers to specify a logging lambda
that is called for rpc failures. Using this feature, change QueryState
to use DoRpcWithRetry for reporting execution status.

Reviewers should note that the following changes to retry parameters
were made, somewhat arbitrarily. I welcome suggestions as to the correct
values.
 Cancel: add a 3 second sleep if service is busy.
 RemoteShutdown: add a 3 second sleep if service is busy.
 ReportExecStatus: Try up to 2 times to do the rpc, add a 3 second sleep
 if service is busy.

TESTING:

Ran all end-to-end tests.

Add a new test to rpc-mgr-test.cc which tests RpcMgr::DoRpcWithRetry in
two ways. Firstly we test the case where the remote service is busy
(which we force by filling up the queue), and secondly we exercise
DoRpcWithRetry by passing lambdas that simulate failures.

Change the service-side fault injection mechanism of
CancelQueryFInstances and RemoteShutdown to use DebugAction.  Clean up
unused values of FaultInjectionUtil.RpcCallType and match values with
test_rpc_timeout.py.

Change-Id: Ia9693151c35e02235665b3c285a48c585973d390
---
M be/src/rpc/impala-service-pool.cc
M be/src/rpc/impala-service-pool.h
M be/src/rpc/rpc-mgr-test.cc
M be/src/rpc/rpc-mgr.h
M be/src/rpc/rpc-mgr.inline.h
M be/src/runtime/coordinator-backend-state.cc
M be/src/runtime/query-state.cc
M be/src/runtime/query-state.h
M be/src/service/client-request-state.cc
M be/src/service/control-service.cc
M be/src/service/control-service.h
M be/src/testutil/fault-injection-util.h
M tests/custom_cluster/test_rpc_timeout.py
13 files changed, 326 insertions(+), 90 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/72/12672/1
-- 
To view, visit http://gerrit.cloudera.org:8080/12672
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: Ia9693151c35e02235665b3c285a48c585973d390
Gerrit-Change-Number: 12672
Gerrit-PatchSet: 1
Gerrit-Owner: Andrew Sherman <as...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Thomas Marshall <tm...@cloudera.com>

[Impala-ASF-CR] IMPALA-8143: Enhance DoRpcWithRetry().

Posted by "Andrew Sherman (Code Review)" <ge...@cloudera.org>.
Andrew Sherman has uploaded a new patch set (#2). ( http://gerrit.cloudera.org:8080/12672 )

Change subject: IMPALA-8143: Enhance DoRpcWithRetry().
......................................................................

IMPALA-8143: Enhance DoRpcWithRetry().

Allow callers of RpcMgr::DoRpcWithRetry to specify a time to sleep if
the remote service is busy. Allow callers to specify a logging lambda
that is called for rpc failures. Using this feature, change QueryState
to use DoRpcWithRetry for reporting execution status.

Reviewers should note that the following changes to retry parameters
were made, somewhat arbitrarily. I welcome suggestions as to the correct
values.
 Cancel: add a 3 second sleep if service is busy.
 RemoteShutdown: add a 3 second sleep if service is busy.
 ReportExecStatus: Try up to 2 times to do the rpc, add a 3 second sleep
 if service is busy.

TESTING:

Ran all end-to-end tests.

Add a new test to rpc-mgr-test.cc which tests RpcMgr::DoRpcWithRetry in
two ways. Firstly we test the case where the remote service is busy
(which we force by filling up the queue), and secondly we exercise
DoRpcWithRetry by passing lambdas that simulate failures.

Change the service-side fault injection mechanism of
CancelQueryFInstances and RemoteShutdown to use DebugAction.  Clean up
unused values of FaultInjectionUtil.RpcCallType and match values with
test_rpc_timeout.py.

Change-Id: Ia9693151c35e02235665b3c285a48c585973d390
---
M be/src/rpc/impala-service-pool.cc
M be/src/rpc/impala-service-pool.h
M be/src/rpc/rpc-mgr-test.cc
M be/src/rpc/rpc-mgr.h
M be/src/rpc/rpc-mgr.inline.h
M be/src/runtime/coordinator-backend-state.cc
M be/src/runtime/query-state.cc
M be/src/runtime/query-state.h
M be/src/service/client-request-state.cc
M be/src/service/control-service.cc
M be/src/service/control-service.h
M be/src/testutil/fault-injection-util.h
M tests/custom_cluster/test_rpc_timeout.py
13 files changed, 326 insertions(+), 91 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/72/12672/2
-- 
To view, visit http://gerrit.cloudera.org:8080/12672
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ia9693151c35e02235665b3c285a48c585973d390
Gerrit-Change-Number: 12672
Gerrit-PatchSet: 2
Gerrit-Owner: Andrew Sherman <as...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Thomas Marshall <tm...@cloudera.com>

[Impala-ASF-CR] IMPALA-8143: Enhance DoRpcWithRetry().

Posted by "Andrew Sherman (Code Review)" <ge...@cloudera.org>.
Andrew Sherman has posted comments on this change. ( http://gerrit.cloudera.org:8080/12672 )

Change subject: IMPALA-8143: Enhance DoRpcWithRetry().
......................................................................


Patch Set 5:

(3 comments)

Thanks Michael. Please see the upcoming Patch Set 8.

http://gerrit.cloudera.org:8080/#/c/12672/5/be/src/rpc/rpc-mgr-test.cc
File be/src/rpc/rpc-mgr-test.cc:

http://gerrit.cloudera.org:8080/#/c/12672/5/be/src/rpc/rpc-mgr-test.cc@333
PS5, Line 333:   // Call DoRpcWithRetry with no backoff on our busy service.
             :   const int retries = 3;
             :   impala::Status rpc_status = RpcMgr::DoRpcWithRetry(proxy, &PingServiceProxy::Ping,
             :       request, &response, query_ctx, "ping failed", retries, MonoDelta::FromSeconds(100));
             :   EXPECT_ERROR(rpc_status, TErrorCode::GENERAL);
             :   EXPECT_STR_CONTAINS(rpc_status.msg().msg(),
             :       "dropped due to backpressure. The service queue contains 1 items out of a maximum "
             :       "of 1; memory consumption is ");
             :   ASSERT_EQ(overflow_metric->GetValue(), retries);
             : 
             :   // Call DoRpcWithRetry, but this time we do backoff on a busy service, and this waiting
             :   // allows the outstanding aysnc calls to complete, so that then this call will succeed.
             :   impala::Status rpc_status_backoff =
             :       RpcMgr::DoRpcWithRetry(proxy, &PingServiceProxy::Ping, request, &response,
             :           query_ctx, "ping failed", 10, MonoDelta::FromSeconds(100), 2);
             :   ASSERT_OK(rpc_status_backoff);
             :   ASSERT_GT(overflow_metric->GetValue(), retries);
             : 
             :   // Check that async calls did complete.
             :   ASSERT_FALSE(async1_done.pending());
             :   ASSERT_FALSE(async2_done.pending());
> How about the following idea:
I did think about this. So the nice thing about your suggestion (thanks) is that we can set up the full queue. 
Now we want to have a call to DoRpcWithRetry that will block (because queue is full) and then succeed (after retry). 
So concurrently with the call to DoRpcWithRetry we need to unblock the queue.
We could start the call to DoRpcWithRetry in a thread, and then sleep for a while, and then unblock the queue. 
That would work but is functionally equivalent to what we have now in that it depends on a sleep call.
So I understand the desire for a completely deterministic test but I still don't see a simple way to achieve that. I am happy to listen to more suggestions, or explanations of what I am not understanding.


http://gerrit.cloudera.org:8080/#/c/12672/7/be/src/rpc/rpc-mgr.h
File be/src/rpc/rpc-mgr.h:

http://gerrit.cloudera.org:8080/#/c/12672/7/be/src/rpc/rpc-mgr.h@186
PS7, Line 186: og' fun
> timeout_ms
Done


http://gerrit.cloudera.org:8080/#/c/12672/5/be/src/rpc/rpc-mgr.h
File be/src/rpc/rpc-mgr.h:

http://gerrit.cloudera.org:8080/#/c/12672/5/be/src/rpc/rpc-mgr.h@178
PS5, Line 178:   // A no-op method that is used as part of overloading DoRpcWithRetry().
             :   static void logging_noop(){};
             : 
             :   // The type of the log method passed to DoRpcWithRetry().
             :   typedef boost::function<void()> RpcLogFn;
> They still seem to be in PS7
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/12672
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia9693151c35e02235665b3c285a48c585973d390
Gerrit-Change-Number: 12672
Gerrit-PatchSet: 5
Gerrit-Owner: Andrew Sherman <as...@cloudera.com>
Gerrit-Reviewer: Andrew Sherman <as...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Thomas Marshall <tm...@cloudera.com>
Gerrit-Comment-Date: Tue, 02 Apr 2019 18:17:46 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-8143: Enhance DoRpcWithRetry().

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/12672 )

Change subject: IMPALA-8143: Enhance DoRpcWithRetry().
......................................................................


Patch Set 13:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/4462/ DRY_RUN=false


-- 
To view, visit http://gerrit.cloudera.org:8080/12672
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia9693151c35e02235665b3c285a48c585973d390
Gerrit-Change-Number: 12672
Gerrit-PatchSet: 13
Gerrit-Owner: Andrew Sherman <as...@cloudera.com>
Gerrit-Reviewer: Andrew Sherman <as...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Thomas Marshall <tm...@cloudera.com>
Gerrit-Comment-Date: Wed, 12 Jun 2019 20:27:36 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8143: Enhance DoRpcWithRetry().

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/12672 )

Change subject: IMPALA-8143: Enhance DoRpcWithRetry().
......................................................................


Patch Set 5:

Build Failed 

https://jenkins.impala.io/job/gerrit-code-review-checks/2553/ : Initial code review checks failed. See linked job for details on the failure.


-- 
To view, visit http://gerrit.cloudera.org:8080/12672
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia9693151c35e02235665b3c285a48c585973d390
Gerrit-Change-Number: 12672
Gerrit-PatchSet: 5
Gerrit-Owner: Andrew Sherman <as...@cloudera.com>
Gerrit-Reviewer: Andrew Sherman <as...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Thomas Marshall <tm...@cloudera.com>
Gerrit-Comment-Date: Tue, 26 Mar 2019 22:31:26 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8143: Enhance DoRpcWithRetry().

Posted by "Andrew Sherman (Code Review)" <ge...@cloudera.org>.
Andrew Sherman has posted comments on this change. ( http://gerrit.cloudera.org:8080/12672 )

Change subject: IMPALA-8143: Enhance DoRpcWithRetry().
......................................................................


Patch Set 4:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/12672/4/be/src/rpc/rpc-mgr.inline.h
File be/src/rpc/rpc-mgr.inline.h:

http://gerrit.cloudera.org:8080/#/c/12672/4/be/src/rpc/rpc-mgr.inline.h@56
PS4, Line 56:     const int times_to_try, const kudu::MonoDelta& timeout, const int server_busy_backoff_s,
> line too long (92 > 90)
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/12672
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia9693151c35e02235665b3c285a48c585973d390
Gerrit-Change-Number: 12672
Gerrit-PatchSet: 4
Gerrit-Owner: Andrew Sherman <as...@cloudera.com>
Gerrit-Reviewer: Andrew Sherman <as...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Thomas Marshall <tm...@cloudera.com>
Gerrit-Comment-Date: Tue, 26 Mar 2019 21:51:11 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-8143: Enhance DoRpcWithRetry().

Posted by "Michael Ho (Code Review)" <ge...@cloudera.org>.
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/12672 )

Change subject: IMPALA-8143: Enhance DoRpcWithRetry().
......................................................................


Patch Set 7:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/12672/5/be/src/rpc/rpc-mgr-test.cc
File be/src/rpc/rpc-mgr-test.cc:

http://gerrit.cloudera.org:8080/#/c/12672/5/be/src/rpc/rpc-mgr-test.cc@333
PS5, Line 333:   // Call DoRpcWithRetry with no retry on our busy service.
             :   const int retries = 3;
             :   PingRequestPB request3;
             :   PingResponsePB response3;
             :   impala::Status rpc_status = RpcMgr::DoRpcWithRetry(proxy, &PingServiceProxy::Ping,
             :       request3, &response3, query_ctx, "ping failed", retries, 100 * MILLIS_PER_SEC);
             :   EXPECT_ERROR(rpc_status, TErrorCode::GENERAL);
             :   EXPECT_STR_CONTAINS(rpc_status.msg().msg(),
             :       "dropped due to backpressure. The service queue contains 1 items out of a maximum "
             :       "of 1; memory consumption is ");
             :   ASSERT_EQ(overflow_metric->GetValue(), retries);
             : 
             :   // Call DoRpcWithRetry, but this time we do retey on a busy service, and this waiting
             :   // allows the outstanding async calls to complete, so that then this call will succeed.
             :   PingRequestPB request4;
             :   PingResponsePB response4;
             :   impala::Status rpc_status_backoff =
             :       RpcMgr::DoRpcWithRetry(proxy, &PingServiceProxy::Ping, request4, &response4,
             :           query_ctx, "ping failed", 10, 100 * MILLIS_PER_SEC, 2 * MILLIS_PER_SEC);
             :   ASSERT_OK(rpc_status_backoff);
             :   ASSERT_GT(overflow_metric->GetValue(
> I reviewed the test and tried to make to more bulletproof. I think any test
How about the following idea:
- the RPC handler can share some sort of condition variable or barrier with the main test. The test can then indirectly control when the RPC handler will complete by signaling those condition variables.

In this way, as long as the main thread doesn't signal the CV, the first RPC will never finish and the second RPC will also block forever. This makes the test not timing dependent and less prone to failure due to random timing issue. What do you think ?


http://gerrit.cloudera.org:8080/#/c/12672/7/be/src/rpc/rpc-mgr.h
File be/src/rpc/rpc-mgr.h:

http://gerrit.cloudera.org:8080/#/c/12672/7/be/src/rpc/rpc-mgr.h@178
PS7, Line 178:   // A no-op method that is used as part of overloading DoRpcWithRetry().
             :   static void logging_noop() {}
             : 
             :   // The type of the log method passed to DoRpcWithRetry().
             :   typedef boost::function<void()> RpcLogFn;
Not needed.


http://gerrit.cloudera.org:8080/#/c/12672/7/be/src/rpc/rpc-mgr.h@186
PS7, Line 186: timeout
timeout_ms


http://gerrit.cloudera.org:8080/#/c/12672/5/be/src/rpc/rpc-mgr.h
File be/src/rpc/rpc-mgr.h:

http://gerrit.cloudera.org:8080/#/c/12672/5/be/src/rpc/rpc-mgr.h@178
PS5, Line 178:   // A no-op method that is used as part of overloading DoRpcWithRetry().
             :   static void logging_noop() {}
             : 
             :   // The type of the log method passed to DoRpcWithRetry().
             :   typedef boost::function<void()> RpcLogFn;
> Removed
They still seem to be in PS7


http://gerrit.cloudera.org:8080/#/c/12672/5/be/src/rpc/rpc-mgr.h@201
PS5, Line 201: 
             :  private:
             :   /// One pool per registered service. scoped_refptr<> is dictated by the Kudu interface.
             :   std::vector<scoped_refptr<ImpalaServicePool>> service_pools_;
             : 
> OK I'll leave this alone and Thomas will clean up the mess? :-)
Sounds good to me.



-- 
To view, visit http://gerrit.cloudera.org:8080/12672
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia9693151c35e02235665b3c285a48c585973d390
Gerrit-Change-Number: 12672
Gerrit-PatchSet: 7
Gerrit-Owner: Andrew Sherman <as...@cloudera.com>
Gerrit-Reviewer: Andrew Sherman <as...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Thomas Marshall <tm...@cloudera.com>
Gerrit-Comment-Date: Tue, 02 Apr 2019 02:08:54 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-8143: Enhance DoRpcWithRetry().

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/12672 )

Change subject: IMPALA-8143: Enhance DoRpcWithRetry().
......................................................................


Patch Set 1:

Build Failed 

https://jenkins.impala.io/job/gerrit-code-review-checks/2364/ : Initial code review checks failed. See linked job for details on the failure.


-- 
To view, visit http://gerrit.cloudera.org:8080/12672
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia9693151c35e02235665b3c285a48c585973d390
Gerrit-Change-Number: 12672
Gerrit-PatchSet: 1
Gerrit-Owner: Andrew Sherman <as...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Thomas Marshall <tm...@cloudera.com>
Gerrit-Comment-Date: Wed, 06 Mar 2019 00:57:16 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8143: Enhance DoRpcWithRetry().

Posted by "Michael Ho (Code Review)" <ge...@cloudera.org>.
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/12672 )

Change subject: IMPALA-8143: Enhance DoRpcWithRetry().
......................................................................


Patch Set 11: Code-Review+2

(6 comments)

LGTM. Please address the nits.

http://gerrit.cloudera.org:8080/#/c/12672/11/be/src/rpc/impala-service-pool.h
File be/src/rpc/impala-service-pool.h:

http://gerrit.cloudera.org:8080/#/c/12672/11/be/src/rpc/impala-service-pool.h@72
PS11, Line 72: string
char*

See https://google.github.io/styleguide/cppguide.html#Static_and_Global_Variables


http://gerrit.cloudera.org:8080/#/c/12672/11/be/src/rpc/impala-service-pool.cc
File be/src/rpc/impala-service-pool.cc:

http://gerrit.cloudera.org:8080/#/c/12672/11/be/src/rpc/impala-service-pool.cc@192
PS11, Line 192: !debug_status.ok()
UNLIKELY(!debug_status.ok())


http://gerrit.cloudera.org:8080/#/c/12672/11/be/src/rpc/rpc-mgr-test.h
File be/src/rpc/rpc-mgr-test.h:

http://gerrit.cloudera.org:8080/#/c/12672/11/be/src/rpc/rpc-mgr-test.h@281
PS11, Line 281: getNumberOfCalls()
nit: number_of_calls() or GetNumberOfCalls()


http://gerrit.cloudera.org:8080/#/c/12672/11/be/src/rpc/rpc-mgr-test.cc
File be/src/rpc/rpc-mgr-test.cc:

http://gerrit.cloudera.org:8080/#/c/12672/11/be/src/rpc/rpc-mgr-test.cc@310
PS11, Line 310: NULL
nit: nullptr


http://gerrit.cloudera.org:8080/#/c/12672/11/be/src/rpc/rpc-mgr-test.cc@331
PS11, Line 331:     // DoRpcWithRetry will fail 1 in 2^40 times.
              :     ASSERT_TRUE(status.ok());
              :   }
              :   // There will be no overflows (i.e. service too busy) 1 in 2^40 times.
              :   ASSERT_GT(overflow_metric->GetValue(), 0);
:-).

Subtle nits: Although they are similar this is referring to the probability of the debug action not kicking in at all in the loop above is: (1/2)^40 times.

The other one on line 331 refers to the probability of the debugging action kicking in 40 times in a row.

I suppose it's slightly clearer to state in (1/2)^num_rpc_retry_calls and (1/2)^num_retries. Please feel free to ignore as they are rather minor nits.


http://gerrit.cloudera.org:8080/#/c/12672/11/be/src/rpc/rpc-mgr.h
File be/src/rpc/rpc-mgr.h:

http://gerrit.cloudera.org:8080/#/c/12672/11/be/src/rpc/rpc-mgr.h@183
PS11, Line 183:   /// Pass 'debug_action' to DebugAction() to potentially inject errors.
Please add a TODO:

TODO: Clean up this interface. Replace the debug action with fault injection in RPC callbacks or other places.



-- 
To view, visit http://gerrit.cloudera.org:8080/12672
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia9693151c35e02235665b3c285a48c585973d390
Gerrit-Change-Number: 12672
Gerrit-PatchSet: 11
Gerrit-Owner: Andrew Sherman <as...@cloudera.com>
Gerrit-Reviewer: Andrew Sherman <as...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Thomas Marshall <tm...@cloudera.com>
Gerrit-Comment-Date: Mon, 03 Jun 2019 23:30:48 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-8143: Enhance DoRpcWithRetry().

Posted by "Andrew Sherman (Code Review)" <ge...@cloudera.org>.
Andrew Sherman has posted comments on this change. ( http://gerrit.cloudera.org:8080/12672 )

Change subject: IMPALA-8143: Enhance DoRpcWithRetry().
......................................................................


Patch Set 5:

(20 comments)

Thanks for the reviews. Please see patch set 7.

http://gerrit.cloudera.org:8080/#/c/12672/5/be/src/rpc/rpc-mgr-test.h
File be/src/rpc/rpc-mgr-test.h:

http://gerrit.cloudera.org:8080/#/c/12672/5/be/src/rpc/rpc-mgr-test.h@296
PS5, Line 296: tha
> typo
Done


http://gerrit.cloudera.org:8080/#/c/12672/5/be/src/rpc/rpc-mgr-test.cc
File be/src/rpc/rpc-mgr-test.cc:

http://gerrit.cloudera.org:8080/#/c/12672/5/be/src/rpc/rpc-mgr-test.cc@214
PS5, Line 214:   ASSERT_OK(static_cast<ScanMemServiceImpl*>(scan_mem_impl)
             :                 ->
> Not sure if it's output of clang-tidy or something but I find it a bit hard
Yes this is clang-format, but I see now how to avoid this getting "fixed".


http://gerrit.cloudera.org:8080/#/c/12672/5/be/src/rpc/rpc-mgr-test.cc@263
PS5, Line 263: shoudl
> typo
Done


http://gerrit.cloudera.org:8080/#/c/12672/5/be/src/rpc/rpc-mgr-test.cc@291
PS5, Line 291: NULL
> nullptr;
Done


http://gerrit.cloudera.org:8080/#/c/12672/5/be/src/rpc/rpc-mgr-test.cc@298
PS5, Line 298: RpcController controller;
> not used ?
yes!


http://gerrit.cloudera.org:8080/#/c/12672/5/be/src/rpc/rpc-mgr-test.cc@316
PS5, Line 316: sleep_ms = 100;
> Instead of sharing "sleep_ms" between RPCs, it may be cleaner to pass the s
Thanks, this is cleaner


http://gerrit.cloudera.org:8080/#/c/12672/5/be/src/rpc/rpc-mgr-test.cc@322
PS5, Line 322: proxy.get()->
> proxy->
Done


http://gerrit.cloudera.org:8080/#/c/12672/5/be/src/rpc/rpc-mgr-test.cc@333
PS5, Line 333:   // Call DoRpcWithRetry with no backoff on our busy service.
             :   const int retries = 3;
             :   impala::Status rpc_status = RpcMgr::DoRpcWithRetry(proxy, &PingServiceProxy::Ping,
             :       request, &response, query_ctx, "ping failed", retries, MonoDelta::FromSeconds(100));
             :   EXPECT_ERROR(rpc_status, TErrorCode::GENERAL);
             :   EXPECT_STR_CONTAINS(rpc_status.msg().msg(),
             :       "dropped due to backpressure. The service queue contains 1 items out of a maximum "
             :       "of 1; memory consumption is ");
             :   ASSERT_EQ(overflow_metric->GetValue(), retries);
             : 
             :   // Call DoRpcWithRetry, but this time we do backoff on a busy service, and this waiting
             :   // allows the outstanding aysnc calls to complete, so that then this call will succeed.
             :   impala::Status rpc_status_backoff =
             :       RpcMgr::DoRpcWithRetry(proxy, &PingServiceProxy::Ping, request, &response,
             :           query_ctx, "ping failed", 10, MonoDelta::FromSeconds(100), 2);
             :   ASSERT_OK(rpc_status_backoff);
             :   ASSERT_GT(overflow_metric->GetValue(), retries);
             : 
             :   // Check that async calls did complete.
             :   ASSERT_FALSE(async1_done.pending());
             :   ASSERT_FALSE(async2_done.pending());
> These set of tests seem rather timing dependent. For instance, it may occur
I reviewed the test and tried to make to more bulletproof. I think any test that really does fill up the queue and test that the retry works in this case will have some asynchrony. I don't like tests with sleep calls but I think this should be reliable. Let me know if you are not persuaded.


http://gerrit.cloudera.org:8080/#/c/12672/5/be/src/rpc/rpc-mgr-test.cc@378
PS5, Line 378:   // If we fail 2x and retry 2x, then the call fails.
             :   unique_ptr<FailingPingServiceProxy> failing_proxy2 =
             :       make_unique<FailingPingServiceProxy>(2);
             : 
             :   Status rpc_status_fail =
             :       RpcMgr::DoRpcWithRetry(failing_proxy2, &FailingPingServiceProxy::Ping, request,
             :           &response, boost::bind(&RpcMgrTest::log, this), query_ctx, "ping failed", 2,
             :           MonoDelta::FromSeconds(10));
> Please see comments in rpc-mgr.inline.h that we probably should only retry 
Done


http://gerrit.cloudera.org:8080/#/c/12672/5/be/src/rpc/rpc-mgr.h
File be/src/rpc/rpc-mgr.h:

http://gerrit.cloudera.org:8080/#/c/12672/5/be/src/rpc/rpc-mgr.h@178
PS5, Line 178:   // A no-op method that is used as part of overloading DoRpcWithRetry().
             :   static void logging_noop(){};
             : 
             :   // The type of the log method passed to DoRpcWithRetry().
             :   typedef boost::function<void()> RpcLogFn;
> Not needed as discussed in person.
Removed


http://gerrit.cloudera.org:8080/#/c/12672/5/be/src/rpc/rpc-mgr.h@189
PS5, Line 189: class
> typename
Done


http://gerrit.cloudera.org:8080/#/c/12672/5/be/src/rpc/rpc-mgr.h@191
PS5, Line 191: rpc_call
> It's important to highlight that this only works iff rpc_call is idempotent
Thanks


http://gerrit.cloudera.org:8080/#/c/12672/5/be/src/rpc/rpc-mgr.h@193
PS5, Line 193: const kudu::MonoDelta& timeout,
             :       const int server_busy_backoff_s = 0
> It seems inconsistent to pass the timeout in as MonoDelta while passing the
Done


http://gerrit.cloudera.org:8080/#/c/12672/5/be/src/rpc/rpc-mgr.h@201
PS5, Line 201: const std::unique_ptr<Proxy>& proxy,
             :       const ProxyMethod& rpc_call, const Request& request, Response* response,
             :       const TQueryCtx& query_ctx, const char* error_msg, const int times_to_try,
             :       const kudu::MonoDelta& timeout, const int server_busy_backoff_s = 0,
             :       const char* debug_action = nullptr)
> My debug action patch will eliminate the 'error_msg', 'debug_action', and '
OK I'll leave this alone and Thomas will clean up the mess? :-)


http://gerrit.cloudera.org:8080/#/c/12672/5/be/src/rpc/rpc-mgr.h@201
PS5, Line 201: const std::unique_ptr<Proxy>& proxy,
             :       const ProxyMethod& rpc_call, const Request& request, Response* response,
             :       const TQueryCtx& query_ctx, const char* error_msg, const int times_to_try,
             :       const kudu::MonoDelta& timeout, const int server_busy_backoff_s = 0,
             :       const char* debug_action = nullptr)
> Although this function is inlined, It has grown over time to take too many 
see my answer to Thomas below


http://gerrit.cloudera.org:8080/#/c/12672/5/be/src/rpc/rpc-mgr.inline.h
File be/src/rpc/rpc-mgr.inline.h:

http://gerrit.cloudera.org:8080/#/c/12672/5/be/src/rpc/rpc-mgr.inline.h@57
PS5, Line 57: const char* debug_action
> May make sense to group this argument next to query_ctx as they are related
These arguments are at the end because the have default values. And Thomas is going to make query_ctx and debug_action go away so I'll leave these alone for now,


http://gerrit.cloudera.org:8080/#/c/12672/5/be/src/rpc/rpc-mgr.inline.h@61
PS5, Line 61: i++
> ++i
Done


http://gerrit.cloudera.org:8080/#/c/12672/5/be/src/rpc/rpc-mgr.inline.h@72
PS5, Line 72: response
> Do we need to reset response between retry ?
My understanding is that any partial results in in the response will be overwritten during a successful call. What do you think? Should I call Clear() ?


http://gerrit.cloudera.org:8080/#/c/12672/5/be/src/rpc/rpc-mgr.inline.h@76
PS5, Line 76:     // Rpc failed, log it. In many cases this will be a no-op.
            :     log();
> Based on our offline discussion, this is not needed. Please consider removi
Done


http://gerrit.cloudera.org:8080/#/c/12672/5/be/src/rpc/rpc-mgr.inline.h@79
PS5, Line 79:     // Sleep if the server is busy and this is not the last time to try
            :     if (server_busy_backoff_s != 0 && i != times_to_try - 1
            :         && RpcMgr::IsServerTooBusy(rpc_controller)) {
            :       sleep(server_busy_backoff_s);
            :     }
> So, it's not clear to me whether we should retry for errors other than RpcM
OK, well good question. On reflection I have changed this to only retry if the service is busy. Are we all happy with this?



-- 
To view, visit http://gerrit.cloudera.org:8080/12672
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia9693151c35e02235665b3c285a48c585973d390
Gerrit-Change-Number: 12672
Gerrit-PatchSet: 5
Gerrit-Owner: Andrew Sherman <as...@cloudera.com>
Gerrit-Reviewer: Andrew Sherman <as...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Thomas Marshall <tm...@cloudera.com>
Gerrit-Comment-Date: Fri, 29 Mar 2019 21:32:41 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-8143: Enhance DoRpcWithRetry().

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/12672 )

Change subject: IMPALA-8143: Enhance DoRpcWithRetry().
......................................................................

IMPALA-8143: Enhance DoRpcWithRetry().

Allow callers of RpcMgr::DoRpcWithRetry to specify a time to sleep if
the remote service is busy. DoRpcWithRetry now only sleeps if the remote
service is busy.

TESTING:

Ran all end-to-end tests.

Add two new tests to rpc-mgr-test.cc which test RpcMgr::DoRpcWithRetry.
One test uses a fake Proxy class, the other uses a new DebugAction to
cause Krpc calls to be rejected as if the service was too busy.

Change-Id: Ia9693151c35e02235665b3c285a48c585973d390
Reviewed-on: http://gerrit.cloudera.org:8080/12672
Reviewed-by: Impala Public Jenkins <im...@cloudera.com>
Tested-by: Impala Public Jenkins <im...@cloudera.com>
---
M be/src/rpc/impala-service-pool.cc
M be/src/rpc/impala-service-pool.h
M be/src/rpc/rpc-mgr-test.cc
M be/src/rpc/rpc-mgr-test.h
M be/src/rpc/rpc-mgr.h
M be/src/rpc/rpc-mgr.inline.h
M be/src/runtime/coordinator-backend-state.cc
M be/src/runtime/query-state.h
M be/src/service/client-request-state.cc
M be/src/service/control-service.h
10 files changed, 189 insertions(+), 44 deletions(-)

Approvals:
  Impala Public Jenkins: Looks good to me, approved; Verified

-- 
To view, visit http://gerrit.cloudera.org:8080/12672
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: Ia9693151c35e02235665b3c285a48c585973d390
Gerrit-Change-Number: 12672
Gerrit-PatchSet: 14
Gerrit-Owner: Andrew Sherman <as...@cloudera.com>
Gerrit-Reviewer: Andrew Sherman <as...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Thomas Marshall <tm...@cloudera.com>

[Impala-ASF-CR] IMPALA-8143: Enhance DoRpcWithRetry().

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/12672 )

Change subject: IMPALA-8143: Enhance DoRpcWithRetry().
......................................................................


Patch Set 9:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/2622/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/12672
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia9693151c35e02235665b3c285a48c585973d390
Gerrit-Change-Number: 12672
Gerrit-PatchSet: 9
Gerrit-Owner: Andrew Sherman <as...@cloudera.com>
Gerrit-Reviewer: Andrew Sherman <as...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Thomas Marshall <tm...@cloudera.com>
Gerrit-Comment-Date: Wed, 03 Apr 2019 16:55:27 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8143: Enhance DoRpcWithRetry().

Posted by "Michael Ho (Code Review)" <ge...@cloudera.org>.
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/12672 )

Change subject: IMPALA-8143: Enhance DoRpcWithRetry().
......................................................................


Patch Set 12: Code-Review+2


-- 
To view, visit http://gerrit.cloudera.org:8080/12672
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia9693151c35e02235665b3c285a48c585973d390
Gerrit-Change-Number: 12672
Gerrit-PatchSet: 12
Gerrit-Owner: Andrew Sherman <as...@cloudera.com>
Gerrit-Reviewer: Andrew Sherman <as...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Thomas Marshall <tm...@cloudera.com>
Gerrit-Comment-Date: Tue, 11 Jun 2019 17:30:39 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8143: Enhance DoRpcWithRetry().

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/12672 )

Change subject: IMPALA-8143: Enhance DoRpcWithRetry().
......................................................................


Patch Set 8:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/2613/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/12672
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia9693151c35e02235665b3c285a48c585973d390
Gerrit-Change-Number: 12672
Gerrit-PatchSet: 8
Gerrit-Owner: Andrew Sherman <as...@cloudera.com>
Gerrit-Reviewer: Andrew Sherman <as...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Thomas Marshall <tm...@cloudera.com>
Gerrit-Comment-Date: Tue, 02 Apr 2019 18:58:48 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8143: Enhance DoRpcWithRetry().

Posted by "Michael Ho (Code Review)" <ge...@cloudera.org>.
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/12672 )

Change subject: IMPALA-8143: Enhance DoRpcWithRetry().
......................................................................


Patch Set 9:

(11 comments)

http://gerrit.cloudera.org:8080/#/c/12672/9/be/src/rpc/rpc-mgr-test.h
File be/src/rpc/rpc-mgr-test.h:

http://gerrit.cloudera.org:8080/#/c/12672/9/be/src/rpc/rpc-mgr-test.h@134
PS9, Line 134: get_current_queue_size
Coding style suggests this should be "GetCurrentQueueSize()". With that said, it may not be needed after all if you take the suggsetion in rpc-mgr-test.cc.


http://gerrit.cloudera.org:8080/#/c/12672/9/be/src/rpc/rpc-mgr-test.h@295
PS9, Line 295: kudu::Status::IOError(
             :         Substitute("ping failing, number of calls=$0.", number_of_calls_));
Instead of IOError, can we simulate the server busy error here ? Please see comments in rpc-mgr-test.cc for reason.


http://gerrit.cloudera.org:8080/#/c/12672/9/be/src/rpc/rpc-mgr-test.cc
File be/src/rpc/rpc-mgr-test.cc:

http://gerrit.cloudera.org:8080/#/c/12672/9/be/src/rpc/rpc-mgr-test.cc@269
PS9, Line 269:  until
typo


http://gerrit.cloudera.org:8080/#/c/12672/9/be/src/rpc/rpc-mgr-test.cc@285
PS9, Line 285:   // Find the counter which tracks the number of times the service queue is too full.
             :   const string& overflow_count = Substitute(
             :       ImpalaServicePool::RPC_QUEUE_OVERFLOW_METRIC_KEY, ping_service->service_name());
             :   IntCounter* overflow_metric =
             :       ExecEnv::GetInstance()->rpc_metrics()->FindMetricForTesting<IntCounter>(
             :           overflow_count);
             :   ASSERT_TRUE(overflow_metric != nullptr);
             :   // There has been no activity yet so this should be 0.
             :   EXPECT_EQ(overflow_metric->GetValue(), 0L);
             : 
             :   // Make a proxy for Ping rpc calls.
             :   unique_ptr<PingServiceProxy> proxy;
             :   ASSERT_OK(ping_service->GetProxy(krpc_address_, FLAGS_hostname, &proxy));
             :   TQueryCtx query_ctx;
             : 
             :   // Do an async call that will run in the service's single thread.
             :   RpcController async_controller;
             :   CountingBarrier async1_done(1);
             :   PingRequestPB request1;
             :   PingResponsePB response1;
             :   ResponseCallback async1_callback = [&]() { async1_done.Notify(); };
             :   proxy->PingAsync(request1, &response1, &async_controller, async1_callback);
             : 
             :   // wait for the async call to be running.
             :   bool timed_out = false;
             :   call1_started.Wait(10 * MILLIS_PER_SEC, &timed_out);
             :   ASSERT_FALSE(timed_out);
             :   ASSERT_EQ(0, get_current_queue_size(rpc_mgr_));
             :   // Now the async call is running in the service.
             : 
             :   // Do a second async call, this will set in the queue, which will be full.
             :   RpcController async_controller2;
             :   CountingBarrier async2_done(1);
             :   // The second call only has a short sleep.
             :   PingRequestPB request2;
             :   PingResponsePB response2;
             :   // Set the deadline to something reasonable otherwise the pings from DoRpcWithRetry
             :   // will replace this async call in the queue.
             :   MonoTime deadline = MonoTime::Now() + MonoDelta::FromSeconds(10);
             :   async_controller2.set_deadline(deadline);
             :   ResponseCallback async2_callback = [&]() { async2_done.Notify(); };
             :   proxy->PingAsync(request2, &response2, &async_controller2, async2_callback);
             : 
             :   // Wait for the second async call to get queued.
             :   int num_sleeps = 20;
             :   while (get_current_queue_size(rpc_mgr_) != 1) {
             :     SleepForMs(300);
             :     if (--num_sleeps < 0) {
             :       FAIL() << "Waited too long for async message to be queued";
             :     }
             :   }
             :   // Now the second async call is queued, and the queue is full.
             : 
             :   // Assert that there have been no overflows yet.
             :   EXPECT_EQ(overflow_metric->GetValue(), 0L);
             : 
             :   // Call DoRpcWithRetry with retry on our busy service.
             :   // This won't succeed, but we can observe that retries occurred.
             :   const int retries = 3;
             :   PingRequestPB request3;
             :   PingResponsePB response3;
             :   impala::Status rpc_status = RpcMgr::DoRpcWithRetry(proxy, &PingServiceProxy::Ping,
             :       request3, &response3, query_ctx, "ping failed", retries, 100 * MILLIS_PER_SEC);
             :   EXPECT_ERROR(rpc_status, TErrorCode::GENERAL);
             :   EXPECT_STR_CONTAINS(rpc_status.msg().msg(),
             :       "dropped due to backpressure. The service queue contains 1 items out of a maximum "
             :       "of 1; memory consumption is ");
             :   ASSERT_EQ(overflow_metric->GetValue(), retries);
Stepping back a bit, it's unclear the exact purposes of these tests are. 

It appears that these tests aim to exercise (1) the retry path in the client code and (2) the "server busy" code in the Kudu server. For (2), we already have SlowCallback() so I don't see the need for another test for it.

So, it seems that we only need a test which exercises the retry loop logic in the client code. In which case, please see comments below on how the test below can be modified slightly to provide sufficient coverage.

If you agree with the suggestion below, we can delete these tests and any auxiliary functions added to support them.


http://gerrit.cloudera.org:8080/#/c/12672/9/be/src/rpc/rpc-mgr-test.cc@376
PS9, Line 376:   ASSERT_FALSE(inject_status.ok());
             :   EXPECT_ERROR(inject_status, TErrorCode::INTERNAL_ERROR);
             :   ASSERT_EQ("Debug Action: DoRpcWithRetry:FAIL", inject_status.msg().msg());
Not entirely sure this test is needed as the EE test should exercise it already.


http://gerrit.cloudera.org:8080/#/c/12672/9/be/src/rpc/rpc-mgr-test.cc@380
PS9, Line 380:   // Test how DoRpcWithRetry retries by using a proxy that always fails.
             :   unique_ptr<FailingPingServiceProxy> failing_proxy =
             :       make_unique<FailingPingServiceProxy>();
             :   // A call that fails is not retried as the server is not busy.
             :   PingRequestPB request6;
             :   PingResponsePB response6;
             :   Status rpc_status_fail =
             :       RpcMgr::DoRpcWithRetry(failing_proxy, &FailingPingServiceProxy::Ping, request6,
             :           &response6, query_ctx, "ping failed", 3, 10 * MILLIS_PER_SEC);
             :   ASSERT_FALSE(rpc_status_fail.ok());
             :   // Assert that proxy was only called once.
             :   ASSERT_EQ(1, failing_proxy->getNumberOfCalls());
This test seems to be sufficient for the coverage we need if we make the following modifications:

- In FailingPingService (which could be renamed to CountingPingService), it can define an RPC which takes an integer as the RPC parameter. If the number of cumulative calls to that service is below that specified number, the call will return with the ServerBusy error status (instead of IOError).

- In the test, issue two different RPC calls to the CountingService: one with a parameter large enough that the call will fail after retrying n times and another subsequent RPC call which will succeed after n+m-1 times where n and m are the number of retries specified for each RPC calls respectively.

In this way, we will guarantee that the retry loop in the client code is exercised.


http://gerrit.cloudera.org:8080/#/c/12672/9/be/src/rpc/rpc-mgr.inline.h
File be/src/rpc/rpc-mgr.inline.h:

http://gerrit.cloudera.org:8080/#/c/12672/9/be/src/rpc/rpc-mgr.inline.h@75
PS9, Line 75:     // If the call succeeded then return.
            :     if (rpc_status.ok()) break;
            : 
            :     // If server is not busy then give up and return result to caller.
            :     if (!RpcMgr::IsServerTooBusy(rpc_controller)) break;
if (rpc_status.ok() || !RpcMgr::IsServerTooBusy(rpc_controller)) {
    return rpc_status;
}


http://gerrit.cloudera.org:8080/#/c/12672/9/be/src/runtime/coordinator-backend-state.cc
File be/src/runtime/coordinator-backend-state.cc:

http://gerrit.cloudera.org:8080/#/c/12672/9/be/src/runtime/coordinator-backend-state.cc@465
PS9, Line 465: 3, 10 * MILLIS_PER_SEC,
             :           3 * MILLIS_PER_SEC, 
nit: May make for better documentation if we have these couple of parameters defined as variables like the following:

  const int num_retries = 3;
  const int64_t timeout_ms = 10 * MILLIS_PER_SEC;
  const int64_t backoff_time_ms = 3 * MILLIS_PER_SEC;


http://gerrit.cloudera.org:8080/#/c/12672/9/be/src/service/client-request-state.cc
File be/src/service/client-request-state.cc:

http://gerrit.cloudera.org:8080/#/c/12672/9/be/src/service/client-request-state.cc@684
PS9, Line 684: 3, 10 * MILLIS_PER_SEC,
             :       3 * MILLIS_PER_SEC
nit: May make for better documentation if we have these couple of parameters defined as variables like the following:

  const int num_retries = 3;
  const int64_t timeout_ms = 10 * MILLIS_PER_SEC;
  const int64_t backoff_time_ms = 3 * MILLIS_PER_SEC;


http://gerrit.cloudera.org:8080/#/c/12672/9/be/src/testutil/fault-injection-util.h
File be/src/testutil/fault-injection-util.h:

http://gerrit.cloudera.org:8080/#/c/12672/9/be/src/testutil/fault-injection-util.h@37
PS9, Line 37:     RPC_TRANSMITDATA,
This can also be removed.


http://gerrit.cloudera.org:8080/#/c/12672/9/be/src/testutil/fault-injection-util.h@40
PS9, Line 40:     // If you add a new RpcCallType then bump the "--fault_injection_rpc_type=X" parameter
            :     // in the test test_random_rpc_timeout() in test_rpc_timeout.py.
Looks like we forgot to update the tests when adding fault injection for remote shutdowns. Thanks for the new comment.



-- 
To view, visit http://gerrit.cloudera.org:8080/12672
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia9693151c35e02235665b3c285a48c585973d390
Gerrit-Change-Number: 12672
Gerrit-PatchSet: 9
Gerrit-Owner: Andrew Sherman <as...@cloudera.com>
Gerrit-Reviewer: Andrew Sherman <as...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Thomas Marshall <tm...@cloudera.com>
Gerrit-Comment-Date: Thu, 11 Apr 2019 01:12:33 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-8143: Enhance DoRpcWithRetry().

Posted by "Andrew Sherman (Code Review)" <ge...@cloudera.org>.
Andrew Sherman has uploaded a new patch set (#9). ( http://gerrit.cloudera.org:8080/12672 )

Change subject: IMPALA-8143: Enhance DoRpcWithRetry().
......................................................................

IMPALA-8143: Enhance DoRpcWithRetry().

Allow callers of RpcMgr::DoRpcWithRetry to specify a time to sleep if
the remote service is busy. DoRpcWithRetry now only sleeps if the remote
service is busy.

TESTING:

Ran all end-to-end tests.

Add a new test to rpc-mgr-test.cc which tests RpcMgr::DoRpcWithRetry in
two ways. Firstly we test the case where the remote service is busy
(which we force by filling up the queue), and secondly we exercise
DoRpcWithRetry by using a fake Proxy that simulates failures.

Clean up unused values of FaultInjectionUtil.RpcCallType and match
values with test_rpc_timeout.py.

Change-Id: Ia9693151c35e02235665b3c285a48c585973d390
---
M be/src/rpc/impala-service-pool.cc
M be/src/rpc/impala-service-pool.h
M be/src/rpc/rpc-mgr-test.cc
M be/src/rpc/rpc-mgr-test.h
M be/src/rpc/rpc-mgr.h
M be/src/rpc/rpc-mgr.inline.h
M be/src/runtime/coordinator-backend-state.cc
M be/src/runtime/query-state.h
M be/src/service/client-request-state.cc
M be/src/service/control-service.cc
M be/src/service/control-service.h
M be/src/testutil/fault-injection-util.h
M tests/custom_cluster/test_rpc_timeout.py
13 files changed, 274 insertions(+), 62 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/72/12672/9
-- 
To view, visit http://gerrit.cloudera.org:8080/12672
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ia9693151c35e02235665b3c285a48c585973d390
Gerrit-Change-Number: 12672
Gerrit-PatchSet: 9
Gerrit-Owner: Andrew Sherman <as...@cloudera.com>
Gerrit-Reviewer: Andrew Sherman <as...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Thomas Marshall <tm...@cloudera.com>

[Impala-ASF-CR] IMPALA-8143: Enhance DoRpcWithRetry().

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/12672 )

Change subject: IMPALA-8143: Enhance DoRpcWithRetry().
......................................................................


Patch Set 3:

Build Failed 

https://jenkins.impala.io/job/gerrit-code-review-checks/2546/ : Initial code review checks failed. See linked job for details on the failure.


-- 
To view, visit http://gerrit.cloudera.org:8080/12672
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia9693151c35e02235665b3c285a48c585973d390
Gerrit-Change-Number: 12672
Gerrit-PatchSet: 3
Gerrit-Owner: Andrew Sherman <as...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Thomas Marshall <tm...@cloudera.com>
Gerrit-Comment-Date: Tue, 26 Mar 2019 19:06:35 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8143: Enhance DoRpcWithRetry().

Posted by "Andrew Sherman (Code Review)" <ge...@cloudera.org>.
Andrew Sherman has uploaded a new patch set (#11). ( http://gerrit.cloudera.org:8080/12672 )

Change subject: IMPALA-8143: Enhance DoRpcWithRetry().
......................................................................

IMPALA-8143: Enhance DoRpcWithRetry().

Allow callers of RpcMgr::DoRpcWithRetry to specify a time to sleep if
the remote service is busy. DoRpcWithRetry now only sleeps if the remote
service is busy.

TESTING:

Ran all end-to-end tests.

Add two new tests to rpc-mgr-test.cc which test RpcMgr::DoRpcWithRetry.
One test uses a fake Proxy class, the other uses a new DebugAction to
cause Krpc calls to be rejected as if the service was too busy.

Change-Id: Ia9693151c35e02235665b3c285a48c585973d390
---
M be/src/rpc/impala-service-pool.cc
M be/src/rpc/impala-service-pool.h
M be/src/rpc/rpc-mgr-test.cc
M be/src/rpc/rpc-mgr-test.h
M be/src/rpc/rpc-mgr.h
M be/src/rpc/rpc-mgr.inline.h
M be/src/runtime/coordinator-backend-state.cc
M be/src/runtime/query-state.h
M be/src/service/client-request-state.cc
M be/src/service/control-service.h
10 files changed, 186 insertions(+), 44 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/72/12672/11
-- 
To view, visit http://gerrit.cloudera.org:8080/12672
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ia9693151c35e02235665b3c285a48c585973d390
Gerrit-Change-Number: 12672
Gerrit-PatchSet: 11
Gerrit-Owner: Andrew Sherman <as...@cloudera.com>
Gerrit-Reviewer: Andrew Sherman <as...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Thomas Marshall <tm...@cloudera.com>

[Impala-ASF-CR] IMPALA-8143: Enhance DoRpcWithRetry().

Posted by "Andrew Sherman (Code Review)" <ge...@cloudera.org>.
Andrew Sherman has posted comments on this change. ( http://gerrit.cloudera.org:8080/12672 )

Change subject: IMPALA-8143: Enhance DoRpcWithRetry().
......................................................................


Patch Set 8:

Reviewers please wait for a future Patch Ste with a simplified test


-- 
To view, visit http://gerrit.cloudera.org:8080/12672
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia9693151c35e02235665b3c285a48c585973d390
Gerrit-Change-Number: 12672
Gerrit-PatchSet: 8
Gerrit-Owner: Andrew Sherman <as...@cloudera.com>
Gerrit-Reviewer: Andrew Sherman <as...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Thomas Marshall <tm...@cloudera.com>
Gerrit-Comment-Date: Tue, 02 Apr 2019 21:15:30 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8143: Enhance DoRpcWithRetry().

Posted by "Andrew Sherman (Code Review)" <ge...@cloudera.org>.
Andrew Sherman has uploaded a new patch set (#4). ( http://gerrit.cloudera.org:8080/12672 )

Change subject: IMPALA-8143: Enhance DoRpcWithRetry().
......................................................................

IMPALA-8143: Enhance DoRpcWithRetry().

Allow callers of RpcMgr::DoRpcWithRetry to specify a time to sleep if
the remote service is busy. Allow callers to specify a logging function
that is called for rpc failures.

Reviewers should note that the following changes to retry parameters
were made:
 Cancel: add a 3 second sleep if service is busy.
 RemoteShutdown: add a 3 second sleep if service is busy.

TESTING:

Ran all end-to-end tests.

Add a new test to rpc-mgr-test.cc which tests RpcMgr::DoRpcWithRetry in
two ways. Firstly we test the case where the remote service is busy
(which we force by filling up the queue), and secondly we exercise
DoRpcWithRetry by using a fake Proxy that simulates failures.

Clean up unused values of FaultInjectionUtil.RpcCallType and match
values with test_rpc_timeout.py.

Change-Id: Ia9693151c35e02235665b3c285a48c585973d390
---
M be/src/rpc/impala-service-pool.cc
M be/src/rpc/impala-service-pool.h
M be/src/rpc/rpc-mgr-test.cc
M be/src/rpc/rpc-mgr-test.h
M be/src/rpc/rpc-mgr.h
M be/src/rpc/rpc-mgr.inline.h
M be/src/runtime/coordinator-backend-state.cc
M be/src/runtime/query-state.h
M be/src/service/client-request-state.cc
M be/src/service/control-service.cc
M be/src/service/control-service.h
M be/src/testutil/fault-injection-util.h
M tests/custom_cluster/test_rpc_timeout.py
13 files changed, 303 insertions(+), 58 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/72/12672/4
-- 
To view, visit http://gerrit.cloudera.org:8080/12672
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ia9693151c35e02235665b3c285a48c585973d390
Gerrit-Change-Number: 12672
Gerrit-PatchSet: 4
Gerrit-Owner: Andrew Sherman <as...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Thomas Marshall <tm...@cloudera.com>

[Impala-ASF-CR] IMPALA-8143: Enhance DoRpcWithRetry().

Posted by "Thomas Marshall (Code Review)" <ge...@cloudera.org>.
Thomas Marshall has posted comments on this change. ( http://gerrit.cloudera.org:8080/12672 )

Change subject: IMPALA-8143: Enhance DoRpcWithRetry().
......................................................................


Patch Set 2:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/12672/2/be/src/rpc/rpc-mgr.h
File be/src/rpc/rpc-mgr.h:

http://gerrit.cloudera.org:8080/#/c/12672/2/be/src/rpc/rpc-mgr.h@29
PS2, Line 29: using kudu::MonoDelta;
Don't use 'using' in header files, just fully qualify it everywhere below.


http://gerrit.cloudera.org:8080/#/c/12672/2/be/src/rpc/rpc-mgr.h@185
PS2, Line 185: F&& rpc_call
Rather than passing in a lambda that performs the call, how would you feel about passing in the proxy object and the function to call, i.e. the way ClientCache::DoRpc() currently works?

The motivation for this is that I've written an equivalent DoAsyncRpcWithRetry() function (see https://gerrit.cloudera.org/#/c/12297/), and for the async case we can't take a lambda that calls the rpc because we need to wrap the callback that gets passed to the rpc function in order to simulate recv errors, and it would be nice for the async and sync cases to work the same.

Two downsides that I see to this:
- It makes the arg list very long, but this is alleviated somewhat by my patch which should prevent callers from having to specify 'error_msg' or 'debug_action'
- It prevents patterns like what you did in query-state.cc. That doesn't matter for now though (see my comments in query-state.cc) and I think we can come up with a solution if needed in the future.


http://gerrit.cloudera.org:8080/#/c/12672/2/be/src/rpc/rpc-mgr.h@194
PS2, Line 194: typename L
There's really only one signature we should be accepting for the log function (i.e. no args and returns void) so I think it might be cleaner just specify the type, something like:

typedef boost::function<void()> RpcLogFn;


http://gerrit.cloudera.org:8080/#/c/12672/2/be/src/runtime/query-state.cc
File be/src/runtime/query-state.cc:

http://gerrit.cloudera.org:8080/#/c/12672/2/be/src/runtime/query-state.cc@400
PS2, Line 400:   rpc_status = RpcMgr::DoRpcWithRetry(report_exec_status, log_failure, query_ctx_,
We don't actually want to retry this rpc like this. Its low impact if the rpc doesn't succeed, because we'll send the missed info with the next report, so we prefer to minimize the number of report rpcs being sent by not retrying the same report.


http://gerrit.cloudera.org:8080/#/c/12672/2/be/src/service/control-service.cc
File be/src/service/control-service.cc:

http://gerrit.cloudera.org:8080/#/c/12672/2/be/src/service/control-service.cc@171
PS2, Line 171: DebugActionNoFail(impala_server->default_query_options_, "RPC_CANCELQUERYFINSTANCES");
I think its probably easier if you leave this alone in this patch and let me handle it in my debug action patch.

For example, because this use of 'default_query_options_' is unfortunate. I think the right solution is to add a new flag, say 'krpc_debug_action', and then define a KrpcDebugActionNoFail() and a few other related things which is all stuff that I'll be touching in my patch anyways.



-- 
To view, visit http://gerrit.cloudera.org:8080/12672
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia9693151c35e02235665b3c285a48c585973d390
Gerrit-Change-Number: 12672
Gerrit-PatchSet: 2
Gerrit-Owner: Andrew Sherman <as...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Thomas Marshall <tm...@cloudera.com>
Gerrit-Comment-Date: Fri, 08 Mar 2019 21:07:37 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-8143: Enhance DoRpcWithRetry().

Posted by "Andrew Sherman (Code Review)" <ge...@cloudera.org>.
Andrew Sherman has uploaded a new patch set (#6). ( http://gerrit.cloudera.org:8080/12672 )

Change subject: IMPALA-8143: Enhance DoRpcWithRetry().
......................................................................

IMPALA-8143: Enhance DoRpcWithRetry().

Allow callers of RpcMgr::DoRpcWithRetry to specify a time to sleep if
the remote service is busy. Allow callers to specify a logging function
that is called for rpc failures.

Reviewers should note that the following changes to retry parameters
were made:
 Cancel: add a 3 second sleep if service is busy.
 RemoteShutdown: add a 3 second sleep if service is busy.

TESTING:

Ran all end-to-end tests.

Add a new test to rpc-mgr-test.cc which tests RpcMgr::DoRpcWithRetry in
two ways. Firstly we test the case where the remote service is busy
(which we force by filling up the queue), and secondly we exercise
DoRpcWithRetry by using a fake Proxy that simulates failures.

Clean up unused values of FaultInjectionUtil.RpcCallType and match
values with test_rpc_timeout.py.

Change-Id: Ia9693151c35e02235665b3c285a48c585973d390
---
M be/src/rpc/impala-service-pool.cc
M be/src/rpc/impala-service-pool.h
M be/src/rpc/rpc-mgr-test.cc
M be/src/rpc/rpc-mgr-test.h
M be/src/rpc/rpc-mgr.h
M be/src/rpc/rpc-mgr.inline.h
M be/src/runtime/coordinator-backend-state.cc
M be/src/runtime/query-state.h
M be/src/service/client-request-state.cc
M be/src/service/control-service.cc
M be/src/service/control-service.h
M be/src/testutil/fault-injection-util.h
M tests/custom_cluster/test_rpc_timeout.py
13 files changed, 302 insertions(+), 58 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/72/12672/6
-- 
To view, visit http://gerrit.cloudera.org:8080/12672
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ia9693151c35e02235665b3c285a48c585973d390
Gerrit-Change-Number: 12672
Gerrit-PatchSet: 6
Gerrit-Owner: Andrew Sherman <as...@cloudera.com>
Gerrit-Reviewer: Andrew Sherman <as...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Thomas Marshall <tm...@cloudera.com>

[Impala-ASF-CR] IMPALA-8143: Enhance DoRpcWithRetry().

Posted by "Thomas Marshall (Code Review)" <ge...@cloudera.org>.
Thomas Marshall has posted comments on this change. ( http://gerrit.cloudera.org:8080/12672 )

Change subject: IMPALA-8143: Enhance DoRpcWithRetry().
......................................................................


Patch Set 3: Code-Review+1

(4 comments)

Looks good, thanks for doing this. +1 for now to give Michael a change to take a look

http://gerrit.cloudera.org:8080/#/c/12672/3/be/src/rpc/rpc-mgr.h
File be/src/rpc/rpc-mgr.h:

http://gerrit.cloudera.org:8080/#/c/12672/3/be/src/rpc/rpc-mgr.h@29
PS3, Line 29: using kudu::MonoDelta;
please don't use 'using' in header files


http://gerrit.cloudera.org:8080/#/c/12672/3/be/src/rpc/rpc-mgr.h@181
PS3, Line 181: noop
maybe name this something more descriptive, eg. logging_noop() and/or mention in the comment that this is used as the logging function


http://gerrit.cloudera.org:8080/#/c/12672/3/be/src/rpc/rpc-mgr.inline.h
File be/src/rpc/rpc-mgr.inline.h:

http://gerrit.cloudera.org:8080/#/c/12672/3/be/src/rpc/rpc-mgr.inline.h@30
PS3, Line 30: using kudu::MonoDelta;
please don't use 'using' in header files


http://gerrit.cloudera.org:8080/#/c/12672/3/be/src/rpc/rpc-mgr.inline.h@61
PS3, Line 61: using namespace std
remove this



-- 
To view, visit http://gerrit.cloudera.org:8080/12672
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia9693151c35e02235665b3c285a48c585973d390
Gerrit-Change-Number: 12672
Gerrit-PatchSet: 3
Gerrit-Owner: Andrew Sherman <as...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Thomas Marshall <tm...@cloudera.com>
Gerrit-Comment-Date: Tue, 26 Mar 2019 19:04:28 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-8143: Enhance DoRpcWithRetry().

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/12672 )

Change subject: IMPALA-8143: Enhance DoRpcWithRetry().
......................................................................


Patch Set 2:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/2393/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/12672
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia9693151c35e02235665b3c285a48c585973d390
Gerrit-Change-Number: 12672
Gerrit-PatchSet: 2
Gerrit-Owner: Andrew Sherman <as...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Thomas Marshall <tm...@cloudera.com>
Gerrit-Comment-Date: Fri, 08 Mar 2019 18:03:50 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8143: Enhance DoRpcWithRetry().

Posted by "Andrew Sherman (Code Review)" <ge...@cloudera.org>.
Andrew Sherman has uploaded a new patch set (#5). ( http://gerrit.cloudera.org:8080/12672 )

Change subject: IMPALA-8143: Enhance DoRpcWithRetry().
......................................................................

IMPALA-8143: Enhance DoRpcWithRetry().

Allow callers of RpcMgr::DoRpcWithRetry to specify a time to sleep if
the remote service is busy. Allow callers to specify a logging function
that is called for rpc failures.

Reviewers should note that the following changes to retry parameters
were made:
 Cancel: add a 3 second sleep if service is busy.
 RemoteShutdown: add a 3 second sleep if service is busy.

TESTING:

Ran all end-to-end tests.

Add a new test to rpc-mgr-test.cc which tests RpcMgr::DoRpcWithRetry in
two ways. Firstly we test the case where the remote service is busy
(which we force by filling up the queue), and secondly we exercise
DoRpcWithRetry by using a fake Proxy that simulates failures.

Clean up unused values of FaultInjectionUtil.RpcCallType and match
values with test_rpc_timeout.py.

Change-Id: Ia9693151c35e02235665b3c285a48c585973d390
---
M be/src/rpc/impala-service-pool.cc
M be/src/rpc/impala-service-pool.h
M be/src/rpc/rpc-mgr-test.cc
M be/src/rpc/rpc-mgr-test.h
M be/src/rpc/rpc-mgr.h
M be/src/rpc/rpc-mgr.inline.h
M be/src/runtime/coordinator-backend-state.cc
M be/src/runtime/query-state.h
M be/src/service/client-request-state.cc
M be/src/service/control-service.cc
M be/src/service/control-service.h
M be/src/testutil/fault-injection-util.h
M tests/custom_cluster/test_rpc_timeout.py
13 files changed, 302 insertions(+), 58 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/72/12672/5
-- 
To view, visit http://gerrit.cloudera.org:8080/12672
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ia9693151c35e02235665b3c285a48c585973d390
Gerrit-Change-Number: 12672
Gerrit-PatchSet: 5
Gerrit-Owner: Andrew Sherman <as...@cloudera.com>
Gerrit-Reviewer: Andrew Sherman <as...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Thomas Marshall <tm...@cloudera.com>

[Impala-ASF-CR] IMPALA-8143: Enhance DoRpcWithRetry().

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/12672 )

Change subject: IMPALA-8143: Enhance DoRpcWithRetry().
......................................................................


Patch Set 6:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/2554/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/12672
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia9693151c35e02235665b3c285a48c585973d390
Gerrit-Change-Number: 12672
Gerrit-PatchSet: 6
Gerrit-Owner: Andrew Sherman <as...@cloudera.com>
Gerrit-Reviewer: Andrew Sherman <as...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Thomas Marshall <tm...@cloudera.com>
Gerrit-Comment-Date: Tue, 26 Mar 2019 23:17:27 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8143: Enhance DoRpcWithRetry().

Posted by "Andrew Sherman (Code Review)" <ge...@cloudera.org>.
Andrew Sherman has uploaded a new patch set (#7). ( http://gerrit.cloudera.org:8080/12672 )

Change subject: IMPALA-8143: Enhance DoRpcWithRetry().
......................................................................

IMPALA-8143: Enhance DoRpcWithRetry().

Allow callers of RpcMgr::DoRpcWithRetry to specify a time to sleep if
the remote service is busy. DoRpcWithRetry now only sleeps if the remote
service is busy.

TESTING:

Ran all end-to-end tests.

Add a new test to rpc-mgr-test.cc which tests RpcMgr::DoRpcWithRetry in
two ways. Firstly we test the case where the remote service is busy
(which we force by filling up the queue), and secondly we exercise
DoRpcWithRetry by using a fake Proxy that simulates failures.

Clean up unused values of FaultInjectionUtil.RpcCallType and match
values with test_rpc_timeout.py.

Change-Id: Ia9693151c35e02235665b3c285a48c585973d390
---
M be/src/rpc/impala-service-pool.cc
M be/src/rpc/impala-service-pool.h
M be/src/rpc/rpc-mgr-test.cc
M be/src/rpc/rpc-mgr-test.h
M be/src/rpc/rpc-mgr.h
M be/src/rpc/rpc-mgr.inline.h
M be/src/runtime/coordinator-backend-state.cc
M be/src/runtime/query-state.h
M be/src/service/client-request-state.cc
M be/src/service/control-service.cc
M be/src/service/control-service.h
M be/src/testutil/fault-injection-util.h
M common/protobuf/rpc_test.proto
M tests/custom_cluster/test_rpc_timeout.py
14 files changed, 274 insertions(+), 62 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/72/12672/7
-- 
To view, visit http://gerrit.cloudera.org:8080/12672
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ia9693151c35e02235665b3c285a48c585973d390
Gerrit-Change-Number: 12672
Gerrit-PatchSet: 7
Gerrit-Owner: Andrew Sherman <as...@cloudera.com>
Gerrit-Reviewer: Andrew Sherman <as...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Thomas Marshall <tm...@cloudera.com>

[Impala-ASF-CR] IMPALA-8143: Enhance DoRpcWithRetry().

Posted by "Andrew Sherman (Code Review)" <ge...@cloudera.org>.
Andrew Sherman has uploaded a new patch set (#3). ( http://gerrit.cloudera.org:8080/12672 )

Change subject: IMPALA-8143: Enhance DoRpcWithRetry().
......................................................................

IMPALA-8143: Enhance DoRpcWithRetry().

Allow callers of RpcMgr::DoRpcWithRetry to specify a time to sleep if
the remote service is busy. Allow callers to specify a logging function
that is called for rpc failures.

Reviewers should note that the following changes to retry parameters
were made:
 Cancel: add a 3 second sleep if service is busy.
 RemoteShutdown: add a 3 second sleep if service is busy.

TESTING:

Ran all end-to-end tests.

Add a new test to rpc-mgr-test.cc which tests RpcMgr::DoRpcWithRetry in
two ways. Firstly we test the case where the remote service is busy
(which we force by filling up the queue), and secondly we exercise
DoRpcWithRetry by using a fake Proxy that simulates failures.

Clean up unused values of FaultInjectionUtil.RpcCallType and match
values with test_rpc_timeout.py.

Change-Id: Ia9693151c35e02235665b3c285a48c585973d390
---
M be/src/rpc/impala-service-pool.cc
M be/src/rpc/impala-service-pool.h
M be/src/rpc/rpc-mgr-test.cc
M be/src/rpc/rpc-mgr-test.h
M be/src/rpc/rpc-mgr.h
M be/src/rpc/rpc-mgr.inline.h
M be/src/runtime/coordinator-backend-state.cc
M be/src/runtime/query-state.h
M be/src/service/client-request-state.cc
M be/src/service/control-service.cc
M be/src/service/control-service.h
M be/src/testutil/fault-injection-util.h
M tests/custom_cluster/test_rpc_timeout.py
13 files changed, 309 insertions(+), 58 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/72/12672/3
-- 
To view, visit http://gerrit.cloudera.org:8080/12672
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ia9693151c35e02235665b3c285a48c585973d390
Gerrit-Change-Number: 12672
Gerrit-PatchSet: 3
Gerrit-Owner: Andrew Sherman <as...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Thomas Marshall <tm...@cloudera.com>

[Impala-ASF-CR] IMPALA-8143: Enhance DoRpcWithRetry().

Posted by "Andrew Sherman (Code Review)" <ge...@cloudera.org>.
Andrew Sherman has uploaded a new patch set (#12). ( http://gerrit.cloudera.org:8080/12672 )

Change subject: IMPALA-8143: Enhance DoRpcWithRetry().
......................................................................

IMPALA-8143: Enhance DoRpcWithRetry().

Allow callers of RpcMgr::DoRpcWithRetry to specify a time to sleep if
the remote service is busy. DoRpcWithRetry now only sleeps if the remote
service is busy.

TESTING:

Ran all end-to-end tests.

Add two new tests to rpc-mgr-test.cc which test RpcMgr::DoRpcWithRetry.
One test uses a fake Proxy class, the other uses a new DebugAction to
cause Krpc calls to be rejected as if the service was too busy.

Change-Id: Ia9693151c35e02235665b3c285a48c585973d390
---
M be/src/rpc/impala-service-pool.cc
M be/src/rpc/impala-service-pool.h
M be/src/rpc/rpc-mgr-test.cc
M be/src/rpc/rpc-mgr-test.h
M be/src/rpc/rpc-mgr.h
M be/src/rpc/rpc-mgr.inline.h
M be/src/runtime/coordinator-backend-state.cc
M be/src/runtime/query-state.h
M be/src/service/client-request-state.cc
M be/src/service/control-service.h
10 files changed, 189 insertions(+), 44 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/72/12672/12
-- 
To view, visit http://gerrit.cloudera.org:8080/12672
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ia9693151c35e02235665b3c285a48c585973d390
Gerrit-Change-Number: 12672
Gerrit-PatchSet: 12
Gerrit-Owner: Andrew Sherman <as...@cloudera.com>
Gerrit-Reviewer: Andrew Sherman <as...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Thomas Marshall <tm...@cloudera.com>

[Impala-ASF-CR] IMPALA-8143: Enhance DoRpcWithRetry().

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/12672 )

Change subject: IMPALA-8143: Enhance DoRpcWithRetry().
......................................................................


Patch Set 13: Verified+1


-- 
To view, visit http://gerrit.cloudera.org:8080/12672
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia9693151c35e02235665b3c285a48c585973d390
Gerrit-Change-Number: 12672
Gerrit-PatchSet: 13
Gerrit-Owner: Andrew Sherman <as...@cloudera.com>
Gerrit-Reviewer: Andrew Sherman <as...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Thomas Marshall <tm...@cloudera.com>
Gerrit-Comment-Date: Thu, 13 Jun 2019 02:15:28 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8143: Enhance DoRpcWithRetry().

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/12672 )

Change subject: IMPALA-8143: Enhance DoRpcWithRetry().
......................................................................


Patch Set 4:

Build Failed 

https://jenkins.impala.io/job/gerrit-code-review-checks/2552/ : Initial code review checks failed. See linked job for details on the failure.


-- 
To view, visit http://gerrit.cloudera.org:8080/12672
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia9693151c35e02235665b3c285a48c585973d390
Gerrit-Change-Number: 12672
Gerrit-PatchSet: 4
Gerrit-Owner: Andrew Sherman <as...@cloudera.com>
Gerrit-Reviewer: Andrew Sherman <as...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Thomas Marshall <tm...@cloudera.com>
Gerrit-Comment-Date: Tue, 26 Mar 2019 22:27:37 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8143: Enhance DoRpcWithRetry().

Posted by "Andrew Sherman (Code Review)" <ge...@cloudera.org>.
Andrew Sherman has posted comments on this change. ( http://gerrit.cloudera.org:8080/12672 )

Change subject: IMPALA-8143: Enhance DoRpcWithRetry().
......................................................................


Patch Set 12:

(5 comments)

Thanks Michael, I believe I addressed the nits. All tests pass.

http://gerrit.cloudera.org:8080/#/c/12672/11/be/src/rpc/impala-service-pool.h
File be/src/rpc/impala-service-pool.h:

http://gerrit.cloudera.org:8080/#/c/12672/11/be/src/rpc/impala-service-pool.h@72
PS11, Line 72: char* 
> char*
Done


http://gerrit.cloudera.org:8080/#/c/12672/11/be/src/rpc/impala-service-pool.cc
File be/src/rpc/impala-service-pool.cc:

http://gerrit.cloudera.org:8080/#/c/12672/11/be/src/rpc/impala-service-pool.cc@192
PS11, Line 192: UNLIKELY(!debug_st
> UNLIKELY(!debug_status.ok())
Done


http://gerrit.cloudera.org:8080/#/c/12672/11/be/src/rpc/rpc-mgr-test.h
File be/src/rpc/rpc-mgr-test.h:

http://gerrit.cloudera.org:8080/#/c/12672/11/be/src/rpc/rpc-mgr-test.h@281
PS11, Line 281: GetNumberOfCalls()
> nit: number_of_calls() or GetNumberOfCalls()
Done


http://gerrit.cloudera.org:8080/#/c/12672/11/be/src/rpc/rpc-mgr-test.cc
File be/src/rpc/rpc-mgr-test.cc:

http://gerrit.cloudera.org:8080/#/c/12672/11/be/src/rpc/rpc-mgr-test.cc@331
PS11, Line 331:     // DoRpcWithRetry will fail with probability (1/2)^num_rpc_retry_calls.
              :     ASSERT_TRUE(status.ok());
              :   }
              :   // There will be no overflows (i.e. service too busy) with probability
              :   // (1/2)^num_retries.
> :-).
:-)
Thanks for thinking about this!


http://gerrit.cloudera.org:8080/#/c/12672/11/be/src/rpc/rpc-mgr.h
File be/src/rpc/rpc-mgr.h:

http://gerrit.cloudera.org:8080/#/c/12672/11/be/src/rpc/rpc-mgr.h@183
PS11, Line 183:   /// Pass 'debug_action' to DebugAction() to potentially inject errors.
> Please add a TODO:
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/12672
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia9693151c35e02235665b3c285a48c585973d390
Gerrit-Change-Number: 12672
Gerrit-PatchSet: 12
Gerrit-Owner: Andrew Sherman <as...@cloudera.com>
Gerrit-Reviewer: Andrew Sherman <as...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Thomas Marshall <tm...@cloudera.com>
Gerrit-Comment-Date: Fri, 07 Jun 2019 16:49:04 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-8143: Enhance DoRpcWithRetry().

Posted by "Andrew Sherman (Code Review)" <ge...@cloudera.org>.
Andrew Sherman has uploaded a new patch set (#8). ( http://gerrit.cloudera.org:8080/12672 )

Change subject: IMPALA-8143: Enhance DoRpcWithRetry().
......................................................................

IMPALA-8143: Enhance DoRpcWithRetry().

Allow callers of RpcMgr::DoRpcWithRetry to specify a time to sleep if
the remote service is busy. DoRpcWithRetry now only sleeps if the remote
service is busy.

TESTING:

Ran all end-to-end tests.

Add a new test to rpc-mgr-test.cc which tests RpcMgr::DoRpcWithRetry in
two ways. Firstly we test the case where the remote service is busy
(which we force by filling up the queue), and secondly we exercise
DoRpcWithRetry by using a fake Proxy that simulates failures.

Clean up unused values of FaultInjectionUtil.RpcCallType and match
values with test_rpc_timeout.py.

Change-Id: Ia9693151c35e02235665b3c285a48c585973d390
---
M be/src/rpc/impala-service-pool.cc
M be/src/rpc/impala-service-pool.h
M be/src/rpc/rpc-mgr-test.cc
M be/src/rpc/rpc-mgr-test.h
M be/src/rpc/rpc-mgr.h
M be/src/rpc/rpc-mgr.inline.h
M be/src/runtime/coordinator-backend-state.cc
M be/src/runtime/query-state.h
M be/src/service/client-request-state.cc
M be/src/service/control-service.cc
M be/src/service/control-service.h
M be/src/testutil/fault-injection-util.h
M common/protobuf/rpc_test.proto
M tests/custom_cluster/test_rpc_timeout.py
14 files changed, 268 insertions(+), 62 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/72/12672/8
-- 
To view, visit http://gerrit.cloudera.org:8080/12672
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ia9693151c35e02235665b3c285a48c585973d390
Gerrit-Change-Number: 12672
Gerrit-PatchSet: 8
Gerrit-Owner: Andrew Sherman <as...@cloudera.com>
Gerrit-Reviewer: Andrew Sherman <as...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Thomas Marshall <tm...@cloudera.com>

[Impala-ASF-CR] IMPALA-8143: Enhance DoRpcWithRetry().

Posted by "Andrew Sherman (Code Review)" <ge...@cloudera.org>.
Andrew Sherman has uploaded a new patch set (#10). ( http://gerrit.cloudera.org:8080/12672 )

Change subject: IMPALA-8143: Enhance DoRpcWithRetry().
......................................................................

IMPALA-8143: Enhance DoRpcWithRetry().

Allow callers of RpcMgr::DoRpcWithRetry to specify a time to sleep if
the remote service is busy. DoRpcWithRetry now only sleeps if the remote
service is busy.

TESTING:

Ran all end-to-end tests.

Add a new test to rpc-mgr-test.cc which tests RpcMgr::DoRpcWithRetry.
by using two fake Proxy classes.

Clean up unused values of FaultInjectionUtil.RpcCallType and match
values with test_rpc_timeout.py.

Change-Id: Ia9693151c35e02235665b3c285a48c585973d390
---
M be/src/kudu/rpc/outbound_call.h
M be/src/kudu/rpc/rpc_controller.h
M be/src/rpc/rpc-mgr-test.cc
M be/src/rpc/rpc-mgr-test.h
M be/src/rpc/rpc-mgr.h
M be/src/rpc/rpc-mgr.inline.h
M be/src/runtime/coordinator-backend-state.cc
M be/src/runtime/query-state.h
M be/src/service/client-request-state.cc
M be/src/service/control-service.cc
M be/src/service/control-service.h
M be/src/testutil/fault-injection-util.h
M tests/custom_cluster/test_rpc_timeout.py
13 files changed, 229 insertions(+), 58 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/72/12672/10
-- 
To view, visit http://gerrit.cloudera.org:8080/12672
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ia9693151c35e02235665b3c285a48c585973d390
Gerrit-Change-Number: 12672
Gerrit-PatchSet: 10
Gerrit-Owner: Andrew Sherman <as...@cloudera.com>
Gerrit-Reviewer: Andrew Sherman <as...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Thomas Marshall <tm...@cloudera.com>

[Impala-ASF-CR] IMPALA-8143: Enhance DoRpcWithRetry().

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/12672 )

Change subject: IMPALA-8143: Enhance DoRpcWithRetry().
......................................................................


Patch Set 13: Code-Review+2


-- 
To view, visit http://gerrit.cloudera.org:8080/12672
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia9693151c35e02235665b3c285a48c585973d390
Gerrit-Change-Number: 12672
Gerrit-PatchSet: 13
Gerrit-Owner: Andrew Sherman <as...@cloudera.com>
Gerrit-Reviewer: Andrew Sherman <as...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Thomas Marshall <tm...@cloudera.com>
Gerrit-Comment-Date: Wed, 12 Jun 2019 20:27:35 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8143: Enhance DoRpcWithRetry().

Posted by "Andrew Sherman (Code Review)" <ge...@cloudera.org>.
Andrew Sherman has posted comments on this change. ( http://gerrit.cloudera.org:8080/12672 )

Change subject: IMPALA-8143: Enhance DoRpcWithRetry().
......................................................................


Patch Set 10:

(1 comment)

Patch set 11 uses the agreed approach.

http://gerrit.cloudera.org:8080/#/c/12672/10/be/src/rpc/rpc-mgr-test.h
File be/src/rpc/rpc-mgr-test.h:

http://gerrit.cloudera.org:8080/#/c/12672/10/be/src/rpc/rpc-mgr-test.h@322
PS10, Line 322:     kudu::rpc::ResponseCallback callback = []() {};
              :     kudu::rpc::ErrorStatusPB* error_status_pb = new kudu::rpc::ErrorStatusPB();
              :     error_status_pb->set_code(kudu::rpc::ErrorStatusPB::ERROR_SERVER_TOO_BUSY);
              :     kudu::rpc::OutboundCall* outbound_call = new kudu::rpc::OutboundCall(
              :         *conn_id_, *remote_method_, resp, controller, callback);
              :     outbound_call->status_ = kudu::Status::RemoteError("Remote error");
              :     outbound_call->state_ = kudu::rpc::OutboundCall::FINISHED_ERROR;
              :     outbound_call->error_pb_.reset(error_status_pb);
              :     controller->call_.reset(outbound_call);
> As discussed offline last week, we may be better off using a debug action i
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/12672
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia9693151c35e02235665b3c285a48c585973d390
Gerrit-Change-Number: 12672
Gerrit-PatchSet: 10
Gerrit-Owner: Andrew Sherman <as...@cloudera.com>
Gerrit-Reviewer: Andrew Sherman <as...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Thomas Marshall <tm...@cloudera.com>
Gerrit-Comment-Date: Thu, 23 May 2019 18:38:14 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-8143: Enhance DoRpcWithRetry().

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/12672 )

Change subject: IMPALA-8143: Enhance DoRpcWithRetry().
......................................................................


Patch Set 11:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/3362/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/12672
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia9693151c35e02235665b3c285a48c585973d390
Gerrit-Change-Number: 12672
Gerrit-PatchSet: 11
Gerrit-Owner: Andrew Sherman <as...@cloudera.com>
Gerrit-Reviewer: Andrew Sherman <as...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Thomas Marshall <tm...@cloudera.com>
Gerrit-Comment-Date: Thu, 23 May 2019 19:19:19 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8143: Enhance DoRpcWithRetry().

Posted by "Michael Ho (Code Review)" <ge...@cloudera.org>.
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/12672 )

Change subject: IMPALA-8143: Enhance DoRpcWithRetry().
......................................................................


Patch Set 10:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/12672/10/be/src/rpc/rpc-mgr-test.h
File be/src/rpc/rpc-mgr-test.h:

http://gerrit.cloudera.org:8080/#/c/12672/10/be/src/rpc/rpc-mgr-test.h@322
PS10, Line 322:     kudu::rpc::ResponseCallback callback = []() {};
              :     kudu::rpc::ErrorStatusPB* error_status_pb = new kudu::rpc::ErrorStatusPB();
              :     error_status_pb->set_code(kudu::rpc::ErrorStatusPB::ERROR_SERVER_TOO_BUSY);
              :     kudu::rpc::OutboundCall* outbound_call = new kudu::rpc::OutboundCall(
              :         *conn_id_, *remote_method_, resp, controller, callback);
              :     outbound_call->status_ = kudu::Status::RemoteError("Remote error");
              :     outbound_call->state_ = kudu::rpc::OutboundCall::FINISHED_ERROR;
              :     outbound_call->error_pb_.reset(error_status_pb);
              :     controller->call_.reset(outbound_call);
As discussed offline last week, we may be better off using a debug action in ImpalaServicePool to simulate the busy queue behavior instead of cooking this failure up on the client side.



-- 
To view, visit http://gerrit.cloudera.org:8080/12672
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia9693151c35e02235665b3c285a48c585973d390
Gerrit-Change-Number: 12672
Gerrit-PatchSet: 10
Gerrit-Owner: Andrew Sherman <as...@cloudera.com>
Gerrit-Reviewer: Andrew Sherman <as...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Thomas Marshall <tm...@cloudera.com>
Gerrit-Comment-Date: Thu, 09 May 2019 17:43:56 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-8143: Enhance DoRpcWithRetry().

Posted by "Andrew Sherman (Code Review)" <ge...@cloudera.org>.
Andrew Sherman has posted comments on this change. ( http://gerrit.cloudera.org:8080/12672 )

Change subject: IMPALA-8143: Enhance DoRpcWithRetry().
......................................................................


Patch Set 10:

(12 comments)

Thanks for the review. See patch set 10.

http://gerrit.cloudera.org:8080/#/c/12672/9/be/src/rpc/rpc-mgr-test.h
File be/src/rpc/rpc-mgr-test.h:

http://gerrit.cloudera.org:8080/#/c/12672/9/be/src/rpc/rpc-mgr-test.h@134
PS9, Line 134: ultipleServicesTest(Rp
> Coding style suggests this should be "GetCurrentQueueSize()". With that sai
Done


http://gerrit.cloudera.org:8080/#/c/12672/9/be/src/rpc/rpc-mgr-test.h@295
PS9, Line 295: ingServiceProxy {
             :  public:
> Instead of IOError, can we simulate the server busy error here ? Please see
I'm using your idea but keeping this class as-is for simplicity


http://gerrit.cloudera.org:8080/#/c/12672/9/be/src/rpc/rpc-mgr-test.cc
File be/src/rpc/rpc-mgr-test.cc:

http://gerrit.cloudera.org:8080/#/c/12672/9/be/src/rpc/rpc-mgr-test.cc@269
PS9, Line 269: uery_c
> typo
Done


http://gerrit.cloudera.org:8080/#/c/12672/9/be/src/rpc/rpc-mgr-test.cc@285
PS9, Line 285: 
             :   // Test injection of failures with DebugAction.
             :   query_ctx.client_request.query_options.__set_debug_action("DoRpcWithRetry:FAIL");
             :   PingRequestPB request3;
             :   PingResponsePB response3;
             :   Status inject_status = RpcMgr::DoRpcWithRetry(busy_proxy, &BusyPingServiceProxy::Ping,
             :       request3, &response3, query_ctx, "ping failed", num_retries, timeout_ms, 0,
             :       "DoRpcWithRetry");
             :   ASSERT_FALSE(inject_status.ok());
             :   EXPECT_ERROR(inject_status, TErrorCode::INTERNAL_ERROR);
             :   ASSERT_EQ("Debug Action: DoRpcWithRetry:FAIL", inject_status.msg().msg());
             : }
             : 
             : } // namespace impala
             : 
             : int main(int argc, char** argv) {
             :   ::testing::InitGoogleTest(&argc, argv);
             :   impala::InitCommonRuntime(argc, argv, true, impala::TestInfo::BE_TEST);
             :   impala::InitFeSupport();
             : 
             :   // Fill in the path of the current binary for use by the tests.
             :   CURRENT_EXECUTABLE_PATH = argv[0];
             :   return RUN_ALL_TESTS();
             : }
             : 
             : 
             : 
             : 
             : 
             : 
             : 
             : 
             : 
             : 
             : 
             : 
             : 
             : 
             : 
             : 
             : 
             : 
             : 
             : 
             : 
             : 
             : 
             : 
             : 
             : 
             : 
             : 
             : 
             : 
             : 
             : 
             : 
             : 
             : 
             : 
             : 
             : 
             : 
             : 
             : 
             : 
             : 
             : 
> Stepping back a bit, it's unclear the exact purposes of these tests are. 
Done


http://gerrit.cloudera.org:8080/#/c/12672/9/be/src/rpc/rpc-mgr-test.cc@376
PS9, Line 376: 
             : 
             : 
> Not entirely sure this test is needed as the EE test should exercise it alr
This test is small and ensures that the DebugAction code is working correctly. The EE test seems fragile to me in that it does not check if an error actually was injected.


http://gerrit.cloudera.org:8080/#/c/12672/9/be/src/rpc/rpc-mgr-test.cc@380
PS9, Line 380: 
             : 
             : 
             : 
             : 
             : 
             : 
             : 
             : 
             : 
             : 
             : 
> This test seems to be sufficient for the coverage we need if we make the fo
Done, though note that there is some slightly tricky code to make the service appear busy - we can't just return a ServerBusy status, instead we have to inject an error into the Rpc Controller.


http://gerrit.cloudera.org:8080/#/c/12672/10/be/src/rpc/rpc-mgr.inline.h
File be/src/rpc/rpc-mgr.inline.h:

http://gerrit.cloudera.org:8080/#/c/12672/10/be/src/rpc/rpc-mgr.inline.h@67
PS10, Line 67:       rpc_status = DebugAction(query_ctx.client_request.query_options, debug_action);
I think this is still needed by existing tests. I'll leave Thomas to sort out the future solution.


http://gerrit.cloudera.org:8080/#/c/12672/9/be/src/rpc/rpc-mgr.inline.h
File be/src/rpc/rpc-mgr.inline.h:

http://gerrit.cloudera.org:8080/#/c/12672/9/be/src/rpc/rpc-mgr.inline.h@75
PS9, Line 75:     // If the call succeeded, or the server is not busy, then return result to caller.
            :     if (rpc_status.ok() || !RpcMgr::IsServerTooBusy(rpc_controller)) {
            :       return rpc_status;
            :     }
            : 
> if (rpc_status.ok() || !RpcMgr::IsServerTooBusy(rpc_controller)) {
Done


http://gerrit.cloudera.org:8080/#/c/12672/9/be/src/runtime/coordinator-backend-state.cc
File be/src/runtime/coordinator-backend-state.cc:

http://gerrit.cloudera.org:8080/#/c/12672/9/be/src/runtime/coordinator-backend-state.cc@465
PS9, Line 465: 
             :   Status rpc_status =
> nit: May make for better documentation if we have these couple of parameter
much nicer, thanks


http://gerrit.cloudera.org:8080/#/c/12672/9/be/src/service/client-request-state.cc
File be/src/service/client-request-state.cc:

http://gerrit.cloudera.org:8080/#/c/12672/9/be/src/service/client-request-state.cc@684
PS9, Line 684: 
             :   Status shutdown_status
> nit: May make for better documentation if we have these couple of parameter
Thanks


http://gerrit.cloudera.org:8080/#/c/12672/9/be/src/testutil/fault-injection-util.h
File be/src/testutil/fault-injection-util.h:

http://gerrit.cloudera.org:8080/#/c/12672/9/be/src/testutil/fault-injection-util.h@37
PS9, Line 37:     RPC_TRANSMITDATA,
> This can also be removed.
Seems to be used in data-stream-service.cc ?


http://gerrit.cloudera.org:8080/#/c/12672/9/be/src/testutil/fault-injection-util.h@40
PS9, Line 40:     // If you add a new RpcCallType then bump the "--fault_injection_rpc_type=X" parameter
            :     // in the test test_random_rpc_timeout() in test_rpc_timeout.py.
> Looks like we forgot to update the tests when adding fault injection for re
Thanks



-- 
To view, visit http://gerrit.cloudera.org:8080/12672
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia9693151c35e02235665b3c285a48c585973d390
Gerrit-Change-Number: 12672
Gerrit-PatchSet: 10
Gerrit-Owner: Andrew Sherman <as...@cloudera.com>
Gerrit-Reviewer: Andrew Sherman <as...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Thomas Marshall <tm...@cloudera.com>
Gerrit-Comment-Date: Fri, 19 Apr 2019 16:19:28 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-8143: Enhance DoRpcWithRetry().

Posted by "Andrew Sherman (Code Review)" <ge...@cloudera.org>.
Andrew Sherman has posted comments on this change. ( http://gerrit.cloudera.org:8080/12672 )

Change subject: IMPALA-8143: Enhance DoRpcWithRetry().
......................................................................


Patch Set 3:

(4 comments)

Thanks for the quick review

http://gerrit.cloudera.org:8080/#/c/12672/3/be/src/rpc/rpc-mgr.h
File be/src/rpc/rpc-mgr.h:

http://gerrit.cloudera.org:8080/#/c/12672/3/be/src/rpc/rpc-mgr.h@29
PS3, Line 29: using kudu::MonoDelta;
> please don't use 'using' in header files
Done


http://gerrit.cloudera.org:8080/#/c/12672/3/be/src/rpc/rpc-mgr.h@181
PS3, Line 181: noop
> maybe name this something more descriptive, eg. logging_noop() and/or menti
Done


http://gerrit.cloudera.org:8080/#/c/12672/3/be/src/rpc/rpc-mgr.inline.h
File be/src/rpc/rpc-mgr.inline.h:

http://gerrit.cloudera.org:8080/#/c/12672/3/be/src/rpc/rpc-mgr.inline.h@30
PS3, Line 30: using kudu::MonoDelta;
> please don't use 'using' in header files
Done


http://gerrit.cloudera.org:8080/#/c/12672/3/be/src/rpc/rpc-mgr.inline.h@61
PS3, Line 61: using namespace std
> remove this
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/12672
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia9693151c35e02235665b3c285a48c585973d390
Gerrit-Change-Number: 12672
Gerrit-PatchSet: 3
Gerrit-Owner: Andrew Sherman <as...@cloudera.com>
Gerrit-Reviewer: Andrew Sherman <as...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Thomas Marshall <tm...@cloudera.com>
Gerrit-Comment-Date: Tue, 26 Mar 2019 21:43:23 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-8143: Enhance DoRpcWithRetry().

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/12672 )

Change subject: IMPALA-8143: Enhance DoRpcWithRetry().
......................................................................


Patch Set 10:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/2844/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/12672
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia9693151c35e02235665b3c285a48c585973d390
Gerrit-Change-Number: 12672
Gerrit-PatchSet: 10
Gerrit-Owner: Andrew Sherman <as...@cloudera.com>
Gerrit-Reviewer: Andrew Sherman <as...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Thomas Marshall <tm...@cloudera.com>
Gerrit-Comment-Date: Fri, 19 Apr 2019 16:55:05 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8143: Enhance DoRpcWithRetry().

Posted by "Thomas Marshall (Code Review)" <ge...@cloudera.org>.
Thomas Marshall has posted comments on this change. ( http://gerrit.cloudera.org:8080/12672 )

Change subject: IMPALA-8143: Enhance DoRpcWithRetry().
......................................................................


Patch Set 6:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/12672/5/be/src/rpc/rpc-mgr.h
File be/src/rpc/rpc-mgr.h:

http://gerrit.cloudera.org:8080/#/c/12672/5/be/src/rpc/rpc-mgr.h@201
PS5, Line 201: const std::unique_ptr<Proxy>& proxy,
             :       const ProxyMethod& rpc_call, const Request& request, Response* response,
             :       const TQueryCtx& query_ctx, const char* error_msg, const int times_to_try,
             :       const kudu::MonoDelta& timeout, const int server_busy_backoff_s = 0,
             :       const char* debug_action = nullptr)
> Although this function is inlined, It has grown over time to take too many 
My debug action patch will eliminate the 'error_msg', 'debug_action', and 'query_ctx' parameters, assuming that the approach I've taken there will pass review.



-- 
To view, visit http://gerrit.cloudera.org:8080/12672
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia9693151c35e02235665b3c285a48c585973d390
Gerrit-Change-Number: 12672
Gerrit-PatchSet: 6
Gerrit-Owner: Andrew Sherman <as...@cloudera.com>
Gerrit-Reviewer: Andrew Sherman <as...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Thomas Marshall <tm...@cloudera.com>
Gerrit-Comment-Date: Thu, 28 Mar 2019 16:56:01 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-8143: Enhance DoRpcWithRetry().

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/12672 )

Change subject: IMPALA-8143: Enhance DoRpcWithRetry().
......................................................................


Patch Set 12:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/3538/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/12672
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia9693151c35e02235665b3c285a48c585973d390
Gerrit-Change-Number: 12672
Gerrit-PatchSet: 12
Gerrit-Owner: Andrew Sherman <as...@cloudera.com>
Gerrit-Reviewer: Andrew Sherman <as...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Thomas Marshall <tm...@cloudera.com>
Gerrit-Comment-Date: Fri, 07 Jun 2019 17:28:40 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8143: Enhance DoRpcWithRetry().

Posted by "Michael Ho (Code Review)" <ge...@cloudera.org>.
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/12672 )

Change subject: IMPALA-8143: Enhance DoRpcWithRetry().
......................................................................


Patch Set 2:

(19 comments)

http://gerrit.cloudera.org:8080/#/c/12672/5/be/src/rpc/rpc-mgr-test.h
File be/src/rpc/rpc-mgr-test.h:

http://gerrit.cloudera.org:8080/#/c/12672/5/be/src/rpc/rpc-mgr-test.h@296
PS5, Line 296: ERR
typo


http://gerrit.cloudera.org:8080/#/c/12672/5/be/src/rpc/rpc-mgr-test.cc
File be/src/rpc/rpc-mgr-test.cc:

http://gerrit.cloudera.org:8080/#/c/12672/5/be/src/rpc/rpc-mgr-test.cc@214
PS5, Line 214:   ASSERT_OK(rpc_mgr_.RegisterService(10, 10, scan_mem_impl,
             :       static_cast<
Not sure if it's output of clang-tidy or something but I find it a bit hard to ready to split it like this. Can't we keep these 2 lines unchanged ?


http://gerrit.cloudera.org:8080/#/c/12672/5/be/src/rpc/rpc-mgr-test.cc@263
PS5, Line 263: ueue f
typo


http://gerrit.cloudera.org:8080/#/c/12672/5/be/src/rpc/rpc-mgr-test.cc@291
PS5, Line 291: 
nullptr;

Also do you want to assert that it's zero ?


http://gerrit.cloudera.org:8080/#/c/12672/5/be/src/rpc/rpc-mgr-test.cc@298
PS5, Line 298: // Configure the service 
not used ?


http://gerrit.cloudera.org:8080/#/c/12672/5/be/src/rpc/rpc-mgr-test.cc@316
PS5, Line 316: ASSERT_OK(ping_
Instead of sharing "sleep_ms" between RPCs, it may be cleaner to pass the sleep time as RPC argument instead.


http://gerrit.cloudera.org:8080/#/c/12672/5/be/src/rpc/rpc-mgr-test.cc@322
PS5, Line 322: // A lambda t
proxy->


http://gerrit.cloudera.org:8080/#/c/12672/5/be/src/rpc/rpc-mgr-test.cc@333
PS5, Line 333: 
             :   // wait for async call to be running
             :   sleep_started.Wait();
             :   ASSERT_EQ(0, get_current_queue_size(rpc_mgr_));
             : 
             :   // Do a second async call that will fill up the queue
             :   RpcController async_controller2;
             :   CountingBarrier async2_done(1);
             :   // Reset the sleep time
             :   sleep_ms = 100;
             :   // Set the deadline to something reasonable otherwise the pings from DoRpcWithRetry
             :   // will replace this async call in the queue.
             :   MonoTime deadline = MonoTime::Now() + MonoDelta::FromSeconds(10);
             :   async_controller2.set_deadline(deadline);
             :   ResponseCallback async2_callback = [&]() { async2_done.Notify(); };
             :   proxy.get()->PingAsync(request, &response, &async_controller2, async2_callback);
             : 
             :   // Wait for second async to get queued
             :   SleepForMs(300);
             :   // Check the queue size
             :   ASSERT_EQ(1, get_current_queue_size(
These set of tests seem rather timing dependent. For instance, it may occur that the first time DoRpcWithRetry() is called will succeed because the async call somehow finishes by then due to some weird scheduling order. May be better to rethink a better way to test it.


http://gerrit.cloudera.org:8080/#/c/12672/5/be/src/rpc/rpc-mgr-test.cc@378
PS5, Line 378:   ASSERT_FALSE(async2_done.pending());
             : 
             :   // Test injection of failures with DebugAction.
             :   query_ctx.client_request.query_options.__set_debug_action("DoRpcWithRetry:FAIL");
             :   Status inject_status = RpcMgr::DoRpcWithRetry(ping_rpc, query_ctx, "ping failed", 1,
             :       MonoDelta::FromSeconds(100), 0, "DoRpcWithRetry");
             :   ASSERT_FALSE(inject_status.ok());
             :   EXPECT_ERROR(inject_status, TErrorCo
Please see comments in rpc-mgr.inline.h that we probably should only retry if the remote server is busy. For all other errors, we probably should report it right away to callers. We can consider transparently handling other types of errors in the future.


http://gerrit.cloudera.org:8080/#/c/12672/5/be/src/rpc/rpc-mgr.h
File be/src/rpc/rpc-mgr.h:

http://gerrit.cloudera.org:8080/#/c/12672/5/be/src/rpc/rpc-mgr.h@178
PS5, Line 178:   void ToJson(rapidjson::Document* document);
             : 
             :   /// Retry the Rpc 'rpc_call' up to 'times_to_try' times.
             :   /// Each Rpc has a timeout of 'timeout'.
             :   /// If the service is busy then sleep 'se
Not needed as discussed in person.


http://gerrit.cloudera.org:8080/#/c/12672/5/be/src/rpc/rpc-mgr.h@189
PS5, Line 189: the R
typename


http://gerrit.cloudera.org:8080/#/c/12672/5/be/src/rpc/rpc-mgr.h@191
PS5, Line 191: n the 'l
It's important to highlight that this only works iff rpc_call is idempotent in the comments above. Any non-idempotent rpc_call using this interface may have undesired consequence.


http://gerrit.cloudera.org:8080/#/c/12672/5/be/src/rpc/rpc-mgr.h@193
PS5, Line 193: ebugAction() to potentially inject errors.
             :   template <typename F, typename L>
It seems inconsistent to pass the timeout in as MonoDelta while passing the backoff time in second. To be consistent with the rest of the code in Impala, it may be better to pass the timeout as int64_t instead (e.g int64_t timeout_ms, int64_t backoff_ms)

In addition, the backoff's unit is second, which may be too coarse grained. I would suggest using millisecond instead.


http://gerrit.cloudera.org:8080/#/c/12672/5/be/src/rpc/rpc-mgr.h@201
PS5, Line 201: ) before destroying RpcMgr";
             :   }
             : 
             :  private:
             :   /// One pool per registered service. sc
Although this function is inlined, It has grown over time to take too many arguments. It may benefit from some refactoring if possible. 

One simplification is to considering hiding the query_ctx and debug action together into a simple lambda function which gets invoked inside DoRpcWithRetry().


http://gerrit.cloudera.org:8080/#/c/12672/5/be/src/rpc/rpc-mgr.inline.h
File be/src/rpc/rpc-mgr.inline.h:

http://gerrit.cloudera.org:8080/#/c/12672/5/be/src/rpc/rpc-mgr.inline.h@57
PS5, Line 57: times_to_try, const Mono
May make sense to group this argument next to query_ctx as they are related.


http://gerrit.cloudera.org:8080/#/c/12672/5/be/src/rpc/rpc-mgr.inline.h@61
PS5, Line 61: , d
++i


http://gerrit.cloudera.org:8080/#/c/12672/5/be/src/rpc/rpc-mgr.inline.h@72
PS5, Line 72: 
Do we need to reset response between retry ?


http://gerrit.cloudera.org:8080/#/c/12672/5/be/src/rpc/rpc-mgr.inline.h@76
PS5, Line 76:       // Check for injected failures.
            :       rpc_
Based on our offline discussion, this is not needed. Please consider removing it.


http://gerrit.cloudera.org:8080/#/c/12672/5/be/src/rpc/rpc-mgr.inline.h@79
PS5, Line 79:     }
            : 
            :     const kudu::Status& k_status = rpc_call(&rpc_controller);
            :     rpc_status = FromKuduStatus(k_status, error_msg);
            :     i
So, it's not clear to me whether we should retry for errors other than RpcMgr::IsServerTooBusy(). In other words, if we hit other types of fatal errors, should we report them back to the caller right away ?

Another possible way to treat it is that the code will retry regardless of the error types, assuming that any kind of error was due to some transient errors which may go away. In which case, it may make sense to backoff regardless of the error types, right ?

What's your thought ?



-- 
To view, visit http://gerrit.cloudera.org:8080/12672
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia9693151c35e02235665b3c285a48c585973d390
Gerrit-Change-Number: 12672
Gerrit-PatchSet: 2
Gerrit-Owner: Andrew Sherman <as...@cloudera.com>
Gerrit-Reviewer: Andrew Sherman <as...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Thomas Marshall <tm...@cloudera.com>
Gerrit-Comment-Date: Thu, 28 Mar 2019 00:15:45 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-8143: Enhance DoRpcWithRetry().

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/12672 )

Change subject: IMPALA-8143: Enhance DoRpcWithRetry().
......................................................................


Patch Set 4:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/12672/4/be/src/rpc/rpc-mgr.inline.h
File be/src/rpc/rpc-mgr.inline.h:

http://gerrit.cloudera.org:8080/#/c/12672/4/be/src/rpc/rpc-mgr.inline.h@56
PS4, Line 56:     const int times_to_try, const kudu::MonoDelta& timeout, const int server_busy_backoff_s,
line too long (92 > 90)



-- 
To view, visit http://gerrit.cloudera.org:8080/12672
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia9693151c35e02235665b3c285a48c585973d390
Gerrit-Change-Number: 12672
Gerrit-PatchSet: 4
Gerrit-Owner: Andrew Sherman <as...@cloudera.com>
Gerrit-Reviewer: Andrew Sherman <as...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Thomas Marshall <tm...@cloudera.com>
Gerrit-Comment-Date: Tue, 26 Mar 2019 21:44:00 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-8143: Enhance DoRpcWithRetry().

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/12672 )

Change subject: IMPALA-8143: Enhance DoRpcWithRetry().
......................................................................


Patch Set 7:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/2594/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/12672
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia9693151c35e02235665b3c285a48c585973d390
Gerrit-Change-Number: 12672
Gerrit-PatchSet: 7
Gerrit-Owner: Andrew Sherman <as...@cloudera.com>
Gerrit-Reviewer: Andrew Sherman <as...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Thomas Marshall <tm...@cloudera.com>
Gerrit-Comment-Date: Fri, 29 Mar 2019 22:17:39 +0000
Gerrit-HasComments: No