You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@kudu.apache.org by "Alexey Serbin (Code Review)" <ge...@cloudera.org> on 2017/06/01 00:43:42 UTC

[kudu-CR] KUDU-2027 retry scan RPC if negotiation times out

Alexey Serbin has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/7037

Change subject: KUDU-2027 retry scan RPC if negotiation times out
......................................................................

KUDU-2027 retry scan RPC if negotiation times out

This patch addresses KUDU-2027, i.e. with this patch the Kudu C++ client
retries a scan RPC with other tablet replica if the connection
negotiation with current replica timed out.

Added new integration test to cover the updated client's behavior.

This patch is a follow-up for e3c5dd18c22b9e20358f05dcd301e7736c0a3321,
since KUDU-2027 is a scan-related counterpart of KUDU-1580.

Change-Id: I05a3c38cc07c9b6edd1ae773d9850e92645f3b14
---
M src/kudu/client/scanner-internal.cc
M src/kudu/integration-tests/client-negotiation-failover-itest.cc
2 files changed, 93 insertions(+), 27 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/37/7037/1
-- 
To view, visit http://gerrit.cloudera.org:8080/7037
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I05a3c38cc07c9b6edd1ae773d9850e92645f3b14
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>

[kudu-CR] KUDU-2027 retry scan RPC if negotiation times out

Posted by "Andrew Wong (Code Review)" <ge...@cloudera.org>.
Andrew Wong has posted comments on this change.

Change subject: KUDU-2027 retry scan RPC if negotiation times out
......................................................................


Patch Set 4:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/7037/4/src/kudu/client/scanner-internal.cc
File src/kudu/client/scanner-internal.cc:

PS4, Line 79: (deadline < MonoTime::Now() ||
            :        (policy == TimeoutHandlingPolicy::FAILOVER_IF_FAULT_TOLERANT &&
            :         !co
> You mean: putting every part of the statement into separate line?  I'm not 
Ah, I see, I was referring to indenting with one extra space for each line but I see now that you're just aligning with each open paren.

I was thinking it would be four-space indents for all three lines, but this is valid too, so no worries :)


-- 
To view, visit http://gerrit.cloudera.org:8080/7037
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I05a3c38cc07c9b6edd1ae773d9850e92645f3b14
Gerrit-PatchSet: 4
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: David Ribeiro Alves <da...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes

[kudu-CR] KUDU-2027 retry scan RPC if negotiation times out

Posted by "David Ribeiro Alves (Code Review)" <ge...@cloudera.org>.
David Ribeiro Alves has posted comments on this change.

Change subject: KUDU-2027 retry scan RPC if negotiation times out
......................................................................


Patch Set 2:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/7037/2/src/kudu/client/scanner-internal.cc
File src/kudu/client/scanner-internal.cc:

PS2, Line 216: if (overall_deadline == deadline && overall_deadline <= MonoTime::Now()) {
so basically this is making sure that even in the case where the overall_deadline is the deadline but we haven't exceeded it we still return a (retriable) ScanRpcStatus::RPC_DEADLINE_EXCEEDED error, right?
I think we need to at least document that behavior reasonably well.


PS2, Line 407: now + KuduClient::Data::ComputeExponentialBackoff(attempt)) - now
not sure I'm following this. isn't now + KuduClient::Data::ComputeExponentialBackoff(attempt)) - now
just equivalent to KuduClient::Data::ComputeExponentialBackoff(attempt))?


http://gerrit.cloudera.org:8080/#/c/7037/2/src/kudu/integration-tests/client-negotiation-failover-itest.cc
File src/kudu/integration-tests/client-negotiation-failover-itest.cc:

PS2, Line 40: using kudu::client::ScanToStrings;
ordering


PS2, Line 113: other
nit: not your fault but "another"


-- 
To view, visit http://gerrit.cloudera.org:8080/7037
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I05a3c38cc07c9b6edd1ae773d9850e92645f3b14
Gerrit-PatchSet: 2
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: David Ribeiro Alves <da...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes

[kudu-CR] KUDU-2027 retry scan RPC if negotiation times out

Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Todd Lipcon has posted comments on this change.

Change subject: KUDU-2027 retry scan RPC if negotiation times out
......................................................................


Patch Set 3:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/7037/3/src/kudu/client/scanner-internal.cc
File src/kudu/client/scanner-internal.cc:

PS3, Line 423:     ScanRpcStatus scan_status = SendScanRpc(deadline, configuration_.is_fault_tolerant());
on the 'Open()' RPC, we should be able to fail-over the scan even if it's not fault-tolerant. ie if we haven't yet returned any rows from the tablet, then it's safe to keep trying to open the scanner on other replicas.


http://gerrit.cloudera.org:8080/#/c/7037/3/src/kudu/integration-tests/client-negotiation-failover-itest.cc
File src/kudu/integration-tests/client-negotiation-failover-itest.cc:

PS3, Line 152: hopefully
hrm, I think you are being pretty optimistic here. If you assume hashing produces a random distribution, then I think there is only a 22% chance that they all end up in different partitions (3!/3^3 = 6/27 = 2/9). Does that affect the correctness of the test?


Line 191:     ASSERT_OK(ScanToStrings(&scanner, &row_strings));
shouldn't this still be assertable like the above, given we set READ_AT_SNAPSHOT? The replica should block until that snapshot is safe, if it isn't safe yet.


Line 196:     SCOPED_TRACE(Substitute("snapshot read at closest replica, iteration $0", i));
this should say "read-latest" not "snapshot" right?


-- 
To view, visit http://gerrit.cloudera.org:8080/7037
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I05a3c38cc07c9b6edd1ae773d9850e92645f3b14
Gerrit-PatchSet: 3
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: David Ribeiro Alves <da...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes

[kudu-CR] KUDU-2027 retry scan RPC if negotiation times out

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change.

Change subject: KUDU-2027 retry scan RPC if negotiation times out
......................................................................


Patch Set 14:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/7037/14//COMMIT_MSG
Commit Message:

PS14, Line 15: handling. Now, regardless of whether it was the RPC or the overall
             : operation timeout, the corresponding tablet server is marked as failed
             : and blacklisted.
> we don't currently invalidate this, right? Given that systems like Impala u
Right -- without this patch (i.e. currently) we don't invalidate corresponding entries in the client's meta-cache on read timeout.

As I understand, if the scan source is set to anything than LEADER_ONLY, the client should try other available replicas.  And if it runs out of replicas, marking them failed, then it will not be able to continue -- that's a good observation.

Should we re-fetch information on replicas in case of running out of available replicas in client meta-cache (stopping doing that after some tries)?

What else can we do to handle this situation if not marking failed ones?  Do some round-robin among available sources?


http://gerrit.cloudera.org:8080/#/c/7037/14/src/kudu/client/client-test.cc
File src/kudu/client/client-test.cc:

PS14, Line 1991:   // The scanner should timeout on the next batch not fetched by the Open().
               :   // The client should not retry until the overall timeout expires, giving up
               :   // after the fist RPC timeout error: the only available replica is marked
               :   // as failed by the first timeout error.
> I'm not really understanding this comment.
Agreed, it sounds like some mumbling.  Let me rephrase it:

In the scenario below, the scanner has some rows not fetched upon calling KuduScanner::Open() because scanner_max_batch_size_bytes is set to a very small value.  There will be timeout on first call of KuduScanner::NextBatch() after scanner is opened.  Since the only available entry in the client meta-cache is marked as failed after timing out on the call of KuduScanner::NextBatch(), the client has no other sources for the scanner to try and will fail right after the KuduScanner::NextBatch() call.

Is it better?


Line 2021:     ASSERT_LT(sw.elapsed().wall_millis(), kScanTimeoutMs);
> in the case that we know we won't be able to retry a scan elsewhere, should
It sounds like a good idea.

Now there is allow_time_for_failover provision as well, it's just not activated in this case because I simplified the logic at a few call sites.  I'll take a look.


-- 
To view, visit http://gerrit.cloudera.org:8080/7037
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I05a3c38cc07c9b6edd1ae773d9850e92645f3b14
Gerrit-PatchSet: 14
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: David Ribeiro Alves <da...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes

[kudu-CR] KUDU-2027 retry scan RPC if negotiation times out

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has uploaded a new patch set (#10).

Change subject: KUDU-2027 retry scan RPC if negotiation times out
......................................................................

KUDU-2027 retry scan RPC if negotiation times out

This patch addresses KUDU-2027, i.e. with this patch the Kudu C++ client
retries a scan RPC with other tablet replica if the connection
negotiation with current replica timed out.  Also addressed a TODO
in KuduScanner::Data::OpenTablet().

As a part of the fix for KUDU-2027, I updated the logic of the timeout
handling. Now, regardless of whether it was the RPC or the overall
operation timeout, the corresponding tablet server is marked as failed
and blacklisted.

Added new integration test to cover the updated client's behavior.

This patch is a follow-up for e3c5dd18c22b9e20358f05dcd301e7736c0a3321,
since KUDU-2027 is a scan-related counterpart of KUDU-1580.

Change-Id: I05a3c38cc07c9b6edd1ae773d9850e92645f3b14
---
M src/kudu/client/client-test.cc
M src/kudu/client/client.cc
M src/kudu/client/scanner-internal.cc
M src/kudu/client/scanner-internal.h
M src/kudu/integration-tests/client-negotiation-failover-itest.cc
5 files changed, 269 insertions(+), 103 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/37/7037/10
-- 
To view, visit http://gerrit.cloudera.org:8080/7037
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I05a3c38cc07c9b6edd1ae773d9850e92645f3b14
Gerrit-PatchSet: 10
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: David Ribeiro Alves <da...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] KUDU-2027 retry scan RPC if negotiation times out

Posted by "David Ribeiro Alves (Code Review)" <ge...@cloudera.org>.
David Ribeiro Alves has posted comments on this change.

Change subject: KUDU-2027 retry scan RPC if negotiation times out
......................................................................


Patch Set 5:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/7037/5/src/kudu/integration-tests/client-negotiation-failover-itest.cc
File src/kudu/integration-tests/client-negotiation-failover-itest.cc:

PS5, Line 155: ASSERT_OK(ins->mutable_row()->SetInt32(0, kNumTabletServers * i + j));
can you add assertions that you don't get per-row errors back?


PS5, Line 218: eventially
typo


-- 
To view, visit http://gerrit.cloudera.org:8080/7037
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I05a3c38cc07c9b6edd1ae773d9850e92645f3b14
Gerrit-PatchSet: 5
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: David Ribeiro Alves <da...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes

[kudu-CR] KUDU-2027 retry scan RPC if negotiation times out

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/7037

to look at the new patch set (#5).

Change subject: KUDU-2027 retry scan RPC if negotiation times out
......................................................................

KUDU-2027 retry scan RPC if negotiation times out

This patch addresses KUDU-2027, i.e. with this patch the Kudu C++ client
retries a scan RPC with other tablet replica if the connection
negotiation with current replica timed out.  Also addressed a TODO
in KuduScanner::Data::OpenTablet().

As a part of the fix for KUDU-2027, I updated the logic of the timeout
handling. Now, regardless of whether it was the RPC or the overall
operation timeout, the corresponding tablet server is marked as failed
and blacklisted.

Added new integration test to cover the updated client's behavior.

This patch is a follow-up for e3c5dd18c22b9e20358f05dcd301e7736c0a3321,
since KUDU-2027 is a scan-related counterpart of KUDU-1580.

Change-Id: I05a3c38cc07c9b6edd1ae773d9850e92645f3b14
---
M src/kudu/client/client-test.cc
M src/kudu/client/client.cc
M src/kudu/client/scanner-internal.cc
M src/kudu/client/scanner-internal.h
M src/kudu/integration-tests/client-negotiation-failover-itest.cc
5 files changed, 235 insertions(+), 96 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/37/7037/5
-- 
To view, visit http://gerrit.cloudera.org:8080/7037
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I05a3c38cc07c9b6edd1ae773d9850e92645f3b14
Gerrit-PatchSet: 5
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: David Ribeiro Alves <da...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] KUDU-2027 retry scan RPC if negotiation times out

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/7037

to look at the new patch set (#6).

Change subject: KUDU-2027 retry scan RPC if negotiation times out
......................................................................

KUDU-2027 retry scan RPC if negotiation times out

This patch addresses KUDU-2027, i.e. with this patch the Kudu C++ client
retries a scan RPC with other tablet replica if the connection
negotiation with current replica timed out.  Also addressed a TODO
in KuduScanner::Data::OpenTablet().

As a part of the fix for KUDU-2027, I updated the logic of the timeout
handling. Now, regardless of whether it was the RPC or the overall
operation timeout, the corresponding tablet server is marked as failed
and blacklisted.

Added new integration test to cover the updated client's behavior.

This patch is a follow-up for e3c5dd18c22b9e20358f05dcd301e7736c0a3321,
since KUDU-2027 is a scan-related counterpart of KUDU-1580.

Change-Id: I05a3c38cc07c9b6edd1ae773d9850e92645f3b14
---
M src/kudu/client/client-test.cc
M src/kudu/client/client.cc
M src/kudu/client/scanner-internal.cc
M src/kudu/client/scanner-internal.h
M src/kudu/integration-tests/client-negotiation-failover-itest.cc
5 files changed, 254 insertions(+), 96 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/37/7037/6
-- 
To view, visit http://gerrit.cloudera.org:8080/7037
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I05a3c38cc07c9b6edd1ae773d9850e92645f3b14
Gerrit-PatchSet: 6
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: David Ribeiro Alves <da...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] KUDU-2027 retry scan RPC if negotiation times out

Posted by "David Ribeiro Alves (Code Review)" <ge...@cloudera.org>.
David Ribeiro Alves has posted comments on this change.

Change subject: KUDU-2027 retry scan RPC if negotiation times out
......................................................................


Patch Set 11:

sorry, forgot I had this in my series to test my fix, will remove before I re-push.

-- 
To view, visit http://gerrit.cloudera.org:8080/7037
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I05a3c38cc07c9b6edd1ae773d9850e92645f3b14
Gerrit-PatchSet: 11
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: David Ribeiro Alves <da...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: No

[kudu-CR] KUDU-2027 retry scan RPC if negotiation times out

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change.

Change subject: KUDU-2027 retry scan RPC if negotiation times out
......................................................................


Patch Set 5:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/7037/5/src/kudu/integration-tests/client-negotiation-failover-itest.cc
File src/kudu/integration-tests/client-negotiation-failover-itest.cc:

PS5, Line 155: ASSERT_OK(ins->mutable_row()->SetInt32(0, kNumTabletServers * i + j));
> can you add assertions that you don't get per-row errors back?
Done


PS5, Line 218: eventially
> typo
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/7037
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I05a3c38cc07c9b6edd1ae773d9850e92645f3b14
Gerrit-PatchSet: 5
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: David Ribeiro Alves <da...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes

[kudu-CR] KUDU-2027 retry scan RPC if negotiation times out

Posted by "Andrew Wong (Code Review)" <ge...@cloudera.org>.
Andrew Wong has posted comments on this change.

Change subject: KUDU-2027 retry scan RPC if negotiation times out
......................................................................


Patch Set 4:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/7037/4/src/kudu/client/scanner-internal.cc
File src/kudu/client/scanner-internal.cc:

PS4, Line 79: (deadline < MonoTime::Now() ||
            :        (policy == TimeoutHandlingPolicy::FAILOVER_IF_FAULT_TOLERANT &&
            :         !co
I can't seem to find it in the style guide, is this the convention for multi-line statements?


PS4, Line 84: return last_error_.ok()
            :         ? err.status : err.status.CloneAndAppend(last_error_.ToString());
nit: should be able to wrap here, but this may be better for readability, so up to you


PS4, Line 272: configuration_.is_fault_tolerant()) {
nit: should be able to join lines


http://gerrit.cloudera.org:8080/#/c/7037/4/src/kudu/integration-tests/client-negotiation-failover-itest.cc
File src/kudu/integration-tests/client-negotiation-failover-itest.cc:

Line 45: using kudu::client::KuduSchema;
nit: not yours, but ordering


-- 
To view, visit http://gerrit.cloudera.org:8080/7037
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I05a3c38cc07c9b6edd1ae773d9850e92645f3b14
Gerrit-PatchSet: 4
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: David Ribeiro Alves <da...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes

[kudu-CR] KUDU-2027 retry scan RPC if negotiation times out

Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Todd Lipcon has posted comments on this change.

Change subject: KUDU-2027 retry scan RPC if negotiation times out
......................................................................


Patch Set 14:

(3 comments)

didn't look in detail at the code yet, but got some higher level design questions here

http://gerrit.cloudera.org:8080/#/c/7037/14//COMMIT_MSG
Commit Message:

PS14, Line 15: handling. Now, regardless of whether it was the RPC or the overall
             : operation timeout, the corresponding tablet server is marked as failed
             : and blacklisted.
we don't currently invalidate this, right? Given that systems like Impala use a long-running client in the backend, it seems slightly dangerous that a single read timeout will "permanently" blacklist a server (at least until the other remaining replicas get blacklisted too).

With writes this isn't that impactful, since we could never get "stuck" on a non-leader, and thus we'll always end up going back to the leader. But with reads I think it's quite plausible, no?


http://gerrit.cloudera.org:8080/#/c/7037/14/src/kudu/client/client-test.cc
File src/kudu/client/client-test.cc:

PS14, Line 1991:   // The scanner should timeout on the next batch not fetched by the Open().
               :   // The client should not retry until the overall timeout expires, giving up
               :   // after the fist RPC timeout error: the only available replica is marked
               :   // as failed by the first timeout error.
I'm not really understanding this comment.


Line 2021:     ASSERT_LT(sw.elapsed().wall_millis(), kScanTimeoutMs);
in the case that we know we won't be able to retry a scan elsewhere, shouldn't we set the timeout to be the full remaining user-specified timeout?

i.e I think if the user specifies 60sec timeout, and we have a non-retriable operation like "nextbatch" on a scan, then we should use that entire 60sec timeout on the one server they gave. It's always possible that it will come back after 45sec or whatever, no?

I think this is what the 'allow_time_for_failover' thing was doing before?


-- 
To view, visit http://gerrit.cloudera.org:8080/7037
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I05a3c38cc07c9b6edd1ae773d9850e92645f3b14
Gerrit-PatchSet: 14
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: David Ribeiro Alves <da...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes

[kudu-CR] KUDU-2027 retry scan RPC if negotiation times out

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change.

Change subject: KUDU-2027 retry scan RPC if negotiation times out
......................................................................


Patch Set 11:

> sorry, forgot I had this in my series to test my fix, will remove
 > before I re-push.

np

Thank you for reminder -- it seems I need to ping the reviewers regarding this patch :)

-- 
To view, visit http://gerrit.cloudera.org:8080/7037
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I05a3c38cc07c9b6edd1ae773d9850e92645f3b14
Gerrit-PatchSet: 11
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: David Ribeiro Alves <da...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: No

[kudu-CR] KUDU-2027 retry scan RPC if negotiation times out

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has uploaded a new patch set (#9).

Change subject: KUDU-2027 retry scan RPC if negotiation times out
......................................................................

KUDU-2027 retry scan RPC if negotiation times out

This patch addresses KUDU-2027, i.e. with this patch the Kudu C++ client
retries a scan RPC with other tablet replica if the connection
negotiation with current replica timed out.  Also addressed a TODO
in KuduScanner::Data::OpenTablet().

As a part of the fix for KUDU-2027, I updated the logic of the timeout
handling. Now, regardless of whether it was the RPC or the overall
operation timeout, the corresponding tablet server is marked as failed
and blacklisted.

Added new integration test to cover the updated client's behavior.

This patch is a follow-up for e3c5dd18c22b9e20358f05dcd301e7736c0a3321,
since KUDU-2027 is a scan-related counterpart of KUDU-1580.

Change-Id: I05a3c38cc07c9b6edd1ae773d9850e92645f3b14
---
M src/kudu/client/client-test.cc
M src/kudu/client/client.cc
M src/kudu/client/scanner-internal.cc
M src/kudu/client/scanner-internal.h
M src/kudu/integration-tests/client-negotiation-failover-itest.cc
5 files changed, 267 insertions(+), 101 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/37/7037/9
-- 
To view, visit http://gerrit.cloudera.org:8080/7037
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I05a3c38cc07c9b6edd1ae773d9850e92645f3b14
Gerrit-PatchSet: 9
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: David Ribeiro Alves <da...@gmail.com>
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] KUDU-2027 retry scan RPC if negotiation times out

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/7037

to look at the new patch set (#8).

Change subject: KUDU-2027 retry scan RPC if negotiation times out
......................................................................

KUDU-2027 retry scan RPC if negotiation times out

This patch addresses KUDU-2027, i.e. with this patch the Kudu C++ client
retries a scan RPC with other tablet replica if the connection
negotiation with current replica timed out.  Also addressed a TODO
in KuduScanner::Data::OpenTablet().

As a part of the fix for KUDU-2027, I updated the logic of the timeout
handling. Now, regardless of whether it was the RPC or the overall
operation timeout, the corresponding tablet server is marked as failed
and blacklisted.

Added new integration test to cover the updated client's behavior.

This patch is a follow-up for e3c5dd18c22b9e20358f05dcd301e7736c0a3321,
since KUDU-2027 is a scan-related counterpart of KUDU-1580.

Change-Id: I05a3c38cc07c9b6edd1ae773d9850e92645f3b14
---
M src/kudu/client/client-test.cc
M src/kudu/client/client.cc
M src/kudu/client/scanner-internal.cc
M src/kudu/client/scanner-internal.h
M src/kudu/integration-tests/client-negotiation-failover-itest.cc
5 files changed, 266 insertions(+), 100 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/37/7037/8
-- 
To view, visit http://gerrit.cloudera.org:8080/7037
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I05a3c38cc07c9b6edd1ae773d9850e92645f3b14
Gerrit-PatchSet: 8
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: David Ribeiro Alves <da...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] KUDU-2027 retry scan RPC if negotiation times out

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/7037

to look at the new patch set (#7).

Change subject: KUDU-2027 retry scan RPC if negotiation times out
......................................................................

KUDU-2027 retry scan RPC if negotiation times out

This patch addresses KUDU-2027, i.e. with this patch the Kudu C++ client
retries a scan RPC with other tablet replica if the connection
negotiation with current replica timed out.  Also addressed a TODO
in KuduScanner::Data::OpenTablet().

As a part of the fix for KUDU-2027, I updated the logic of the timeout
handling. Now, regardless of whether it was the RPC or the overall
operation timeout, the corresponding tablet server is marked as failed
and blacklisted.

Added new integration test to cover the updated client's behavior.

This patch is a follow-up for e3c5dd18c22b9e20358f05dcd301e7736c0a3321,
since KUDU-2027 is a scan-related counterpart of KUDU-1580.

Change-Id: I05a3c38cc07c9b6edd1ae773d9850e92645f3b14
---
M src/kudu/client/client-test.cc
M src/kudu/client/client.cc
M src/kudu/client/scanner-internal.cc
M src/kudu/client/scanner-internal.h
M src/kudu/integration-tests/client-negotiation-failover-itest.cc
5 files changed, 263 insertions(+), 96 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/37/7037/7
-- 
To view, visit http://gerrit.cloudera.org:8080/7037
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I05a3c38cc07c9b6edd1ae773d9850e92645f3b14
Gerrit-PatchSet: 7
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: David Ribeiro Alves <da...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] KUDU-2027 retry scan RPC if negotiation times out

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/7037

to look at the new patch set (#3).

Change subject: KUDU-2027 retry scan RPC if negotiation times out
......................................................................

KUDU-2027 retry scan RPC if negotiation times out

This patch addresses KUDU-2027, i.e. with this patch the Kudu C++ client
retries a scan RPC with other tablet replica if the connection
negotiation with current replica timed out.

Added new integration test to cover the updated client's behavior.

This patch is a follow-up for e3c5dd18c22b9e20358f05dcd301e7736c0a3321,
since KUDU-2027 is a scan-related counterpart of KUDU-1580.

Change-Id: I05a3c38cc07c9b6edd1ae773d9850e92645f3b14
---
M src/kudu/client/scanner-internal.cc
M src/kudu/integration-tests/client-negotiation-failover-itest.cc
2 files changed, 99 insertions(+), 31 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/37/7037/3
-- 
To view, visit http://gerrit.cloudera.org:8080/7037
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I05a3c38cc07c9b6edd1ae773d9850e92645f3b14
Gerrit-PatchSet: 3
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: David Ribeiro Alves <da...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] KUDU-2027 retry scan RPC if negotiation times out

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change.

Change subject: KUDU-2027 retry scan RPC if negotiation times out
......................................................................


Patch Set 8: Verified+1

Unrelated flake in org.apache.kudu.spark.kudu.DefaultSourceTest.Test

-- 
To view, visit http://gerrit.cloudera.org:8080/7037
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I05a3c38cc07c9b6edd1ae773d9850e92645f3b14
Gerrit-PatchSet: 8
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: David Ribeiro Alves <da...@gmail.com>
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: No

[kudu-CR] KUDU-2027 retry scan RPC if negotiation times out

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change.

Change subject: KUDU-2027 retry scan RPC if negotiation times out
......................................................................


Patch Set 4:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/7037/3/src/kudu/client/scanner-internal.cc
File src/kudu/client/scanner-internal.cc:

PS3, Line 423:     CHECK(ts->proxy());
> on the 'Open()' RPC, we should be able to fail-over the scan even if it's n
Yep, I had some doubts about this change (too restrictive, but safe).  Thank you for the clarification.  I'll update this and other places correspondingly.


http://gerrit.cloudera.org:8080/#/c/7037/3/src/kudu/integration-tests/client-negotiation-failover-itest.cc
File src/kudu/integration-tests/client-negotiation-failover-itest.cc:

PS3, Line 152: hopefully
> hrm, I think you are being pretty optimistic here. If you assume hashing pr
Well, I assumed the fact that the numbers are sequential adds some bias to the distribution of the results across partitions.

From the other side, during the test, the leader tablets tend to switch from one server to another due to re-elections.  If they flock to the same tablet server, that get us into the situation where even all operations are put into different partitions, the behaviour of the test is the same as if all operations are put into the same tablet. 

If all the values are put into the same partition (i.e. the same tablet) and that particular tablet is not paused, the test does not exercise the expected negotiation timeout behavior for that particular run.  So, assuming there is an issue with the negotiation timeouts, the test would give false negative in that case.  However, the test does not produce false positives (I assume by 'correctness' you meant false positives).

Since the test runs multiple iterations and it's run multiple times itself, I feel confident it gives us the necessary coverage.

Of course, it's possible to do add some code to make sure the writes go to different partitions and do tablet 'un-herding', but I think it's better to keep the code simpler and rely on high number or runs.


Line 191:     ASSERT_OK(ScanToStrings(&scanner, &row_strings));
> shouldn't this still be assertable like the above, given we set READ_AT_SNA
In the first revision it was exactly like that (I assumed it should be like that).  However, I found that it fails sometimes: i.e. CLOSEST_REPLICA and READ_AT_SNAPSHOT does not guarantee RYW behaviour.  It might be a bug -- I asked David about that on the HipChat Kudu-dev channel, but I haven't gotten response.  I'll ping him regarding this.


Line 196:     SCOPED_TRACE(Substitute("read-latest at closest replica, iteration $0", i));
> this should say "read-latest" not "snapshot" right?
good catch -- yes, it should.


-- 
To view, visit http://gerrit.cloudera.org:8080/7037
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I05a3c38cc07c9b6edd1ae773d9850e92645f3b14
Gerrit-PatchSet: 4
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: David Ribeiro Alves <da...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes

[kudu-CR] KUDU-2027 retry scan RPC if negotiation times out

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change.

Change subject: KUDU-2027 retry scan RPC if negotiation times out
......................................................................


Patch Set 7:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/7037/4/src/kudu/client/scanner-internal.cc
File src/kudu/client/scanner-internal.cc:

PS4, Line 79: G(INFO) << "Scan of tablet " << remote_->tablet_id() << " at "
            :         << ts_->ToString() << " deadline expired.";
            : 
> I can't seem to find it in the style guide, is this the convention for mult
You mean: putting every part of the statement into separate line?  I'm not sure that's kind of established convention.  Ideally I would like it to be like

if (err.result == ScanRpcStatus::TIMED_OUT &&
    (policy == TimeoutHandlingPolicy::FAILOVER_IF_FAULT_TOLERANT && !configuration_.is_fault_tolerant()))) {
  ...
}

However, there is a limit on the number of characters in one line.

What formatting would you suggest instead?


PS4, Line 84: // the black list.
            :     table_->client()->data_->meta_cache_->MarkTSFailed(ts_, err.status);
> nit: should be able to wrap here, but this may be better for readability, s
I just moved these lines since the shift was not up-to-the coding style.


PS4, Line 272: ach individual RPC call. This gives u
> nit: should be able to join lines
Done


http://gerrit.cloudera.org:8080/#/c/7037/4/src/kudu/integration-tests/client-negotiation-failover-itest.cc
File src/kudu/integration-tests/client-negotiation-failover-itest.cc:

Line 45: using kudu::client::KuduSession;
> nit: not yours, but ordering
That's all mine :)  I added this test.

Done.


-- 
To view, visit http://gerrit.cloudera.org:8080/7037
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I05a3c38cc07c9b6edd1ae773d9850e92645f3b14
Gerrit-PatchSet: 7
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: David Ribeiro Alves <da...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes

[kudu-CR] KUDU-2027 retry scan RPC if negotiation times out

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/7037

to look at the new patch set (#4).

Change subject: KUDU-2027 retry scan RPC if negotiation times out
......................................................................

KUDU-2027 retry scan RPC if negotiation times out

This patch addresses KUDU-2027, i.e. with this patch the Kudu C++ client
retries a scan RPC with other tablet replica if the connection
negotiation with current replica timed out.  Also addressed a TODO
in KuduScanner::Data::OpenTablet().

Added new integration test to cover the updated client's behavior.

This patch is a follow-up for e3c5dd18c22b9e20358f05dcd301e7736c0a3321,
since KUDU-2027 is a scan-related counterpart of KUDU-1580.

Change-Id: I05a3c38cc07c9b6edd1ae773d9850e92645f3b14
---
M src/kudu/client/client.cc
M src/kudu/client/scanner-internal.cc
M src/kudu/client/scanner-internal.h
M src/kudu/integration-tests/client-negotiation-failover-itest.cc
4 files changed, 168 insertions(+), 77 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/37/7037/4
-- 
To view, visit http://gerrit.cloudera.org:8080/7037
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I05a3c38cc07c9b6edd1ae773d9850e92645f3b14
Gerrit-PatchSet: 4
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: David Ribeiro Alves <da...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] KUDU-2027 retry scan RPC if negotiation times out

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/7037

to look at the new patch set (#2).

Change subject: KUDU-2027 retry scan RPC if negotiation times out
......................................................................

KUDU-2027 retry scan RPC if negotiation times out

This patch addresses KUDU-2027, i.e. with this patch the Kudu C++ client
retries a scan RPC with other tablet replica if the connection
negotiation with current replica timed out.

Added new integration test to cover the updated client's behavior.

This patch is a follow-up for e3c5dd18c22b9e20358f05dcd301e7736c0a3321,
since KUDU-2027 is a scan-related counterpart of KUDU-1580.

Change-Id: I05a3c38cc07c9b6edd1ae773d9850e92645f3b14
---
M src/kudu/client/scanner-internal.cc
M src/kudu/integration-tests/client-negotiation-failover-itest.cc
2 files changed, 95 insertions(+), 28 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/37/7037/2
-- 
To view, visit http://gerrit.cloudera.org:8080/7037
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I05a3c38cc07c9b6edd1ae773d9850e92645f3b14
Gerrit-PatchSet: 2
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: David Ribeiro Alves <da...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] KUDU-2027 retry scan RPC if negotiation times out

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change.

Change subject: KUDU-2027 retry scan RPC if negotiation times out
......................................................................


Patch Set 2:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/7037/2/src/kudu/client/scanner-internal.cc
File src/kudu/client/scanner-internal.cc:

PS2, Line 216: if (overall_deadline == deadline && overall_deadline <= MonoTime::Now()) {
> so basically this is making sure that even in the case where the overall_de
Right.

The issue with the original code was that even if the timeout error happened prior to that overall deadline, this code would assume the whole timeout interval has passed.

I was also thinking of employing the new functionality to distinguish between negotiation timeout and the RPC timeout itself, but in that case it would be the same criteria plus that RpcController::negotiation_failed() anyway, so I decided to not involve the new RpcController::negotiation_failed().

What is the best place to document this new behavior?


PS2, Line 407: now + KuduClient::Data::ComputeExponentialBackoff(attempt)) - now
> not sure I'm following this. isn't now + KuduClient::Data::ComputeExponenti
That's all about parentheses (i.e. braces) :)

This code does the following:

  sleep = Earliest(deadline, now + backoff) - now;


http://gerrit.cloudera.org:8080/#/c/7037/2/src/kudu/integration-tests/client-negotiation-failover-itest.cc
File src/kudu/integration-tests/client-negotiation-failover-itest.cc:

PS2, Line 40: using kudu::client::ScanToStrings;
> ordering
Done


PS2, Line 113: other
> nit: not your fault but "another"
The original version was written by me as well :)

Done.


-- 
To view, visit http://gerrit.cloudera.org:8080/7037
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I05a3c38cc07c9b6edd1ae773d9850e92645f3b14
Gerrit-PatchSet: 2
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: David Ribeiro Alves <da...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes