You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@kudu.apache.org by "Dan Burkert (Code Review)" <ge...@cloudera.org> on 2017/09/08 23:16:00 UTC

[kudu-CR] KUDU-2125: Tablet copy client does not retry on failures

Hello Mike Percy, Adar Dembo, Todd Lipcon,

I'd like you to do a code review.  Please visit

    http://gerrit.cloudera.org:8080/8016

to review the following change.

Change subject: KUDU-2125: Tablet copy client does not retry on failures
......................................................................

KUDU-2125: Tablet copy client does not retry on failures

Change-Id: I7c8454fc600a841bd15306a2b3b06ddf53130be6
---
M src/kudu/integration-tests/cluster_itest_util.cc
M src/kudu/integration-tests/cluster_itest_util.h
M src/kudu/integration-tests/tablet_copy_client_session-itest.cc
M src/kudu/tserver/tablet_copy_client.cc
M src/kudu/tserver/tablet_copy_client.h
5 files changed, 125 insertions(+), 24 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/16/8016/1
-- 
To view, visit http://gerrit.cloudera.org:8080/8016
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I7c8454fc600a841bd15306a2b3b06ddf53130be6
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] KUDU-2125: Tablet copy client does not retry on failures

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has posted comments on this change.

Change subject: KUDU-2125: Tablet copy client does not retry on failures
......................................................................


Patch Set 4:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/8016/4/src/kudu/integration-tests/tablet_copy_client_session-itest.cc
File src/kudu/integration-tests/tablet_copy_client_session-itest.cc:

PS4, Line 305: ASSERT_OK
> Doesn't seem to be an issue in practice, is there something in particular t
That's interesting; I too was under the impression that the main test thread would not see ASSERTs triggered by other threads.

The gtest docs (https://github.com/google/googletest/blob/master/googletest/docs/AdvancedGuide.md) say: "Assertions from multiple threads are currently not supported." In your experimentation, did the test halt immediately? What exactly was the behavior?


-- 
To view, visit http://gerrit.cloudera.org:8080/8016
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I7c8454fc600a841bd15306a2b3b06ddf53130be6
Gerrit-PatchSet: 4
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes

[kudu-CR] KUDU-2125: Tablet copy client does not retry on failures

Posted by "Dan Burkert (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/8016

to look at the new patch set (#6).

Change subject: KUDU-2125: Tablet copy client does not retry on failures
......................................................................

KUDU-2125: Tablet copy client does not retry on failures

The tablet copy client would fail the tablet copy after encountering an
RPC error. This commit adds retry logic for tablet copy operations when
encountering SERVER_TOO_BUSY or UNAVAILABLE errors, which are retriable.

Change-Id: I7c8454fc600a841bd15306a2b3b06ddf53130be6
---
M src/kudu/integration-tests/cluster_itest_util.cc
M src/kudu/integration-tests/cluster_itest_util.h
M src/kudu/integration-tests/tablet_copy_client_session-itest.cc
M src/kudu/tserver/tablet_copy_client.cc
M src/kudu/tserver/tablet_copy_client.h
5 files changed, 133 insertions(+), 33 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/16/8016/6
-- 
To view, visit http://gerrit.cloudera.org:8080/8016
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I7c8454fc600a841bd15306a2b3b06ddf53130be6
Gerrit-PatchSet: 6
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] KUDU-2125: Tablet copy client does not retry on failures

Posted by "Dan Burkert (Code Review)" <ge...@cloudera.org>.
Dan Burkert has posted comments on this change.

Change subject: KUDU-2125: Tablet copy client does not retry on failures
......................................................................


Patch Set 4:

retriggered; failure seemingly due to KUDU-1736

-- 
To view, visit http://gerrit.cloudera.org:8080/8016
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I7c8454fc600a841bd15306a2b3b06ddf53130be6
Gerrit-PatchSet: 4
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: No

[kudu-CR] KUDU-2125: Tablet copy client does not retry on failures

Posted by "Dan Burkert (Code Review)" <ge...@cloudera.org>.
Dan Burkert has posted comments on this change.

Change subject: KUDU-2125: Tablet copy client does not retry on failures
......................................................................


Patch Set 4:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/8016/4/src/kudu/integration-tests/tablet_copy_client_session-itest.cc
File src/kudu/integration-tests/tablet_copy_client_session-itest.cc:

PS4, Line 305: ASSERT_OK
> I'm surprised at this. I've had weird issues trying to do this before and n
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/8016
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I7c8454fc600a841bd15306a2b3b06ddf53130be6
Gerrit-PatchSet: 4
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes

[kudu-CR] KUDU-2125: Tablet copy client does not retry on failures

Posted by "Dan Burkert (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/8016

to look at the new patch set (#2).

Change subject: KUDU-2125: Tablet copy client does not retry on failures
......................................................................

KUDU-2125: Tablet copy client does not retry on failures

Change-Id: I7c8454fc600a841bd15306a2b3b06ddf53130be6
---
M src/kudu/integration-tests/cluster_itest_util.cc
M src/kudu/integration-tests/cluster_itest_util.h
M src/kudu/integration-tests/tablet_copy_client_session-itest.cc
M src/kudu/tserver/tablet_copy_client.cc
M src/kudu/tserver/tablet_copy_client.h
5 files changed, 123 insertions(+), 24 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/16/8016/2
-- 
To view, visit http://gerrit.cloudera.org:8080/8016
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I7c8454fc600a841bd15306a2b3b06ddf53130be6
Gerrit-PatchSet: 2
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] KUDU-2125: Tablet copy client does not retry on failures

Posted by "Dan Burkert (Code Review)" <ge...@cloudera.org>.
Dan Burkert has posted comments on this change.

Change subject: KUDU-2125: Tablet copy client does not retry on failures
......................................................................


Patch Set 4:

(11 comments)

http://gerrit.cloudera.org:8080/#/c/8016/4//COMMIT_MSG
Commit Message:

Line 8: 
> Could you elaborate a bit on the way the issue has been addressed by the pa
Done


http://gerrit.cloudera.org:8080/#/c/8016/4/src/kudu/integration-tests/cluster_itest_util.cc
File src/kudu/integration-tests/cluster_itest_util.cc:

PS4, Line 854: if (only_running) {
> nit: does it make to skip this if !s.ok() ?
Done


http://gerrit.cloudera.org:8080/#/c/8016/4/src/kudu/integration-tests/cluster_itest_util.h
File src/kudu/integration-tests/cluster_itest_util.h:

Line 330:     bool only_running = false);
> Maybe it'd be better to pass a `boost::optional<TabletStatePB>` instead? Th
Done


http://gerrit.cloudera.org:8080/#/c/8016/4/src/kudu/integration-tests/tablet_copy_client_session-itest.cc
File src/kudu/integration-tests/tablet_copy_client_session-itest.cc:

PS4, Line 274: TEST_F(TabletCopyClientSessionITest, TestTabletCopyWithBusySource)
> How fast does this test run?  If it's relatively long, consider running it 
Done


PS4, Line 305: ASSERT_OK
> I've heard ASSERT_OK() does not work properly with non-main test threads.  
Doesn't seem to be an issue in practice, is there something in particular to watch out for?  Adding an ASSERT_TRUE(false) to the thread routines causes the test to fail with the expected output.


http://gerrit.cloudera.org:8080/#/c/8016/4/src/kudu/tserver/tablet_copy_client.cc
File src/kudu/tserver/tablet_copy_client.cc:

PS4, Line 669: uint64_t offset = 0;
> nit: move it under the 'while(!done) {...}' loop scope?
The offset is updated each time through the loop.


PS4, Line 672: FetchDataRequestPB req;
> nit: move it under the 'while(!done) {...}' loop scope?
Took this in a bit different direction, but it should make it clear why it's outside the loop.


PS4, Line 702:     done = offset + chunk_size >= resp.chunk().total_data_length();
             :     offset += resp.chunk().data().size();
> nit: consider removing the extra '+' operation (at least for clarity)
I kept around the addition since I think it makes it more readable to compare to offset + chunk_size, but I did replace the redundant field accesses on 703 with 'chunk_size'.


Line 736:   MonoTime deadline = MonoTime::Now() + MonoDelta::FromMilliseconds(session_idle_timeout_millis_);
> nit: const?
Any reason to prefer const here?


Line 748:       double kJitterPct = 0.5;
> nit: static const?
This is the only place these constants are used, so I don't see an advantage to making them static.


PS4, Line 753: MonoTime::Now() + backoff > deadline
> Is it worth giving it the last chance if, say, (MonoTime::Now() + MonoDelta
I don't personally think it's worth complicating with a special case.


-- 
To view, visit http://gerrit.cloudera.org:8080/8016
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I7c8454fc600a841bd15306a2b3b06ddf53130be6
Gerrit-PatchSet: 4
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes

[kudu-CR] KUDU-2125: Tablet copy client does not retry on failures

Posted by "Dan Burkert (Code Review)" <ge...@cloudera.org>.
Hello Alexey Serbin, Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/8016

to look at the new patch set (#10).

Change subject: KUDU-2125: Tablet copy client does not retry on failures
......................................................................

KUDU-2125: Tablet copy client does not retry on failures

The tablet copy client would fail the tablet copy after encountering an
RPC error. This commit adds retry logic for tablet copy operations when
encountering SERVER_TOO_BUSY or UNAVAILABLE errors, which are retriable.

Change-Id: I7c8454fc600a841bd15306a2b3b06ddf53130be6
---
M src/kudu/integration-tests/cluster_itest_util.cc
M src/kudu/integration-tests/cluster_itest_util.h
M src/kudu/integration-tests/tablet_copy_client_session-itest.cc
M src/kudu/tserver/tablet_copy_client.cc
M src/kudu/tserver/tablet_copy_client.h
5 files changed, 133 insertions(+), 33 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/16/8016/10
-- 
To view, visit http://gerrit.cloudera.org:8080/8016
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I7c8454fc600a841bd15306a2b3b06ddf53130be6
Gerrit-PatchSet: 10
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] KUDU-2125: Tablet copy client does not retry on failures

Posted by "Mike Percy (Code Review)" <ge...@cloudera.org>.
Mike Percy has posted comments on this change.

Change subject: KUDU-2125: Tablet copy client does not retry on failures
......................................................................


Patch Set 10:

(5 comments)

Just one thing I'm wondering about, in the tablet copy client.

http://gerrit.cloudera.org:8080/#/c/8016/10/src/kudu/integration-tests/cluster_itest_util.cc
File src/kudu/integration-tests/cluster_itest_util.cc:

PS10, Line 856: std::remove_if
fancy


PS10, Line 858:  != 
Neat: I didn't know you could check for equality between a thing and an optional of a thing.


http://gerrit.cloudera.org:8080/#/c/8016/4/src/kudu/integration-tests/tablet_copy_client_session-itest.cc
File src/kudu/integration-tests/tablet_copy_client_session-itest.cc:

PS4, Line 305: or<thread
> Turns out it was caught because of some existing calls to HasFailure() in e
I'm surprised at this. I've had weird issues trying to do this before and now I always use CHECK_OK outside of the main thread. Since the docs claim otherwise, do you have any evidence that this is safe to do?


http://gerrit.cloudera.org:8080/#/c/8016/10/src/kudu/tserver/tablet_copy_client.cc
File src/kudu/tserver/tablet_copy_client.cc:

PS10, Line 702: >=
why >= and not == ?


Line 747:       // Polynomial backoff with 50% jitter.
Have you used this before? I plotted y=10x^2 and it seems pretty good -- well behaved and no need for a cap.


-- 
To view, visit http://gerrit.cloudera.org:8080/8016
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I7c8454fc600a841bd15306a2b3b06ddf53130be6
Gerrit-PatchSet: 10
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes

[kudu-CR] KUDU-2125: Tablet copy client does not retry on failures

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change.

Change subject: KUDU-2125: Tablet copy client does not retry on failures
......................................................................


Patch Set 4:

(10 comments)

http://gerrit.cloudera.org:8080/#/c/8016/4//COMMIT_MSG
Commit Message:

Line 8: 
Could you elaborate a bit on the way the issue has been addressed by the patch?  Also, I think it would be beneficial to add concise description of the problem as well,.


http://gerrit.cloudera.org:8080/#/c/8016/4/src/kudu/integration-tests/cluster_itest_util.cc
File src/kudu/integration-tests/cluster_itest_util.cc:

PS4, Line 854: if (only_running) {
nit: does it make to skip this if !s.ok() ?


http://gerrit.cloudera.org:8080/#/c/8016/4/src/kudu/integration-tests/tablet_copy_client_session-itest.cc
File src/kudu/integration-tests/tablet_copy_client_session-itest.cc:

PS4, Line 274: TEST_F(TabletCopyClientSessionITest, TestTabletCopyWithBusySource)
How fast does this test run?  If it's relatively long, consider running it only if AllowSlowTests() returns true.


PS4, Line 305: ASSERT_OK
I've heard ASSERT_OK() does not work properly with non-main test threads.  If that is still true, consider changing this to CHECK_OK() or exiting from the thread function and then reporting on the failure in the main thread instead.


http://gerrit.cloudera.org:8080/#/c/8016/4/src/kudu/tserver/tablet_copy_client.cc
File src/kudu/tserver/tablet_copy_client.cc:

PS4, Line 669: uint64_t offset = 0;
nit: move it under the 'while(!done) {...}' loop scope?


PS4, Line 672: FetchDataRequestPB req;
nit: move it under the 'while(!done) {...}' loop scope?


PS4, Line 702:     done = offset + chunk_size >= resp.chunk().total_data_length();
             :     offset += resp.chunk().data().size();
nit: consider removing the extra '+' operation (at least for clarity)

offset += chunk_size;
done = offset >= resp.chunk().total_data_length();


Line 736:   MonoTime deadline = MonoTime::Now() + MonoDelta::FromMilliseconds(session_idle_timeout_millis_);
nit: const?


Line 748:       double kJitterPct = 0.5;
nit: static const?


PS4, Line 753: MonoTime::Now() + backoff > deadline
Is it worth giving it the last chance if, say, (MonoTime::Now() + MonoDelta::FromMilliseconds(kBackoffBaseMs) < deadline) but (MonoTime::Now() + backoff > deadline) ?


-- 
To view, visit http://gerrit.cloudera.org:8080/8016
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I7c8454fc600a841bd15306a2b3b06ddf53130be6
Gerrit-PatchSet: 4
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes

[kudu-CR] KUDU-2125: Tablet copy client does not retry on failures

Posted by "Dan Burkert (Code Review)" <ge...@cloudera.org>.
Dan Burkert has submitted this change and it was merged.

Change subject: KUDU-2125: Tablet copy client does not retry on failures
......................................................................


KUDU-2125: Tablet copy client does not retry on failures

The tablet copy client would fail the tablet copy after encountering an
RPC error. This commit adds retry logic for tablet copy operations when
encountering SERVER_TOO_BUSY or UNAVAILABLE errors, which are retriable.

Change-Id: I7c8454fc600a841bd15306a2b3b06ddf53130be6
Reviewed-on: http://gerrit.cloudera.org:8080/8016
Reviewed-by: Dan Burkert <da...@apache.org>
Tested-by: Kudu Jenkins
---
M src/kudu/integration-tests/cluster_itest_util.cc
M src/kudu/integration-tests/cluster_itest_util.h
M src/kudu/integration-tests/tablet_copy_client_session-itest.cc
M src/kudu/tserver/tablet_copy_client.cc
M src/kudu/tserver/tablet_copy_client.h
5 files changed, 139 insertions(+), 33 deletions(-)

Approvals:
  Dan Burkert: Looks good to me, approved
  Kudu Jenkins: Verified



-- 
To view, visit http://gerrit.cloudera.org:8080/8016
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: I7c8454fc600a841bd15306a2b3b06ddf53130be6
Gerrit-PatchSet: 16
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] KUDU-2125: Tablet copy client does not retry on failures

Posted by "Dan Burkert (Code Review)" <ge...@cloudera.org>.
Dan Burkert has posted comments on this change.

Change subject: KUDU-2125: Tablet copy client does not retry on failures
......................................................................


Patch Set 10:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/8016/10/src/kudu/tserver/tablet_copy_client.cc
File src/kudu/tserver/tablet_copy_client.cc:

PS10, Line 702: >=
> It can happen if the server sends the wrong total_data_length, or sends a c
I'll just fix it.


-- 
To view, visit http://gerrit.cloudera.org:8080/8016
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I7c8454fc600a841bd15306a2b3b06ddf53130be6
Gerrit-PatchSet: 10
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes

[kudu-CR] KUDU-2125: Tablet copy client does not retry on failures

Posted by "Dan Burkert (Code Review)" <ge...@cloudera.org>.
Dan Burkert has posted comments on this change.

Change subject: KUDU-2125: Tablet copy client does not retry on failures
......................................................................


Patch Set 8:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/8016/8/src/kudu/integration-tests/tablet_copy_client_session-itest.cc
File src/kudu/integration-tests/tablet_copy_client_session-itest.cc:

Line 320:   ASSERT_FALSE(HasFailure());
> Is this the same as NO_FATALS()? If so, we generally use the latter. If not
NO_FATALS, through some unknowable black magic, only triggers when there are new fatal errors.  This is also checking for non fatal errors (expect).


-- 
To view, visit http://gerrit.cloudera.org:8080/8016
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I7c8454fc600a841bd15306a2b3b06ddf53130be6
Gerrit-PatchSet: 8
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes

[kudu-CR] KUDU-2125: Tablet copy client does not retry on failures

Posted by "Dan Burkert (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/8016

to look at the new patch set (#7).

Change subject: KUDU-2125: Tablet copy client does not retry on failures
......................................................................

KUDU-2125: Tablet copy client does not retry on failures

The tablet copy client would fail the tablet copy after encountering an
RPC error. This commit adds retry logic for tablet copy operations when
encountering SERVER_TOO_BUSY or UNAVAILABLE errors, which are retriable.

Change-Id: I7c8454fc600a841bd15306a2b3b06ddf53130be6
---
M src/kudu/integration-tests/cluster_itest_util.cc
M src/kudu/integration-tests/cluster_itest_util.h
M src/kudu/integration-tests/tablet_copy_client_session-itest.cc
M src/kudu/tserver/tablet_copy_client.cc
M src/kudu/tserver/tablet_copy_client.h
5 files changed, 133 insertions(+), 33 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/16/8016/7
-- 
To view, visit http://gerrit.cloudera.org:8080/8016
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I7c8454fc600a841bd15306a2b3b06ddf53130be6
Gerrit-PatchSet: 7
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] KUDU-2125: Tablet copy client does not retry on failures

Posted by "Dan Burkert (Code Review)" <ge...@cloudera.org>.
Dan Burkert has posted comments on this change.

Change subject: KUDU-2125: Tablet copy client does not retry on failures
......................................................................


Patch Set 2: Verified+1

bogus iwyu

-- 
To view, visit http://gerrit.cloudera.org:8080/8016
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I7c8454fc600a841bd15306a2b3b06ddf53130be6
Gerrit-PatchSet: 2
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: No

[kudu-CR] KUDU-2125: Tablet copy client does not retry on failures

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has posted comments on this change.

Change subject: KUDU-2125: Tablet copy client does not retry on failures
......................................................................


Patch Set 8:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/8016/8/src/kudu/integration-tests/tablet_copy_client_session-itest.cc
File src/kudu/integration-tests/tablet_copy_client_session-itest.cc:

Line 320:   ASSERT_FALSE(HasFailure());
Is this the same as NO_FATALS()? If so, we generally use the latter. If not, what's different?


-- 
To view, visit http://gerrit.cloudera.org:8080/8016
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I7c8454fc600a841bd15306a2b3b06ddf53130be6
Gerrit-PatchSet: 8
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes

[kudu-CR] KUDU-2125: Tablet copy client does not retry on failures

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change.

Change subject: KUDU-2125: Tablet copy client does not retry on failures
......................................................................


Patch Set 7:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/8016/4/src/kudu/tserver/tablet_copy_client.cc
File src/kudu/tserver/tablet_copy_client.cc:

PS4, Line 669: uint64_t offset = 0;
> The offset is updated each time through the loop.
ugh, right -- my bad.


Line 736:   MonoTime deadline = MonoTime::Now() + MonoDelta::FromMilliseconds(session_idle_timeout_millis_);
> Any reason to prefer const here?
To explicitly state it's not going to change in the code below.


PS4, Line 753: MonoTime::Now() + backoff > deadline
> I don't personally think it's worth complicating with a special case.
SGTM


-- 
To view, visit http://gerrit.cloudera.org:8080/8016
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I7c8454fc600a841bd15306a2b3b06ddf53130be6
Gerrit-PatchSet: 7
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes

[kudu-CR] KUDU-2125: Tablet copy client does not retry on failures

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has posted comments on this change.

Change subject: KUDU-2125: Tablet copy client does not retry on failures
......................................................................


Patch Set 2:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/8016/2/src/kudu/integration-tests/cluster_itest_util.h
File src/kudu/integration-tests/cluster_itest_util.h:

PS2, Line 330: running
Nit: maybe 'only_running'?


http://gerrit.cloudera.org:8080/#/c/8016/2/src/kudu/integration-tests/tablet_copy_client_session-itest.cc
File src/kudu/integration-tests/tablet_copy_client_session-itest.cc:

PS2, Line 277:   // Make the service queue artificially short on the destination server so that
             :   // ERROR_SERVER_TOO_BUSY is likely to occur.
Misplaced comment?


Line 296:   ASSERT_OK(WaitForNumTabletsOnTS(ts0, kNumTablets, kDefaultTimeout, &tablets, true));
Why is running=true important for this test?


-- 
To view, visit http://gerrit.cloudera.org:8080/8016
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I7c8454fc600a841bd15306a2b3b06ddf53130be6
Gerrit-PatchSet: 2
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes

[kudu-CR] KUDU-2125: Tablet copy client does not retry on failures

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has posted comments on this change.

Change subject: KUDU-2125: Tablet copy client does not retry on failures
......................................................................


Patch Set 3:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/8016/2/src/kudu/integration-tests/tablet_copy_client_session-itest.cc
File src/kudu/integration-tests/tablet_copy_client_session-itest.cc:

Line 296: 
> Attempting to tablet copy a non-running tablet will fail, and that's not co
I understand, but I don't see any code here to delete or otherwise fail tablets, so shouldn't all tablets post-restart basically be running?


-- 
To view, visit http://gerrit.cloudera.org:8080/8016
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I7c8454fc600a841bd15306a2b3b06ddf53130be6
Gerrit-PatchSet: 3
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes

[kudu-CR] KUDU-2125: Tablet copy client does not retry on failures

Posted by "Dan Burkert (Code Review)" <ge...@cloudera.org>.
Dan Burkert has uploaded a new patch set (#3).

Change subject: KUDU-2125: Tablet copy client does not retry on failures
......................................................................

KUDU-2125: Tablet copy client does not retry on failures

Change-Id: I7c8454fc600a841bd15306a2b3b06ddf53130be6
---
M src/kudu/integration-tests/cluster_itest_util.cc
M src/kudu/integration-tests/cluster_itest_util.h
M src/kudu/integration-tests/tablet_copy_client_session-itest.cc
M src/kudu/tserver/tablet_copy_client.cc
M src/kudu/tserver/tablet_copy_client.h
5 files changed, 121 insertions(+), 24 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/16/8016/3
-- 
To view, visit http://gerrit.cloudera.org:8080/8016
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I7c8454fc600a841bd15306a2b3b06ddf53130be6
Gerrit-PatchSet: 3
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] KUDU-2125: Tablet copy client does not retry on failures

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has posted comments on this change.

Change subject: KUDU-2125: Tablet copy client does not retry on failures
......................................................................


Patch Set 10: Code-Review+1

Mike should review this.

-- 
To view, visit http://gerrit.cloudera.org:8080/8016
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I7c8454fc600a841bd15306a2b3b06ddf53130be6
Gerrit-PatchSet: 10
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: No

[kudu-CR] KUDU-2125: Tablet copy client does not retry on failures

Posted by "Dan Burkert (Code Review)" <ge...@cloudera.org>.
Dan Burkert has posted comments on this change.

Change subject: KUDU-2125: Tablet copy client does not retry on failures
......................................................................


Patch Set 8:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/8016/8/src/kudu/integration-tests/tablet_copy_client_session-itest.cc
File src/kudu/integration-tests/tablet_copy_client_session-itest.cc:

Line 320:   ASSERT_FALSE(HasFailure());
> I see, thanks.  Maybe, remove it then (or the idea is to catch some other e
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/8016
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I7c8454fc600a841bd15306a2b3b06ddf53130be6
Gerrit-PatchSet: 8
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes

[kudu-CR] KUDU-2125: Tablet copy client does not retry on failures

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has posted comments on this change.

Change subject: KUDU-2125: Tablet copy client does not retry on failures
......................................................................


Patch Set 4:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/8016/4/src/kudu/integration-tests/cluster_itest_util.h
File src/kudu/integration-tests/cluster_itest_util.h:

Line 330:     bool only_running = false);
Maybe it'd be better to pass a `boost::optional<TabletStatePB>` instead? Then you won't hear from me about "using an enum is more descriptive than a bool".

Function comment could be:

"Repeatedly invoke ListTablets(), waiting for up to 'timeout' time for the specified 'count' number of replicas. If 'state' is provided, the replicas must also be in the specified state for the wait to be considered successful."


-- 
To view, visit http://gerrit.cloudera.org:8080/8016
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I7c8454fc600a841bd15306a2b3b06ddf53130be6
Gerrit-PatchSet: 4
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes

[kudu-CR] KUDU-2125: Tablet copy client does not retry on failures

Posted by "Dan Burkert (Code Review)" <ge...@cloudera.org>.
Dan Burkert has posted comments on this change.

Change subject: KUDU-2125: Tablet copy client does not retry on failures
......................................................................


Patch Set 2:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/8016/2/src/kudu/integration-tests/cluster_itest_util.h
File src/kudu/integration-tests/cluster_itest_util.h:

PS2, Line 330: running
> Nit: maybe 'only_running'?
Done


http://gerrit.cloudera.org:8080/#/c/8016/2/src/kudu/integration-tests/tablet_copy_client_session-itest.cc
File src/kudu/integration-tests/tablet_copy_client_session-itest.cc:

PS2, Line 277:   // Make the service queue artificially short on the destination server so that
             :   // ERROR_SERVER_TOO_BUSY is likely to occur.
> Misplaced comment?
Done


Line 296:   ASSERT_OK(WaitForNumTabletsOnTS(ts0, kNumTablets, kDefaultTimeout, &tablets, true));
> Why is running=true important for this test?
Attempting to tablet copy a non-running tablet will fail, and that's not considered a retriable error.


-- 
To view, visit http://gerrit.cloudera.org:8080/8016
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I7c8454fc600a841bd15306a2b3b06ddf53130be6
Gerrit-PatchSet: 2
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes

[kudu-CR] KUDU-2125: Tablet copy client does not retry on failures

Posted by "Dan Burkert (Code Review)" <ge...@cloudera.org>.
Dan Burkert has posted comments on this change.

Change subject: KUDU-2125: Tablet copy client does not retry on failures
......................................................................


Patch Set 10:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/8016/10/src/kudu/tserver/tablet_copy_client.cc
File src/kudu/tserver/tablet_copy_client.cc:

PS10, Line 702: >=
> Can that actually happen? I think the checksum would be invalid on the bloc
It can happen if the server sends the wrong total_data_length, or sends a chunk that exceeds the total_data_length.  We aren't currently defensive about checking this.


-- 
To view, visit http://gerrit.cloudera.org:8080/8016
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I7c8454fc600a841bd15306a2b3b06ddf53130be6
Gerrit-PatchSet: 10
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes

[kudu-CR] KUDU-2125: Tablet copy client does not retry on failures

Posted by "Dan Burkert (Code Review)" <ge...@cloudera.org>.
Hello Adar Dembo, Alexey Serbin, Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/8016

to look at the new patch set (#12).

Change subject: KUDU-2125: Tablet copy client does not retry on failures
......................................................................

KUDU-2125: Tablet copy client does not retry on failures

The tablet copy client would fail the tablet copy after encountering an
RPC error. This commit adds retry logic for tablet copy operations when
encountering SERVER_TOO_BUSY or UNAVAILABLE errors, which are retriable.

Change-Id: I7c8454fc600a841bd15306a2b3b06ddf53130be6
---
M src/kudu/integration-tests/cluster_itest_util.cc
M src/kudu/integration-tests/cluster_itest_util.h
M src/kudu/integration-tests/tablet_copy_client_session-itest.cc
M src/kudu/tserver/tablet_copy_client.cc
M src/kudu/tserver/tablet_copy_client.h
5 files changed, 133 insertions(+), 33 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/16/8016/12
-- 
To view, visit http://gerrit.cloudera.org:8080/8016
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I7c8454fc600a841bd15306a2b3b06ddf53130be6
Gerrit-PatchSet: 12
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] KUDU-2125: Tablet copy client does not retry on failures

Posted by "Mike Percy (Code Review)" <ge...@cloudera.org>.
Mike Percy has posted comments on this change.

Change subject: KUDU-2125: Tablet copy client does not retry on failures
......................................................................


Patch Set 14: Code-Review+2

-- 
To view, visit http://gerrit.cloudera.org:8080/8016
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I7c8454fc600a841bd15306a2b3b06ddf53130be6
Gerrit-PatchSet: 14
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: No

[kudu-CR] KUDU-2125: Tablet copy client does not retry on failures

Posted by "Dan Burkert (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/8016

to look at the new patch set (#5).

Change subject: KUDU-2125: Tablet copy client does not retry on failures
......................................................................

KUDU-2125: Tablet copy client does not retry on failures

Change-Id: I7c8454fc600a841bd15306a2b3b06ddf53130be6
---
M src/kudu/integration-tests/cluster_itest_util.cc
M src/kudu/integration-tests/cluster_itest_util.h
M src/kudu/integration-tests/tablet_copy_client_session-itest.cc
M src/kudu/tserver/tablet_copy_client.cc
M src/kudu/tserver/tablet_copy_client.h
5 files changed, 125 insertions(+), 29 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/16/8016/5
-- 
To view, visit http://gerrit.cloudera.org:8080/8016
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I7c8454fc600a841bd15306a2b3b06ddf53130be6
Gerrit-PatchSet: 5
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] KUDU-2125: Tablet copy client does not retry on failures

Posted by "Dan Burkert (Code Review)" <ge...@cloudera.org>.
Hello Adar Dembo, Alexey Serbin, Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/8016

to look at the new patch set (#14).

Change subject: KUDU-2125: Tablet copy client does not retry on failures
......................................................................

KUDU-2125: Tablet copy client does not retry on failures

The tablet copy client would fail the tablet copy after encountering an
RPC error. This commit adds retry logic for tablet copy operations when
encountering SERVER_TOO_BUSY or UNAVAILABLE errors, which are retriable.

Change-Id: I7c8454fc600a841bd15306a2b3b06ddf53130be6
---
M src/kudu/integration-tests/cluster_itest_util.cc
M src/kudu/integration-tests/cluster_itest_util.h
M src/kudu/integration-tests/tablet_copy_client_session-itest.cc
M src/kudu/tserver/tablet_copy_client.cc
M src/kudu/tserver/tablet_copy_client.h
5 files changed, 139 insertions(+), 33 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/16/8016/14
-- 
To view, visit http://gerrit.cloudera.org:8080/8016
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I7c8454fc600a841bd15306a2b3b06ddf53130be6
Gerrit-PatchSet: 14
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] KUDU-2125: Tablet copy client does not retry on failures

Posted by "Dan Burkert (Code Review)" <ge...@cloudera.org>.
Dan Burkert has posted comments on this change.

Change subject: KUDU-2125: Tablet copy client does not retry on failures
......................................................................


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/8016/2/src/kudu/integration-tests/tablet_copy_client_session-itest.cc
File src/kudu/integration-tests/tablet_copy_client_session-itest.cc:

Line 296:   ASSERT_OK(WaitForNumTabletsOnTS(ts0, kNumTablets, kDefaultTimeout, &tablets, true));
> I understand, but I don't see any code here to delete or otherwise fail tab
Without this the tablet copies were failing due to the tablets being in initializing state.


-- 
To view, visit http://gerrit.cloudera.org:8080/8016
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I7c8454fc600a841bd15306a2b3b06ddf53130be6
Gerrit-PatchSet: 2
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes

[kudu-CR] KUDU-2125: Tablet copy client does not retry on failures

Posted by "Dan Burkert (Code Review)" <ge...@cloudera.org>.
Dan Burkert has posted comments on this change.

Change subject: KUDU-2125: Tablet copy client does not retry on failures
......................................................................


Patch Set 8:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/8016/8/src/kudu/integration-tests/tablet_copy_client_session-itest.cc
File src/kudu/integration-tests/tablet_copy_client_session-itest.cc:

Line 320:   ASSERT_FALSE(HasFailure());
> I'm curious why it's necessary to add this assert.  What happens if there i
It still fails even without this statement.


-- 
To view, visit http://gerrit.cloudera.org:8080/8016
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I7c8454fc600a841bd15306a2b3b06ddf53130be6
Gerrit-PatchSet: 8
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes

[kudu-CR] KUDU-2125: Tablet copy client does not retry on failures

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change.

Change subject: KUDU-2125: Tablet copy client does not retry on failures
......................................................................


Patch Set 8:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/8016/8/src/kudu/integration-tests/tablet_copy_client_session-itest.cc
File src/kudu/integration-tests/tablet_copy_client_session-itest.cc:

Line 320:   ASSERT_FALSE(HasFailure());
> NO_FATALS, through some unknowable black magic, only triggers when there ar
I'm curious why it's necessary to add this assert.  What happens if there is a failure at least in one thread above and this ASSERT_FALSE(HasFailure()) is not present?  Does the test pass in that case or there is some other misreporting?


-- 
To view, visit http://gerrit.cloudera.org:8080/8016
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I7c8454fc600a841bd15306a2b3b06ddf53130be6
Gerrit-PatchSet: 8
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes

[kudu-CR] KUDU-2125: Tablet copy client does not retry on failures

Posted by "Dan Burkert (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/8016

to look at the new patch set (#8).

Change subject: KUDU-2125: Tablet copy client does not retry on failures
......................................................................

KUDU-2125: Tablet copy client does not retry on failures

The tablet copy client would fail the tablet copy after encountering an
RPC error. This commit adds retry logic for tablet copy operations when
encountering SERVER_TOO_BUSY or UNAVAILABLE errors, which are retriable.

Change-Id: I7c8454fc600a841bd15306a2b3b06ddf53130be6
---
M src/kudu/integration-tests/cluster_itest_util.cc
M src/kudu/integration-tests/cluster_itest_util.h
M src/kudu/integration-tests/tablet_copy_client_session-itest.cc
M src/kudu/tserver/tablet_copy_client.cc
M src/kudu/tserver/tablet_copy_client.h
5 files changed, 135 insertions(+), 33 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/16/8016/8
-- 
To view, visit http://gerrit.cloudera.org:8080/8016
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I7c8454fc600a841bd15306a2b3b06ddf53130be6
Gerrit-PatchSet: 8
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] KUDU-2125: Tablet copy client does not retry on failures

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change.

Change subject: KUDU-2125: Tablet copy client does not retry on failures
......................................................................


Patch Set 8: Code-Review+1

(1 comment)

http://gerrit.cloudera.org:8080/#/c/8016/8/src/kudu/integration-tests/tablet_copy_client_session-itest.cc
File src/kudu/integration-tests/tablet_copy_client_session-itest.cc:

Line 320:   ASSERT_FALSE(HasFailure());
> It still fails even without this statement.
I see, thanks.  Maybe, remove it then (or the idea is to catch some other errors)?


-- 
To view, visit http://gerrit.cloudera.org:8080/8016
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I7c8454fc600a841bd15306a2b3b06ddf53130be6
Gerrit-PatchSet: 8
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes

[kudu-CR] KUDU-2125: Tablet copy client does not retry on failures

Posted by "Dan Burkert (Code Review)" <ge...@cloudera.org>.
Dan Burkert has posted comments on this change.

Change subject: KUDU-2125: Tablet copy client does not retry on failures
......................................................................


Patch Set 10:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/8016/10/src/kudu/integration-tests/cluster_itest_util.cc
File src/kudu/integration-tests/cluster_itest_util.cc:

PS10, Line 858:  != 
> Neat: I didn't know you could check for equality between a thing and an opt
Yah, it auto derefs it.  Very dangerous if you don't do a check up front, though.  I don't think it throws an exception or anything.


http://gerrit.cloudera.org:8080/#/c/8016/10/src/kudu/tserver/tablet_copy_client.cc
File src/kudu/tserver/tablet_copy_client.cc:

PS10, Line 702: >=
> why >= and not == ?
I thought about it, and mildly preferred >= so that the client doesn't continue looping indefinitely if the server sends a block with a too-long length.  I'm not sure what would happen in that scenario, but infinite loop doesn't seem good.


Line 747:       // Polynomial backoff with 50% jitter.
> Have you used this before? I plotted y=10x^2 and it seems pretty good -- we
I briefly tried to see if we had an abstraction for exponential backoff with jitter calculation, and it seems we don't.  This was just gut feeling.  I also graphed it initially, for anyone else following along: https://www.wolframcloud.com/objects/b532668a-64c0-45d4-8fad-dfce1718cda6


-- 
To view, visit http://gerrit.cloudera.org:8080/8016
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I7c8454fc600a841bd15306a2b3b06ddf53130be6
Gerrit-PatchSet: 10
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes

[kudu-CR] KUDU-2125: Tablet copy client does not retry on failures

Posted by "Dan Burkert (Code Review)" <ge...@cloudera.org>.
Hello Adar Dembo, Alexey Serbin, Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/8016

to look at the new patch set (#13).

Change subject: KUDU-2125: Tablet copy client does not retry on failures
......................................................................

KUDU-2125: Tablet copy client does not retry on failures

The tablet copy client would fail the tablet copy after encountering an
RPC error. This commit adds retry logic for tablet copy operations when
encountering SERVER_TOO_BUSY or UNAVAILABLE errors, which are retriable.

Change-Id: I7c8454fc600a841bd15306a2b3b06ddf53130be6
---
M src/kudu/integration-tests/cluster_itest_util.cc
M src/kudu/integration-tests/cluster_itest_util.h
M src/kudu/integration-tests/tablet_copy_client_session-itest.cc
M src/kudu/tserver/tablet_copy_client.cc
M src/kudu/tserver/tablet_copy_client.h
5 files changed, 139 insertions(+), 33 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/16/8016/13
-- 
To view, visit http://gerrit.cloudera.org:8080/8016
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I7c8454fc600a841bd15306a2b3b06ddf53130be6
Gerrit-PatchSet: 13
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] KUDU-2125: Tablet copy client does not retry on failures

Posted by "Dan Burkert (Code Review)" <ge...@cloudera.org>.
Dan Burkert has posted comments on this change.

Change subject: KUDU-2125: Tablet copy client does not retry on failures
......................................................................


Patch Set 4:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/8016/4/src/kudu/integration-tests/tablet_copy_client_session-itest.cc
File src/kudu/integration-tests/tablet_copy_client_session-itest.cc:

PS4, Line 305: ASSERT_OK
> That's interesting; I too was under the impression that the main test threa
Turns out it was caught because of some existing calls to HasFailure() in external_mini_cluster-itest-base and test_util.  I've added a call in this test to make it explicit.  I think that gtest manual is way outdated aobut thread safety.


-- 
To view, visit http://gerrit.cloudera.org:8080/8016
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I7c8454fc600a841bd15306a2b3b06ddf53130be6
Gerrit-PatchSet: 4
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes

[kudu-CR] KUDU-2125: Tablet copy client does not retry on failures

Posted by "Dan Burkert (Code Review)" <ge...@cloudera.org>.
Dan Burkert has posted comments on this change.

Change subject: KUDU-2125: Tablet copy client does not retry on failures
......................................................................


Patch Set 15: Code-Review+2

Carrying over Mike's +2.  Had to rebase due to test conflict.

-- 
To view, visit http://gerrit.cloudera.org:8080/8016
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I7c8454fc600a841bd15306a2b3b06ddf53130be6
Gerrit-PatchSet: 15
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: No

[kudu-CR] KUDU-2125: Tablet copy client does not retry on failures

Posted by "Mike Percy (Code Review)" <ge...@cloudera.org>.
Mike Percy has posted comments on this change.

Change subject: KUDU-2125: Tablet copy client does not retry on failures
......................................................................


Patch Set 10:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/8016/10/src/kudu/tserver/tablet_copy_client.cc
File src/kudu/tserver/tablet_copy_client.cc:

PS10, Line 702: >=
> I thought about it, and mildly preferred >= so that the client doesn't cont
Can that actually happen? I think the checksum would be invalid on the block. I'm nervous about adding fuzzy math to our data copying code.


-- 
To view, visit http://gerrit.cloudera.org:8080/8016
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I7c8454fc600a841bd15306a2b3b06ddf53130be6
Gerrit-PatchSet: 10
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes