You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@kudu.apache.org by "Andrew Wong (Code Review)" <ge...@cloudera.org> on 2020/12/14 18:39:57 UTC

[kudu-CR] rpc: fix non-retriable error when starting up or shutting down

Andrew Wong has uploaded this change for review. ( http://gerrit.cloudera.org:8080/16876


Change subject: rpc: fix non-retriable error when starting up or shutting down
......................................................................

rpc: fix non-retriable error when starting up or shutting down

I found in testing a later patch that when a server is starting up or
being destructed, we may run into non-retriable error messages like:

Bad status: Remote error: Failed to write to server: ff6a8ff4d4bd456c99cc60b4ab35b6cc (127.0.0.1:60603): Service unavailable: service kudu.tserver.TabletServerAdminService not registered on TabletServer

This is because during startup or destruction, there are points at which
RPC services may not be registered, during which time RPCs will run into
non-retriable errors like that posted.

This patch addresses this by replacing the 'closing_' flag in
rpc::Messenger with a state enum, allowing us to express windows during
which the Messenger may not have all services registered.

Change-Id: I04e2379de4cf632d257b93cd701d7c73cb2bbaed
---
M src/kudu/client/client-test.cc
M src/kudu/rpc/messenger.cc
M src/kudu/rpc/messenger.h
M src/kudu/server/rpc_server.cc
4 files changed, 81 insertions(+), 8 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/76/16876/1
-- 
To view, visit http://gerrit.cloudera.org:8080/16876
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I04e2379de4cf632d257b93cd701d7c73cb2bbaed
Gerrit-Change-Number: 16876
Gerrit-PatchSet: 1
Gerrit-Owner: Andrew Wong <aw...@cloudera.com>

[kudu-CR] rpc: fix non-retriable error when starting up or shutting down

Posted by "Andrew Wong (Code Review)" <ge...@cloudera.org>.
Hello Alexey Serbin, Kudu Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/16876

to look at the new patch set (#3).

Change subject: rpc: fix non-retriable error when starting up or shutting down
......................................................................

rpc: fix non-retriable error when starting up or shutting down

I found in testing a later patch that when a server is starting up or
being destructed, we may run into non-retriable error messages like:

Bad status: Remote error: Failed to write to server: ff6a8ff4d4bd456c99cc60b4ab35b6cc (127.0.0.1:60603): Service unavailable: service kudu.tserver.TabletServerAdminService not registered on TabletServer

This is because during startup or destruction, there are points at which
RPC services may not be registered, during which time RPCs will run into
non-retriable errors like that posted.

This patch addresses this by replacing the 'closing_' flag in
rpc::Messenger with a state enum, allowing us to express windows during
which the Messenger may not have all services registered.

Change-Id: I04e2379de4cf632d257b93cd701d7c73cb2bbaed
---
M src/kudu/client/client-test.cc
M src/kudu/rpc/messenger.cc
M src/kudu/rpc/messenger.h
M src/kudu/rpc/rpc-test.cc
M src/kudu/server/rpc_server.cc
5 files changed, 96 insertions(+), 11 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/76/16876/3
-- 
To view, visit http://gerrit.cloudera.org:8080/16876
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I04e2379de4cf632d257b93cd701d7c73cb2bbaed
Gerrit-Change-Number: 16876
Gerrit-PatchSet: 3
Gerrit-Owner: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)

[kudu-CR] rpc: fix non-retriable error when starting up or shutting down

Posted by "Andrew Wong (Code Review)" <ge...@cloudera.org>.
Andrew Wong has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/16876 )

Change subject: rpc: fix non-retriable error when starting up or shutting down
......................................................................

rpc: fix non-retriable error when starting up or shutting down

I found in testing a later patch that when a server is starting up or
being destructed, we may run into non-retriable error messages like:

Bad status: Remote error: Failed to write to server: ff6a8ff4d4bd456c99cc60b4ab35b6cc (127.0.0.1:60603): Service unavailable: service kudu.tserver.TabletServerAdminService not registered on TabletServer

This is because during startup or destruction, there are points at which
RPC services may not be registered, during which time RPCs will run into
non-retriable errors like that posted.

This patch addresses this by replacing the 'closing_' flag in
rpc::Messenger with a state enum, allowing us to express windows during
which the Messenger may not have all services registered.

Change-Id: I04e2379de4cf632d257b93cd701d7c73cb2bbaed
Reviewed-on: http://gerrit.cloudera.org:8080/16876
Tested-by: Kudu Jenkins
Reviewed-by: Hao Hao <ha...@cloudera.com>
---
M src/kudu/client/client-test.cc
M src/kudu/rpc/messenger.cc
M src/kudu/rpc/messenger.h
M src/kudu/rpc/rpc-test.cc
M src/kudu/server/rpc_server.cc
5 files changed, 96 insertions(+), 11 deletions(-)

Approvals:
  Kudu Jenkins: Verified
  Hao Hao: Looks good to me, approved

-- 
To view, visit http://gerrit.cloudera.org:8080/16876
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I04e2379de4cf632d257b93cd701d7c73cb2bbaed
Gerrit-Change-Number: 16876
Gerrit-PatchSet: 5
Gerrit-Owner: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)

[kudu-CR] rpc: fix non-retriable error when starting up or shutting down

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/16876 )

Change subject: rpc: fix non-retriable error when starting up or shutting down
......................................................................


Patch Set 1:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/16876/1/src/kudu/rpc/messenger.cc
File src/kudu/rpc/messenger.cc:

http://gerrit.cloudera.org:8080/#/c/16876/1/src/kudu/rpc/messenger.cc@382
PS1, Line 382: Status::ServiceUnavailable
Now, when there is more detailed information on the state of the messenger, should we switch here to Status::NotFound() in case if state_ == kRunning?


http://gerrit.cloudera.org:8080/#/c/16876/1/src/kudu/server/rpc_server.cc
File src/kudu/server/rpc_server.cc:

http://gerrit.cloudera.org:8080/#/c/16876/1/src/kudu/server/rpc_server.cc@222
PS1, Line 222: messenger_->set_running();
What about RPC messenger at the client side?  Should we toggle its state as well?



-- 
To view, visit http://gerrit.cloudera.org:8080/16876
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I04e2379de4cf632d257b93cd701d7c73cb2bbaed
Gerrit-Change-Number: 16876
Gerrit-PatchSet: 1
Gerrit-Owner: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Mon, 14 Dec 2020 18:56:44 +0000
Gerrit-HasComments: Yes

[kudu-CR] rpc: fix non-retriable error when starting up or shutting down

Posted by "Hao Hao (Code Review)" <ge...@cloudera.org>.
Hao Hao has posted comments on this change. ( http://gerrit.cloudera.org:8080/16876 )

Change subject: rpc: fix non-retriable error when starting up or shutting down
......................................................................


Patch Set 4: Code-Review+2


-- 
To view, visit http://gerrit.cloudera.org:8080/16876
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I04e2379de4cf632d257b93cd701d7c73cb2bbaed
Gerrit-Change-Number: 16876
Gerrit-PatchSet: 4
Gerrit-Owner: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Tue, 15 Dec 2020 20:10:28 +0000
Gerrit-HasComments: No

[kudu-CR] rpc: fix non-retriable error when starting up or shutting down

Posted by "Andrew Wong (Code Review)" <ge...@cloudera.org>.
Hello Alexey Serbin, Kudu Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/16876

to look at the new patch set (#2).

Change subject: rpc: fix non-retriable error when starting up or shutting down
......................................................................

rpc: fix non-retriable error when starting up or shutting down

I found in testing a later patch that when a server is starting up or
being destructed, we may run into non-retriable error messages like:

Bad status: Remote error: Failed to write to server: ff6a8ff4d4bd456c99cc60b4ab35b6cc (127.0.0.1:60603): Service unavailable: service kudu.tserver.TabletServerAdminService not registered on TabletServer

This is because during startup or destruction, there are points at which
RPC services may not be registered, during which time RPCs will run into
non-retriable errors like that posted.

This patch addresses this by replacing the 'closing_' flag in
rpc::Messenger with a state enum, allowing us to express windows during
which the Messenger may not have all services registered.

Change-Id: I04e2379de4cf632d257b93cd701d7c73cb2bbaed
---
M src/kudu/client/client-test.cc
M src/kudu/rpc/messenger.cc
M src/kudu/rpc/messenger.h
M src/kudu/rpc/rpc-test.cc
M src/kudu/server/rpc_server.cc
5 files changed, 96 insertions(+), 11 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/76/16876/2
-- 
To view, visit http://gerrit.cloudera.org:8080/16876
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I04e2379de4cf632d257b93cd701d7c73cb2bbaed
Gerrit-Change-Number: 16876
Gerrit-PatchSet: 2
Gerrit-Owner: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)

[kudu-CR] rpc: fix non-retriable error when starting up or shutting down

Posted by "Andrew Wong (Code Review)" <ge...@cloudera.org>.
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/16876 )

Change subject: rpc: fix non-retriable error when starting up or shutting down
......................................................................


Patch Set 1:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/16876/1/src/kudu/rpc/messenger.cc
File src/kudu/rpc/messenger.cc:

http://gerrit.cloudera.org:8080/#/c/16876/1/src/kudu/rpc/messenger.cc@382
PS1, Line 382: Status::ServiceUnavailable
> Now, when there is more detailed information on the state of the messenger,
Done


http://gerrit.cloudera.org:8080/#/c/16876/1/src/kudu/server/rpc_server.cc
File src/kudu/server/rpc_server.cc:

http://gerrit.cloudera.org:8080/#/c/16876/1/src/kudu/server/rpc_server.cc@222
PS1, Line 222: messenger_->set_running();
> What about RPC messenger at the client side?  Should we toggle its state as
Good point. I actually don't think there's a good reason for this, other than it seeming natural, given the state is about "running". Instead, I'll rename the states to be more service-specific, and mention that not all Messengers will register/unregister services.



-- 
To view, visit http://gerrit.cloudera.org:8080/16876
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I04e2379de4cf632d257b93cd701d7c73cb2bbaed
Gerrit-Change-Number: 16876
Gerrit-PatchSet: 1
Gerrit-Owner: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Tue, 15 Dec 2020 02:41:27 +0000
Gerrit-HasComments: Yes

[kudu-CR] rpc: fix non-retriable error when starting up or shutting down

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/16876 )

Change subject: rpc: fix non-retriable error when starting up or shutting down
......................................................................


Patch Set 4:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/16876/1/src/kudu/server/rpc_server.cc
File src/kudu/server/rpc_server.cc:

http://gerrit.cloudera.org:8080/#/c/16876/1/src/kudu/server/rpc_server.cc@222
PS1, Line 222: messenger_->SetServicesReg
> Good point. I actually don't think there's a good reason for this, other th
Indeed, I think that's a better approach.



-- 
To view, visit http://gerrit.cloudera.org:8080/16876
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I04e2379de4cf632d257b93cd701d7c73cb2bbaed
Gerrit-Change-Number: 16876
Gerrit-PatchSet: 4
Gerrit-Owner: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Tue, 15 Dec 2020 17:17:34 +0000
Gerrit-HasComments: Yes