You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ozone.apache.org by "Attila Doroszlai (Jira)" <ji...@apache.org> on 2022/05/05 15:25:00 UTC

[jira] [Updated] (HDDS-6703) Install snapshot should wait for RPC server to stop

     [ https://issues.apache.org/jira/browse/HDDS-6703?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Attila Doroszlai updated HDDS-6703:
-----------------------------------
    Description: 
HDDS-6685 changed install snapshot logic in OM, restarting various OM components during the process.  However, it does not wait for OM RPC server to stop, leading to intermittent BindException a bit later during start.

 * https://github.com/adoroszlai/ozone-build-results/tree/master/2022/05/05/14835/it-flaky/hadoop-ozone/integration-test/
 * https://github.com/adoroszlai/ozone-build-results/tree/master/2022/05/05/14832/it-flaky/hadoop-ozone/integration-test/
 * https://github.com/adoroszlai/ozone-build-results/tree/master/2022/05/05/14826/it-flaky/hadoop-ozone/integration-test/

{code}
2022-05-05 13:41:05,653 [pool-2368-thread-1] ERROR om.OzoneManager (ExitUtils.java:terminate(133)) - Terminating with exit status 1: Failed to start RPC Server.
java.net.BindException: Problem binding to [localhost:39581] java.net.BindException: Address already in use; For more details see:  http://wiki.apache.org/hadoop/BindException
	at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
	at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
	at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
	at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:913)
	at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:809)
	at org.apache.hadoop.ipc.Server.bind(Server.java:640)
	at org.apache.hadoop.ipc.Server$Listener.<init>(Server.java:1225)
	at org.apache.hadoop.ipc.Server.<init>(Server.java:3117)
	at org.apache.hadoop.ipc.RPC$Server.<init>(RPC.java:1062)
	at org.apache.hadoop.ipc.ProtobufRpcEngine2$Server.<init>(ProtobufRpcEngine2.java:464)
	at org.apache.hadoop.ipc.ProtobufRpcEngine$Server.<init>(ProtobufRpcEngine.java:434)
	at org.apache.hadoop.ipc.ProtobufRpcEngine.getServer(ProtobufRpcEngine.java:361)
	at org.apache.hadoop.ipc.RPC$Builder.build(RPC.java:853)
	at org.apache.hadoop.ozone.om.OzoneManager.startRpcServer(OzoneManager.java:1084)
	at org.apache.hadoop.ozone.om.OzoneManager.getRpcServer(OzoneManager.java:1054)
	at org.apache.hadoop.ozone.om.OzoneManager.installCheckpoint(OzoneManager.java:3366)
	at org.apache.hadoop.ozone.om.OzoneManager.installCheckpoint(OzoneManager.java:3254)
	at org.apache.hadoop.ozone.om.OzoneManager.installSnapshotFromLeader(OzoneManager.java:3231)
{code}

  was:
HDDS-6685 changed install snapshot logic in OM, restarting various OM components during the process.  However, it does not wait for OM RPC server to stop, leading to intermittent BindException a bit later during start.

 * https://github.com/adoroszlai/ozone-build-results/tree/master/2022/05/05/14835/it-flaky/hadoop-ozone/integration-test/
 * https://github.com/adoroszlai/ozone-build-results/tree/master/2022/05/05/14832/it-flaky/hadoop-ozone/integration-test/
 * https://github.com/adoroszlai/ozone-build-results/tree/master/2022/05/05/14826/it-flaky/hadoop-ozone/integration-test/


> Install snapshot should wait for RPC server to stop
> ---------------------------------------------------
>
>                 Key: HDDS-6703
>                 URL: https://issues.apache.org/jira/browse/HDDS-6703
>             Project: Apache Ozone
>          Issue Type: Bug
>          Components: Ozone Manager
>    Affects Versions: 1.3.0
>            Reporter: Attila Doroszlai
>            Assignee: Attila Doroszlai
>            Priority: Major
>
> HDDS-6685 changed install snapshot logic in OM, restarting various OM components during the process.  However, it does not wait for OM RPC server to stop, leading to intermittent BindException a bit later during start.
>  * https://github.com/adoroszlai/ozone-build-results/tree/master/2022/05/05/14835/it-flaky/hadoop-ozone/integration-test/
>  * https://github.com/adoroszlai/ozone-build-results/tree/master/2022/05/05/14832/it-flaky/hadoop-ozone/integration-test/
>  * https://github.com/adoroszlai/ozone-build-results/tree/master/2022/05/05/14826/it-flaky/hadoop-ozone/integration-test/
> {code}
> 2022-05-05 13:41:05,653 [pool-2368-thread-1] ERROR om.OzoneManager (ExitUtils.java:terminate(133)) - Terminating with exit status 1: Failed to start RPC Server.
> java.net.BindException: Problem binding to [localhost:39581] java.net.BindException: Address already in use; For more details see:  http://wiki.apache.org/hadoop/BindException
> 	at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
> 	at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
> 	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
> 	at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
> 	at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:913)
> 	at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:809)
> 	at org.apache.hadoop.ipc.Server.bind(Server.java:640)
> 	at org.apache.hadoop.ipc.Server$Listener.<init>(Server.java:1225)
> 	at org.apache.hadoop.ipc.Server.<init>(Server.java:3117)
> 	at org.apache.hadoop.ipc.RPC$Server.<init>(RPC.java:1062)
> 	at org.apache.hadoop.ipc.ProtobufRpcEngine2$Server.<init>(ProtobufRpcEngine2.java:464)
> 	at org.apache.hadoop.ipc.ProtobufRpcEngine$Server.<init>(ProtobufRpcEngine.java:434)
> 	at org.apache.hadoop.ipc.ProtobufRpcEngine.getServer(ProtobufRpcEngine.java:361)
> 	at org.apache.hadoop.ipc.RPC$Builder.build(RPC.java:853)
> 	at org.apache.hadoop.ozone.om.OzoneManager.startRpcServer(OzoneManager.java:1084)
> 	at org.apache.hadoop.ozone.om.OzoneManager.getRpcServer(OzoneManager.java:1054)
> 	at org.apache.hadoop.ozone.om.OzoneManager.installCheckpoint(OzoneManager.java:3366)
> 	at org.apache.hadoop.ozone.om.OzoneManager.installCheckpoint(OzoneManager.java:3254)
> 	at org.apache.hadoop.ozone.om.OzoneManager.installSnapshotFromLeader(OzoneManager.java:3231)
> {code}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org