You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ozone.apache.org by "Vivek Ratnavel Subramanian (Jira)" <ji...@apache.org> on 2021/08/19 06:08:00 UTC

[jira] [Commented] (HDDS-5556) GrpcReplication Client may fail in SCM HA Cluster

    [ https://issues.apache.org/jira/browse/HDDS-5556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17401492#comment-17401492 ] 

Vivek Ratnavel Subramanian commented on HDDS-5556:
--------------------------------------------------

I was able to reproduce the failure in the secure docker-compose ha setup.


{code:java}
datanode3_1  | 2021-08-19 05:30:51,318 [grpc-default-executor-11] ERROR replication.GrpcReplicationClient: Download of container 1 was unsuccessful
datanode3_1  | org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: UNAVAILABLE: io exception
datanode3_1  | Channel Pipeline: [SslHandler#0, ProtocolNegotiators$ClientTlsHandler#0, WriteBufferingAndExceptionHandler#0, DefaultChannelPipeline$TailContext#0]
datanode3_1  | 	at org.apache.ratis.thirdparty.io.grpc.Status.asRuntimeException(Status.java:533)
datanode3_1  | 	at org.apache.ratis.thirdparty.io.grpc.stub.ClientCalls$StreamObserverToCallListenerAdapter.onClose(ClientCalls.java:478)
datanode3_1  | 	at org.apache.ratis.thirdparty.io.grpc.internal.DelayedClientCall$DelayedListener$3.run(DelayedClientCall.java:463)
datanode3_1  | 	at org.apache.ratis.thirdparty.io.grpc.internal.DelayedClientCall$DelayedListener.delayOrExecute(DelayedClientCall.java:427)
datanode3_1  | 	at org.apache.ratis.thirdparty.io.grpc.internal.DelayedClientCall$DelayedListener.onClose(DelayedClientCall.java:460)
datanode3_1  | 	at org.apache.ratis.thirdparty.io.grpc.internal.ClientCallImpl.closeObserver(ClientCallImpl.java:616)
datanode3_1  | 	at org.apache.ratis.thirdparty.io.grpc.internal.ClientCallImpl.access$300(ClientCallImpl.java:69)
datanode3_1  | 	at org.apache.ratis.thirdparty.io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1StreamClosed.runInternal(ClientCallImpl.java:802)
datanode3_1  | 	at org.apache.ratis.thirdparty.io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1StreamClosed.runInContext(ClientCallImpl.java:781)
datanode3_1  | 	at org.apache.ratis.thirdparty.io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37)
datanode3_1  | 	at org.apache.ratis.thirdparty.io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:123)
datanode3_1  | 	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
datanode3_1  | 	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
datanode3_1  | 	at java.base/java.lang.Thread.run(Thread.java:834)
datanode3_1  | Caused by: javax.net.ssl.SSLHandshakeException: General OpenSslEngine problem
datanode3_1  | 	at org.apache.ratis.thirdparty.io.netty.handler.ssl.ReferenceCountedOpenSslEngine.handshakeException(ReferenceCountedOpenSslEngine.java:1860)
datanode3_1  | 	at org.apache.ratis.thirdparty.io.netty.handler.ssl.ReferenceCountedOpenSslEngine.wrap(ReferenceCountedOpenSslEngine.java:815)
datanode3_1  | 	at java.base/javax.net.ssl.SSLEngine.wrap(SSLEngine.java:522)
datanode3_1  | 	at org.apache.ratis.thirdparty.io.netty.handler.ssl.SslHandler.wrap(SslHandler.java:1059)
datanode3_1  | 	at org.apache.ratis.thirdparty.io.netty.handler.ssl.SslHandler.wrapNonAppData(SslHandler.java:944)
datanode3_1  | 	at org.apache.ratis.thirdparty.io.netty.handler.ssl.SslHandler.unwrap(SslHandler.java:1421)
datanode3_1  | 	at org.apache.ratis.thirdparty.io.netty.handler.ssl.SslHandler.decodeJdkCompatible(SslHandler.java:1253)
datanode3_1  | 	at org.apache.ratis.thirdparty.io.netty.handler.ssl.SslHandler.decode(SslHandler.java:1300)
datanode3_1  | 	at org.apache.ratis.thirdparty.io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:508)
datanode3_1  | 	at org.apache.ratis.thirdparty.io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:447)
datanode3_1  | 	at org.apache.ratis.thirdparty.io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:276)
datanode3_1  | 	at org.apache.ratis.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
datanode3_1  | 	at org.apache.ratis.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
datanode3_1  | 	at org.apache.ratis.thirdparty.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
datanode3_1  | 	at org.apache.ratis.thirdparty.io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410)
datanode3_1  | 	at org.apache.ratis.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
datanode3_1  | 	at org.apache.ratis.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
datanode3_1  | 	at org.apache.ratis.thirdparty.io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919)
datanode3_1  | 	at org.apache.ratis.thirdparty.io.netty.channel.epoll.AbstractEpollStreamChannel$EpollStreamUnsafe.epollInReady(AbstractEpollStreamChannel.java:795)
datanode3_1  | 	at org.apache.ratis.thirdparty.io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:480)
datanode3_1  | 	at org.apache.ratis.thirdparty.io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:378)
datanode3_1  | 	at org.apache.ratis.thirdparty.io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
datanode3_1  | 	at org.apache.ratis.thirdparty.io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
datanode3_1  | 	at org.apache.ratis.thirdparty.io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
datanode3_1  | 	... 1 more
datanode3_1  | Caused by: sun.security.validator.ValidatorException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target
datanode3_1  | 	at java.base/sun.security.validator.PKIXValidator.doBuild(PKIXValidator.java:439)
datanode3_1  | 	at java.base/sun.security.validator.PKIXValidator.engineValidate(PKIXValidator.java:306)
datanode3_1  | 	at java.base/sun.security.validator.Validator.validate(Validator.java:264)
datanode3_1  | 	at java.base/sun.security.ssl.X509TrustManagerImpl.validate(X509TrustManagerImpl.java:313)
datanode3_1  | 	at java.base/sun.security.ssl.X509TrustManagerImpl.checkTrusted(X509TrustManagerImpl.java:276)
datanode3_1  | 	at java.base/sun.security.ssl.X509TrustManagerImpl.checkServerTrusted(X509TrustManagerImpl.java:141)
datanode3_1  | 	at org.apache.ratis.thirdparty.io.netty.handler.ssl.ReferenceCountedOpenSslClientContext$ExtendedTrustManagerVerifyCallback.verify(ReferenceCountedOpenSslClientContext.java:234)
datanode3_1  | 	at org.apache.ratis.thirdparty.io.netty.handler.ssl.ReferenceCountedOpenSslContext$AbstractCertificateVerifier.verify(ReferenceCountedOpenSslContext.java:717)
datanode3_1  | 	at org.apache.ratis.thirdparty.io.netty.internal.tcnative.SSL.readFromSSL(Native Method)
datanode3_1  | 	at org.apache.ratis.thirdparty.io.netty.handler.ssl.ReferenceCountedOpenSslEngine.readPlaintextData(ReferenceCountedOpenSslEngine.java:634)
datanode3_1  | 	at org.apache.ratis.thirdparty.io.netty.handler.ssl.ReferenceCountedOpenSslEngine.unwrap(ReferenceCountedOpenSslEngine.java:1258)
datanode3_1  | 	at org.apache.ratis.thirdparty.io.netty.handler.ssl.ReferenceCountedOpenSslEngine.unwrap(ReferenceCountedOpenSslEngine.java:1384)
datanode3_1  | 	at org.apache.ratis.thirdparty.io.netty.handler.ssl.ReferenceCountedOpenSslEngine.unwrap(ReferenceCountedOpenSslEngine.java:1427)
datanode3_1  | 	at org.apache.ratis.thirdparty.io.netty.handler.ssl.SslHandler$SslEngineType$1.unwrap(SslHandler.java:208)
datanode3_1  | 	at org.apache.ratis.thirdparty.io.netty.handler.ssl.SslHandler.unwrap(SslHandler.java:1358)
datanode3_1  | 	... 19 more
datanode3_1  | 	Suppressed: javax.net.ssl.SSLHandshakeException: error:1000007d:SSL routines:OPENSSL_internal:CERTIFICATE_VERIFY_FAILED
datanode3_1  | 		at org.apache.ratis.thirdparty.io.netty.handler.ssl.ReferenceCountedOpenSslEngine.sslReadErrorResult(ReferenceCountedOpenSslEngine.java:1347)
datanode3_1  | 		at org.apache.ratis.thirdparty.io.netty.handler.ssl.ReferenceCountedOpenSslEngine.unwrap(ReferenceCountedOpenSslEngine.java:1308)
datanode3_1  | 		... 23 more
datanode3_1  | Caused by: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target
datanode3_1  | 	at java.base/sun.security.provider.certpath.SunCertPathBuilder.build(SunCertPathBuilder.java:141)
datanode3_1  | 	at java.base/sun.security.provider.certpath.SunCertPathBuilder.engineBuild(SunCertPathBuilder.java:126)
datanode3_1  | 	at java.base/java.security.cert.CertPathBuilder.build(CertPathBuilder.java:297)
datanode3_1  | 	at java.base/sun.security.validator.PKIXValidator.doBuild(PKIXValidator.java:434)
datanode3_1  | 	... 33 more
datanode3_1  | 2021-08-19 05:30:51,320 [grpc-default-executor-8] ERROR replication.SimpleContainerDownloader: Error on replicating container: 1
datanode3_1  | java.util.concurrent.CompletionException: org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: UNAVAILABLE: io exception
datanode3_1  | Channel Pipeline: [SslHandler#0, ProtocolNegotiators$ClientTlsHandler#0, WriteBufferingAndExceptionHandler#0, DefaultChannelPipeline$TailContext#0]
datanode3_1  | 	at java.base/java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:331)
datanode3_1  | 	at java.base/java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:346)
datanode3_1  | 	at java.base/java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:632)
datanode3_1  | 	at java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506)
datanode3_1  | 	at java.base/java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2088)
datanode3_1  | 	at org.apache.hadoop.ozone.container.replication.GrpcReplicationClient$StreamDownloader.onError(GrpcReplicationClient.java:173)
datanode3_1  | 	at org.apache.ratis.thirdparty.io.grpc.stub.ClientCalls$StreamObserverToCallListenerAdapter.onClose(ClientCalls.java:478)
datanode3_1  | 	at org.apache.ratis.thirdparty.io.grpc.internal.DelayedClientCall$DelayedListener$3.run(DelayedClientCall.java:463)
datanode3_1  | 	at org.apache.ratis.thirdparty.io.grpc.internal.DelayedClientCall$DelayedListener.delayOrExecute(DelayedClientCall.java:427)
datanode3_1  | 	at org.apache.ratis.thirdparty.io.grpc.internal.DelayedClientCall$DelayedListener.onClose(DelayedClientCall.java:460)
datanode3_1  | 	at org.apache.ratis.thirdparty.io.grpc.internal.ClientCallImpl.closeObserver(ClientCallImpl.java:616)
datanode3_1  | 	at org.apache.ratis.thirdparty.io.grpc.internal.ClientCallImpl.access$300(ClientCallImpl.java:69)
datanode3_1  | 	at org.apache.ratis.thirdparty.io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1StreamClosed.runInternal(ClientCallImpl.java:802)
datanode3_1  | 	at org.apache.ratis.thirdparty.io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1StreamClosed.runInContext(ClientCallImpl.java:781)
datanode3_1  | 	at org.apache.ratis.thirdparty.io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37)
datanode3_1  | 	at org.apache.ratis.thirdparty.io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:123)
datanode3_1  | 	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
datanode3_1  | 	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
datanode3_1  | 	at java.base/java.lang.Thread.run(Thread.java:834)
datanode3_1  | Caused by: org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: UNAVAILABLE: io exception
datanode3_1  | Channel Pipeline: [SslHandler#0, ProtocolNegotiators$ClientTlsHandler#0, WriteBufferingAndExceptionHandler#0, DefaultChannelPipeline$TailContext#0]
datanode3_1  | 	at org.apache.ratis.thirdparty.io.grpc.Status.asRuntimeException(Status.java:533)
datanode3_1  | 	... 13 more
datanode3_1  | Caused by: javax.net.ssl.SSLHandshakeException: General OpenSslEngine problem
datanode3_1  | 	at org.apache.ratis.thirdparty.io.netty.handler.ssl.ReferenceCountedOpenSslEngine.handshakeException(ReferenceCountedOpenSslEngine.java:1860)
datanode3_1  | 	at org.apache.ratis.thirdparty.io.netty.handler.ssl.ReferenceCountedOpenSslEngine.wrap(ReferenceCountedOpenSslEngine.java:815)
datanode3_1  | 	at java.base/javax.net.ssl.SSLEngine.wrap(SSLEngine.java:522)
datanode3_1  | 	at org.apache.ratis.thirdparty.io.netty.handler.ssl.SslHandler.wrap(SslHandler.java:1059)
datanode3_1  | 	at org.apache.ratis.thirdparty.io.netty.handler.ssl.SslHandler.wrapNonAppData(SslHandler.java:944)
datanode3_1  | 	at org.apache.ratis.thirdparty.io.netty.handler.ssl.SslHandler.unwrap(SslHandler.java:1421)
datanode3_1  | 	at org.apache.ratis.thirdparty.io.netty.handler.ssl.SslHandler.decodeJdkCompatible(SslHandler.java:1253)
datanode3_1  | 	at org.apache.ratis.thirdparty.io.netty.handler.ssl.SslHandler.decode(SslHandler.java:1300)
datanode3_1  | 	at org.apache.ratis.thirdparty.io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:508)
datanode3_1  | 	at org.apache.ratis.thirdparty.io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:447)
datanode3_1  | 	at org.apache.ratis.thirdparty.io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:276)
datanode3_1  | 	at org.apache.ratis.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
datanode3_1  | 	at org.apache.ratis.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
datanode3_1  | 	at org.apache.ratis.thirdparty.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
datanode3_1  | 	at org.apache.ratis.thirdparty.io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410)
datanode3_1  | 	at org.apache.ratis.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
datanode3_1  | 	at org.apache.ratis.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
datanode3_1  | 	at org.apache.ratis.thirdparty.io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919)
datanode3_1  | 	at org.apache.ratis.thirdparty.io.netty.channel.epoll.AbstractEpollStreamChannel$EpollStreamUnsafe.epollInReady(AbstractEpollStreamChannel.java:795)
datanode3_1  | 	at org.apache.ratis.thirdparty.io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:480)
datanode3_1  | 	at org.apache.ratis.thirdparty.io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:378)
datanode3_1  | 	at org.apache.ratis.thirdparty.io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
datanode3_1  | 	at org.apache.ratis.thirdparty.io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
datanode3_1  | 	at org.apache.ratis.thirdparty.io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
datanode3_1  | 	... 1 more
datanode3_1  | Caused by: sun.security.validator.ValidatorException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target
datanode3_1  | 	at java.base/sun.security.validator.PKIXValidator.doBuild(PKIXValidator.java:439)
datanode3_1  | 	at java.base/sun.security.validator.PKIXValidator.engineValidate(PKIXValidator.java:306)
datanode3_1  | 	at java.base/sun.security.validator.Validator.validate(Validator.java:264)
datanode3_1  | 	at java.base/sun.security.ssl.X509TrustManagerImpl.validate(X509TrustManagerImpl.java:313)
datanode3_1  | 	at java.base/sun.security.ssl.X509TrustManagerImpl.checkTrusted(X509TrustManagerImpl.java:276)
datanode3_1  | 	at java.base/sun.security.ssl.X509TrustManagerImpl.checkServerTrusted(X509TrustManagerImpl.java:141)
datanode3_1  | 	at org.apache.ratis.thirdparty.io.netty.handler.ssl.ReferenceCountedOpenSslClientContext$ExtendedTrustManagerVerifyCallback.verify(ReferenceCountedOpenSslClientContext.java:234)
datanode3_1  | 	at org.apache.ratis.thirdparty.io.netty.handler.ssl.ReferenceCountedOpenSslContext$AbstractCertificateVerifier.verify(ReferenceCountedOpenSslContext.java:717)
datanode3_1  | 	at org.apache.ratis.thirdparty.io.netty.internal.tcnative.SSL.readFromSSL(Native Method)
datanode3_1  | 	at org.apache.ratis.thirdparty.io.netty.handler.ssl.ReferenceCountedOpenSslEngine.readPlaintextData(ReferenceCountedOpenSslEngine.java:634)
datanode3_1  | 	at org.apache.ratis.thirdparty.io.netty.handler.ssl.ReferenceCountedOpenSslEngine.unwrap(ReferenceCountedOpenSslEngine.java:1258)
datanode3_1  | 	at org.apache.ratis.thirdparty.io.netty.handler.ssl.ReferenceCountedOpenSslEngine.unwrap(ReferenceCountedOpenSslEngine.java:1384)
datanode3_1  | 	at org.apache.ratis.thirdparty.io.netty.handler.ssl.ReferenceCountedOpenSslEngine.unwrap(ReferenceCountedOpenSslEngine.java:1427)
datanode3_1  | 	at org.apache.ratis.thirdparty.io.netty.handler.ssl.SslHandler$SslEngineType$1.unwrap(SslHandler.java:208)
datanode3_1  | 	at org.apache.ratis.thirdparty.io.netty.handler.ssl.SslHandler.unwrap(SslHandler.java:1358)
datanode3_1  | 	... 19 more
datanode3_1  | 	Suppressed: javax.net.ssl.SSLHandshakeException: error:1000007d:SSL routines:OPENSSL_internal:CERTIFICATE_VERIFY_FAILED
datanode3_1  | 		at org.apache.ratis.thirdparty.io.netty.handler.ssl.ReferenceCountedOpenSslEngine.sslReadErrorResult(ReferenceCountedOpenSslEngine.java:1347)
datanode3_1  | 		at org.apache.ratis.thirdparty.io.netty.handler.ssl.ReferenceCountedOpenSslEngine.unwrap(ReferenceCountedOpenSslEngine.java:1308)
datanode3_1  | 		... 23 more
datanode3_1  | Caused by: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target
datanode3_1  | 	at java.base/sun.security.provider.certpath.SunCertPathBuilder.build(SunCertPathBuilder.java:141)
datanode3_1  | 	at java.base/sun.security.provider.certpath.SunCertPathBuilder.engineBuild(SunCertPathBuilder.java:126)
datanode3_1  | 	at java.base/java.security.cert.CertPathBuilder.build(CertPathBuilder.java:297)
datanode3_1  | 	at java.base/sun.security.validator.PKIXValidator.doBuild(PKIXValidator.java:434)
datanode3_1  | 	... 33 more
datanode3_1  | 2021-08-19 05:30:51,322 [ContainerReplicationThread-1] ERROR replication.DownloadAndImportReplicator: Container 1 replication was unsuccessful.
datanode3_1  | java.lang.NullPointerException
datanode3_1  | 	at java.base/java.nio.file.Files.provider(Files.java:101)
datanode3_1  | 	at java.base/java.nio.file.Files.readAttributes(Files.java:1764)
datanode3_1  | 	at java.base/java.nio.file.Files.size(Files.java:2381)
datanode3_1  | 	at org.apache.hadoop.ozone.container.replication.DownloadAndImportReplicator.replicate(DownloadAndImportReplicator.java:119)
datanode3_1  | 	at org.apache.hadoop.ozone.container.replication.MeasuredReplicator.replicate(MeasuredReplicator.java:69)
datanode3_1  | 	at org.apache.hadoop.ozone.container.replication.ReplicationSupervisor$TaskRunner.run(ReplicationSupervisor.java:161)
datanode3_1  | 	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
datanode3_1  | 	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
datanode3_1  | 	at java.base/java.lang.Thread.run(Thread.java:834)
datanode3_1  | 2021-08-19 05:30:51,322 [ContainerReplicationThread-1] ERROR replication.ReplicationSupervisor: Container 1 can't be downloaded from any of the datanodes.
{code}


> GrpcReplication Client may fail in SCM HA Cluster
> -------------------------------------------------
>
>                 Key: HDDS-5556
>                 URL: https://issues.apache.org/jira/browse/HDDS-5556
>             Project: Apache Ozone
>          Issue Type: Bug
>          Components: Ozone Datanode
>            Reporter: Bharat Viswanadham
>            Assignee: Vivek Ratnavel Subramanian
>            Priority: Blocker
>
> Scenario:
> 1. DN1 got cert from SCM1
> 2.  DN2 got cert from SCM2
> 3. DN3 got cert from SCM3
> 4. DN4 got cert from SCM3
> And now one of the closed container is under replicated due to DN3 faiilure, and DN4 is choose for replication it will fail during  secure channel setup.
> {code:java}
>  sslContextBuilder
>             .trustManager(certClient.getCACertificate)
>             .clientAuth(ClientAuth.REQUIRE)
>             .keyManager(certClient.getPrivateKey(),
>                 certClient.getCertificate()); 
> {code}
> In SCM HA kind of setup we should pass for truststore all the CA certs to setup a secure channel.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org