You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ratis.apache.org by "Rajeshbabu Chintaguntla (JIRA)" <ji...@apache.org> on 2019/07/11 13:56:00 UTC
[jira] [Commented] (RATIS-556) Detect node failures and add other workers to group serving the log and replicate the data of the log

    [ https://issues.apache.org/jira/browse/RATIS-556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16882983#comment-16882983 ] 

Rajeshbabu Chintaguntla commented on RATIS-556:
-----------------------------------------------

Currently when we create log 3 worker nodes are getting assigned to the log as a group
1) when one of the worker nodes went down we are not able to operate on the log.
{noformat}
logservice> read 'testlog1'
Failed to read from log
org.apache.ratis.logservice.common.LogNotFoundException: 'testlog1'
	at org.apache.ratis.logservice.server.MetaStateMachine.processGetLogRequest(MetaStateMachine.java:382)
	at org.apache.ratis.logservice.server.MetaStateMachine.query(MetaStateMachine.java:213)
	at org.apache.ratis.server.impl.RaftServerImpl.submitClientRequestAsync(RaftServerImpl.java:547)
	at org.apache.ratis.server.impl.RaftServerProxy.lambda$submitClientRequestAsync$7(RaftServerProxy.java:333)
	at org.apache.ratis.server.impl.RaftServerProxy.lambda$null$5(RaftServerProxy.java:328)
	at org.apache.ratis.util.JavaUtils.callAsUnchecked(JavaUtils.java:109)
	at org.apache.ratis.server.impl.RaftServerProxy.lambda$submitRequest$6(RaftServerProxy.java:328)
	at java.util.concurrent.CompletableFuture.uniComposeStage(CompletableFuture.java:981)
	at java.util.concurrent.CompletableFuture.thenCompose(CompletableFuture.java:2124)
	at org.apache.ratis.server.impl.RaftServerProxy.submitRequest(RaftServerProxy.java:327)
	at org.apache.ratis.server.impl.RaftServerProxy.submitClientRequestAsync(RaftServerProxy.java:333)
	at org.apache.ratis.grpc.client.GrpcClientProtocolService$RequestStreamObserver.processClientRequest(GrpcClientProtocolService.java:220)
	at org.apache.ratis.grpc.client.GrpcClientProtocolService$UnorderedRequestStreamObserver.processClientRequest(GrpcClientProtocolService.java:276)
	at org.apache.ratis.grpc.client.GrpcClientProtocolService$RequestStreamObserver.onNext(GrpcClientProtocolService.java:240)
	at org.apache.ratis.grpc.client.GrpcClientProtocolService$RequestStreamObserver.onNext(GrpcClientProtocolService.java:168)
	at org.apache.ratis.thirdparty.io.grpc.stub.ServerCalls$StreamingServerCallHandler$StreamingServerCallListener.onMessage(ServerCalls.java:248)
	at org.apache.ratis.thirdparty.io.grpc.internal.ServerCallImpl$ServerStreamListenerImpl.messagesAvailable(ServerCallImpl.java:263)
	at org.apache.ratis.thirdparty.io.grpc.internal.ServerImpl$JumpToApplicationThreadServerStreamListener$1MessagesAvailable.runInContext(ServerImpl.java:686)
	at org.apache.ratis.thirdparty.io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37)
	at org.apache.ratis.thirdparty.io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:123)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)
logservice> read 'testlog2'
Failed to read from log
{noformat} 
2) When one of the worker nodes in the group went down not able to detect the failure and add other workers to the group to maintain the replication. We need to handle it.
3) When a new worker added the the group the data need to automatically replicated.

> Detect node failures and add other workers to group serving the log and replicate the data of the log
> -----------------------------------------------------------------------------------------------------
>
>                 Key: RATIS-556
>                 URL: https://issues.apache.org/jira/browse/RATIS-556
>             Project: Ratis
>          Issue Type: Improvement
>            Reporter: Rajeshbabu Chintaguntla
>            Assignee: Rajeshbabu Chintaguntla
>            Priority: Major
>
> Currently there is no way to detect the node failures at master log servers and add new nodes to the group serving the log. We need to analyze how Ozone is working in this case.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)