You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ratis.apache.org by "Hanisha Koneru (Jira)" <ji...@apache.org> on 2019/11/07 04:17:00 UTC

[jira] [Commented] (RATIS-649) Add metrics related to ClientRequests

    [ https://issues.apache.org/jira/browse/RATIS-649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16968910#comment-16968910 ] 

Hanisha Koneru commented on RATIS-649:
--------------------------------------

After this patch, RaftServer restart is failing. 

In HDDS-2392,{{ RaftServer#start()}} fails with following exception:
{code:java}
java.io.IOException: java.lang.IllegalStateException: Not started
	at org.apache.ratis.util.IOUtils.asIOException(IOUtils.java:54)
	at org.apache.ratis.util.IOUtils.toIOException(IOUtils.java:61)
	at org.apache.ratis.util.IOUtils.getFromFuture(IOUtils.java:70)
	at org.apache.ratis.server.impl.RaftServerProxy.getImpls(RaftServerProxy.java:284)
	at org.apache.ratis.server.impl.RaftServerProxy.start(RaftServerProxy.java:296)
	at org.apache.hadoop.ozone.container.common.transport.server.ratis.XceiverServerRatis.start(XceiverServerRatis.java:421)
	at org.apache.hadoop.ozone.container.ozoneimpl.OzoneContainer.start(OzoneContainer.java:215)
	at org.apache.hadoop.ozone.container.common.states.endpoint.VersionEndpointTask.call(VersionEndpointTask.java:110)
	at org.apache.hadoop.ozone.container.common.states.endpoint.VersionEndpointTask.call(VersionEndpointTask.java:42)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.IllegalStateException: Not started
	at org.apache.ratis.thirdparty.com.google.common.base.Preconditions.checkState(Preconditions.java:504)
	at org.apache.ratis.thirdparty.io.grpc.internal.ServerImpl.getPort(ServerImpl.java:176)
	at org.apache.ratis.grpc.server.GrpcService.lambda$new$2(GrpcService.java:143)
	at org.apache.ratis.util.MemoizedSupplier.get(MemoizedSupplier.java:62)
	at org.apache.ratis.grpc.server.GrpcService.getInetSocketAddress(GrpcService.java:182)
	at org.apache.ratis.server.impl.RaftServerImpl.lambda$new$0(RaftServerImpl.java:84)
	at org.apache.ratis.util.MemoizedSupplier.get(MemoizedSupplier.java:62)
	at org.apache.ratis.server.impl.RaftServerImpl.getPeer(RaftServerImpl.java:136)
	at org.apache.ratis.server.impl.RaftServerMetrics.<init>(RaftServerMetrics.java:70)
	at org.apache.ratis.server.impl.RaftServerMetrics.getRaftServerMetrics(RaftServerMetrics.java:62)
	at org.apache.ratis.server.impl.RaftServerImpl.<init>(RaftServerImpl.java:119)
	at org.apache.ratis.server.impl.RaftServerProxy.lambda$newRaftServerImpl$2(RaftServerProxy.java:208)
	at java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1590){code}
 

I traced back the error and the root cause is the new {{RaftServerMetrics}} initialization in {{RaftServerImpl}} (line 119). In RaftServerMetrics initialization, we are passing {{server.getPeer()}} to {{addPeerCommitIndexGauge().}} But the server is not started yet and this causes the _IllelageStateException_ in {{GrpcService#addressSupplier.}}

Without {{addPeerCommitIndexGauge()}} call in RaftServerMetrics, {{RaftServer#start()}} succeeds.



cc. [~avijayan], [~shashikant]

 

> Add metrics related to ClientRequests 
> --------------------------------------
>
>                 Key: RATIS-649
>                 URL: https://issues.apache.org/jira/browse/RATIS-649
>             Project: Ratis
>          Issue Type: Sub-task
>          Components: server
>    Affects Versions: 0.4.0
>            Reporter: Shashikant Banerjee
>            Assignee: Aravindan Vijayan
>            Priority: Major
>             Fix For: 0.5.0
>
>         Attachments: RATIS-649-000.patch, RATIS-649-001.patch, RATIS-649-002.patch
>
>
> Following metrics would be good to have to measure the load and the processing time of client requests:
>  
> |numReadRequestCount|Number of read type requests received on the leader|
> |numWriteRequestCount|Number of write type requests received on the leader|
> |numWatchForMajorityRequestCount|Number of Watch for Majority type requests received on the leader. 
>  |
> |numWatchForAllRequestCount|Number of Watch for All type requests received on the leader.|
> |raftClientReadRequestLatency|Time required to process read type requests |
> |raftClientWriteRequestLatency|Time required to process write type requests|
> |raftClientWatchForMajority|Time required to process WatchForMajority requests|
> |raftClientWatchForAllRequests|Time required to process WatchForAll requests|
> |requestQueueLimitHitCount|Number of times the no of pending requests in the leader hit the configured limit.|
> |numRequestRetryCacheHitCount|No of of Request Retry Cache hits. This gives an idea of retries via Raft clients because of request timeouts or exceptions.|



--
This message was sent by Atlassian Jira
(v8.3.4#803005)