You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@ratis.apache.org by Riguz Lee <dr...@riguz.com> on 2022/06/27 10:28:27 UTC

Ratis start failed due to "OverlappingFileLockException"

Hi there,

I get an error when trying to start a raft node, which is deployed inside a kubernetes cluster. Here's the error info:



Caused by: java.io.IOException: Failed to lock storage /data/ratis-data/dynamic-service-2.dynamic-service-gcek/43dea5d8-f076-11ec-8ea0-0242ac120002. The directory is already locked
&nbsp; &nbsp; at org.apache.ratis.server.storage.RaftStorageDirectoryImpl.tryLock(RaftStorageDirectoryImpl.java:236) ~[ratis-server-2.3.0.jar!/:2.3.0]
&nbsp; &nbsp; at org.apache.ratis.server.storage.RaftStorageDirectoryImpl.lambda$lock$0(RaftStorageDirectoryImpl.java:194) ~[ratis-server-2.3.0.jar!/:2.3.0]
&nbsp; &nbsp; at org.apache.ratis.util.JavaUtils.attempt(JavaUtils.java:166) ~[ratis-common-2.3.0.jar!/:2.3.0]
&nbsp; &nbsp; at org.apache.ratis.util.FileUtils.attempt(FileUtils.java:40) ~[ratis-common-2.3.0.jar!/:2.3.0]
&nbsp; &nbsp; at org.apache.ratis.server.storage.RaftStorageDirectoryImpl.lock(RaftStorageDirectoryImpl.java:194) ~[ratis-server-2.3.0.jar!/:2.3.0]
&nbsp; &nbsp; at org.apache.ratis.server.storage.RaftStorageDirectoryImpl.analyzeStorage(RaftStorageDirectoryImpl.java:153) ~[ratis-server-2.3.0.jar!/:2.3.0]
&nbsp; &nbsp; at org.apache.ratis.server.storage.RaftStorageImpl.analyzeAndRecoverStorage(RaftStorageImpl.java:97) ~[ratis-server-2.3.0.jar!/:2.3.0]
&nbsp; &nbsp; at org.apache.ratis.server.storage.RaftStorageImpl.<init&gt;(RaftStorageImpl.java:67) ~[ratis-server-2.3.0.jar!/:2.3.0]
&nbsp; &nbsp; at org.apache.ratis.server.storage.RaftStorageImpl.<init&gt;(RaftStorageImpl.java:52) ~[ratis-server-2.3.0.jar!/:2.3.0]
&nbsp; &nbsp; at org.apache.ratis.server.impl.ServerState.<init&gt;(ServerState.java:116) ~[ratis-server-2.3.0.jar!/:2.3.0]
&nbsp; &nbsp; at org.apache.ratis.server.impl.RaftServerImpl.<init&gt;(RaftServerImpl.java:201) ~[ratis-server-2.3.0.jar!/:2.3.0]
&nbsp; &nbsp; at org.apache.ratis.server.impl.RaftServerProxy.lambda$newRaftServerImpl$5(RaftServerProxy.java:274) ~[ratis-server-2.3.0.jar!/:2.3.0]
&nbsp; &nbsp; at java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1700) ~[?:?]
&nbsp; &nbsp; at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) ~[?:?]
&nbsp; &nbsp; at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) ~[?:?]
&nbsp; &nbsp; at java.lang.Thread.run(Thread.java:829) ~[?:?]
&nbsp;Caused by: java.nio.channels.OverlappingFileLockException
&nbsp; &nbsp; at org.apache.ratis.server.storage.RaftStorageDirectoryImpl.tryLock(RaftStorageDirectoryImpl.java:227) ~[ratis-server-2.3.0.jar!/:2.3.0]
&nbsp; &nbsp; at org.apache.ratis.server.storage.RaftStorageDirectoryImpl.lambda$lock$0(RaftStorageDirectoryImpl.java:194) ~[ratis-server-2.3.0.jar!/:2.3.0]
&nbsp; &nbsp; at org.apache.ratis.util.JavaUtils.attempt(JavaUtils.java:166) ~[ratis-common-2.3.0.jar!/:2.3.0]
&nbsp; &nbsp; at org.apache.ratis.util.FileUtils.attempt(FileUtils.java:40) ~[ratis-common-2.3.0.jar!/:2.3.0]
&nbsp; &nbsp; at org.apache.ratis.server.storage.RaftStorageDirectoryImpl.lock(RaftStorageDirectoryImpl.java:194) ~[ratis-server-2.3.0.jar!/:2.3.0]
&nbsp; &nbsp; at org.apache.ratis.server.storage.RaftStorageDirectoryImpl.analyzeStorage(RaftStorageDirectoryImpl.java:153) ~[ratis-server-2.3.0.jar!/:2.3.0]
&nbsp; &nbsp; at org.apache.ratis.server.storage.RaftStorageImpl.analyzeAndRecoverStorage(RaftStorageImpl.java:97) ~[ratis-server-2.3.0.jar!/:2.3.0]
&nbsp; &nbsp; at org.apache.ratis.server.storage.RaftStorageImpl.<init&gt;(RaftStorageImpl.java:67) ~[ratis-server-2.3.0.jar!/:2.3.0]
&nbsp; &nbsp; at org.apache.ratis.server.storage.RaftStorageImpl.<init&gt;(RaftStorageImpl.java:52) ~[ratis-server-2.3.0.jar!/:2.3.0]
&nbsp; &nbsp; at org.apache.ratis.server.impl.ServerState.<init&gt;(ServerState.java:116) ~[ratis-server-2.3.0.jar!/:2.3.0]
&nbsp; &nbsp; at org.apache.ratis.server.impl.RaftServerImpl.<init&gt;(RaftServerImpl.java:201) ~[ratis-server-2.3.0.jar!/:2.3.0]
&nbsp; &nbsp; at org.apache.ratis.server.impl.RaftServerProxy.lambda$newRaftServerImpl$5(RaftServerProxy.java:274) ~[ratis-server-2.3.0.jar!/:2.3.0]
&nbsp; &nbsp; at java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1700) ~[?:?]
&nbsp; &nbsp; at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) ~[?:?]
&nbsp; &nbsp; at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) ~[?:?]

&nbsp; &nbsp; at java.lang.Thread.run(Thread.java:829) ~[?:?]




And I've tried to recreate the raft directory(by recreating the pvc) and restart the pod, but still get the same issue.




Each pod has it's own data storage, there's no reason it will be locked by two ratis process.

So I guess it might be some kind of bug? I found a JIRA bug here: https://issues.apache.org/jira/browse/RATIS-538, which is 

almost the same.




Any ideas how to fix it? 




Thanks,




Riguz Lee