You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@nifi.apache.org by "Mark Payne (JIRA)" <ji...@apache.org> on 2018/05/10 14:15:00 UTC

[jira] [Commented] (NIFI-5177) Failed to merge Journal Files leads to LockObtainFailedException: Lock obtain timed out exception

    [ https://issues.apache.org/jira/browse/NIFI-5177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16470429#comment-16470429 ] 

Mark Payne commented on NIFI-5177:
----------------------------------

[~AmitC15] I would recommend that you update your instances to run the WriteAheadProvenanceRepository instead of the PersitentProvenanceRepository. You can do this by updating the "nifi.provenance.repository.implementation" property in conf/nifi.properties from "org.apache.nifi.provenance.PersistentProvenanceRepository" to "org.apache.nifi.provenance.WriteAheadProvenanceRepository". Also, of note, if you are running Java 8 and using the Garbage First Garbage Collector (G1GC) then you'll probably want to disable that because there are known bugs in JDK 8 that can cause segmentation faults with Memory Mapped Files. While this can occur in PersistentProvenanceRepository as well, it seems to happen more often with the WriteAheadProvenanceRepository. To check/modify this, look at conf/bootstrap.conf. If you see the line "java.arg.13=-XX:+UseG1GC" then you should comment that out.

The WriteAheadProvenanceRepository is much newer. It's known to be more stable and is far faster than the PersistentProvenanceRepository. I have actually created a Jira (NIFI-5181) to update the default to use WriteAheadProvenanceRepository. I suspect we will subsequently deprecated the PersistentProvenanceRepository and stop maintaining it.

Also of note, if you change to the WriteAheadProvenanceRepository, it will honor the provenance data that was stored in the Persistent Provenance Repository, so the migration should be painless.

> Failed to merge Journal Files leads to LockObtainFailedException: Lock obtain timed out exception
> -------------------------------------------------------------------------------------------------
>
>                 Key: NIFI-5177
>                 URL: https://issues.apache.org/jira/browse/NIFI-5177
>             Project: Apache NiFi
>          Issue Type: Bug
>          Components: Core Framework
>    Affects Versions: 1.5.0
>            Reporter: AmitC15
>            Priority: Critical
>
> NiFI version: 1.5
> Cluster setup +  external zookeeper on each one of them.
> Log: 
> [ Date ] 2018-05-08 15:53:12,193 [ Priority ] ERROR [ Text 3 ] [Provenance Repository Rollover Thread-1] o.a.n.p.PersistentProvenanceRepository org.apache.lucene.store.LockObtainFailedException: Lock obtain timed out: [NativeFSLock@/nifi/nifi-1.5.0/provenance_repository/index-1524294029000/write.lock|mailto:NativeFSLock@/nifi/nifi-1.5.0/provenance_repository/index-1524294029000/write.lock] at org.apache.lucene.store.Lock.obtain(Lock.java:89) at org.apache.lucene.index.IndexWriter.(IndexWriter.java:755) at org.apache.nifi.provenance.lucene.SimpleIndexManager.createWriter(SimpleIndexManager.java:198) at org.apache.nifi.provenance.lucene.SimpleIndexManager.borrowIndexWriter(SimpleIndexManager.java:227) at org.apache.nifi.provenance.PersistentProvenanceRepository.mergeJournals(PersistentProvenanceRepository.java:1712) at org.apache.nifi.provenance.PersistentProvenanceRepository$8.run(PersistentProvenanceRepository.java:1300) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748)
>  Happened twice this week on 2 different environments.
> After effects:   
>  * specific node disconnects from cluster (requires restart)
>  * UI not accessible from all nodes.
>  * Also led once to a different issue -  failed to connect node to cluster due to: java.lang.IllegalStateException: Signaled to end recovery, but there are more recovery files for Partition in directory
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)