You are viewing a plain text version of this content. The canonical link for it is here.
Posted to oak-issues@jackrabbit.apache.org by "Chetan Mehrotra (JIRA)" <ji...@apache.org> on 2017/07/27 07:51:00 UTC
[jira] [Commented] (OAK-6500) NRTIndex leaks file handles due to unclosed IndexReader

    [ https://issues.apache.org/jira/browse/OAK-6500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16102882#comment-16102882 ] 

Chetan Mehrotra commented on OAK-6500:
--------------------------------------

The leak happens due {{IndexNode#refreshReaders}} not closing the LuceneIndexReader while switching to new IndexSearcher. Doing that would be tricky as old IndexSearcher may still be in use and closing the reader there would cause issue there. 

Possible fixes
*A- Close reader upon NRTIndex close*

Collect all open IndexReader in NRTIndex and close them upon close. [patch|^OAK-6500-v1.patch] implements such an approach. Test case pending. In normal situation were NRTIndex instance remain alive for 10-30 secs this should not cause any issue. But if some reason async indexer cycle take long time (say hour) then it can cause accumulation and would require pruning.

For pruning we can close last few reader instances. But then we cannot be sure if some query is still using an IndexSearcher associated with that reader.

*B - Implement ref counting*

Lucene IndexReader supports ref counting which can be leveraged to detect if the IndexReader is in use or not. We can use that to determine post refresh if IndexReader is in use or not. If not then it can be closed

My thought is to go for #A for now and backport that and continue work on #B



> NRTIndex leaks file handles due to unclosed IndexReader
> -------------------------------------------------------
>
>                 Key: OAK-6500
>                 URL: https://issues.apache.org/jira/browse/OAK-6500
>             Project: Jackrabbit Oak
>          Issue Type: Bug
>          Components: lucene
>    Affects Versions: 1.6.0
>            Reporter: Chetan Mehrotra
>            Assignee: Chetan Mehrotra
>            Priority: Critical
>             Fix For: 1.8, 1.7.5, 1.6.4
>
>         Attachments: OAK-6500-v1.patch
>
>
> On some setups under stress it has been seen that NRTIndex leaks file handles over time. 
> Checking with lsof indicates that more than 3 nrt folders per index are being used. However per design there can be max 3 and after system is not in use max 1 should be present.
> {noformat}
> $ lsof -p 9550 | grep '/nrt' | gawk 'match($0, /.*crx-quickstart\/repository\/index\/(.*?)\/\_.*$/, m) { print m[1]; }' | sort | uniq
> cqPageLucene-1501065263331/nrt1501065335930
> cqPageLucene-1501065263331/nrt1501065374667
> cqPageLucene-1501065263331/nrt1501065392492
> cqPageLucene-1501065263331/nrt1501065440955
> cqPageLucene-1501065263331/nrt1501065473286
> cqPageLucene-1501065263331/nrt1501065507345
> slingeventJob-1501065263330/nrt1501065356975
> slingeventJob-1501065263330/nrt1501065373229
> slingeventJob-1501065263330/nrt1501065394142
> slingeventJob-1501065263330/nrt1501065440953
> slingeventJob-1501065263330/nrt1501065473282
> slingeventJob-1501065263330/nrt1501065507342
> versionStoreIndex-1501065263332/nrt1501065335925
> versionStoreIndex-1501065263332/nrt1501065366781
> versionStoreIndex-1501065263332/nrt1501065392490
> versionStoreIndex-1501065263332/nrt1501065441232
> versionStoreIndex-1501065263332/nrt1501065473285
> {noformat}
> Further actually checking index folder indicates that those folder are actually deleted. So some where the file handle is still referring them.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)