You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Michael McCandless (JIRA)" <ji...@apache.org> on 2010/11/17 18:52:13 UTC

[jira] Commented: (LUCENE-2762) Don't leak deleted open file handles with pooled readers

    [ https://issues.apache.org/jira/browse/LUCENE-2762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12933051#action_12933051 ] 

Michael McCandless commented on LUCENE-2762:
--------------------------------------------


So with this patch, we now build the CFS for a merged segment before
adding that segment to the segment infos.

This is important, to prevent an NRT reader from opening the pre-CFS
version, thus tying open the files, using up extra disk space, and
leaking deleted-but-open files even once all NRT readers are closed.

But, unfortunately, this means the worst-case temporary peak free disk
space required when using CFS has gone up... this worst case is hit if
you 1) open an existing index, 2) call optimize on it, 3) the index
needs more than 1 merge to become optimized, and 4) on the final merge
of that optimize just after it's built the CFS but hasn't yet
committed it to the segment infos.  At that point you have 1X due to
starting segments (which cannot be deleted until commit), another 1X
due to the segments created by the prior merge (now being merged),
another 1X by the newly merged single segment, and a final 1X from the
final CFS.  In this worst case that means we require 3X of your index
size in temporary space.

In other cases we use less disk space (the NRT case).

And of course if CFS is off there's no change to the temp disk space.

I've noted this in the javadocs and will add to CHANGES...

But... I think we should improve our default MP.  First, maybe we
should set a maxMergeMB by default?  Because immense merges cause all
sorts of problems, and, likely are not going to impact search perf.
Second, I think if a newly merged segment will be more than X% of the
index, I think we should leave it in non-compound-file format even if
"useCompoundFile" is enabled... I think there's a separate issue open
somewhere for that 2nd one.


> Don't leak deleted open file handles with pooled readers
> --------------------------------------------------------
>
>                 Key: LUCENE-2762
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2762
>             Project: Lucene - Java
>          Issue Type: Bug
>    Affects Versions: 2.9.4, 3.0.3, 3.1, 4.0
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>         Attachments: LUCENE-2762.patch
>
>
> If you have CFS enabled today, and pooling is enabled (either directly
> or because you've pulled an NRT reader), IndexWriter will hold open
> SegmentReaders against the non-CFS format of each merged segment.
> So even if you close all NRT readers you've pulled from the writer,
> you'll still see file handles open against files that have been
> deleted.
> This count will not grow unbounded, since it's limited by the number
> of segments in the index, but it's still a serious problem since the
> app had turned off CFS in the first place presumably to avoid risk of
> too-many-open-files.  It's also bad because it ties up disk space
> since these files would otherwise be deleted.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org