You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by sol myr <so...@yahoo.com> on 2011/01/16 16:15:40 UTC

Question on writer optimize() / file merging?

Hi,

I'm trying to understand the behavior of file merging / optimization.
I see that whenever my IndexWriter calls 'commit()', it creates a new file (or fileS).
I also see these files merged when calling 'optimize()' , as much as allowed by the parameter 'NoCFSRatio' .

But I'm still trying to figure out:

1) Will my writer still perform some file merging, even if I don't explicitly call 'optimize()'?

2) Is there a way to configure the number or files, or their size?

3) I always keep an open IndexSearcher (and IndexReader). I know they should be re-opened when a change occurs, but it's not crucial to see changes immediately, so I just poll periodically, and it might be a few minutes before my reader is re-opened and allowed to see changes.
But will this approach disturb the writer's ability to optimize / merge files? If a reader is open, will it prevent file merging?

Thanks




      

Re: Question on writer optimize() / file merging?

Posted by Erick Erickson <er...@gmail.com>.
See below:

On Sun, Jan 16, 2011 at 10:15 AM, sol myr <so...@yahoo.com> wrote:

> Hi,
>
> I'm trying to understand the behavior of file merging / optimization.
> I see that whenever my IndexWriter calls 'commit()', it creates a new file
> (or fileS).
> I also see these files merged when calling 'optimize()' , as much as
> allowed by the parameter 'NoCFSRatio' .
>
> But I'm still trying to figure out:
>
> 1) Will my writer still perform some file merging, even if I don't
> explicitly call 'optimize()'?
>
>
Yes. The merge factor controls this so you don't have a huge number of
files. There are some
nifty diagrams floating around on the net, but I don't have one right at
hand...



> 2) Is there a way to configure the number or files, or their size?
>
> IndexWriter.setMergeFactor controls the number of segments. There's no way
I know
of to control by size however.

> 3) I always keep an open IndexSearcher (and IndexReader). I know they
> should be re-opened when a change occurs, but it's not crucial to see
> changes immediately, so I just poll periodically, and it might be a few
> minutes before my reader is re-opened and allowed to see changes.
> But will this approach disturb the writer's ability to optimize / merge
> files? If a reader is open, will it prevent file merging?
>
>
No, this is a fine approach. Lucene index segments are never changed. A
merge will #copy# the
segments being merged to a new segment and when you open a new reader it
will look at the new
segment while the old reader merrily looks at the old segments. This is why
the disk space may
double during a merge.

Best
Erick


> Thanks
>
>
>
>
>