You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Dino Korah <dc...@gmail.com> on 2007/03/10 12:03:29 UTC

index optimisation - disk fill-up

Hi All,

I understand lucene has a requirement of double the size of index available
free on the disk on which the index is being optimised. But if in case the
disk gets filled up during optimisation, what will happen to the index,
theoretically? Is there an effective way of avoiding this?

Many Thanks
Dino


-- 
d i n o    k o r a h
51°24'26.01"N   0°16'38.56"W

Re: index optimisation - disk fill-up

Posted by Dino Korah <dc...@gmail.com>.
Cheers Michael.


On 10/03/07, Michael McCandless <lu...@mikemccandless.com> wrote:
>
> "Dino Korah" <dc...@gmail.com> wrote:
> > I understand lucene has a requirement of double the size of index
> > available
> > free on the disk on which the index is being optimised. But if in case
> > the
> > disk gets filled up during optimisation, what will happen to the index,
> > theoretically? Is there an effective way of avoiding this?
>
> Beware, if you have readers using the index while optimize is running,
> the disk space required is 3X the index size.  Worse, if those readers
> are refreshing while optimize is running (by checking isCurrent() for
> example), it will be even more than 3X.  It's best not to refresh
> readers during an optimize.  LUCENE-710 (patch pending) aims to make
> this easier by adding a "commit only on close" mode to the IndexWriter
> such that readers don't see any changes until the writer is closed.
>
> Also note that addDocument() periodically merges segments.  Every so
> often (maxBufferedDocs * powers of mergeFactor) that merging will
> completely cascade (produce 1 segment), making it the equivalent of an
> optimize() call.  So the same disk usage caveat applies for those
> addDocument() calls.
>
> On disk full during addDocument() or during optimize(), the index will
> not become corrupt.  However: prior to Lucene 2.1, you will find extra
> unused partially written large files in your index (often they end
> with .tmp).  As of Lucene 2.1, Lucene tries to remove such files on
> hitting an exception (disk full or otherwise).
>
> Note that if you hit disk full during addIndexes(), prior to Lucene
> 2.1 this could corrupt your index.  As of Lucene 2.1, it won't.  (See
> LUCENE-702 for details.)
>
> The only way to avoid disk full is to get more disk space, I think.
> Unfortunately, simply "never calling optimize" won't do it because of
> the case above where addDocument effectively results in an optimize.
>
> Mike
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>


-- 
d i n o    k o r a h
Tel: +44 795 66 65 283
--------------------------------
51°24'26.01"N   0°16'38.56"W

Re: index optimisation - disk fill-up

Posted by Michael McCandless <lu...@mikemccandless.com>.
"Dino Korah" <dc...@gmail.com> wrote:
> I understand lucene has a requirement of double the size of index
> available
> free on the disk on which the index is being optimised. But if in case
> the
> disk gets filled up during optimisation, what will happen to the index,
> theoretically? Is there an effective way of avoiding this?

Beware, if you have readers using the index while optimize is running,
the disk space required is 3X the index size.  Worse, if those readers
are refreshing while optimize is running (by checking isCurrent() for
example), it will be even more than 3X.  It's best not to refresh
readers during an optimize.  LUCENE-710 (patch pending) aims to make
this easier by adding a "commit only on close" mode to the IndexWriter
such that readers don't see any changes until the writer is closed.

Also note that addDocument() periodically merges segments.  Every so
often (maxBufferedDocs * powers of mergeFactor) that merging will
completely cascade (produce 1 segment), making it the equivalent of an
optimize() call.  So the same disk usage caveat applies for those
addDocument() calls.

On disk full during addDocument() or during optimize(), the index will
not become corrupt.  However: prior to Lucene 2.1, you will find extra
unused partially written large files in your index (often they end
with .tmp).  As of Lucene 2.1, Lucene tries to remove such files on
hitting an exception (disk full or otherwise).

Note that if you hit disk full during addIndexes(), prior to Lucene
2.1 this could corrupt your index.  As of Lucene 2.1, it won't.  (See
LUCENE-702 for details.)

The only way to avoid disk full is to get more disk space, I think.
Unfortunately, simply "never calling optimize" won't do it because of
the case above where addDocument effectively results in an optimize.

Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org