You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by dan at gmail <da...@gmail.com> on 2008/07/18 19:47:39 UTC

How did Lucene clean out the deleted documents from the disk?

Hello,

Could someone please confirm that calling indexWriter.optimize() is the only
way to clean out the deleted documents from the disk?

I understand that indexWriter.deleteDocuments() does not clean the disk
space, and I tested that calling after indexWriter.flush() and
indexWriter.close() after indexWriter.deleteDocuments() does not clean the
disk space either.  So do they only mark the documents have been deleted,
then which method really do the "delete" the documents (from the disk)?

Thanks,
Dan
-- 
View this message in context: http://www.nabble.com/How-did-Lucene-clean-out-the-deleted-documents-from-the-disk--tp18534562p18534562.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: How did Lucene clean out the deleted documents from the disk?

Posted by Michael McCandless <lu...@mikemccandless.com>.
That's right, you do not need to run optimize.

Over time the disk space will gradually be reclaimed through Lucene's  
normal merging...

Mike

dan at gmail wrote:

>
>> Also, over time, as segments that have marked deletions are merged,
>> the disk space is also reclaimed.
>
> Thanks Mike.
>
> So can I say that calling optimzie() is really optional?  Because I  
> was
> worrying that these deleted documents would never get cleaned if I  
> don't run
> optimize() and eventually it would run out of the disk space.  What  
> I am
> trying to figure out here is if optimization is mandatory.  I know  
> that
> optimization would improve the search performance, however it would  
> take a
> long time to run and the application I am writing needs to update  
> the index
> very frequently.  According to "Lucene in Action", running  
> optimization is
> not recommended in my case.  So would you please advise?
>
> Thanks,
> Dan
>
>
> Michael McCandless-2 wrote:
>>
>>
>> Either optimize() or expungeDeletes() will reclaim the disk space  
>> used
>> by deleted documents.
>>
>> Also, over time, as segments that have marked deletions are merged,
>> the disk space is also reclaimed.
>>
>> Mike
>>
>> dan at gmail wrote:
>>
>>>
>>> Hello,
>>>
>>> Could someone please confirm that calling indexWriter.optimize() is
>>> the only
>>> way to clean out the deleted documents from the disk?
>>>
>>> I understand that indexWriter.deleteDocuments() does not clean the
>>> disk
>>> space, and I tested that calling after indexWriter.flush() and
>>> indexWriter.close() after indexWriter.deleteDocuments() does not
>>> clean the
>>> disk space either.  So do they only mark the documents have been
>>> deleted,
>>> then which method really do the "delete" the documents (from the
>>> disk)?
>>>
>>> Thanks,
>>> Dan
>>> -- 
>>> View this message in context:
>>> http://www.nabble.com/How-did-Lucene-clean-out-the-deleted-documents-from-the-disk--tp18534562p18534562.html
>>> Sent from the Lucene - Java Users mailing list archive at  
>>> Nabble.com.
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>>
>
> -- 
> View this message in context: http://www.nabble.com/How-did-Lucene-clean-out-the-deleted-documents-from-the-disk--tp18534562p18537767.html
> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: How did Lucene clean out the deleted documents from the disk?

Posted by dan at gmail <da...@gmail.com>.
> Also, over time, as segments that have marked deletions are merged,  
> the disk space is also reclaimed.

Thanks Mike.

So can I say that calling optimzie() is really optional?  Because I was
worrying that these deleted documents would never get cleaned if I don't run
optimize() and eventually it would run out of the disk space.  What I am
trying to figure out here is if optimization is mandatory.  I know that
optimization would improve the search performance, however it would take a
long time to run and the application I am writing needs to update the index
very frequently.  According to "Lucene in Action", running optimization is
not recommended in my case.  So would you please advise?

Thanks,
Dan


Michael McCandless-2 wrote:
> 
> 
> Either optimize() or expungeDeletes() will reclaim the disk space used  
> by deleted documents.
> 
> Also, over time, as segments that have marked deletions are merged,  
> the disk space is also reclaimed.
> 
> Mike
> 
> dan at gmail wrote:
> 
>>
>> Hello,
>>
>> Could someone please confirm that calling indexWriter.optimize() is  
>> the only
>> way to clean out the deleted documents from the disk?
>>
>> I understand that indexWriter.deleteDocuments() does not clean the  
>> disk
>> space, and I tested that calling after indexWriter.flush() and
>> indexWriter.close() after indexWriter.deleteDocuments() does not  
>> clean the
>> disk space either.  So do they only mark the documents have been  
>> deleted,
>> then which method really do the "delete" the documents (from the  
>> disk)?
>>
>> Thanks,
>> Dan
>> -- 
>> View this message in context:
>> http://www.nabble.com/How-did-Lucene-clean-out-the-deleted-documents-from-the-disk--tp18534562p18534562.html
>> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
> 
> 
> 

-- 
View this message in context: http://www.nabble.com/How-did-Lucene-clean-out-the-deleted-documents-from-the-disk--tp18534562p18537767.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: How did Lucene clean out the deleted documents from the disk?

Posted by Michael McCandless <lu...@mikemccandless.com>.
Either optimize() or expungeDeletes() will reclaim the disk space used  
by deleted documents.

Also, over time, as segments that have marked deletions are merged,  
the disk space is also reclaimed.

Mike

dan at gmail wrote:

>
> Hello,
>
> Could someone please confirm that calling indexWriter.optimize() is  
> the only
> way to clean out the deleted documents from the disk?
>
> I understand that indexWriter.deleteDocuments() does not clean the  
> disk
> space, and I tested that calling after indexWriter.flush() and
> indexWriter.close() after indexWriter.deleteDocuments() does not  
> clean the
> disk space either.  So do they only mark the documents have been  
> deleted,
> then which method really do the "delete" the documents (from the  
> disk)?
>
> Thanks,
> Dan
> -- 
> View this message in context: http://www.nabble.com/How-did-Lucene-clean-out-the-deleted-documents-from-the-disk--tp18534562p18534562.html
> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org