You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Stadler Hans-Christian <ha...@psi.ch> on 2007/05/08 11:30:00 UTC

Is it necessary to optimize?

If mergeFactor is set to 2 and no optimize() is ever done on the index,
what is the impact on

1) the number opened files during indexing
2) the number of opened files during searching
2) the search speed
3) the indexing speed

??

HC

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Is it necessary to optimize?

Posted by Grant Ingersoll <gs...@apache.org>.
The contrib/benchmark addition can help you characterize many of  
these scenarios, especially if you write a DocMaker and QueryMaker  
for your collection.

On May 8, 2007, at 5:30 AM, Stadler Hans-Christian wrote:

> If mergeFactor is set to 2 and no optimize() is ever done on the  
> index,
> what is the impact on
>
> 1) the number opened files during indexing
> 2) the number of opened files during searching
> 2) the search speed
> 3) the indexing speed
>
> ??
>
> HC
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>

--------------------------
Grant Ingersoll
Center for Natural Language Processing
http://www.cnlp.org/tech/lucene.asp

Read the Lucene Java FAQ at http://wiki.apache.org/jakarta-lucene/ 
LuceneFAQ



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Is it necessary to optimize?

Posted by "Aleksander M. Stensby" <al...@integrasco.no>.
I would say, that over time, the number of files will grow. and continue  
growing if you never perform
an optimize(). After some very adviceful mails from Erick i settled on a  
mergeFactor of 30, and since I do the indexing in large batches, I perform  
an optimize() only in the end of the indexing-run. This works really great  
and reduced the indexing time greatly!

as to your points, the number of opened files is closely related to the  
merge factor, since;
"With the default value of 10, Lucene will store 10 documents in memory  
before writing them to a single segment on the disk. The mergeFactor value  
of 10 also means that once the number of segments on the disk has reached  
the power of 10, Lucene will merge these segments into a single segment."

It is also a fact that higher mergeFactor allows less file-io, hence  
faster indexing.
"MergeFactor - Determines the minimal number of documents required before  
the buffered in-memory documents are merged and a new Segment is created.  
Since Documents are merged in a RAMDirectory, large value gives faster  
indexing. "

Backside is when using a too large mergeFactor you may experience the "Too  
many open files" exception.
"For instance, with a default mergeFactor of 10 and an index of 1 million  
documents, Lucene will require 110 open files on an unoptimized index.  
When IndexWrite's optimize() method is called, all segments are merged  
into a single segment, which minimizes the number of open files that  
Lucene needs."

"using a higher value for mergeFactor will cause Lucene to use more RAM,  
but will let Lucene write data to disk less frequently, which will speed  
up the indexing process. A smaller mergeFactor will use less memory and  
will cause the index to be updated more frequently, which will make it  
more up-to-date, but will also slow down the indexing process. Similarly,  
a larger maxMergeDocs is better suited for batch indexing, and a smaller  
maxMergeDocs is better for more interactive indexing."

- Aleksander

On Tue, 08 May 2007 11:30:00 +0200, Stadler Hans-Christian  
<ha...@psi.ch> wrote:

> If mergeFactor is set to 2 and no optimize() is ever done on the index,
> what is the impact on
>
> 1) the number opened files during indexing
> 2) the number of opened files during searching
> 2) the search speed
> 3) the indexing speed
>
> ??
>
> HC
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>



-- 
Aleksander M. Stensby
Software Developer
Integrasco A/S
aleksander.stensby@integrasco.no

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org