You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-commits@lucene.apache.org by Apache Wiki <wi...@apache.org> on 2008/01/13 18:24:37 UTC

[Lucene-java Wiki] Update of "PainlessIndexing" by MikeMcCandless

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Lucene-java Wiki" for change notification.

The following page has been changed by MikeMcCandless:
http://wiki.apache.org/lucene-java/PainlessIndexing

------------------------------------------------------------------------------
+ See ImproveIndexingSpeed.
- IndexWriter has a useful method called (at least temporarily) '''setMinMergeDocs'''
- that should be used in order to avoid file handles problems and reduce
- indexing time.
  
- File handles problem is often due to the fact that people use large '''mergeFactor''' 
- values in order to speed up indexation.  The maximum number of open files while merging is around mergeFactor * (5 + number of indexed fields), 
- which can be too much for the FSDirectory.
- 
- By setting a higher value to '''minMergeDocs''', you'll index and merge with a
- RAMDirectory which is internally used by the IndexWriter. When the limit set by '''minMergeDocs''' is reached (ex 1000) a segment is written in
- the FS. '''mergeFactor''' controls the number of segments to be merged, so when
- you have 10 segments on the FS (which is already 10x1000 docs), the
- IndexWriter will merge them all into a single segment. This is equivalent to
- an optimize I think. The process continues like that until it's finished.
- 
- Combining these parameters should be enough to achieve good performance.
- The good point of using '''minMergeDocs''' is that you make a heavy use of the
- RAMDirectory used by your IndexWriter (== fast) without having to be too
- careful with the RAM (which would be the case with RAMDirectory). At the
- same time keeping your mergeFactor low, limits the risk of too many file handles
- problems.
- 
- <hint given by JulienNioche>
-