You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-commits@lucene.apache.org by Apache Wiki <wi...@apache.org> on 2008/01/13 18:24:37 UTC
[Lucene-java Wiki] Update of "PainlessIndexing" by MikeMcCandless
Dear Wiki user,
You have subscribed to a wiki page or wiki category on "Lucene-java Wiki" for change notification.
The following page has been changed by MikeMcCandless:
http://wiki.apache.org/lucene-java/PainlessIndexing
------------------------------------------------------------------------------
+ See ImproveIndexingSpeed.
- IndexWriter has a useful method called (at least temporarily) '''setMinMergeDocs'''
- that should be used in order to avoid file handles problems and reduce
- indexing time.
- File handles problem is often due to the fact that people use large '''mergeFactor'''
- values in order to speed up indexation. The maximum number of open files while merging is around mergeFactor * (5 + number of indexed fields),
- which can be too much for the FSDirectory.
-
- By setting a higher value to '''minMergeDocs''', you'll index and merge with a
- RAMDirectory which is internally used by the IndexWriter. When the limit set by '''minMergeDocs''' is reached (ex 1000) a segment is written in
- the FS. '''mergeFactor''' controls the number of segments to be merged, so when
- you have 10 segments on the FS (which is already 10x1000 docs), the
- IndexWriter will merge them all into a single segment. This is equivalent to
- an optimize I think. The process continues like that until it's finished.
-
- Combining these parameters should be enough to achieve good performance.
- The good point of using '''minMergeDocs''' is that you make a heavy use of the
- RAMDirectory used by your IndexWriter (== fast) without having to be too
- careful with the RAM (which would be the case with RAMDirectory). At the
- same time keeping your mergeFactor low, limits the risk of too many file handles
- problems.
-
- <hint given by JulienNioche>
-