You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Che Dong <ch...@chedong.com> on 2005/04/22 04:52:32 UTC

Increase IndexWriter.mergeFactor if you have enought memory Re: Lucene bulk indexing

Hi all:
did you tried to increase IndexWriter.mergeFactor. I tried to increase 
it to 1000 and index speed is about 10 time faster than defualt = 10 .

Regards

Che Dong
http://www.chedong.com/


Aalap Parikh 写道:
> My machine is pretty good and fairly new. The disk for
> sure is not slow and also I am not indexing large
> Documents; 27 fields with each field value being a
> string with no more than 15-20 characters long.
> 
> I tried setting the maxFieldLength value of the
> Indexwriter to a low value but that didn't help.
> 
> Also, I am not using Hibernate at all.
> 
> Thanks,
> Aalap.
> 
> --- Otis Gospodnetic <ot...@yahoo.com>
> wrote:
> 
>>That sounds way too long, unless you have veeery
>>slow disks, veeery
>>large Documents (long fields that you analyze,
>>index, and store in
>>Lucene), or some such.
>>If you have very loooong fiiiiieeeelds you could try
>>setting
>>
> 
> http://lucene.apache.org/java/docs/api/org/apache/lucene/index/IndexWriter.html#maxFieldLength
> 
>>to a very small number and see if that changes
>>performance drastically.
>> There are other IndexWriter knobs you can fiddle
>>with.
>>
>>I've seen Hibernate 2.* get sluggish once its
>>Session gets filled up
>>with a lot of objects.
>>
>>Otis
>>
>>
>>--- Aalap Parikh <al...@yahoo.com> wrote:
>>
>>>Hi,
>>>
>>>I have similar issues in indexing time.
>>>
>>>I am doing a SELECT from database and getting back
>>>10,000 rows. I then start indexing each row and
>>
>>hence
>>
>>>would have 10,000 documents in my Lucene index.
>>
>>Each
>>
>>>doc has 27 fields.
>>>
>>>I added some timing code to my indexing process.
>>
>>The
>>
>>>DB select call takes around 23 seconds and the
>>>indexing process takes 567 seconds. Also, I
>>
>>profiled
>>
>>>the app using JProfiler and found out that 90% of
>>
>>time
>>
>>>is spent in the IndexWriter.addDocument call. As
>>>expected, there were 10,000 invocation of that
>>
>>method
>>
>>>(one for each doc) and the profiler showed that
>>
>>the
>>
>>>method took 90% of the processing time.
>>>
>>>I am concerned that it is taking around 9.5
>>
>>minutes
>>
>>>for 10,000 docs and I am expecting to have around
>>>600,000 docs to index. So that would take 570
>>
>>minutes
>>
>>>(9-10 hours) to index and which is HUGE!!!
>>>
>>>My machine: Pentium 4 CPU 2.40 GHz
>>>            RAM 1 GB
>>>
>>>Any help appreciated.
>>>
>>>Thanks,
>>>Aalap.
>>>
>>>
>>>--- skoptelov@fis.ru wrote:
>>>
>>>>В сообщении от Среда 20
>>>>Апрель 2005 04:07 Mufaddal Khumri
>>>>написал(a):
>>>>
>>>>>The 20000 products I mentioned are 20000 rows.
>>
>>I
>>
>>>>get the products in
>>>>
>>>>>bulk by using a limit clause.
>>>>>
>>>>>I am using hibernate with MySQL server on a
>>>>
>>>>2.8GHz, 1.00GB Ram machine.
>>>>
>>>>Maybe your session-level cache in hibernate
>>
>>grows
>>
>>>>incredibly. Do you do 
>>>>Session.clear() sometimes while doing indexing?
>>>>Here's a link about batching 
>>>>& hibernate:
>>>>
>>>
> http://blog.hibernate.org/cgi-bin/blosxom.cgi/2004/08/
> 
>>>>
> ---------------------------------------------------------------------
> 
>>>>To unsubscribe, e-mail:
>>>>java-user-unsubscribe@lucene.apache.org
>>>>For additional commands, e-mail:
>>>>java-user-help@lucene.apache.org
>>>>
>>>>
>>>
>>>
> ---------------------------------------------------------------------
> 
>>>To unsubscribe, e-mail:
>>
>>java-user-unsubscribe@lucene.apache.org
>>
>>>For additional commands, e-mail:
>>
>>java-user-help@lucene.apache.org
>>
>>>
>>
> ---------------------------------------------------------------------
> 
>>To unsubscribe, e-mail:
>>java-user-unsubscribe@lucene.apache.org
>>For additional commands, e-mail:
>>java-user-help@lucene.apache.org
>>
>>
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
> 
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org