You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Mufaddal Khumri <MK...@allegromedical.com> on 2005/04/19 20:10:41 UTC

Lucene bulk indexing

Hi,

I am sure this question must be raised before and maybe it has been even
answered. I would be grateful, if someone could point me in the right
direction or give their thoughts on this topic.

The problem:

I have approximately over 20000 products that I need to index. At the
moment I get X number of products at a time and index them. This process
takes about 26 minutes (Am indexing the database id, product name,
product description).

I was thinking of ways to make this indexing faster. For this I was
thinking about writing a threaded module that would index X number of
products simultaneously. For instance I could spawn (Number of
products/X) number of threads and do the indexing. I am guessing this
would be faster but by what factor would this be faster? (I understand
the writes to the index are synchronized by lucene).

Is there any other approach by which I could speed up the indexing?
Thoughts? Suggestions?

Thanks,
Mufaddal.


------------------------------------------------------------------------------------------
This email and any files transmitted with it are confidential 
and intended solely for the use of the individual or entity 
to whom they are addressed. If you have received this 
email in error please notify the system manager. Please
note that any views or opinions presented in this email 
are solely those of the author and do not necessarily
represent those of the company. Finally, the recipient
should check this email and any attachments for the 
presence of viruses. The company accepts no liability for
any damage caused by any virus transmitted by this email.
Consult your physician prior to the use of any medical
supplies or product.
------------------------------------------------------------------------------------------


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Lucene bulk indexing

Posted by Volodymyr Bychkoviak <vb...@i-hypergrid.com>.
Hi,

The best way to determine bottlenecks is profiling. (JProfiler is very 
good tool for that. It's commercial product with free evaluation)

I was indexing 1.5 million documents in 45 minutes.
before optimizing it took much more time to index. optimization was done 
through 'select' query changing.

Mufaddal Khumri wrote:

>Hi,
>
>I am sure this question must be raised before and maybe it has been even
>answered. I would be grateful, if someone could point me in the right
>direction or give their thoughts on this topic.
>
>The problem:
>
>I have approximately over 20000 products that I need to index. At the
>moment I get X number of products at a time and index them. This process
>takes about 26 minutes (Am indexing the database id, product name,
>product description).
>
>I was thinking of ways to make this indexing faster. For this I was
>thinking about writing a threaded module that would index X number of
>products simultaneously. For instance I could spawn (Number of
>products/X) number of threads and do the indexing. I am guessing this
>would be faster but by what factor would this be faster? (I understand
>the writes to the index are synchronized by lucene).
>
>Is there any other approach by which I could speed up the indexing?
>Thoughts? Suggestions?
>
>Thanks,
>Mufaddal.
>
>
>------------------------------------------------------------------------------------------
>This email and any files transmitted with it are confidential 
>and intended solely for the use of the individual or entity 
>to whom they are addressed. If you have received this 
>email in error please notify the system manager. Please
>note that any views or opinions presented in this email 
>are solely those of the author and do not necessarily
>represent those of the company. Finally, the recipient
>should check this email and any attachments for the 
>presence of viruses. The company accepts no liability for
>any damage caused by any virus transmitted by this email.
>Consult your physician prior to the use of any medical
>supplies or product.
>------------------------------------------------------------------------------------------
>
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
>  
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org