You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by 治江 王 <wa...@yahoo.com.cn> on 2009/02/16 06:48:33 UTC
the efficiency of creating indexes
As i know, the time effciency of creating index is non-linearity with the size of documents. For example, if the size of indexes is 1G, the time cost is 2 hours, If the size of indexes is 10G, the time cost may be 30 hours. Who can tell me what is the reason? Any tips will be appreciated.
___________________________________________________________
好玩贺卡等你发,邮箱贺卡全新上线!
http://card.mail.cn.yahoo.com/
Re: the efficiency of creating indexes
Posted by Michael McCandless <lu...@mikemccandless.com>.
If not for merging, I believe indexing is simply linear.
Merging adds only a logarithmic (in total index size) cost.
Using as large an IndexWriter RAM buffer as you can will minimize the
amount of merging. (Also increasing mergeFactor, or decreasing
maxMergeMB/Docs, but these will impact search performance as well).
Mike
emc.com wrote:
> Did you try?
> The cost of index merging grows when indexes are getting bigger.
> Try to limit the max document size in a segment by setting
> setMaxMergeDocs in IndexWriter.
>
> -----Original Message-----
> From: 治江 王 [mailto:wangzhijiang999@yahoo.com.cn]
> Sent: Monday, February 16, 2009 1:49 PM
> To: java-user@lucene.apache.org
> Subject: the efficiency of creating indexes
>
> As i know, the time effciency of creating index is non-linearity
> with the size of documents. For example, if the size of indexes is
> 1G, the time cost is 2 hours, If the size of indexes is 10G, the
> time cost may be 30 hours. Who can tell me what is the reason? Any
> tips will be appreciated.
>
>
> ___________________________________________________________
> 好玩贺卡等你发,邮箱贺卡全新上线!
> http://card.mail.cn.yahoo.com/
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
RE: the efficiency of creating indexes
Posted by Fa...@emc.com.
Did you try?
The cost of index merging grows when indexes are getting bigger.
Try to limit the max document size in a segment by setting setMaxMergeDocs in IndexWriter.
-----Original Message-----
From: 治江 王 [mailto:wangzhijiang999@yahoo.com.cn]
Sent: Monday, February 16, 2009 1:49 PM
To: java-user@lucene.apache.org
Subject: the efficiency of creating indexes
As i know, the time effciency of creating index is non-linearity with the size of documents. For example, if the size of indexes is 1G, the time cost is 2 hours, If the size of indexes is 10G, the time cost may be 30 hours. Who can tell me what is the reason? Any tips will be appreciated.
___________________________________________________________
好玩贺卡等你发,邮箱贺卡全新上线!
http://card.mail.cn.yahoo.com/
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org