You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Michael Busch (JIRA)" <ji...@apache.org> on 2008/05/23 19:23:55 UTC
[jira] Resolved: (LUCENE-1195) Performance improvement for
TermInfosReader
[ https://issues.apache.org/jira/browse/LUCENE-1195?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Michael Busch resolved LUCENE-1195.
-----------------------------------
Resolution: Fixed
Committed.
> Performance improvement for TermInfosReader
> -------------------------------------------
>
> Key: LUCENE-1195
> URL: https://issues.apache.org/jira/browse/LUCENE-1195
> Project: Lucene - Java
> Issue Type: Improvement
> Components: Index
> Reporter: Michael Busch
> Assignee: Michael Busch
> Priority: Minor
> Fix For: 2.4
>
> Attachments: lucene-1195.patch, lucene-1195.patch, lucene-1195.patch
>
>
> Currently we have a bottleneck for multi-term queries: the dictionary lookup is being done
> twice for each term. The first time in Similarity.idf(), where searcher.docFreq() is called.
> The second time when the posting list is opened (TermDocs or TermPositions).
> The dictionary lookup is not cheap, that's why a significant performance improvement is
> possible here if we avoid the second lookup. An easy way to do this is to add a small LRU
> cache to TermInfosReader.
> I ran some performance experiments with an LRU cache size of 20, and an mid-size index of
> 500,000 documents from wikipedia. Here are some test results:
> 50,000 AND queries with 3 terms each:
> old: 152 secs
> new (with LRU cache): 112 secs (26% faster)
> 50,000 OR queries with 3 terms each:
> old: 175 secs
> new (with LRU cache): 133 secs (24% faster)
> For bigger indexes this patch will probably have less impact, for smaller once more.
> I will attach a patch soon.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org
RE: [jira] Resolved: (LUCENE-1195) Performance improvement for TermInfosReader
Posted by bruce <be...@earthlink.net>.
hi...
is this the email list for user questions regarding lucene/nutch/hadoop??
thanks
-bruce
-----Original Message-----
From: Michael Busch (JIRA) [mailto:jira@apache.org]
Sent: Friday, May 23, 2008 10:24 AM
To: java-dev@lucene.apache.org
Subject: [jira] Resolved: (LUCENE-1195) Performance improvement for
TermInfosReader
[ https://issues.apache.org/jira/browse/LUCENE-1195?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Michael Busch resolved LUCENE-1195.
-----------------------------------
Resolution: Fixed
Committed.
> Performance improvement for TermInfosReader
> -------------------------------------------
>
> Key: LUCENE-1195
> URL: https://issues.apache.org/jira/browse/LUCENE-1195
> Project: Lucene - Java
> Issue Type: Improvement
> Components: Index
> Reporter: Michael Busch
> Assignee: Michael Busch
> Priority: Minor
> Fix For: 2.4
>
> Attachments: lucene-1195.patch, lucene-1195.patch, lucene-1195.patch
>
>
> Currently we have a bottleneck for multi-term queries: the dictionary lookup is being done
> twice for each term. The first time in Similarity.idf(), where searcher.docFreq() is called.
> The second time when the posting list is opened (TermDocs or TermPositions).
> The dictionary lookup is not cheap, that's why a significant performance improvement is
> possible here if we avoid the second lookup. An easy way to do this is to add a small LRU
> cache to TermInfosReader.
> I ran some performance experiments with an LRU cache size of 20, and an mid-size index of
> 500,000 documents from wikipedia. Here are some test results:
> 50,000 AND queries with 3 terms each:
> old: 152 secs
> new (with LRU cache): 112 secs (26% faster)
> 50,000 OR queries with 3 terms each:
> old: 175 secs
> new (with LRU cache): 133 secs (24% faster)
> For bigger indexes this patch will probably have less impact, for smaller once more.
> I will attach a patch soon.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org
RE: test msg...
Posted by bruce <be...@earthlink.net>.
thanks steve!
-----Original Message-----
From: Steven A Rowe [mailto:sarowe@syr.edu]
Sent: Friday, May 23, 2008 10:39 AM
To: java-dev@lucene.apache.org
Subject: RE: test msg...
Hi Bruce,
On 05/23/2008 at 1:34 PM, bruce wrote:
> is this the email list for user questions regarding lucene/nutch/hadoop??
No. You want the *-user mailing lists, not the *-dev ones.
More info:
Lucene Java mailing lists:
<http://lucene.apache.org/java/docs/mailinglists.html>
Nutch mailing lists:
<http://lucene.apache.org/nutch/mailing_lists.html>
Hadoop mailing lists:
<http://hadoop.apache.org/core/mailing_lists.html>
Steve
RE: test msg...
Posted by Steven A Rowe <sa...@syr.edu>.
Hi Bruce,
On 05/23/2008 at 1:34 PM, bruce wrote:
> is this the email list for user questions regarding lucene/nutch/hadoop??
No. You want the *-user mailing lists, not the *-dev ones.
More info:
Lucene Java mailing lists:
<http://lucene.apache.org/java/docs/mailinglists.html>
Nutch mailing lists:
<http://lucene.apache.org/nutch/mailing_lists.html>
Hadoop mailing lists:
<http://hadoop.apache.org/core/mailing_lists.html>
Steve
test msg...
Posted by bruce <be...@earthlink.net>.
hi...
is this the email list for user questions regarding lucene/nutch/hadoop??
thanks
-bruce
---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org