You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Robert Muir (JIRA)" <ji...@apache.org> on 2010/11/15 23:52:14 UTC
[jira] Created: (LUCENE-2765) Optimize scanning in DocsEnum
Optimize scanning in DocsEnum
-----------------------------
Key: LUCENE-2765
URL: https://issues.apache.org/jira/browse/LUCENE-2765
Project: Lucene - Java
Issue Type: Improvement
Reporter: Robert Muir
Fix For: 4.0
Similar to LUCENE-2761:
when we call advance(), after skipping it scans, but this can be optimized better than calling nextDoc() like today
{noformat}
// scan for the rest:
do {
nextDoc();
} while (target > doc);
{noformat}
in particular, the freq can be "skipVinted" and the skipDocs (deletedDocs) don't need to be checked during this scanning.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org
[jira] Commented: (LUCENE-2765) Optimize scanning in DocsEnum
Posted by "Robert Muir (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/LUCENE-2765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12932920#action_12932920 ]
Robert Muir commented on LUCENE-2765:
-------------------------------------
here is Mike's results on his wikipedia index (multi-segment, 5% deletions) with the patch.
||Query||QPS base||QPS spec||Pct diff||
|"unit state"|7.94|7.84|-1.3%|
|state|36.15|35.81|-1.0%|
|spanNear([unit, state], 10, true)|4.46|4.42|-0.9%|
|spanFirst(unit, 5)|16.51|16.45|-0.4%|
|unit state|10.76|10.78|0.1%|
|unit~2.0|13.83|14.06 |1.7%|
|unit~1.0|14.36|14.69 |2.3%|
|uni*|15.57|16.02|2.9%|
|unit*|27.29|28.26|3.5%|
|+unit +state|11.73|12.31|4.9%|
|united~1.0|29.01|30.86|6.4%|
|un*d|66.52|70.99|6.7%|
|u*d|21.29|22.98|7.9%|
|united~2.0|6.48|7.07|9.1%|
|+nebraska +state|169.87|188.95|11.2%|
> Optimize scanning in DocsEnum
> -----------------------------
>
> Key: LUCENE-2765
> URL: https://issues.apache.org/jira/browse/LUCENE-2765
> Project: Lucene - Java
> Issue Type: Improvement
> Reporter: Robert Muir
> Fix For: 4.0
>
> Attachments: LUCENE-2765.patch, LUCENE-2765.patch
>
>
> Similar to LUCENE-2761:
> when we call advance(), after skipping it scans, but this can be optimized better than calling nextDoc() like today
> {noformat}
> // scan for the rest:
> do {
> nextDoc();
> } while (target > doc);
> {noformat}
> in particular, the freq can be "skipVinted" and the skipDocs (deletedDocs) don't need to be checked during this scanning.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org
[jira] Commented: (LUCENE-2765) Optimize scanning in DocsEnum
Posted by "Robert Muir (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/LUCENE-2765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12932247#action_12932247 ]
Robert Muir commented on LUCENE-2765:
-------------------------------------
Also, another idea like LUCENE-2761 is to specialize the omitTF case here...
> Optimize scanning in DocsEnum
> -----------------------------
>
> Key: LUCENE-2765
> URL: https://issues.apache.org/jira/browse/LUCENE-2765
> Project: Lucene - Java
> Issue Type: Improvement
> Reporter: Robert Muir
> Fix For: 4.0
>
>
> Similar to LUCENE-2761:
> when we call advance(), after skipping it scans, but this can be optimized better than calling nextDoc() like today
> {noformat}
> // scan for the rest:
> do {
> nextDoc();
> } while (target > doc);
> {noformat}
> in particular, the freq can be "skipVinted" and the skipDocs (deletedDocs) don't need to be checked during this scanning.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org
[jira] Assigned: (LUCENE-2765) Optimize scanning in DocsEnum
Posted by "Robert Muir (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/LUCENE-2765?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Robert Muir reassigned LUCENE-2765:
-----------------------------------
Assignee: Robert Muir
> Optimize scanning in DocsEnum
> -----------------------------
>
> Key: LUCENE-2765
> URL: https://issues.apache.org/jira/browse/LUCENE-2765
> Project: Lucene - Java
> Issue Type: Improvement
> Reporter: Robert Muir
> Assignee: Robert Muir
> Fix For: 4.0
>
> Attachments: LUCENE-2765.patch, LUCENE-2765.patch
>
>
> Similar to LUCENE-2761:
> when we call advance(), after skipping it scans, but this can be optimized better than calling nextDoc() like today
> {noformat}
> // scan for the rest:
> do {
> nextDoc();
> } while (target > doc);
> {noformat}
> in particular, the freq can be "skipVinted" and the skipDocs (deletedDocs) don't need to be checked during this scanning.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org
[jira] Commented: (LUCENE-2765) Optimize scanning in DocsEnum
Posted by "Robert Muir (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/LUCENE-2765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12932291#action_12932291 ]
Robert Muir commented on LUCENE-2765:
-------------------------------------
i ran a quick very rough check, with AND query (3149 results for this query)...
i didnt benchmark the omitTF case (but it should be better too)
all times in milliseconds
{code}
QueryParser qp = new QueryParser(Version.LUCENE_CURRENT, "body", new MockAnalyzer());
Query query = qp.parse("+the +america");
System.out.println(searcher.search(query, 10).totalHits);
long ms = System.currentTimeMillis();
for (int i = 0; i < 1000; i++) {
searcher.search(query, 10);
}
long ms2 = System.currentTimeMillis();
System.out.println("time = " + (ms2 - ms));
{code}
||setup||run1||run2||run3||run4||run5||run6||
|trunk|1707|1706|1709|1704|1704|1703|
|LUCENE-2765|1628|1623|1641|1624|1627|1628|
seems worth it to me.
> Optimize scanning in DocsEnum
> -----------------------------
>
> Key: LUCENE-2765
> URL: https://issues.apache.org/jira/browse/LUCENE-2765
> Project: Lucene - Java
> Issue Type: Improvement
> Reporter: Robert Muir
> Fix For: 4.0
>
> Attachments: LUCENE-2765.patch, LUCENE-2765.patch
>
>
> Similar to LUCENE-2761:
> when we call advance(), after skipping it scans, but this can be optimized better than calling nextDoc() like today
> {noformat}
> // scan for the rest:
> do {
> nextDoc();
> } while (target > doc);
> {noformat}
> in particular, the freq can be "skipVinted" and the skipDocs (deletedDocs) don't need to be checked during this scanning.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org
[jira] Updated: (LUCENE-2765) Optimize scanning in DocsEnum
Posted by "Robert Muir (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/LUCENE-2765?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Robert Muir updated LUCENE-2765:
--------------------------------
Attachment: LUCENE-2765.patch
here's a patch, maybe can be beautified/optimized further.
needs benchmarking.
> Optimize scanning in DocsEnum
> -----------------------------
>
> Key: LUCENE-2765
> URL: https://issues.apache.org/jira/browse/LUCENE-2765
> Project: Lucene - Java
> Issue Type: Improvement
> Reporter: Robert Muir
> Fix For: 4.0
>
> Attachments: LUCENE-2765.patch
>
>
> Similar to LUCENE-2761:
> when we call advance(), after skipping it scans, but this can be optimized better than calling nextDoc() like today
> {noformat}
> // scan for the rest:
> do {
> nextDoc();
> } while (target > doc);
> {noformat}
> in particular, the freq can be "skipVinted" and the skipDocs (deletedDocs) don't need to be checked during this scanning.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org
[jira] Updated: (LUCENE-2765) Optimize scanning in DocsEnum
Posted by "Robert Muir (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/LUCENE-2765?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Robert Muir updated LUCENE-2765:
--------------------------------
Attachment: LUCENE-2765.patch
my mistake, i left an extra check in the code... here's the updated one.
> Optimize scanning in DocsEnum
> -----------------------------
>
> Key: LUCENE-2765
> URL: https://issues.apache.org/jira/browse/LUCENE-2765
> Project: Lucene - Java
> Issue Type: Improvement
> Reporter: Robert Muir
> Fix For: 4.0
>
> Attachments: LUCENE-2765.patch, LUCENE-2765.patch
>
>
> Similar to LUCENE-2761:
> when we call advance(), after skipping it scans, but this can be optimized better than calling nextDoc() like today
> {noformat}
> // scan for the rest:
> do {
> nextDoc();
> } while (target > doc);
> {noformat}
> in particular, the freq can be "skipVinted" and the skipDocs (deletedDocs) don't need to be checked during this scanning.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org