You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@lucene.apache.org by "Robert Muir (JIRA)" <ji...@apache.org> on 2010/11/15 23:52:14 UTC

[jira] Created: (LUCENE-2765) Optimize scanning in DocsEnum

Optimize scanning in DocsEnum
-----------------------------

                 Key: LUCENE-2765
                 URL: https://issues.apache.org/jira/browse/LUCENE-2765
             Project: Lucene - Java
          Issue Type: Improvement
            Reporter: Robert Muir
             Fix For: 4.0


Similar to LUCENE-2761:

when we call advance(), after skipping it scans, but this can be optimized better than calling nextDoc() like today

{noformat}
      // scan for the rest:
      do {
        nextDoc();
      } while (target > doc);
{noformat}

in particular, the freq can be "skipVinted" and the skipDocs (deletedDocs) don't need to be checked during this scanning.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

[jira] Commented: (LUCENE-2765) Optimize scanning in DocsEnum

Posted by "Robert Muir (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-2765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12932920#action_12932920 ] 

Robert Muir commented on LUCENE-2765:
-------------------------------------

here is Mike's results on his wikipedia index (multi-segment, 5% deletions) with the patch.

||Query||QPS base||QPS spec||Pct diff||
|"unit state"|7.94|7.84|-1.3%|
|state|36.15|35.81|-1.0%|
|spanNear([unit, state], 10, true)|4.46|4.42|-0.9%|
|spanFirst(unit, 5)|16.51|16.45|-0.4%|
|unit state|10.76|10.78|0.1%|
|unit~2.0|13.83|14.06 |1.7%|
|unit~1.0|14.36|14.69 |2.3%|
|uni*|15.57|16.02|2.9%|
|unit*|27.29|28.26|3.5%|
|+unit +state|11.73|12.31|4.9%|
|united~1.0|29.01|30.86|6.4%|
|un*d|66.52|70.99|6.7%|
|u*d|21.29|22.98|7.9%|
|united~2.0|6.48|7.07|9.1%|
|+nebraska +state|169.87|188.95|11.2%|

> Optimize scanning in DocsEnum
> -----------------------------
>
>                 Key: LUCENE-2765
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2765
>             Project: Lucene - Java
>          Issue Type: Improvement
>            Reporter: Robert Muir
>             Fix For: 4.0
>
>         Attachments: LUCENE-2765.patch, LUCENE-2765.patch
>
>
> Similar to LUCENE-2761:
> when we call advance(), after skipping it scans, but this can be optimized better than calling nextDoc() like today
> {noformat}
>       // scan for the rest:
>       do {
>         nextDoc();
>       } while (target > doc);
> {noformat}
> in particular, the freq can be "skipVinted" and the skipDocs (deletedDocs) don't need to be checked during this scanning.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

[jira] Commented: (LUCENE-2765) Optimize scanning in DocsEnum

Posted by "Robert Muir (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-2765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12932247#action_12932247 ] 

Robert Muir commented on LUCENE-2765:
-------------------------------------

Also, another idea like LUCENE-2761 is to specialize the omitTF case here...



> Optimize scanning in DocsEnum
> -----------------------------
>
>                 Key: LUCENE-2765
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2765
>             Project: Lucene - Java
>          Issue Type: Improvement
>            Reporter: Robert Muir
>             Fix For: 4.0
>
>
> Similar to LUCENE-2761:
> when we call advance(), after skipping it scans, but this can be optimized better than calling nextDoc() like today
> {noformat}
>       // scan for the rest:
>       do {
>         nextDoc();
>       } while (target > doc);
> {noformat}
> in particular, the freq can be "skipVinted" and the skipDocs (deletedDocs) don't need to be checked during this scanning.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

[jira] Assigned: (LUCENE-2765) Optimize scanning in DocsEnum

Posted by "Robert Muir (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/LUCENE-2765?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Robert Muir reassigned LUCENE-2765:
-----------------------------------

    Assignee: Robert Muir

> Optimize scanning in DocsEnum
> -----------------------------
>
>                 Key: LUCENE-2765
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2765
>             Project: Lucene - Java
>          Issue Type: Improvement
>            Reporter: Robert Muir
>            Assignee: Robert Muir
>             Fix For: 4.0
>
>         Attachments: LUCENE-2765.patch, LUCENE-2765.patch
>
>
> Similar to LUCENE-2761:
> when we call advance(), after skipping it scans, but this can be optimized better than calling nextDoc() like today
> {noformat}
>       // scan for the rest:
>       do {
>         nextDoc();
>       } while (target > doc);
> {noformat}
> in particular, the freq can be "skipVinted" and the skipDocs (deletedDocs) don't need to be checked during this scanning.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

[jira] Commented: (LUCENE-2765) Optimize scanning in DocsEnum

Posted by "Robert Muir (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-2765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12932291#action_12932291 ] 

Robert Muir commented on LUCENE-2765:
-------------------------------------

i ran a quick very rough check, with AND query (3149 results for this query)... 
i didnt benchmark the omitTF case (but it should be better too)

all times in milliseconds

{code}
    QueryParser qp = new QueryParser(Version.LUCENE_CURRENT, "body", new MockAnalyzer());
    Query query = qp.parse("+the +america");
    System.out.println(searcher.search(query, 10).totalHits);
    long ms = System.currentTimeMillis();
    for (int i = 0; i < 1000; i++) {
      searcher.search(query, 10);
    }
    long ms2 = System.currentTimeMillis();
    System.out.println("time = " + (ms2 - ms));
{code}

||setup||run1||run2||run3||run4||run5||run6||
|trunk|1707|1706|1709|1704|1704|1703|
|LUCENE-2765|1628|1623|1641|1624|1627|1628|

seems worth it to me.


> Optimize scanning in DocsEnum
> -----------------------------
>
>                 Key: LUCENE-2765
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2765
>             Project: Lucene - Java
>          Issue Type: Improvement
>            Reporter: Robert Muir
>             Fix For: 4.0
>
>         Attachments: LUCENE-2765.patch, LUCENE-2765.patch
>
>
> Similar to LUCENE-2761:
> when we call advance(), after skipping it scans, but this can be optimized better than calling nextDoc() like today
> {noformat}
>       // scan for the rest:
>       do {
>         nextDoc();
>       } while (target > doc);
> {noformat}
> in particular, the freq can be "skipVinted" and the skipDocs (deletedDocs) don't need to be checked during this scanning.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

[jira] Updated: (LUCENE-2765) Optimize scanning in DocsEnum

Posted by "Robert Muir (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/LUCENE-2765?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Robert Muir updated LUCENE-2765:
--------------------------------

    Attachment: LUCENE-2765.patch

here's a patch, maybe can be beautified/optimized further.

needs benchmarking.

> Optimize scanning in DocsEnum
> -----------------------------
>
>                 Key: LUCENE-2765
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2765
>             Project: Lucene - Java
>          Issue Type: Improvement
>            Reporter: Robert Muir
>             Fix For: 4.0
>
>         Attachments: LUCENE-2765.patch
>
>
> Similar to LUCENE-2761:
> when we call advance(), after skipping it scans, but this can be optimized better than calling nextDoc() like today
> {noformat}
>       // scan for the rest:
>       do {
>         nextDoc();
>       } while (target > doc);
> {noformat}
> in particular, the freq can be "skipVinted" and the skipDocs (deletedDocs) don't need to be checked during this scanning.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

[jira] Updated: (LUCENE-2765) Optimize scanning in DocsEnum

Posted by "Robert Muir (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/LUCENE-2765?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Robert Muir updated LUCENE-2765:
--------------------------------

    Attachment: LUCENE-2765.patch

my mistake, i left an extra check in the code... here's the updated one.

> Optimize scanning in DocsEnum
> -----------------------------
>
>                 Key: LUCENE-2765
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2765
>             Project: Lucene - Java
>          Issue Type: Improvement
>            Reporter: Robert Muir
>             Fix For: 4.0
>
>         Attachments: LUCENE-2765.patch, LUCENE-2765.patch
>
>
> Similar to LUCENE-2761:
> when we call advance(), after skipping it scans, but this can be optimized better than calling nextDoc() like today
> {noformat}
>       // scan for the rest:
>       do {
>         nextDoc();
>       } while (target > doc);
> {noformat}
> in particular, the freq can be "skipVinted" and the skipDocs (deletedDocs) don't need to be checked during this scanning.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org