You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@lucene.apache.org by "Robert Muir (JIRA)" <ji...@apache.org> on 2010/11/14 22:29:14 UTC

[jira] Created: (LUCENE-2760) optimize spanfirstquery, spanpositionrangequery

optimize spanfirstquery, spanpositionrangequery
-----------------------------------------------

                 Key: LUCENE-2760
                 URL: https://issues.apache.org/jira/browse/LUCENE-2760
             Project: Lucene - Java
          Issue Type: Improvement
            Reporter: Robert Muir
             Fix For: 3.1, 4.0


SpanFirstQuery and SpanPositionRangeQuery (SpanFirst is just a special case of this), are currently inefficient.

Take this worst case example: SpanFirstQuery("the").
Currently the code reads all the positions for the term "the".

But when enumerating spans, once we have passed the allowable range we should move on to the next document (skipTo)
 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

[jira] Commented: (LUCENE-2760) optimize spanfirstquery, spanpositionrangequery

Posted by "Robert Muir (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-2760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12931896#action_12931896 ] 

Robert Muir commented on LUCENE-2760:
-------------------------------------

Admittedly, I don't yet have a good benchmarking setup for these spanqueries yet.

But from doing a quick test on a 125k doc corpus, the SpanFirstQuery on a common term like "the" took
about half the time.. this is because it read/evaluated 117,556 positions instead of 1,029,622 positions.


> optimize spanfirstquery, spanpositionrangequery
> -----------------------------------------------
>
>                 Key: LUCENE-2760
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2760
>             Project: Lucene - Java
>          Issue Type: Improvement
>            Reporter: Robert Muir
>             Fix For: 3.1, 4.0
>
>         Attachments: LUCENE-2760.patch
>
>
> SpanFirstQuery and SpanPositionRangeQuery (SpanFirst is just a special case of this), are currently inefficient.
> Take this worst case example: SpanFirstQuery("the").
> Currently the code reads all the positions for the term "the".
> But when enumerating spans, once we have passed the allowable range we should move on to the next document (skipTo)
>  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

[jira] Updated: (LUCENE-2760) optimize spanfirstquery, spanpositionrangequery

Posted by "Robert Muir (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/LUCENE-2760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Robert Muir updated LUCENE-2760:
--------------------------------

    Attachment: LUCENE-2760.patch

here's the patch, the SpanPositionCheckQuery now has logic similar to FilteredTermsEnum,
instead of returning a boolean true/false for whether a match is acceptable,
it can return YES, NO, NO_AND_ADVANCE


> optimize spanfirstquery, spanpositionrangequery
> -----------------------------------------------
>
>                 Key: LUCENE-2760
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2760
>             Project: Lucene - Java
>          Issue Type: Improvement
>            Reporter: Robert Muir
>             Fix For: 3.1, 4.0
>
>         Attachments: LUCENE-2760.patch
>
>
> SpanFirstQuery and SpanPositionRangeQuery (SpanFirst is just a special case of this), are currently inefficient.
> Take this worst case example: SpanFirstQuery("the").
> Currently the code reads all the positions for the term "the".
> But when enumerating spans, once we have passed the allowable range we should move on to the next document (skipTo)
>  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

[jira] Updated: (LUCENE-2760) optimize spanfirstquery, spanpositionrangequery

Posted by "Robert Muir (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/LUCENE-2760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Robert Muir updated LUCENE-2760:
--------------------------------

    Attachment: LUCENE-2760.patch

here's an updated patch, with javadocs.

additionally i now check for spans.start() *>=* end() instead of spans.start() *>* end()

i believe its invalid to have a zero-length span (e.g. for a single term end = start+1)
I added an assert to check for this, and all tests still pass.


> optimize spanfirstquery, spanpositionrangequery
> -----------------------------------------------
>
>                 Key: LUCENE-2760
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2760
>             Project: Lucene - Java
>          Issue Type: Improvement
>            Reporter: Robert Muir
>             Fix For: 3.1, 4.0
>
>         Attachments: LUCENE-2760.patch, LUCENE-2760.patch
>
>
> SpanFirstQuery and SpanPositionRangeQuery (SpanFirst is just a special case of this), are currently inefficient.
> Take this worst case example: SpanFirstQuery("the").
> Currently the code reads all the positions for the term "the".
> But when enumerating spans, once we have passed the allowable range we should move on to the next document (skipTo)
>  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

[jira] Commented: (LUCENE-2760) optimize spanfirstquery, spanpositionrangequery

Posted by "Robert Muir (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-2760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12932107#action_12932107 ] 

Robert Muir commented on LUCENE-2760:
-------------------------------------

I'd like to commit this soon if there are no objections.

these apis (SpanPositionCheckQuery) are new in 3.x/trunk so theres no backwards break


> optimize spanfirstquery, spanpositionrangequery
> -----------------------------------------------
>
>                 Key: LUCENE-2760
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2760
>             Project: Lucene - Java
>          Issue Type: Improvement
>            Reporter: Robert Muir
>             Fix For: 3.1, 4.0
>
>         Attachments: LUCENE-2760.patch, LUCENE-2760.patch
>
>
> SpanFirstQuery and SpanPositionRangeQuery (SpanFirst is just a special case of this), are currently inefficient.
> Take this worst case example: SpanFirstQuery("the").
> Currently the code reads all the positions for the term "the".
> But when enumerating spans, once we have passed the allowable range we should move on to the next document (skipTo)
>  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

[jira] Assigned: (LUCENE-2760) optimize spanfirstquery, spanpositionrangequery

Posted by "Robert Muir (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/LUCENE-2760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Robert Muir reassigned LUCENE-2760:
-----------------------------------

    Assignee: Robert Muir

> optimize spanfirstquery, spanpositionrangequery
> -----------------------------------------------
>
>                 Key: LUCENE-2760
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2760
>             Project: Lucene - Java
>          Issue Type: Improvement
>            Reporter: Robert Muir
>            Assignee: Robert Muir
>             Fix For: 3.1, 4.0
>
>         Attachments: LUCENE-2760.patch, LUCENE-2760.patch
>
>
> SpanFirstQuery and SpanPositionRangeQuery (SpanFirst is just a special case of this), are currently inefficient.
> Take this worst case example: SpanFirstQuery("the").
> Currently the code reads all the positions for the term "the".
> But when enumerating spans, once we have passed the allowable range we should move on to the next document (skipTo)
>  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

[jira] Resolved: (LUCENE-2760) optimize spanfirstquery, spanpositionrangequery

Posted by "Robert Muir (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/LUCENE-2760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Robert Muir resolved LUCENE-2760.
---------------------------------

    Resolution: Fixed

Committed revision 1035397, 1035411 (3x)

> optimize spanfirstquery, spanpositionrangequery
> -----------------------------------------------
>
>                 Key: LUCENE-2760
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2760
>             Project: Lucene - Java
>          Issue Type: Improvement
>            Reporter: Robert Muir
>            Assignee: Robert Muir
>             Fix For: 3.1, 4.0
>
>         Attachments: LUCENE-2760.patch, LUCENE-2760.patch
>
>
> SpanFirstQuery and SpanPositionRangeQuery (SpanFirst is just a special case of this), are currently inefficient.
> Take this worst case example: SpanFirstQuery("the").
> Currently the code reads all the positions for the term "the".
> But when enumerating spans, once we have passed the allowable range we should move on to the next document (skipTo)
>  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org