You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Robert Muir (JIRA)" <ji...@apache.org> on 2010/11/14 22:29:14 UTC
[jira] Created: (LUCENE-2760) optimize spanfirstquery,
spanpositionrangequery
optimize spanfirstquery, spanpositionrangequery
-----------------------------------------------
Key: LUCENE-2760
URL: https://issues.apache.org/jira/browse/LUCENE-2760
Project: Lucene - Java
Issue Type: Improvement
Reporter: Robert Muir
Fix For: 3.1, 4.0
SpanFirstQuery and SpanPositionRangeQuery (SpanFirst is just a special case of this), are currently inefficient.
Take this worst case example: SpanFirstQuery("the").
Currently the code reads all the positions for the term "the".
But when enumerating spans, once we have passed the allowable range we should move on to the next document (skipTo)
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org
[jira] Commented: (LUCENE-2760) optimize spanfirstquery,
spanpositionrangequery
Posted by "Robert Muir (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/LUCENE-2760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12931896#action_12931896 ]
Robert Muir commented on LUCENE-2760:
-------------------------------------
Admittedly, I don't yet have a good benchmarking setup for these spanqueries yet.
But from doing a quick test on a 125k doc corpus, the SpanFirstQuery on a common term like "the" took
about half the time.. this is because it read/evaluated 117,556 positions instead of 1,029,622 positions.
> optimize spanfirstquery, spanpositionrangequery
> -----------------------------------------------
>
> Key: LUCENE-2760
> URL: https://issues.apache.org/jira/browse/LUCENE-2760
> Project: Lucene - Java
> Issue Type: Improvement
> Reporter: Robert Muir
> Fix For: 3.1, 4.0
>
> Attachments: LUCENE-2760.patch
>
>
> SpanFirstQuery and SpanPositionRangeQuery (SpanFirst is just a special case of this), are currently inefficient.
> Take this worst case example: SpanFirstQuery("the").
> Currently the code reads all the positions for the term "the".
> But when enumerating spans, once we have passed the allowable range we should move on to the next document (skipTo)
>
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org
[jira] Updated: (LUCENE-2760) optimize spanfirstquery,
spanpositionrangequery
Posted by "Robert Muir (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/LUCENE-2760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Robert Muir updated LUCENE-2760:
--------------------------------
Attachment: LUCENE-2760.patch
here's the patch, the SpanPositionCheckQuery now has logic similar to FilteredTermsEnum,
instead of returning a boolean true/false for whether a match is acceptable,
it can return YES, NO, NO_AND_ADVANCE
> optimize spanfirstquery, spanpositionrangequery
> -----------------------------------------------
>
> Key: LUCENE-2760
> URL: https://issues.apache.org/jira/browse/LUCENE-2760
> Project: Lucene - Java
> Issue Type: Improvement
> Reporter: Robert Muir
> Fix For: 3.1, 4.0
>
> Attachments: LUCENE-2760.patch
>
>
> SpanFirstQuery and SpanPositionRangeQuery (SpanFirst is just a special case of this), are currently inefficient.
> Take this worst case example: SpanFirstQuery("the").
> Currently the code reads all the positions for the term "the".
> But when enumerating spans, once we have passed the allowable range we should move on to the next document (skipTo)
>
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org
[jira] Updated: (LUCENE-2760) optimize spanfirstquery,
spanpositionrangequery
Posted by "Robert Muir (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/LUCENE-2760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Robert Muir updated LUCENE-2760:
--------------------------------
Attachment: LUCENE-2760.patch
here's an updated patch, with javadocs.
additionally i now check for spans.start() *>=* end() instead of spans.start() *>* end()
i believe its invalid to have a zero-length span (e.g. for a single term end = start+1)
I added an assert to check for this, and all tests still pass.
> optimize spanfirstquery, spanpositionrangequery
> -----------------------------------------------
>
> Key: LUCENE-2760
> URL: https://issues.apache.org/jira/browse/LUCENE-2760
> Project: Lucene - Java
> Issue Type: Improvement
> Reporter: Robert Muir
> Fix For: 3.1, 4.0
>
> Attachments: LUCENE-2760.patch, LUCENE-2760.patch
>
>
> SpanFirstQuery and SpanPositionRangeQuery (SpanFirst is just a special case of this), are currently inefficient.
> Take this worst case example: SpanFirstQuery("the").
> Currently the code reads all the positions for the term "the".
> But when enumerating spans, once we have passed the allowable range we should move on to the next document (skipTo)
>
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org
[jira] Commented: (LUCENE-2760) optimize spanfirstquery,
spanpositionrangequery
Posted by "Robert Muir (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/LUCENE-2760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12932107#action_12932107 ]
Robert Muir commented on LUCENE-2760:
-------------------------------------
I'd like to commit this soon if there are no objections.
these apis (SpanPositionCheckQuery) are new in 3.x/trunk so theres no backwards break
> optimize spanfirstquery, spanpositionrangequery
> -----------------------------------------------
>
> Key: LUCENE-2760
> URL: https://issues.apache.org/jira/browse/LUCENE-2760
> Project: Lucene - Java
> Issue Type: Improvement
> Reporter: Robert Muir
> Fix For: 3.1, 4.0
>
> Attachments: LUCENE-2760.patch, LUCENE-2760.patch
>
>
> SpanFirstQuery and SpanPositionRangeQuery (SpanFirst is just a special case of this), are currently inefficient.
> Take this worst case example: SpanFirstQuery("the").
> Currently the code reads all the positions for the term "the".
> But when enumerating spans, once we have passed the allowable range we should move on to the next document (skipTo)
>
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org
[jira] Assigned: (LUCENE-2760) optimize spanfirstquery,
spanpositionrangequery
Posted by "Robert Muir (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/LUCENE-2760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Robert Muir reassigned LUCENE-2760:
-----------------------------------
Assignee: Robert Muir
> optimize spanfirstquery, spanpositionrangequery
> -----------------------------------------------
>
> Key: LUCENE-2760
> URL: https://issues.apache.org/jira/browse/LUCENE-2760
> Project: Lucene - Java
> Issue Type: Improvement
> Reporter: Robert Muir
> Assignee: Robert Muir
> Fix For: 3.1, 4.0
>
> Attachments: LUCENE-2760.patch, LUCENE-2760.patch
>
>
> SpanFirstQuery and SpanPositionRangeQuery (SpanFirst is just a special case of this), are currently inefficient.
> Take this worst case example: SpanFirstQuery("the").
> Currently the code reads all the positions for the term "the".
> But when enumerating spans, once we have passed the allowable range we should move on to the next document (skipTo)
>
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org
[jira] Resolved: (LUCENE-2760) optimize spanfirstquery,
spanpositionrangequery
Posted by "Robert Muir (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/LUCENE-2760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Robert Muir resolved LUCENE-2760.
---------------------------------
Resolution: Fixed
Committed revision 1035397, 1035411 (3x)
> optimize spanfirstquery, spanpositionrangequery
> -----------------------------------------------
>
> Key: LUCENE-2760
> URL: https://issues.apache.org/jira/browse/LUCENE-2760
> Project: Lucene - Java
> Issue Type: Improvement
> Reporter: Robert Muir
> Assignee: Robert Muir
> Fix For: 3.1, 4.0
>
> Attachments: LUCENE-2760.patch, LUCENE-2760.patch
>
>
> SpanFirstQuery and SpanPositionRangeQuery (SpanFirst is just a special case of this), are currently inefficient.
> Take this worst case example: SpanFirstQuery("the").
> Currently the code reads all the positions for the term "the".
> But when enumerating spans, once we have passed the allowable range we should move on to the next document (skipTo)
>
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org