You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@lucene.apache.org by "Robert Muir (JIRA)" <ji...@apache.org> on 2010/11/15 16:38:13 UTC

[jira] Commented: (LUCENE-2761) specialize payload processing from of DocsAndPositionsEnum

    [ https://issues.apache.org/jira/browse/LUCENE-2761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12932079#action_12932079 ] 

Robert Muir commented on LUCENE-2761:
-------------------------------------

Here's my result from a crappy benchmark that just does SpanFirst query on a very common term:

{code}
    SpanQuery sq = new SpanFirstQuery(new SpanTermQuery(new Term("body", "the")), 1);
    System.out.println(searcher.search(sq, 10).totalHits);
    long ms = System.currentTimeMillis();
    for (int i = 0; i < 100; i++) {
      searcher.search(sq, 10);
    }
    long ms2 = System.currentTimeMillis();
    System.out.println("time = " + (ms2 - ms));
{code}

All times below in milliseconds.

||setup||run1||run2||run3||run4||run5||run6||
|TRUNK|13055|13054|13061|13068|13070|13058|
|LUCENE-2760|7987|7993|7995|7987|8012|7989|
|LUCENE-2760+LUCENE-2761|7741|7723|7701|7702|7693|7702|

I think it sucks to introduce duplication, but if we can eek out a few 
% faster phrasequeries/spanqueries for the common case, i think this is worth it for codecs


> specialize payload processing from of DocsAndPositionsEnum
> ----------------------------------------------------------
>
>                 Key: LUCENE-2761
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2761
>             Project: Lucene - Java
>          Issue Type: Improvement
>            Reporter: Robert Muir
>             Fix For: 4.0
>
>         Attachments: LUCENE-2761.patch
>
>
> In LUCENE-2760 i started working to try to improve the speed of a few spanqueries.
> In general the trick there is to avoid processing positions if you dont have to.
> But, we can improve queries that read lots of positions further by cleaning up SegmentDocsAndPositionsEnum, 
> in nextPosition() this has no less than 3 payloads-related checks.
> however, a large majority of users/fields have no payloads at all.
> I think we should specialize this case into a separate implementation and speed up the common case.
> edit: dyslexia with the jira issue number.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org