You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@lucene.apache.org by "Grant Ingersoll (Created) (JIRA)" <ji...@apache.org> on 2011/12/06 16:27:40 UTC

[jira] [Created] (SOLR-2950) QueryElevationComponent needlessly looks up document ids

QueryElevationComponent needlessly looks up document ids
--------------------------------------------------------

                 Key: SOLR-2950
                 URL: https://issues.apache.org/jira/browse/SOLR-2950
             Project: Solr
          Issue Type: Improvement
            Reporter: Grant Ingersoll
            Priority: Minor
             Fix For: 3.6, 4.0


The QueryElevationComponent needlessly instantiates a FieldCache and does look ups in it for every document.  If we flipped things around a bit and got Lucene internal doc ids on inform() we could then simply do a much smaller and faster lookup during the sort.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

[jira] [Commented] (SOLR-2950) QueryElevationComponent needlessly looks up document ids

Posted by "Grant Ingersoll (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/SOLR-2950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13174258#comment-13174258 ] 

Grant Ingersoll commented on SOLR-2950:
---------------------------------------

+1, go ahead and commit.
                
> QueryElevationComponent needlessly looks up document ids
> --------------------------------------------------------
>
>                 Key: SOLR-2950
>                 URL: https://issues.apache.org/jira/browse/SOLR-2950
>             Project: Solr
>          Issue Type: Improvement
>            Reporter: Grant Ingersoll
>            Assignee: Grant Ingersoll
>            Priority: Minor
>             Fix For: 4.0
>
>         Attachments: SOLR-2950.patch, SOLR-2950.patch, SOLR-2950.patch, SOLR-2950.patch
>
>
> The QueryElevationComponent needlessly instantiates a FieldCache and does look ups in it for every document.  If we flipped things around a bit and got Lucene internal doc ids on inform() we could then simply do a much smaller and faster lookup during the sort.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

[jira] [Updated] (SOLR-2950) QueryElevationComponent needlessly looks up document ids

Posted by "Yonik Seeley (Updated) (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/SOLR-2950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Yonik Seeley updated SOLR-2950:
-------------------------------

    Attachment: SOLR-2950.patch

OK, just had a chance to view the comparator part of this patch.

Here's a patch that fixes
 - minor check-for-null for fields() and terms() which can return null
 - even though docsEnum returns something, it may be deleted (i.e. need to check for NO_MORE_DOCS)
 - use liveDocs when requesting the docsEnum so we won't use a deleted (overwritten) doc.

The last two issues would both cause us to miss elevated documents if they have been updated and an old deleted version still exists in the index.
                
> QueryElevationComponent needlessly looks up document ids
> --------------------------------------------------------
>
>                 Key: SOLR-2950
>                 URL: https://issues.apache.org/jira/browse/SOLR-2950
>             Project: Solr
>          Issue Type: Improvement
>            Reporter: Grant Ingersoll
>            Assignee: Grant Ingersoll
>            Priority: Minor
>             Fix For: 4.0
>
>         Attachments: SOLR-2950.patch, SOLR-2950.patch, SOLR-2950.patch, SOLR-2950.patch
>
>
> The QueryElevationComponent needlessly instantiates a FieldCache and does look ups in it for every document.  If we flipped things around a bit and got Lucene internal doc ids on inform() we could then simply do a much smaller and faster lookup during the sort.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

[jira] [Commented] (SOLR-2950) QueryElevationComponent needlessly looks up document ids

Posted by "Grant Ingersoll (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/SOLR-2950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13169377#comment-13169377 ] 

Grant Ingersoll commented on SOLR-2950:
---------------------------------------

Also, I need to double check my comparator understanding b/c perhaps the doc ids are off due to getting a top level reader.
                
> QueryElevationComponent needlessly looks up document ids
> --------------------------------------------------------
>
>                 Key: SOLR-2950
>                 URL: https://issues.apache.org/jira/browse/SOLR-2950
>             Project: Solr
>          Issue Type: Improvement
>            Reporter: Grant Ingersoll
>            Assignee: Grant Ingersoll
>            Priority: Minor
>             Fix For: 3.6, 4.0
>
>         Attachments: SOLR-2950.patch
>
>
> The QueryElevationComponent needlessly instantiates a FieldCache and does look ups in it for every document.  If we flipped things around a bit and got Lucene internal doc ids on inform() we could then simply do a much smaller and faster lookup during the sort.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

[jira] [Commented] (SOLR-2950) QueryElevationComponent needlessly looks up document ids

Posted by "Yonik Seeley (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/SOLR-2950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13168900#comment-13168900 ] 

Yonik Seeley commented on SOLR-2950:
------------------------------------

It would probably be most performant to do the lookup perSegment (i.e. in setNextReader) and remove documents as they are found (i.e. if the doc exists in segment1, don't bother looking it up in further segments).  This will also mean that we only do hash lookups in the SentinelIntSet when there actually exists a boosted doc in the segment.
                
> QueryElevationComponent needlessly looks up document ids
> --------------------------------------------------------
>
>                 Key: SOLR-2950
>                 URL: https://issues.apache.org/jira/browse/SOLR-2950
>             Project: Solr
>          Issue Type: Improvement
>            Reporter: Grant Ingersoll
>            Assignee: Grant Ingersoll
>            Priority: Minor
>             Fix For: 3.6, 4.0
>
>
> The QueryElevationComponent needlessly instantiates a FieldCache and does look ups in it for every document.  If we flipped things around a bit and got Lucene internal doc ids on inform() we could then simply do a much smaller and faster lookup during the sort.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

[jira] [Commented] (SOLR-2950) QueryElevationComponent needlessly looks up document ids

Posted by "Yonik Seeley (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/SOLR-2950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13169670#comment-13169670 ] 

Yonik Seeley commented on SOLR-2950:
------------------------------------

bq. I can't imagine it will really make much difference given the small number of items that we typically would expect to be elevated.

The fact that it will be a small number of elevated docs is entirely my point - that means that if we do it per segment, that there will normally be *no* documents elevated in a specific segment and the hash lookup can be skipped (and that would be a sizeable gain for something simple like a term query).  You're right about small sets - it doesn't matter if the set size is 1 or 10 if you do need to do the lookup.
                
> QueryElevationComponent needlessly looks up document ids
> --------------------------------------------------------
>
>                 Key: SOLR-2950
>                 URL: https://issues.apache.org/jira/browse/SOLR-2950
>             Project: Solr
>          Issue Type: Improvement
>            Reporter: Grant Ingersoll
>            Assignee: Grant Ingersoll
>            Priority: Minor
>             Fix For: 3.6, 4.0
>
>         Attachments: SOLR-2950.patch
>
>
> The QueryElevationComponent needlessly instantiates a FieldCache and does look ups in it for every document.  If we flipped things around a bit and got Lucene internal doc ids on inform() we could then simply do a much smaller and faster lookup during the sort.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

[jira] [Updated] (SOLR-2950) QueryElevationComponent needlessly looks up document ids

Posted by "Grant Ingersoll (Updated) (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/SOLR-2950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Grant Ingersoll updated SOLR-2950:
----------------------------------

    Attachment: SOLR-2950.patch

This patch moves the work to setNextReader
                
> QueryElevationComponent needlessly looks up document ids
> --------------------------------------------------------
>
>                 Key: SOLR-2950
>                 URL: https://issues.apache.org/jira/browse/SOLR-2950
>             Project: Solr
>          Issue Type: Improvement
>            Reporter: Grant Ingersoll
>            Assignee: Grant Ingersoll
>            Priority: Minor
>             Fix For: 3.6, 4.0
>
>         Attachments: SOLR-2950.patch, SOLR-2950.patch
>
>
> The QueryElevationComponent needlessly instantiates a FieldCache and does look ups in it for every document.  If we flipped things around a bit and got Lucene internal doc ids on inform() we could then simply do a much smaller and faster lookup during the sort.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

[jira] [Commented] (SOLR-2950) QueryElevationComponent needlessly looks up document ids

Posted by "Grant Ingersoll (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/SOLR-2950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13169360#comment-13169360 ] 

Grant Ingersoll commented on SOLR-2950:
---------------------------------------

Note, I'm going to try the way Yonik suggested too, but wanted to put this up as a first draft.
                
> QueryElevationComponent needlessly looks up document ids
> --------------------------------------------------------
>
>                 Key: SOLR-2950
>                 URL: https://issues.apache.org/jira/browse/SOLR-2950
>             Project: Solr
>          Issue Type: Improvement
>            Reporter: Grant Ingersoll
>            Assignee: Grant Ingersoll
>            Priority: Minor
>             Fix For: 3.6, 4.0
>
>         Attachments: SOLR-2950.patch
>
>
> The QueryElevationComponent needlessly instantiates a FieldCache and does look ups in it for every document.  If we flipped things around a bit and got Lucene internal doc ids on inform() we could then simply do a much smaller and faster lookup during the sort.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

[jira] [Resolved] (SOLR-2950) QueryElevationComponent needlessly looks up document ids

Posted by "Grant Ingersoll (Resolved) (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/SOLR-2950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Grant Ingersoll resolved SOLR-2950.
-----------------------------------

       Resolution: Fixed
    Fix Version/s:     (was: 3.6)
    
> QueryElevationComponent needlessly looks up document ids
> --------------------------------------------------------
>
>                 Key: SOLR-2950
>                 URL: https://issues.apache.org/jira/browse/SOLR-2950
>             Project: Solr
>          Issue Type: Improvement
>            Reporter: Grant Ingersoll
>            Assignee: Grant Ingersoll
>            Priority: Minor
>             Fix For: 4.0
>
>         Attachments: SOLR-2950.patch, SOLR-2950.patch, SOLR-2950.patch
>
>
> The QueryElevationComponent needlessly instantiates a FieldCache and does look ups in it for every document.  If we flipped things around a bit and got Lucene internal doc ids on inform() we could then simply do a much smaller and faster lookup during the sort.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

[jira] [Assigned] (SOLR-2950) QueryElevationComponent needlessly looks up document ids

Posted by "Grant Ingersoll (Assigned) (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/SOLR-2950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Grant Ingersoll reassigned SOLR-2950:
-------------------------------------

    Assignee: Grant Ingersoll
    
> QueryElevationComponent needlessly looks up document ids
> --------------------------------------------------------
>
>                 Key: SOLR-2950
>                 URL: https://issues.apache.org/jira/browse/SOLR-2950
>             Project: Solr
>          Issue Type: Improvement
>            Reporter: Grant Ingersoll
>            Assignee: Grant Ingersoll
>            Priority: Minor
>             Fix For: 3.6, 4.0
>
>
> The QueryElevationComponent needlessly instantiates a FieldCache and does look ups in it for every document.  If we flipped things around a bit and got Lucene internal doc ids on inform() we could then simply do a much smaller and faster lookup during the sort.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

[jira] [Updated] (SOLR-2950) QueryElevationComponent needlessly looks up document ids

Posted by "Grant Ingersoll (Updated) (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/SOLR-2950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Grant Ingersoll updated SOLR-2950:
----------------------------------

    Attachment: SOLR-2950.patch

Minor cleanup.  I think this is ready to go and will likely commit later today or tomorrow.
                
> QueryElevationComponent needlessly looks up document ids
> --------------------------------------------------------
>
>                 Key: SOLR-2950
>                 URL: https://issues.apache.org/jira/browse/SOLR-2950
>             Project: Solr
>          Issue Type: Improvement
>            Reporter: Grant Ingersoll
>            Assignee: Grant Ingersoll
>            Priority: Minor
>             Fix For: 3.6, 4.0
>
>         Attachments: SOLR-2950.patch, SOLR-2950.patch, SOLR-2950.patch
>
>
> The QueryElevationComponent needlessly instantiates a FieldCache and does look ups in it for every document.  If we flipped things around a bit and got Lucene internal doc ids on inform() we could then simply do a much smaller and faster lookup during the sort.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

[jira] [Updated] (SOLR-2950) QueryElevationComponent needlessly looks up document ids

Posted by "Grant Ingersoll (Updated) (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/SOLR-2950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Grant Ingersoll updated SOLR-2950:
----------------------------------

    Attachment: SOLR-2950.patch

First draft.  This just does the mapping in the constructor to the comparator.  We could do setNextReader, but I can't imagine it will really make much difference given the small number of items that we typically would expect to be elevated.
                
> QueryElevationComponent needlessly looks up document ids
> --------------------------------------------------------
>
>                 Key: SOLR-2950
>                 URL: https://issues.apache.org/jira/browse/SOLR-2950
>             Project: Solr
>          Issue Type: Improvement
>            Reporter: Grant Ingersoll
>            Assignee: Grant Ingersoll
>            Priority: Minor
>             Fix For: 3.6, 4.0
>
>         Attachments: SOLR-2950.patch
>
>
> The QueryElevationComponent needlessly instantiates a FieldCache and does look ups in it for every document.  If we flipped things around a bit and got Lucene internal doc ids on inform() we could then simply do a much smaller and faster lookup during the sort.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

[jira] [Commented] (SOLR-2950) QueryElevationComponent needlessly looks up document ids

Posted by "Yonik Seeley (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/SOLR-2950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13163661#comment-13163661 ] 

Yonik Seeley commented on SOLR-2950:
------------------------------------

Instead of doing everything on inform (which isn't great for NRT), we should just do it on-demand in the comparator in setNextReader() for only those uniqueKeys that were boosted)

We could cache the uniqueKey -> docid across queries, but not sure it's worth it at this point (assuming at most a handful of docs are boosted per-query).  And if we did want some sort of uniqueKey -> docid cache it would make most sense to be an internal cache in SolrIndexSearcher, not private to the QEC.
                
> QueryElevationComponent needlessly looks up document ids
> --------------------------------------------------------
>
>                 Key: SOLR-2950
>                 URL: https://issues.apache.org/jira/browse/SOLR-2950
>             Project: Solr
>          Issue Type: Improvement
>            Reporter: Grant Ingersoll
>            Priority: Minor
>             Fix For: 3.6, 4.0
>
>
> The QueryElevationComponent needlessly instantiates a FieldCache and does look ups in it for every document.  If we flipped things around a bit and got Lucene internal doc ids on inform() we could then simply do a much smaller and faster lookup during the sort.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org