You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-dev@lucene.apache.org by "Yonik Seeley (JIRA)" <ji...@apache.org> on 2009/04/13 16:19:15 UTC

[jira] Created: (SOLR-1111) fix FieldCache usage in Solr

fix FieldCache usage in Solr
----------------------------

                 Key: SOLR-1111
                 URL: https://issues.apache.org/jira/browse/SOLR-1111
             Project: Solr
          Issue Type: Bug
            Reporter: Yonik Seeley
             Fix For: 1.4


Recent changes in Lucene have altered how the FieldCache is used and as-is could lead to previously working Solr installations blowing up when they upgrade to 1.4.  We need to fix, or document the affects of these changes.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-1111) fix FieldCache usage in Solr

Posted by "Yonik Seeley (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-1111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12713285#action_12713285 ] 

Yonik Seeley commented on SOLR-1111:
------------------------------------

For FieldCache issues, I've opened LUCENE-1662

> fix FieldCache usage in Solr
> ----------------------------
>
>                 Key: SOLR-1111
>                 URL: https://issues.apache.org/jira/browse/SOLR-1111
>             Project: Solr
>          Issue Type: Bug
>            Reporter: Yonik Seeley
>             Fix For: 1.4
>
>         Attachments: SOLR-1111-distrib.patch, SOLR-1111_sort.patch, SOLR-1111_sort.patch, SOLR-1111_sort.patch, SOLR-1111_sort.patch
>
>
> Recent changes in Lucene have altered how the FieldCache is used and as-is could lead to previously working Solr installations blowing up when they upgrade to 1.4.  We need to fix, or document the affects of these changes.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (SOLR-1111) fix FieldCache usage in Solr

Posted by "Yonik Seeley (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-1111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Yonik Seeley updated SOLR-1111:
-------------------------------

    Attachment: SOLR-1111_sort.patch

Attaching updated patch.  All tests new pass.


> fix FieldCache usage in Solr
> ----------------------------
>
>                 Key: SOLR-1111
>                 URL: https://issues.apache.org/jira/browse/SOLR-1111
>             Project: Solr
>          Issue Type: Bug
>            Reporter: Yonik Seeley
>             Fix For: 1.4
>
>         Attachments: SOLR-1111-distrib.patch, SOLR-1111_sort.patch, SOLR-1111_sort.patch, SOLR-1111_sort.patch, SOLR-1111_sort.patch
>
>
> Recent changes in Lucene have altered how the FieldCache is used and as-is could lead to previously working Solr installations blowing up when they upgrade to 1.4.  We need to fix, or document the affects of these changes.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (SOLR-1111) fix FieldCache usage in Solr

Posted by "Yonik Seeley (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-1111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Yonik Seeley updated SOLR-1111:
-------------------------------

    Attachment: SOLR-1111_sort.patch

Attaching updated patch - multiple test cases still failing.
- fixed the sort-last comparator sources
- fixed SolrIndexReader.sortDocSet()

> fix FieldCache usage in Solr
> ----------------------------
>
>                 Key: SOLR-1111
>                 URL: https://issues.apache.org/jira/browse/SOLR-1111
>             Project: Solr
>          Issue Type: Bug
>            Reporter: Yonik Seeley
>             Fix For: 1.4
>
>         Attachments: SOLR-1111-distrib.patch, SOLR-1111_sort.patch, SOLR-1111_sort.patch
>
>
> Recent changes in Lucene have altered how the FieldCache is used and as-is could lead to previously working Solr installations blowing up when they upgrade to 1.4.  We need to fix, or document the affects of these changes.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (SOLR-1111) fix FieldCache usage in Solr

Posted by "Yonik Seeley (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-1111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Yonik Seeley updated SOLR-1111:
-------------------------------

    Fix Version/s:     (was: 1.4)
                   1.5

Moving the rest of this to 1.5

> fix FieldCache usage in Solr
> ----------------------------
>
>                 Key: SOLR-1111
>                 URL: https://issues.apache.org/jira/browse/SOLR-1111
>             Project: Solr
>          Issue Type: Bug
>            Reporter: Yonik Seeley
>             Fix For: 1.5
>
>         Attachments: SOLR-1111-distrib.patch, SOLR-1111_sort.patch, SOLR-1111_sort.patch, SOLR-1111_sort.patch, SOLR-1111_sort.patch
>
>
> Recent changes in Lucene have altered how the FieldCache is used and as-is could lead to previously working Solr installations blowing up when they upgrade to 1.4.  We need to fix, or document the affects of these changes.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (SOLR-1111) fix FieldCache usage in Solr

Posted by "Yonik Seeley (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-1111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Yonik Seeley updated SOLR-1111:
-------------------------------

    Attachment: SOLR-1111_sort.patch

Attaching SOLR-1111_sort.patch to use new Lucene Collector classes, including sorting collectors that will use FieldCache entries at the segment level instead of the top level reader.

Unfortunately, tests don't currently pass - NPE caused by sort=a_i asc.
Looks like we'll need to port any custom comparators over to the new FieldComparatorSource (I hadn't thought about this before, but of course it makes sense that the old custom comparators wouldn't work since there isn't a method to compare docs from different segments).

> fix FieldCache usage in Solr
> ----------------------------
>
>                 Key: SOLR-1111
>                 URL: https://issues.apache.org/jira/browse/SOLR-1111
>             Project: Solr
>          Issue Type: Bug
>            Reporter: Yonik Seeley
>             Fix For: 1.4
>
>         Attachments: SOLR-1111-distrib.patch, SOLR-1111_sort.patch
>
>
> Recent changes in Lucene have altered how the FieldCache is used and as-is could lead to previously working Solr installations blowing up when they upgrade to 1.4.  We need to fix, or document the affects of these changes.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-1111) fix FieldCache usage in Solr

Posted by "Yonik Seeley (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-1111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12713053#action_12713053 ] 

Yonik Seeley commented on SOLR-1111:
------------------------------------

TODO reminder: FieldCache.DEFAULT and ExtendedFieldCache.EXT_DEFAULT are different instances... make sure that we are using the same instance everywhere to avoid more memory being used than necessary.

> fix FieldCache usage in Solr
> ----------------------------
>
>                 Key: SOLR-1111
>                 URL: https://issues.apache.org/jira/browse/SOLR-1111
>             Project: Solr
>          Issue Type: Bug
>            Reporter: Yonik Seeley
>             Fix For: 1.4
>
>         Attachments: SOLR-1111-distrib.patch, SOLR-1111_sort.patch, SOLR-1111_sort.patch, SOLR-1111_sort.patch
>
>
> Recent changes in Lucene have altered how the FieldCache is used and as-is could lead to previously working Solr installations blowing up when they upgrade to 1.4.  We need to fix, or document the affects of these changes.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-1111) fix FieldCache usage in Solr

Posted by "Yonik Seeley (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-1111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12713271#action_12713271 ] 

Yonik Seeley commented on SOLR-1111:
------------------------------------

committed.  Leaving this issue open for now - need to look at RandomSortField, FieldCache.DEFAULT, and perhaps some tests (something to show that FieldCache entries are being shared).

> fix FieldCache usage in Solr
> ----------------------------
>
>                 Key: SOLR-1111
>                 URL: https://issues.apache.org/jira/browse/SOLR-1111
>             Project: Solr
>          Issue Type: Bug
>            Reporter: Yonik Seeley
>             Fix For: 1.4
>
>         Attachments: SOLR-1111-distrib.patch, SOLR-1111_sort.patch, SOLR-1111_sort.patch, SOLR-1111_sort.patch, SOLR-1111_sort.patch
>
>
> Recent changes in Lucene have altered how the FieldCache is used and as-is could lead to previously working Solr installations blowing up when they upgrade to 1.4.  We need to fix, or document the affects of these changes.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (SOLR-1111) fix FieldCache usage in Solr

Posted by "Yonik Seeley (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-1111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Yonik Seeley updated SOLR-1111:
-------------------------------

    Attachment: SOLR-1111-distrib.patch

Here's a patch for distributed search to retrieve sort field values from the lowest level index readers.
I plan on committing shortly.

> fix FieldCache usage in Solr
> ----------------------------
>
>                 Key: SOLR-1111
>                 URL: https://issues.apache.org/jira/browse/SOLR-1111
>             Project: Solr
>          Issue Type: Bug
>            Reporter: Yonik Seeley
>             Fix For: 1.4
>
>         Attachments: SOLR-1111-distrib.patch
>
>
> Recent changes in Lucene have altered how the FieldCache is used and as-is could lead to previously working Solr installations blowing up when they upgrade to 1.4.  We need to fix, or document the affects of these changes.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Issue Comment Edited: (SOLR-1111) fix FieldCache usage in Solr

Posted by "Yonik Seeley (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-1111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12698399#action_12698399 ] 

Yonik Seeley edited comment on SOLR-1111 at 4/13/09 2:50 PM:
-------------------------------------------------------------

The major issue is that Lucene now creates scorers per-segment, and if you use Lucene's searcher.search(...,sort) then the FieldCache populations will also be per-segment.

The biggest issue:  If FieldCache get's populated at both the top-level reader and per-segment, memory usage doubles (as does un-inversion time).
 - Faceting on single-valued fields uses the FieldCache at the top-level (and would be
   - This is non-trivial to change...  if we started counting per-segment, counts would somehow have to be merged across segments.
 - Sorting in Solr currently uses the FieldCache at the top level
   - This can't easily be changed to use Lucene's searcher.search(...,sort) since we are using a hit collector (which can be wrapped in a time limited collector).
 - Distributed search uses the top-level FieldCache to retrieve sort field values.
 - FunctionQuery now derives values at the segment level
   - This also applies to the function range query

Another issue for function query is the use of ord()... it won't be valid in multi-segment indexes if evaluated at the segment level.

      was (Author: yseeley@gmail.com):
    The major issue is that Lucene now creates scorers per-segment, and if you use Lucene's searcher.search(...,sort) then the FieldCache populations will also be per-segment.

The biggest issue:  If FieldCache get's populated at both the top-level reader and per-segment, memory usage doubles (as does un-inversion time).
 - Faceting on single-valued fields uses the FieldCache at the top-level (and would be
   - This is non-trivial to change...  if we started counting per-segment, counts would somehow have to be merged across segments.
 - Sorting in Solr currently uses the FieldCache at the top level
   - This can't easily be changed to use Lucene's searcher.search(...,sort) since we are using a hit collector (which can be wrapped in a time limited collector).
 - Distributed search uses the top-level FieldCache to retrieve sort field values.
 - FunctionQuery now derives values at the segment level
   - This also applies to the function range query
  
> fix FieldCache usage in Solr
> ----------------------------
>
>                 Key: SOLR-1111
>                 URL: https://issues.apache.org/jira/browse/SOLR-1111
>             Project: Solr
>          Issue Type: Bug
>            Reporter: Yonik Seeley
>             Fix For: 1.4
>
>
> Recent changes in Lucene have altered how the FieldCache is used and as-is could lead to previously working Solr installations blowing up when they upgrade to 1.4.  We need to fix, or document the affects of these changes.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-1111) fix FieldCache usage in Solr

Posted by "Jayson Minard (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-1111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12705414#action_12705414 ] 

Jayson Minard commented on SOLR-1111:
-------------------------------------

Is this Lucene version in the current 1.4 trunk, or is it a version not-yet integrated into Solr libs?

And also, the description makes it sound like an upgrade issue, but really any 1.4 version could blow up due to this problem.

Lastly, define "blow up"...  Uses double memory, or some other side effect?

> fix FieldCache usage in Solr
> ----------------------------
>
>                 Key: SOLR-1111
>                 URL: https://issues.apache.org/jira/browse/SOLR-1111
>             Project: Solr
>          Issue Type: Bug
>            Reporter: Yonik Seeley
>             Fix For: 1.4
>
>
> Recent changes in Lucene have altered how the FieldCache is used and as-is could lead to previously working Solr installations blowing up when they upgrade to 1.4.  We need to fix, or document the affects of these changes.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Re: [jira] Updated: (SOLR-1111) fix FieldCache usage in Solr

Posted by Mark Miller <ma...@gmail.com>.
>  I fixed it by changing "super(r)" to "super(wrap(r))
Nice. Where was that suggestion the other day.

-- 
- Mark

http://www.lucidimagination.com




[jira] Updated: (SOLR-1111) fix FieldCache usage in Solr

Posted by "Yonik Seeley (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-1111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Yonik Seeley updated SOLR-1111:
-------------------------------

    Attachment: SOLR-1111_sort.patch

Latest patch - some tests still fail.
- fixed/implemented sort-missing-last as a new FieldComparatorSource
- fixed distributed search for sorting missing last
- fixed function query when scores are NaN or -infinity... had to map to -max_val

This won't apply to trunk because it clashes with the reversion of SolrIndexSearcher to use delegation rather than inheritance.  I fixed it by changing "super(r)" to "super(wrap(r))"

> fix FieldCache usage in Solr
> ----------------------------
>
>                 Key: SOLR-1111
>                 URL: https://issues.apache.org/jira/browse/SOLR-1111
>             Project: Solr
>          Issue Type: Bug
>            Reporter: Yonik Seeley
>             Fix For: 1.4
>
>         Attachments: SOLR-1111-distrib.patch, SOLR-1111_sort.patch, SOLR-1111_sort.patch, SOLR-1111_sort.patch
>
>
> Recent changes in Lucene have altered how the FieldCache is used and as-is could lead to previously working Solr installations blowing up when they upgrade to 1.4.  We need to fix, or document the affects of these changes.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-1111) fix FieldCache usage in Solr

Posted by "Yonik Seeley (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-1111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12698399#action_12698399 ] 

Yonik Seeley commented on SOLR-1111:
------------------------------------

The major issue is that Lucene now creates scorers per-segment, and if you use Lucene's searcher.search(...,sort) then the FieldCache populations will also be per-segment.

The biggest issue:  If FieldCache get's populated at both the top-level reader and per-segment, memory usage doubles (as does un-inversion time).
 - Faceting on single-valued fields uses the FieldCache at the top-level (and would be
   - This is non-trivial to change...  if we started counting per-segment, counts would somehow have to be merged across segments.
 - Sorting in Solr currently uses the FieldCache at the top level
   - This can't easily be changed to use Lucene's searcher.search(...,sort) since we are using a hit collector (which can be wrapped in a time limited collector).
 - Distributed search uses the top-level FieldCache to retrieve sort field values.
 - FunctionQuery now derives values at the segment level
   - This also applies to the function range query

> fix FieldCache usage in Solr
> ----------------------------
>
>                 Key: SOLR-1111
>                 URL: https://issues.apache.org/jira/browse/SOLR-1111
>             Project: Solr
>          Issue Type: Bug
>            Reporter: Yonik Seeley
>             Fix For: 1.4
>
>
> Recent changes in Lucene have altered how the FieldCache is used and as-is could lead to previously working Solr installations blowing up when they upgrade to 1.4.  We need to fix, or document the affects of these changes.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-1111) fix FieldCache usage in Solr

Posted by "Yonik Seeley (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-1111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12709518#action_12709518 ] 

Yonik Seeley commented on SOLR-1111:
------------------------------------

I just committed the distributed search part of this patch.

> fix FieldCache usage in Solr
> ----------------------------
>
>                 Key: SOLR-1111
>                 URL: https://issues.apache.org/jira/browse/SOLR-1111
>             Project: Solr
>          Issue Type: Bug
>            Reporter: Yonik Seeley
>             Fix For: 1.4
>
>         Attachments: SOLR-1111-distrib.patch
>
>
> Recent changes in Lucene have altered how the FieldCache is used and as-is could lead to previously working Solr installations blowing up when they upgrade to 1.4.  We need to fix, or document the affects of these changes.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Issue Comment Edited: (SOLR-1111) fix FieldCache usage in Solr

Posted by "Yonik Seeley (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-1111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12698399#action_12698399 ] 

Yonik Seeley edited comment on SOLR-1111 at 4/16/09 7:46 AM:
-------------------------------------------------------------

The major issue is that Lucene now creates scorers per-segment, and if you use Lucene's searcher.search(...,sort) then the FieldCache populations will also be per-segment.

The biggest issue:  If FieldCache get's populated at both the top-level reader and per-segment, memory usage doubles (as does un-inversion time).
 - Faceting on single-valued fields uses the FieldCache at the top-level (and would be
   - This is non-trivial to change...  if we started counting per-segment, counts would somehow have to be merged across segments.
 - Sorting in Solr currently uses the FieldCache at the top level
   - This can't easily be changed to use Lucene's searcher.search(...,sort) since we are using a hit collector (which can be wrapped in a time limited collector).
 - Distributed search uses the top-level FieldCache to retrieve sort field values.
 - FunctionQuery now derives values at the segment level
   - This also applies to the function range query

Another issue for function query is the use of ord()... it won't be valid in multi-segment indexes if evaluated at the segment level.

Evaluate custom sorters (like query elevation, etc) to ensure that they still work at the segment level.  Solr doesn't currently do segment-level sorting like Lucene now does, but perhaps we should switch for more near-real-time support.


      was (Author: yseeley@gmail.com):
    The major issue is that Lucene now creates scorers per-segment, and if you use Lucene's searcher.search(...,sort) then the FieldCache populations will also be per-segment.

The biggest issue:  If FieldCache get's populated at both the top-level reader and per-segment, memory usage doubles (as does un-inversion time).
 - Faceting on single-valued fields uses the FieldCache at the top-level (and would be
   - This is non-trivial to change...  if we started counting per-segment, counts would somehow have to be merged across segments.
 - Sorting in Solr currently uses the FieldCache at the top level
   - This can't easily be changed to use Lucene's searcher.search(...,sort) since we are using a hit collector (which can be wrapped in a time limited collector).
 - Distributed search uses the top-level FieldCache to retrieve sort field values.
 - FunctionQuery now derives values at the segment level
   - This also applies to the function range query

Another issue for function query is the use of ord()... it won't be valid in multi-segment indexes if evaluated at the segment level.
  
> fix FieldCache usage in Solr
> ----------------------------
>
>                 Key: SOLR-1111
>                 URL: https://issues.apache.org/jira/browse/SOLR-1111
>             Project: Solr
>          Issue Type: Bug
>            Reporter: Yonik Seeley
>             Fix For: 1.4
>
>
> Recent changes in Lucene have altered how the FieldCache is used and as-is could lead to previously working Solr installations blowing up when they upgrade to 1.4.  We need to fix, or document the affects of these changes.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-1111) fix FieldCache usage in Solr

Posted by "Yonik Seeley (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-1111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12705418#action_12705418 ] 

Yonik Seeley commented on SOLR-1111:
------------------------------------

bq. Is this Lucene version in the current 1.4 trunk

Yes.

bq. define "blow up"... Uses double memory, or some other side effect?

Yep - which can cause previously working systems OOM errors.

> fix FieldCache usage in Solr
> ----------------------------
>
>                 Key: SOLR-1111
>                 URL: https://issues.apache.org/jira/browse/SOLR-1111
>             Project: Solr
>          Issue Type: Bug
>            Reporter: Yonik Seeley
>             Fix For: 1.4
>
>
> Recent changes in Lucene have altered how the FieldCache is used and as-is could lead to previously working Solr installations blowing up when they upgrade to 1.4.  We need to fix, or document the affects of these changes.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.