You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Uwe Schindler (JIRA)" <ji...@apache.org> on 2010/11/19 19:44:13 UTC

[jira] Created: (LUCENE-2771) Remove norms() support from non-atomic IndexReaders

Remove norms() support from non-atomic IndexReaders
---------------------------------------------------

                 Key: LUCENE-2771
                 URL: https://issues.apache.org/jira/browse/LUCENE-2771
             Project: Lucene - Java
          Issue Type: Improvement
            Reporter: Uwe Schindler
             Fix For: 4.0


Spin-off from LUCENE-2769:
Currently all IndexReaders support norms(), but the core of Lucene never uses it and its even dangerous because of memory usage. We should do the same like with MultiFields and factor it out and throw UOE on non-atomic readers.

The SlowMultiReaderWrapper can then manage the norms. Also ParallelReader needs to be fixed.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Commented: (LUCENE-2771) Remove norms() support from non-atomic IndexReaders

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-2771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12933925#action_12933925 ] 

Michael McCandless commented on LUCENE-2771:
--------------------------------------------

I think there may be hope for ParallelReader.

Ie, if the readers added to it a "congruent" (consist same-sized
sub-readers), which I think (?) is common, then we can implement
getSequentialSubReaders as simply returning an array of
ParallelReaders of the sub-readers.

If they are not congruent then it's effectively a
SlowMultiReaderWrapper.


> Remove norms() support from non-atomic IndexReaders
> ---------------------------------------------------
>
>                 Key: LUCENE-2771
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2771
>             Project: Lucene - Java
>          Issue Type: Improvement
>            Reporter: Uwe Schindler
>             Fix For: 4.0
>
>
> Spin-off from LUCENE-2769:
> Currently all IndexReaders support norms(), but the core of Lucene never uses it and its even dangerous because of memory usage. We should do the same like with MultiFields and factor it out and throw UOE on non-atomic readers.
> The SlowMultiReaderWrapper can then manage the norms. Also ParallelReader needs to be fixed.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Commented: (LUCENE-2771) Remove norms() support from non-atomic IndexReaders

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-2771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12934188#action_12934188 ] 

Uwe Schindler commented on LUCENE-2771:
---------------------------------------

I am just collecting ideas!

> Remove norms() support from non-atomic IndexReaders
> ---------------------------------------------------
>
>                 Key: LUCENE-2771
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2771
>             Project: Lucene - Java
>          Issue Type: Improvement
>            Reporter: Uwe Schindler
>             Fix For: 4.0
>
>         Attachments: LUCENE-2771.patch, LUCENE-2771.patch, LUCENE-2771.patch, LUCENE-2771_needsCache.patch
>
>
> Spin-off from LUCENE-2769:
> Currently all IndexReaders support norms(), but the core of Lucene never uses it and its even dangerous because of memory usage. We should do the same like with MultiFields and factor it out and throw UOE on non-atomic readers.
> The SlowMultiReaderWrapper can then manage the norms. Also ParallelReader needs to be fixed.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Updated: (LUCENE-2771) Remove norms() support from non-atomic IndexReaders

Posted by "Robert Muir (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-2771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Robert Muir updated LUCENE-2771:
--------------------------------

    Attachment: LUCENE-2771.patch

here is a patch with a cache on ParallelReader.
i also fixed some bugs in my previous patch :)

in general the patch should be reviewed, as things are mostly unoptimized.
e.g. the ParallelReader norms methods might not need to be entirely synchronized
and maybe it can have a ctor that re-uses parts of the norms cache like DirectoryReader had before (though that was complicated)



> Remove norms() support from non-atomic IndexReaders
> ---------------------------------------------------
>
>                 Key: LUCENE-2771
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2771
>             Project: Lucene - Java
>          Issue Type: Improvement
>            Reporter: Uwe Schindler
>             Fix For: 4.0
>
>         Attachments: LUCENE-2771.patch, LUCENE-2771_needsCache.patch
>
>
> Spin-off from LUCENE-2769:
> Currently all IndexReaders support norms(), but the core of Lucene never uses it and its even dangerous because of memory usage. We should do the same like with MultiFields and factor it out and throw UOE on non-atomic readers.
> The SlowMultiReaderWrapper can then manage the norms. Also ParallelReader needs to be fixed.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Issue Comment Edited: (LUCENE-2771) Remove norms() support from non-atomic IndexReaders

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-2771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12934167#action_12934167 ] 

Uwe Schindler edited comment on LUCENE-2771 at 11/20/10 2:03 PM:
-----------------------------------------------------------------

Updated patch after commit of subissue LUCENE-2772.

Some throughts: The cache currently dont support reopening readers, as FilterIndexReader throws UOE on reopen (which is fine for most cases). But for this reader we should support reopen and implement it in the FilterIndexReader with optimized norms recreation (copy over in the map only reopened segments?).

Another thing about MultiNorms: We are inconsistent now: We are using MultiFields everywhere in core queries but not MultiNorms. E.g. for a TermQuery you can currently get a Scorer, but as soon as this scorer requests norms, it will throw UOE. We should be consistent. As we have now the SlowMultiReaderWrapper, we should remove MultiFields support from everywhere else in core (Filters and Queries, but also FieldCache?). *+1 for that from my side!*

*EDIT* We use MultiFields nor everywhere anymore, eg no longer in TermQuery/Scorer. But still in filters and other parts. So we should open issue to remove all usage!

      was (Author: thetaphi):
    Updated patch after commit of subissue LUCENE-2772.

Some throughts: The cache currently dont support reopening readers, as FilterIndexReader throws UOE on reopen (which is fine for most cases). But for this reader we should support reopen and implement it in the FilterIndexReader with optimized norms recreation (copy over in the map only reopened segments?).

Another thing about MultiNorms: We are inconsistent now: We are using MultiFields everywhere in core queries but not MultiNorms. E.g. for a TermQuery you can currently get a Scorer, but as soon as this scorer requests norms, it will throw UOE. We should be consistent. As we have now the SlowMultiReaderWrapper, we should remove MultiFields support from everywhere else in core (Filters and Queries, but also FieldCache?). *+1 for that from my side!*
  
> Remove norms() support from non-atomic IndexReaders
> ---------------------------------------------------
>
>                 Key: LUCENE-2771
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2771
>             Project: Lucene - Java
>          Issue Type: Improvement
>            Reporter: Uwe Schindler
>             Fix For: 4.0
>
>         Attachments: LUCENE-2771.patch, LUCENE-2771.patch, LUCENE-2771.patch, LUCENE-2771_needsCache.patch
>
>
> Spin-off from LUCENE-2769:
> Currently all IndexReaders support norms(), but the core of Lucene never uses it and its even dangerous because of memory usage. We should do the same like with MultiFields and factor it out and throw UOE on non-atomic readers.
> The SlowMultiReaderWrapper can then manage the norms. Also ParallelReader needs to be fixed.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Updated: (LUCENE-2771) Remove norms() support from non-atomic IndexReaders

Posted by "Robert Muir (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-2771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Robert Muir updated LUCENE-2771:
--------------------------------

    Attachment: LUCENE-2771_needsCache.patch

here is a start to the new approach i described.

it needs some cleanup and optimization, but mostly just to add a cache to ParallelReader
like i did to SlowMultiReaderWrapper... 

all tests pass, this is much simpler.

> Remove norms() support from non-atomic IndexReaders
> ---------------------------------------------------
>
>                 Key: LUCENE-2771
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2771
>             Project: Lucene - Java
>          Issue Type: Improvement
>            Reporter: Uwe Schindler
>             Fix For: 4.0
>
>         Attachments: LUCENE-2771_needsCache.patch
>
>
> Spin-off from LUCENE-2769:
> Currently all IndexReaders support norms(), but the core of Lucene never uses it and its even dangerous because of memory usage. We should do the same like with MultiFields and factor it out and throw UOE on non-atomic readers.
> The SlowMultiReaderWrapper can then manage the norms. Also ParallelReader needs to be fixed.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Commented: (LUCENE-2771) Remove norms() support from non-atomic IndexReaders

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-2771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12933911#action_12933911 ] 

Uwe Schindler commented on LUCENE-2771:
---------------------------------------

Here the relevant comments:

{quote}
*Robert Muir added a comment - 19/Nov/10 04:28 AM*

here is a hack patch for Uwe's idea about the norms.
we need to change SegmentMerger to not call norms on the top-level IR but populate its normBuffer from the subs. 
in my opinion it seems crazy we are currently creating these big arrays this way (yeah there is the hairy code for re-open that re-uses the big merged cache for the NRT case, but still).

Maybe i am missing something.

*Robert Muir added a comment - 19/Nov/10 04:28 AM*

here is a hack patch for Uwe's idea about the norms. we need to change SegmentMerger to not call norms on the top-level IR but populate its normBuffer from the subs. in my opinion it seems crazy we are currently creating these big arrays this way (yeah there is the hairy code for re-open that re-uses the big merged cache for the NRT case, but still). Maybe i am missing something. 

*Robert Muir added a comment - 19/Nov/10 04:58 AM*

here's another hacky update: but still a few tests explicitly check these norms and need to be fixed. 
maybe we could add an uncached "MultiNorms" or something at least in src/test for convenience,
just to fill the byte arrays so these tests can assertEquals

otherwise we are going to have to put a lot of SlowMultiReaderWrappers in these tests.

*Robert Muir added a comment - 19/Nov/10 04:58 AM*

here's another hacky update: but still a few tests explicitly check these norms and need to be fixed. maybe we could add an uncached "MultiNorms" or something at least in src/test for convenience, just to fill the byte arrays so these tests can assertEquals otherwise we are going to have to put a lot of SlowMultiReaderWrappers in these tests. 

*Uwe Schindler added a comment - 19/Nov/10 07:17 AM*

Here a better patch for the segment merger. We should even apply this if we not remove top-level norms! 
It saves lots of memory during merging by using ReaderUtil to go down to segment level! Don't wonder about BytesRef, but we need a reference here because of the anonymous inner class.

*Uwe Schindler added a comment - 19/Nov/10 07:17 AM*

Here a better patch for the segment merger. We should even apply this if we not remove top-level norms! It saves lots of memory during merging by using ReaderUtil to go down to segment level! Don't wonder about BytesRef, but we need a reference here because of the anonymous inner class. 

*Robert Muir added a comment - 19/Nov/10 07:45 AM*

here is an updated patch, with core/contrib/solr tests passing. 
For ParallelReader i forced it to require non-composite readers only (e.g. SlowMultiReaderWrap them if thats not the case).

TODO: 
- ParallelReader shouldnt need multifields etc anymore 
- there are 5 @Ignore'd ParallelReader-related tests, because of things like reopen/isOptimized/isCurrent 
- merge in Uwe's improved SegmentsMerger 
- clean up code. 

*Robert Muir added a comment - 19/Nov/10 07:45 AM*

here is an updated patch, with core/contrib/solr tests passing. For ParallelReader i forced it to require non-composite readers only (e.g. SlowMultiReaderWrap them if thats not the case). TODO: 
ParallelReader shouldnt need multifields etc anymore 
there are 5 @Ignore'd ParallelReader-related tests, because of things like reopen/isOptimized/isCurrent 
merge in Uwe's improved SegmentsMerger 
clean up code. 
{quote}

> Remove norms() support from non-atomic IndexReaders
> ---------------------------------------------------
>
>                 Key: LUCENE-2771
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2771
>             Project: Lucene - Java
>          Issue Type: Improvement
>            Reporter: Uwe Schindler
>             Fix For: 4.0
>
>
> Spin-off from LUCENE-2769:
> Currently all IndexReaders support norms(), but the core of Lucene never uses it and its even dangerous because of memory usage. We should do the same like with MultiFields and factor it out and throw UOE on non-atomic readers.
> The SlowMultiReaderWrapper can then manage the norms. Also ParallelReader needs to be fixed.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Commented: (LUCENE-2771) Remove norms() support from non-atomic IndexReaders

Posted by "Robert Muir (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-2771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12977348#action_12977348 ] 

Robert Muir commented on LUCENE-2771:
-------------------------------------

I was just only waiting for a review... i'll merge the patch and test again and commit shortly.

> Remove norms() support from non-atomic IndexReaders
> ---------------------------------------------------
>
>                 Key: LUCENE-2771
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2771
>             Project: Lucene - Java
>          Issue Type: Improvement
>            Reporter: Uwe Schindler
>             Fix For: 4.0
>
>         Attachments: LUCENE-2771.patch, LUCENE-2771.patch, LUCENE-2771.patch, LUCENE-2771_needsCache.patch
>
>
> Spin-off from LUCENE-2769:
> Currently all IndexReaders support norms(), but the core of Lucene never uses it and its even dangerous because of memory usage. We should do the same like with MultiFields and factor it out and throw UOE on non-atomic readers.
> The SlowMultiReaderWrapper can then manage the norms. Also ParallelReader needs to be fixed.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Commented: (LUCENE-2771) Remove norms() support from non-atomic IndexReaders

Posted by "Earwin Burrfoot (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-2771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12934078#action_12934078 ] 

Earwin Burrfoot commented on LUCENE-2771:
-----------------------------------------

SegmentReader and AllOtherReaders are becoming less and less similar. Is it time to remove their common parent class?

> Remove norms() support from non-atomic IndexReaders
> ---------------------------------------------------
>
>                 Key: LUCENE-2771
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2771
>             Project: Lucene - Java
>          Issue Type: Improvement
>            Reporter: Uwe Schindler
>             Fix For: 4.0
>
>         Attachments: LUCENE-2771.patch, LUCENE-2771.patch, LUCENE-2771_needsCache.patch
>
>
> Spin-off from LUCENE-2769:
> Currently all IndexReaders support norms(), but the core of Lucene never uses it and its even dangerous because of memory usage. We should do the same like with MultiFields and factor it out and throw UOE on non-atomic readers.
> The SlowMultiReaderWrapper can then manage the norms. Also ParallelReader needs to be fixed.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Commented: (LUCENE-2771) Remove norms() support from non-atomic IndexReaders

Posted by "Robert Muir (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-2771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12964732#action_12964732 ] 

Robert Muir commented on LUCENE-2771:
-------------------------------------

{quote}
We should in all cases open another issue and remove Multi* support from all core queries and filters (where it is left). If you use a filter with a MultiReader, it can throw UOE. Then you can always use SlowMultiReaderWrapper.
{quote}

+1


> Remove norms() support from non-atomic IndexReaders
> ---------------------------------------------------
>
>                 Key: LUCENE-2771
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2771
>             Project: Lucene - Java
>          Issue Type: Improvement
>            Reporter: Uwe Schindler
>             Fix For: 4.0
>
>         Attachments: LUCENE-2771.patch, LUCENE-2771.patch, LUCENE-2771.patch, LUCENE-2771_needsCache.patch
>
>
> Spin-off from LUCENE-2769:
> Currently all IndexReaders support norms(), but the core of Lucene never uses it and its even dangerous because of memory usage. We should do the same like with MultiFields and factor it out and throw UOE on non-atomic readers.
> The SlowMultiReaderWrapper can then manage the norms. Also ParallelReader needs to be fixed.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Issue Comment Edited: (LUCENE-2771) Remove norms() support from non-atomic IndexReaders

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-2771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12933911#action_12933911 ] 

Uwe Schindler edited comment on LUCENE-2771 at 11/19/10 1:49 PM:
-----------------------------------------------------------------

Here the relevant comments:

{quote}
*Robert Muir added a comment - 19/Nov/10 04:28 AM*

here is a hack patch for Uwe's idea about the norms.
we need to change SegmentMerger to not call norms on the top-level IR but populate its normBuffer from the subs. 
in my opinion it seems crazy we are currently creating these big arrays this way (yeah there is the hairy code for re-open that re-uses the big merged cache for the NRT case, but still).

Maybe i am missing something.

*Robert Muir added a comment - 19/Nov/10 04:28 AM*

here is a hack patch for Uwe's idea about the norms. we need to change SegmentMerger to not call norms on the top-level IR but populate its normBuffer from the subs. in my opinion it seems crazy we are currently creating these big arrays this way (yeah there is the hairy code for re-open that re-uses the big merged cache for the NRT case, but still). Maybe i am missing something. 

*Robert Muir added a comment - 19/Nov/10 04:58 AM*

here's another hacky update: but still a few tests explicitly check these norms and need to be fixed. 
maybe we could add an uncached "MultiNorms" or something at least in src/test for convenience,
just to fill the byte arrays so these tests can assertEquals

otherwise we are going to have to put a lot of SlowMultiReaderWrappers in these tests.

*Robert Muir added a comment - 19/Nov/10 04:58 AM*

here's another hacky update: but still a few tests explicitly check these norms and need to be fixed. maybe we could add an uncached "MultiNorms" or something at least in src/test for convenience, just to fill the byte arrays so these tests can assertEquals otherwise we are going to have to put a lot of SlowMultiReaderWrappers in these tests. 

*Robert Muir added a comment - 19/Nov/10 07:45 AM*

here is an updated patch, with core/contrib/solr tests passing. 
For ParallelReader i forced it to require non-composite readers only (e.g. SlowMultiReaderWrap them if thats not the case).

TODO: 
- ParallelReader shouldnt need multifields etc anymore 
- there are 5 @Ignore'd ParallelReader-related tests, because of things like reopen/isOptimized/isCurrent 
- merge in Uwe's improved SegmentsMerger 
- clean up code. 

*Robert Muir added a comment - 19/Nov/10 07:45 AM*

here is an updated patch, with core/contrib/solr tests passing. For ParallelReader i forced it to require non-composite readers only (e.g. SlowMultiReaderWrap them if thats not the case). TODO: 
ParallelReader shouldnt need multifields etc anymore 
there are 5 @Ignore'd ParallelReader-related tests, because of things like reopen/isOptimized/isCurrent 
merge in Uwe's improved SegmentsMerger 
clean up code. 
{quote}

      was (Author: thetaphi):
    Here the relevant comments:

{quote}
*Robert Muir added a comment - 19/Nov/10 04:28 AM*

here is a hack patch for Uwe's idea about the norms.
we need to change SegmentMerger to not call norms on the top-level IR but populate its normBuffer from the subs. 
in my opinion it seems crazy we are currently creating these big arrays this way (yeah there is the hairy code for re-open that re-uses the big merged cache for the NRT case, but still).

Maybe i am missing something.

*Robert Muir added a comment - 19/Nov/10 04:28 AM*

here is a hack patch for Uwe's idea about the norms. we need to change SegmentMerger to not call norms on the top-level IR but populate its normBuffer from the subs. in my opinion it seems crazy we are currently creating these big arrays this way (yeah there is the hairy code for re-open that re-uses the big merged cache for the NRT case, but still). Maybe i am missing something. 

*Robert Muir added a comment - 19/Nov/10 04:58 AM*

here's another hacky update: but still a few tests explicitly check these norms and need to be fixed. 
maybe we could add an uncached "MultiNorms" or something at least in src/test for convenience,
just to fill the byte arrays so these tests can assertEquals

otherwise we are going to have to put a lot of SlowMultiReaderWrappers in these tests.

*Robert Muir added a comment - 19/Nov/10 04:58 AM*

here's another hacky update: but still a few tests explicitly check these norms and need to be fixed. maybe we could add an uncached "MultiNorms" or something at least in src/test for convenience, just to fill the byte arrays so these tests can assertEquals otherwise we are going to have to put a lot of SlowMultiReaderWrappers in these tests. 

*Uwe Schindler added a comment - 19/Nov/10 07:17 AM*

Here a better patch for the segment merger. We should even apply this if we not remove top-level norms! 
It saves lots of memory during merging by using ReaderUtil to go down to segment level! Don't wonder about BytesRef, but we need a reference here because of the anonymous inner class.

*Uwe Schindler added a comment - 19/Nov/10 07:17 AM*

Here a better patch for the segment merger. We should even apply this if we not remove top-level norms! It saves lots of memory during merging by using ReaderUtil to go down to segment level! Don't wonder about BytesRef, but we need a reference here because of the anonymous inner class. 

*Robert Muir added a comment - 19/Nov/10 07:45 AM*

here is an updated patch, with core/contrib/solr tests passing. 
For ParallelReader i forced it to require non-composite readers only (e.g. SlowMultiReaderWrap them if thats not the case).

TODO: 
- ParallelReader shouldnt need multifields etc anymore 
- there are 5 @Ignore'd ParallelReader-related tests, because of things like reopen/isOptimized/isCurrent 
- merge in Uwe's improved SegmentsMerger 
- clean up code. 

*Robert Muir added a comment - 19/Nov/10 07:45 AM*

here is an updated patch, with core/contrib/solr tests passing. For ParallelReader i forced it to require non-composite readers only (e.g. SlowMultiReaderWrap them if thats not the case). TODO: 
ParallelReader shouldnt need multifields etc anymore 
there are 5 @Ignore'd ParallelReader-related tests, because of things like reopen/isOptimized/isCurrent 
merge in Uwe's improved SegmentsMerger 
clean up code. 
{quote}
  
> Remove norms() support from non-atomic IndexReaders
> ---------------------------------------------------
>
>                 Key: LUCENE-2771
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2771
>             Project: Lucene - Java
>          Issue Type: Improvement
>            Reporter: Uwe Schindler
>             Fix For: 4.0
>
>
> Spin-off from LUCENE-2769:
> Currently all IndexReaders support norms(), but the core of Lucene never uses it and its even dangerous because of memory usage. We should do the same like with MultiFields and factor it out and throw UOE on non-atomic readers.
> The SlowMultiReaderWrapper can then manage the norms. Also ParallelReader needs to be fixed.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Commented: (LUCENE-2771) Remove norms() support from non-atomic IndexReaders

Posted by "Robert Muir (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-2771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12934169#action_12934169 ] 

Robert Muir commented on LUCENE-2771:
-------------------------------------

bq. Some throughts: The cache currently dont support reopening readers, as FilterIndexReader throws UOE on reopen

I don't see why SlowMultiReaderWrapper needs to support reopen, is there really a use case for this?
But parallelreader still does, with the patch (and its tests pass).

> Remove norms() support from non-atomic IndexReaders
> ---------------------------------------------------
>
>                 Key: LUCENE-2771
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2771
>             Project: Lucene - Java
>          Issue Type: Improvement
>            Reporter: Uwe Schindler
>             Fix For: 4.0
>
>         Attachments: LUCENE-2771.patch, LUCENE-2771.patch, LUCENE-2771.patch, LUCENE-2771_needsCache.patch
>
>
> Spin-off from LUCENE-2769:
> Currently all IndexReaders support norms(), but the core of Lucene never uses it and its even dangerous because of memory usage. We should do the same like with MultiFields and factor it out and throw UOE on non-atomic readers.
> The SlowMultiReaderWrapper can then manage the norms. Also ParallelReader needs to be fixed.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Commented: (LUCENE-2771) Remove norms() support from non-atomic IndexReaders

Posted by "Robert Muir (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-2771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12964728#action_12964728 ] 

Robert Muir commented on LUCENE-2771:
-------------------------------------

Whats going on here, should i commit the patch?


> Remove norms() support from non-atomic IndexReaders
> ---------------------------------------------------
>
>                 Key: LUCENE-2771
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2771
>             Project: Lucene - Java
>          Issue Type: Improvement
>            Reporter: Uwe Schindler
>             Fix For: 4.0
>
>         Attachments: LUCENE-2771.patch, LUCENE-2771.patch, LUCENE-2771.patch, LUCENE-2771_needsCache.patch
>
>
> Spin-off from LUCENE-2769:
> Currently all IndexReaders support norms(), but the core of Lucene never uses it and its even dangerous because of memory usage. We should do the same like with MultiFields and factor it out and throw UOE on non-atomic readers.
> The SlowMultiReaderWrapper can then manage the norms. Also ParallelReader needs to be fixed.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Commented: (LUCENE-2771) Remove norms() support from non-atomic IndexReaders

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-2771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12934181#action_12934181 ] 

Uwe Schindler commented on LUCENE-2771:
---------------------------------------

bq. In fact, reopen is completely unrelated to this issue, or the cache here at all... so i don't understand your comment. 

If you reopen an reader you can reuse the cache partly (so instead of copying all norms from all subreaders again, you can sometimes simply override a part of only a refreshed subreader).

> Remove norms() support from non-atomic IndexReaders
> ---------------------------------------------------
>
>                 Key: LUCENE-2771
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2771
>             Project: Lucene - Java
>          Issue Type: Improvement
>            Reporter: Uwe Schindler
>             Fix For: 4.0
>
>         Attachments: LUCENE-2771.patch, LUCENE-2771.patch, LUCENE-2771.patch, LUCENE-2771_needsCache.patch
>
>
> Spin-off from LUCENE-2769:
> Currently all IndexReaders support norms(), but the core of Lucene never uses it and its even dangerous because of memory usage. We should do the same like with MultiFields and factor it out and throw UOE on non-atomic readers.
> The SlowMultiReaderWrapper can then manage the norms. Also ParallelReader needs to be fixed.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Commented: (LUCENE-2771) Remove norms() support from non-atomic IndexReaders

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-2771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12977345#action_12977345 ] 

Michael McCandless commented on LUCENE-2771:
--------------------------------------------

This looks great!  Any reason not to commit now?

> Remove norms() support from non-atomic IndexReaders
> ---------------------------------------------------
>
>                 Key: LUCENE-2771
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2771
>             Project: Lucene - Java
>          Issue Type: Improvement
>            Reporter: Uwe Schindler
>             Fix For: 4.0
>
>         Attachments: LUCENE-2771.patch, LUCENE-2771.patch, LUCENE-2771.patch, LUCENE-2771_needsCache.patch
>
>
> Spin-off from LUCENE-2769:
> Currently all IndexReaders support norms(), but the core of Lucene never uses it and its even dangerous because of memory usage. We should do the same like with MultiFields and factor it out and throw UOE on non-atomic readers.
> The SlowMultiReaderWrapper can then manage the norms. Also ParallelReader needs to be fixed.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Updated: (LUCENE-2771) Remove norms() support from non-atomic IndexReaders

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-2771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Uwe Schindler updated LUCENE-2771:
----------------------------------

    Attachment: LUCENE-2771.patch

Updated patch after commit of subissue LUCENE-2772.

Some throughts: The cache currently dont support reopening readers, as FilterIndexReader throws UOE on reopen (which is fine for most cases). But for this reader we should support reopen and implement it in the FilterIndexReader with optimized norms recreation (copy over in the map only reopened segments?).

Another thing about MultiNorms: We are inconsistent now: We are using MultiFields everywhere in core queries but not MultiNorms. E.g. for a TermQuery you can currently get a Scorer, but as soon as this scorer requests norms, it will throw UOE. We should be consistent. As we have now the SlowMultiReaderWrapper, we should remove MultiFields support from everywhere else in core (Filters and Queries, but also FieldCache?). *+1 for that from my side!*

> Remove norms() support from non-atomic IndexReaders
> ---------------------------------------------------
>
>                 Key: LUCENE-2771
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2771
>             Project: Lucene - Java
>          Issue Type: Improvement
>            Reporter: Uwe Schindler
>             Fix For: 4.0
>
>         Attachments: LUCENE-2771.patch, LUCENE-2771.patch, LUCENE-2771.patch, LUCENE-2771_needsCache.patch
>
>
> Spin-off from LUCENE-2769:
> Currently all IndexReaders support norms(), but the core of Lucene never uses it and its even dangerous because of memory usage. We should do the same like with MultiFields and factor it out and throw UOE on non-atomic readers.
> The SlowMultiReaderWrapper can then manage the norms. Also ParallelReader needs to be fixed.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Commented: (LUCENE-2771) Remove norms() support from non-atomic IndexReaders

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-2771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12964729#action_12964729 ] 

Uwe Schindler commented on LUCENE-2771:
---------------------------------------

I had no time to further look into it, I just have to recapitulate my comments and review the patch again.

We should in all cases open another issue and remove Multi* support from all core queries and filters (where it is left). If you use a filter with a MultiReader, it can throw UOE. Then you can always use SlowMultiReaderWrapper.

> Remove norms() support from non-atomic IndexReaders
> ---------------------------------------------------
>
>                 Key: LUCENE-2771
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2771
>             Project: Lucene - Java
>          Issue Type: Improvement
>            Reporter: Uwe Schindler
>             Fix For: 4.0
>
>         Attachments: LUCENE-2771.patch, LUCENE-2771.patch, LUCENE-2771.patch, LUCENE-2771_needsCache.patch
>
>
> Spin-off from LUCENE-2769:
> Currently all IndexReaders support norms(), but the core of Lucene never uses it and its even dangerous because of memory usage. We should do the same like with MultiFields and factor it out and throw UOE on non-atomic readers.
> The SlowMultiReaderWrapper can then manage the norms. Also ParallelReader needs to be fixed.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Commented: (LUCENE-2771) Remove norms() support from non-atomic IndexReaders

Posted by "Robert Muir (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-2771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12934185#action_12934185 ] 

Robert Muir commented on LUCENE-2771:
-------------------------------------

right but slowmultiwrapper never supported reopen before... so its unrelated.

> Remove norms() support from non-atomic IndexReaders
> ---------------------------------------------------
>
>                 Key: LUCENE-2771
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2771
>             Project: Lucene - Java
>          Issue Type: Improvement
>            Reporter: Uwe Schindler
>             Fix For: 4.0
>
>         Attachments: LUCENE-2771.patch, LUCENE-2771.patch, LUCENE-2771.patch, LUCENE-2771_needsCache.patch
>
>
> Spin-off from LUCENE-2769:
> Currently all IndexReaders support norms(), but the core of Lucene never uses it and its even dangerous because of memory usage. We should do the same like with MultiFields and factor it out and throw UOE on non-atomic readers.
> The SlowMultiReaderWrapper can then manage the norms. Also ParallelReader needs to be fixed.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Commented: (LUCENE-2771) Remove norms() support from non-atomic IndexReaders

Posted by "Robert Muir (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-2771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12934171#action_12934171 ] 

Robert Muir commented on LUCENE-2771:
-------------------------------------

In fact, reopen is completely unrelated to this issue, or the cache here at all... so i don't understand your comment.


> Remove norms() support from non-atomic IndexReaders
> ---------------------------------------------------
>
>                 Key: LUCENE-2771
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2771
>             Project: Lucene - Java
>          Issue Type: Improvement
>            Reporter: Uwe Schindler
>             Fix For: 4.0
>
>         Attachments: LUCENE-2771.patch, LUCENE-2771.patch, LUCENE-2771.patch, LUCENE-2771_needsCache.patch
>
>
> Spin-off from LUCENE-2769:
> Currently all IndexReaders support norms(), but the core of Lucene never uses it and its even dangerous because of memory usage. We should do the same like with MultiFields and factor it out and throw UOE on non-atomic readers.
> The SlowMultiReaderWrapper can then manage the norms. Also ParallelReader needs to be fixed.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Updated: (LUCENE-2771) Remove norms() support from non-atomic IndexReaders

Posted by "Robert Muir (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-2771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Robert Muir updated LUCENE-2771:
--------------------------------

    Attachment: LUCENE-2771.patch

Ok, not really related, but i just can't stand it:

In the contrib/demo "SearchFiles" example, there is a OneNorms FilterIndexReader.
I think this entire thing is a no-op since per-segment search, and not a good
thing to have in an example.  I removed this here.

i also turned some nocommits into TODO's: really this Slow* stuff doesn't need to be
hyper-optimized.

> Remove norms() support from non-atomic IndexReaders
> ---------------------------------------------------
>
>                 Key: LUCENE-2771
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2771
>             Project: Lucene - Java
>          Issue Type: Improvement
>            Reporter: Uwe Schindler
>             Fix For: 4.0
>
>         Attachments: LUCENE-2771.patch, LUCENE-2771.patch, LUCENE-2771_needsCache.patch
>
>
> Spin-off from LUCENE-2769:
> Currently all IndexReaders support norms(), but the core of Lucene never uses it and its even dangerous because of memory usage. We should do the same like with MultiFields and factor it out and throw UOE on non-atomic readers.
> The SlowMultiReaderWrapper can then manage the norms. Also ParallelReader needs to be fixed.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Resolved: (LUCENE-2771) Remove norms() support from non-atomic IndexReaders

Posted by "Robert Muir (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-2771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Robert Muir resolved LUCENE-2771.
---------------------------------

    Resolution: Fixed

Committed revision 1055238.

> Remove norms() support from non-atomic IndexReaders
> ---------------------------------------------------
>
>                 Key: LUCENE-2771
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2771
>             Project: Lucene - Java
>          Issue Type: Improvement
>            Reporter: Uwe Schindler
>             Fix For: 4.0
>
>         Attachments: LUCENE-2771.patch, LUCENE-2771.patch, LUCENE-2771.patch, LUCENE-2771_needsCache.patch
>
>
> Spin-off from LUCENE-2769:
> Currently all IndexReaders support norms(), but the core of Lucene never uses it and its even dangerous because of memory usage. We should do the same like with MultiFields and factor it out and throw UOE on non-atomic readers.
> The SlowMultiReaderWrapper can then manage the norms. Also ParallelReader needs to be fixed.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org