You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Uwe Schindler (Created) (JIRA)" <ji...@apache.org> on 2012/03/13 10:27:39 UTC

[jira] [Created] (LUCENE-3866) Make CompositeReader.getSequntialSubReaders() and the corresponding IndexReaderContext methods return unmodifiable List

Make CompositeReader.getSequntialSubReaders() and the corresponding IndexReaderContext methods return unmodifiable List<R extends IndexReader>
----------------------------------------------------------------------------------------------------------------------------------------------

                 Key: LUCENE-3866
                 URL: https://issues.apache.org/jira/browse/LUCENE-3866
             Project: Lucene - Java
          Issue Type: Improvement
            Reporter: Uwe Schindler
            Assignee: Uwe Schindler
             Fix For: 4.0


Since Lucene 2.9 we have the method getSequentialSubReader() returning IndexReader[]. Based on hardly-to-debug errors in user's code, Robert and me realized that returning an array from a public API is an anti-pattern. If the array is intended to be modifiable (like byte[] in terms,...), it is fine to use arrays in public APIs, but not, if the array must be protected from modification. As IndexReaders are 100% unmodifiable in trunk code (no deletions,...), the only possibility to corrumpt the reader is by modifying the array returned by getSequentialSubReaders(). We should prevent this.

The same theoretically applies to FieldCache, too - but the party that is afraid of performance problems is too big to fight against that :(

For getSequentialSubReaders there is no performance problem at all. The binary search of reader-ids inside BaseCompositeReader can still be done with the internal protected array, but public APIs should expose only a unmodifiable List. The same applies to leaves() and children() in IndexReaderContext. This change to list would also allow to make CompositeReader and CompositeReaderContext Iterable<R extends IndexReader>, so some loops would look nice.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Updated] (LUCENE-3866) Make CompositeReader.getSequentialSubReaders() and the corresponding IndexReaderContext methods return unmodifiable List

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-3866?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Uwe Schindler updated LUCENE-3866:
----------------------------------

    Attachment: LUCENE-3866-step2.patch

This is the final patch, now also moving IndexReaderContext to collections and make them unmodifiable.

For safety, IRC.leaves() now throws UnsupportedOperationException, if the context is not top-level (returned null before), this helps to find bugs.

All tests pass, JavaDocs are updated.

Robert suggested to me yesterday, to maybe make getSequentialSubReaders() protected, as it is no longer used by user code (only tests). All code should use the topReaderContext returned by the IndexReader to get all *atomic* IRC.leaves(), or IRC.children() to get the sub-contexts/readers.

For easy use it might be a good idea to add some "easy-use methods" in ReaderUtil to get a view on all AtomicReader leaves (without docBase, so returns List<AtomicReader>). Some code not needing the whole info would get simplier. This is stuff for a new issue.

In my opinion we should also move and hide ReaderSlice and BitSlice to index package, those classes are solely privately used from there.

I think this is ready to commit to trunk and for backport to 4.x. I will not wait too long (max 24hrs), as the patch may get outdated very quickly. Maybe Mike can do some perf benchmarks with beast to show that it does not affect performance (some parts like creating Multi* should be much more effective now).
                
> Make CompositeReader.getSequentialSubReaders() and the corresponding IndexReaderContext methods return unmodifiable List<R extends IndexReader>
> -----------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-3866
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3866
>             Project: Lucene - Java
>          Issue Type: Improvement
>            Reporter: Uwe Schindler
>            Assignee: Uwe Schindler
>             Fix For: 4.0
>
>         Attachments: LUCENE-3866-step1.patch, LUCENE-3866-step2.patch
>
>
> Since Lucene 2.9 we have the method getSequentialSubReader() returning IndexReader[]. Based on hardly-to-debug errors in user's code, Robert and me realized that returning an array from a public API is an anti-pattern. If the array is intended to be modifiable (like byte[] in terms,...), it is fine to use arrays in public APIs, but not, if the array must be protected from modification. As IndexReaders are 100% unmodifiable in trunk code (no deletions,...), the only possibility to corrumpt the reader is by modifying the array returned by getSequentialSubReaders(). We should prevent this.
> The same theoretically applies to FieldCache, too - but the party that is afraid of performance problems is too big to fight against that :(
> For getSequentialSubReaders there is no performance problem at all. The binary search of reader-ids inside BaseCompositeReader can still be done with the internal protected array, but public APIs should expose only a unmodifiable List. The same applies to leaves() and children() in IndexReaderContext. This change to list would also allow to make CompositeReader and CompositeReaderContext Iterable<R extends IndexReader>, so some loops would look nice.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Updated] (LUCENE-3866) Make CompositeReader.getSequentialSubReaders() and the corresponding IndexReaderContext methods return unmodifiable List

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-3866?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Uwe Schindler updated LUCENE-3866:
----------------------------------

    Attachment: LUCENE-3866-step1.patch

Here the first step of this refactoring (affects only few core classes):

- getSequentialSubReaders() returns a List<? extends IndexReader> (unmodifiable)
- Nuked ReaderUtil.Gather completely (this one is ineffective and slow because of useless recursion). We can use ReaderContext.leaves() to get all atomic sub readers including their docStarts
- Moved ReaderUtil.Slice to separate class (it's somehow unrelated), but has nothing to do with ReaderUtil anymore -> Should move to index package + hidden

There is still one (slow) nocommit, because step 2 will refactor the ReaderContexts to use Collections, too.

Yonik:

bq. I think we should allow users to access in the most low-level efficient manner, just as lucene can. Remember, this is expert level stuff!  Seems like at most we would need a javadoc comment saying "don't modify this".

Actually it's more efficient than before. And the collection views are still backed by arrays and are never used in inner loops (it's just when IndexSearcher initializes or executes searches on all segments). Without any real benchmark showing a slowness (please do this *after* step 2 -> nocommit) I don't think we should risk corrumpt readers.
                
> Make CompositeReader.getSequentialSubReaders() and the corresponding IndexReaderContext methods return unmodifiable List<R extends IndexReader>
> -----------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-3866
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3866
>             Project: Lucene - Java
>          Issue Type: Improvement
>            Reporter: Uwe Schindler
>            Assignee: Uwe Schindler
>             Fix For: 4.0
>
>         Attachments: LUCENE-3866-step1.patch
>
>
> Since Lucene 2.9 we have the method getSequentialSubReader() returning IndexReader[]. Based on hardly-to-debug errors in user's code, Robert and me realized that returning an array from a public API is an anti-pattern. If the array is intended to be modifiable (like byte[] in terms,...), it is fine to use arrays in public APIs, but not, if the array must be protected from modification. As IndexReaders are 100% unmodifiable in trunk code (no deletions,...), the only possibility to corrumpt the reader is by modifying the array returned by getSequentialSubReaders(). We should prevent this.
> The same theoretically applies to FieldCache, too - but the party that is afraid of performance problems is too big to fight against that :(
> For getSequentialSubReaders there is no performance problem at all. The binary search of reader-ids inside BaseCompositeReader can still be done with the internal protected array, but public APIs should expose only a unmodifiable List. The same applies to leaves() and children() in IndexReaderContext. This change to list would also allow to make CompositeReader and CompositeReaderContext Iterable<R extends IndexReader>, so some loops would look nice.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (LUCENE-3866) Make CompositeReader.getSequentialSubReaders() and the corresponding IndexReaderContext methods return unmodifiable List

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-3866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13396237#comment-13396237 ] 

Uwe Schindler commented on LUCENE-3866:
---------------------------------------

I checked CHANGES.txt and MIGRATE.txt, there are no explicit mentions of the datatypes. No need to change anything. But MIGRATE.txt is missing the documentation about how to get atomic leaves from a reader. We should do this in the followup issue, when we simplify some names.

I will commit this patch tomorrow morning to trunk and 4.x.
                
> Make CompositeReader.getSequentialSubReaders() and the corresponding IndexReaderContext methods return unmodifiable List<R extends IndexReader>
> -----------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-3866
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3866
>             Project: Lucene - Java
>          Issue Type: Improvement
>            Reporter: Uwe Schindler
>            Assignee: Uwe Schindler
>             Fix For: 4.0
>
>         Attachments: LUCENE-3866-step1.patch, LUCENE-3866-step2.patch, LUCENE-3866-step2.patch
>
>
> Since Lucene 2.9 we have the method getSequentialSubReader() returning IndexReader[]. Based on hardly-to-debug errors in user's code, Robert and me realized that returning an array from a public API is an anti-pattern. If the array is intended to be modifiable (like byte[] in terms,...), it is fine to use arrays in public APIs, but not, if the array must be protected from modification. As IndexReaders are 100% unmodifiable in trunk code (no deletions,...), the only possibility to corrumpt the reader is by modifying the array returned by getSequentialSubReaders(). We should prevent this.
> The same theoretically applies to FieldCache, too - but the party that is afraid of performance problems is too big to fight against that :(
> For getSequentialSubReaders there is no performance problem at all. The binary search of reader-ids inside BaseCompositeReader can still be done with the internal protected array, but public APIs should expose only a unmodifiable List. The same applies to leaves() and children() in IndexReaderContext. This change to list would also allow to make CompositeReader and CompositeReaderContext Iterable<R extends IndexReader>, so some loops would look nice.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (LUCENE-3866) Make CompositeReader.getSequentialSubReaders() and the corresponding IndexReaderContext methods return unmodifiable List

Posted by "Uwe Schindler (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-3866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13228339#comment-13228339 ] 

Uwe Schindler commented on LUCENE-3866:
---------------------------------------

Yeah, its both cases: The protected inner reader[] is of course a problem. But in general we should protect anybody outside (except subclasses) to do harm here. So it should be a simple List<R extends IndexReader>.
                
> Make CompositeReader.getSequentialSubReaders() and the corresponding IndexReaderContext methods return unmodifiable List<R extends IndexReader>
> -----------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-3866
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3866
>             Project: Lucene - Java
>          Issue Type: Improvement
>            Reporter: Uwe Schindler
>            Assignee: Uwe Schindler
>             Fix For: 4.0
>
>
> Since Lucene 2.9 we have the method getSequentialSubReader() returning IndexReader[]. Based on hardly-to-debug errors in user's code, Robert and me realized that returning an array from a public API is an anti-pattern. If the array is intended to be modifiable (like byte[] in terms,...), it is fine to use arrays in public APIs, but not, if the array must be protected from modification. As IndexReaders are 100% unmodifiable in trunk code (no deletions,...), the only possibility to corrumpt the reader is by modifying the array returned by getSequentialSubReaders(). We should prevent this.
> The same theoretically applies to FieldCache, too - but the party that is afraid of performance problems is too big to fight against that :(
> For getSequentialSubReaders there is no performance problem at all. The binary search of reader-ids inside BaseCompositeReader can still be done with the internal protected array, but public APIs should expose only a unmodifiable List. The same applies to leaves() and children() in IndexReaderContext. This change to list would also allow to make CompositeReader and CompositeReaderContext Iterable<R extends IndexReader>, so some loops would look nice.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (LUCENE-3866) Make CompositeReader.getSequentialSubReaders() and the corresponding IndexReaderContext methods return unmodifiable List

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-3866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13395995#comment-13395995 ] 

Michael McCandless commented on LUCENE-3866:
--------------------------------------------

+1, patch looks good!  Thanks Uwe.
                
> Make CompositeReader.getSequentialSubReaders() and the corresponding IndexReaderContext methods return unmodifiable List<R extends IndexReader>
> -----------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-3866
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3866
>             Project: Lucene - Java
>          Issue Type: Improvement
>            Reporter: Uwe Schindler
>            Assignee: Uwe Schindler
>             Fix For: 4.0
>
>         Attachments: LUCENE-3866-step1.patch, LUCENE-3866-step2.patch, LUCENE-3866-step2.patch
>
>
> Since Lucene 2.9 we have the method getSequentialSubReader() returning IndexReader[]. Based on hardly-to-debug errors in user's code, Robert and me realized that returning an array from a public API is an anti-pattern. If the array is intended to be modifiable (like byte[] in terms,...), it is fine to use arrays in public APIs, but not, if the array must be protected from modification. As IndexReaders are 100% unmodifiable in trunk code (no deletions,...), the only possibility to corrumpt the reader is by modifying the array returned by getSequentialSubReaders(). We should prevent this.
> The same theoretically applies to FieldCache, too - but the party that is afraid of performance problems is too big to fight against that :(
> For getSequentialSubReaders there is no performance problem at all. The binary search of reader-ids inside BaseCompositeReader can still be done with the internal protected array, but public APIs should expose only a unmodifiable List. The same applies to leaves() and children() in IndexReaderContext. This change to list would also allow to make CompositeReader and CompositeReaderContext Iterable<R extends IndexReader>, so some loops would look nice.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Resolved] (LUCENE-3866) Make CompositeReader.getSequentialSubReaders() and the corresponding IndexReaderContext methods return unmodifiable List

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-3866?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Uwe Schindler resolved LUCENE-3866.
-----------------------------------

       Resolution: Fixed
    Fix Version/s: 5.0

Committed trunk revision: 1351590
Committed 4.x revision: 1351591
                
> Make CompositeReader.getSequentialSubReaders() and the corresponding IndexReaderContext methods return unmodifiable List<R extends IndexReader>
> -----------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-3866
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3866
>             Project: Lucene - Java
>          Issue Type: Improvement
>            Reporter: Uwe Schindler
>            Assignee: Uwe Schindler
>             Fix For: 4.0, 5.0
>
>         Attachments: LUCENE-3866-step1.patch, LUCENE-3866-step2.patch, LUCENE-3866-step2.patch
>
>
> Since Lucene 2.9 we have the method getSequentialSubReader() returning IndexReader[]. Based on hardly-to-debug errors in user's code, Robert and me realized that returning an array from a public API is an anti-pattern. If the array is intended to be modifiable (like byte[] in terms,...), it is fine to use arrays in public APIs, but not, if the array must be protected from modification. As IndexReaders are 100% unmodifiable in trunk code (no deletions,...), the only possibility to corrumpt the reader is by modifying the array returned by getSequentialSubReaders(). We should prevent this.
> The same theoretically applies to FieldCache, too - but the party that is afraid of performance problems is too big to fight against that :(
> For getSequentialSubReaders there is no performance problem at all. The binary search of reader-ids inside BaseCompositeReader can still be done with the internal protected array, but public APIs should expose only a unmodifiable List. The same applies to leaves() and children() in IndexReaderContext. This change to list would also allow to make CompositeReader and CompositeReaderContext Iterable<R extends IndexReader>, so some loops would look nice.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (LUCENE-3866) Make CompositeReader.getSequentialSubReaders() and the corresponding IndexReaderContext methods return unmodifiable List

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-3866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13395968#comment-13395968 ] 

Michael McCandless commented on LUCENE-3866:
--------------------------------------------

I ran quick perf test:
{noformat}
                Task    QPS base StdDev base   QPS patchStdDev patch      Pct diff
             Respell       92.15        1.40       90.30        2.70   -6% -    2%
            PKLookup      130.68        4.16      128.24        2.28   -6% -    3%
              Fuzzy2       41.79        0.40       41.14        1.11   -5% -    2%
              Fuzzy1      108.97        2.40      107.33        2.87   -6% -    3%
         AndHighHigh       16.19        0.48       15.97        0.31   -6% -    3%
              Phrase       12.90        0.32       12.74        0.36   -6% -    4%
          AndHighMed       64.18        1.81       63.46        1.83   -6% -    4%
        SloppyPhrase        8.37        0.29        8.29        0.11   -5% -    3%
            SpanNear        5.51        0.12        5.46        0.18   -6% -    4%
         TermGroup1M       36.19        0.60       35.89        0.74   -4% -    2%
        TermBGroup1M       70.64        0.49       70.74        0.57   -1% -    1%
             Prefix3       61.07        3.58       61.25        1.35   -7% -    8%
            Wildcard       40.84        2.20       41.00        0.98   -7% -    8%
                Term      147.32        3.85      149.65        4.87   -4% -    7%
      TermBGroup1M1P       50.32        1.58       51.29        0.88   -2% -    7%
              IntNRQ        9.96        1.40       10.18        0.56  -15% -   25%
          OrHighHigh       10.31        0.74       10.56        0.57   -9% -   16%
           OrHighMed       12.95        1.01       13.26        0.79  -10% -   17%
{noformat}

Basically no real change ... good!
                
> Make CompositeReader.getSequentialSubReaders() and the corresponding IndexReaderContext methods return unmodifiable List<R extends IndexReader>
> -----------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-3866
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3866
>             Project: Lucene - Java
>          Issue Type: Improvement
>            Reporter: Uwe Schindler
>            Assignee: Uwe Schindler
>             Fix For: 4.0
>
>         Attachments: LUCENE-3866-step1.patch, LUCENE-3866-step2.patch, LUCENE-3866-step2.patch
>
>
> Since Lucene 2.9 we have the method getSequentialSubReader() returning IndexReader[]. Based on hardly-to-debug errors in user's code, Robert and me realized that returning an array from a public API is an anti-pattern. If the array is intended to be modifiable (like byte[] in terms,...), it is fine to use arrays in public APIs, but not, if the array must be protected from modification. As IndexReaders are 100% unmodifiable in trunk code (no deletions,...), the only possibility to corrumpt the reader is by modifying the array returned by getSequentialSubReaders(). We should prevent this.
> The same theoretically applies to FieldCache, too - but the party that is afraid of performance problems is too big to fight against that :(
> For getSequentialSubReaders there is no performance problem at all. The binary search of reader-ids inside BaseCompositeReader can still be done with the internal protected array, but public APIs should expose only a unmodifiable List. The same applies to leaves() and children() in IndexReaderContext. This change to list would also allow to make CompositeReader and CompositeReaderContext Iterable<R extends IndexReader>, so some loops would look nice.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Updated] (LUCENE-3866) Make CompositeReader.getSequentialSubReaders() and the corresponding IndexReaderContext methods return unmodifiable List

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-3866?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Uwe Schindler updated LUCENE-3866:
----------------------------------

    Attachment: LUCENE-3866-step2.patch

Samll update with JavaDocs (remove the "don't modify returned array" @ getSequentialSubReaders)
                
> Make CompositeReader.getSequentialSubReaders() and the corresponding IndexReaderContext methods return unmodifiable List<R extends IndexReader>
> -----------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-3866
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3866
>             Project: Lucene - Java
>          Issue Type: Improvement
>            Reporter: Uwe Schindler
>            Assignee: Uwe Schindler
>             Fix For: 4.0
>
>         Attachments: LUCENE-3866-step1.patch, LUCENE-3866-step2.patch, LUCENE-3866-step2.patch
>
>
> Since Lucene 2.9 we have the method getSequentialSubReader() returning IndexReader[]. Based on hardly-to-debug errors in user's code, Robert and me realized that returning an array from a public API is an anti-pattern. If the array is intended to be modifiable (like byte[] in terms,...), it is fine to use arrays in public APIs, but not, if the array must be protected from modification. As IndexReaders are 100% unmodifiable in trunk code (no deletions,...), the only possibility to corrumpt the reader is by modifying the array returned by getSequentialSubReaders(). We should prevent this.
> The same theoretically applies to FieldCache, too - but the party that is afraid of performance problems is too big to fight against that :(
> For getSequentialSubReaders there is no performance problem at all. The binary search of reader-ids inside BaseCompositeReader can still be done with the internal protected array, but public APIs should expose only a unmodifiable List. The same applies to leaves() and children() in IndexReaderContext. This change to list would also allow to make CompositeReader and CompositeReaderContext Iterable<R extends IndexReader>, so some loops would look nice.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Updated] (LUCENE-3866) Make CompositeReader.getSequentialSubReaders() and the corresponding IndexReaderContext methods return unmodifiable List

Posted by "Uwe Schindler (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-3866?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Uwe Schindler updated LUCENE-3866:
----------------------------------

    Summary: Make CompositeReader.getSequentialSubReaders() and the corresponding IndexReaderContext methods return unmodifiable List<R extends IndexReader>  (was: Make CompositeReader.getSequntialSubReaders() and the corresponding IndexReaderContext methods return unmodifiable List<R extends IndexReader>)
    
> Make CompositeReader.getSequentialSubReaders() and the corresponding IndexReaderContext methods return unmodifiable List<R extends IndexReader>
> -----------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-3866
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3866
>             Project: Lucene - Java
>          Issue Type: Improvement
>            Reporter: Uwe Schindler
>            Assignee: Uwe Schindler
>             Fix For: 4.0
>
>
> Since Lucene 2.9 we have the method getSequentialSubReader() returning IndexReader[]. Based on hardly-to-debug errors in user's code, Robert and me realized that returning an array from a public API is an anti-pattern. If the array is intended to be modifiable (like byte[] in terms,...), it is fine to use arrays in public APIs, but not, if the array must be protected from modification. As IndexReaders are 100% unmodifiable in trunk code (no deletions,...), the only possibility to corrumpt the reader is by modifying the array returned by getSequentialSubReaders(). We should prevent this.
> The same theoretically applies to FieldCache, too - but the party that is afraid of performance problems is too big to fight against that :(
> For getSequentialSubReaders there is no performance problem at all. The binary search of reader-ids inside BaseCompositeReader can still be done with the internal protected array, but public APIs should expose only a unmodifiable List. The same applies to leaves() and children() in IndexReaderContext. This change to list would also allow to make CompositeReader and CompositeReaderContext Iterable<R extends IndexReader>, so some loops would look nice.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (LUCENE-3866) Make CompositeReader.getSequentialSubReaders() and the corresponding IndexReaderContext methods return unmodifiable List

Posted by "Robert Muir (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-3866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13228335#comment-13228335 ] 

Robert Muir commented on LUCENE-3866:
-------------------------------------

Thanks for opening!

{quote}
The binary search of reader-ids inside BaseCompositeReader can still be done with the internal protected array,
{quote}

That's unaffected here right? Because starts[] is a separate problem. Why isnt this private? I made it private,
only MultiPassIndexSplitter.FakeDeleteIndexReader is affected. In my opinion we should fix this too, the fix would
be to just have a protected method in BaseComposite: int getDocStart(int readerIndex). Between this and readerIndex(),
subclasses then have all they need.

                
> Make CompositeReader.getSequentialSubReaders() and the corresponding IndexReaderContext methods return unmodifiable List<R extends IndexReader>
> -----------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-3866
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3866
>             Project: Lucene - Java
>          Issue Type: Improvement
>            Reporter: Uwe Schindler
>            Assignee: Uwe Schindler
>             Fix For: 4.0
>
>
> Since Lucene 2.9 we have the method getSequentialSubReader() returning IndexReader[]. Based on hardly-to-debug errors in user's code, Robert and me realized that returning an array from a public API is an anti-pattern. If the array is intended to be modifiable (like byte[] in terms,...), it is fine to use arrays in public APIs, but not, if the array must be protected from modification. As IndexReaders are 100% unmodifiable in trunk code (no deletions,...), the only possibility to corrumpt the reader is by modifying the array returned by getSequentialSubReaders(). We should prevent this.
> The same theoretically applies to FieldCache, too - but the party that is afraid of performance problems is too big to fight against that :(
> For getSequentialSubReaders there is no performance problem at all. The binary search of reader-ids inside BaseCompositeReader can still be done with the internal protected array, but public APIs should expose only a unmodifiable List. The same applies to leaves() and children() in IndexReaderContext. This change to list would also allow to make CompositeReader and CompositeReaderContext Iterable<R extends IndexReader>, so some loops would look nice.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (LUCENE-3866) Make CompositeReader.getSequentialSubReaders() and the corresponding IndexReaderContext methods return unmodifiable List

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-3866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13395975#comment-13395975 ] 

Uwe Schindler commented on LUCENE-3866:
---------------------------------------

Thanks, I was not expecting a change - otherwise all my believes would have been devastated... :-)
                
> Make CompositeReader.getSequentialSubReaders() and the corresponding IndexReaderContext methods return unmodifiable List<R extends IndexReader>
> -----------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-3866
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3866
>             Project: Lucene - Java
>          Issue Type: Improvement
>            Reporter: Uwe Schindler
>            Assignee: Uwe Schindler
>             Fix For: 4.0
>
>         Attachments: LUCENE-3866-step1.patch, LUCENE-3866-step2.patch, LUCENE-3866-step2.patch
>
>
> Since Lucene 2.9 we have the method getSequentialSubReader() returning IndexReader[]. Based on hardly-to-debug errors in user's code, Robert and me realized that returning an array from a public API is an anti-pattern. If the array is intended to be modifiable (like byte[] in terms,...), it is fine to use arrays in public APIs, but not, if the array must be protected from modification. As IndexReaders are 100% unmodifiable in trunk code (no deletions,...), the only possibility to corrumpt the reader is by modifying the array returned by getSequentialSubReaders(). We should prevent this.
> The same theoretically applies to FieldCache, too - but the party that is afraid of performance problems is too big to fight against that :(
> For getSequentialSubReaders there is no performance problem at all. The binary search of reader-ids inside BaseCompositeReader can still be done with the internal protected array, but public APIs should expose only a unmodifiable List. The same applies to leaves() and children() in IndexReaderContext. This change to list would also allow to make CompositeReader and CompositeReaderContext Iterable<R extends IndexReader>, so some loops would look nice.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Comment Edited] (LUCENE-3866) Make CompositeReader.getSequentialSubReaders() and the corresponding IndexReaderContext methods return unmodifiable List

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-3866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13393639#comment-13393639 ] 

Uwe Schindler edited comment on LUCENE-3866 at 6/17/12 11:40 PM:
-----------------------------------------------------------------

Here the first step of this refactoring (affects only few core classes):

- getSequentialSubReaders() returns a List<? extends IndexReader> (unmodifiable)
- Nuked ReaderUtil.Gather completely (this one is ineffective and slow because of useless recursion). We can use ReaderContext.leaves() to get all atomic sub readers including their docStarts
- Improved MultiFields and MultiDocValues by usage of ReaderContext.leaves(). Code much easier to read!
- Moved ReaderUtil.Slice to separate class (it's somehow unrelated), but has nothing to do with ReaderUtil anymore -> Should move to index package + hidden

There is still one (slow) nocommit, because step 2 will refactor the ReaderContexts to use Collections, too.

Yonik:

bq. I think we should allow users to access in the most low-level efficient manner, just as lucene can. Remember, this is expert level stuff!  Seems like at most we would need a javadoc comment saying "don't modify this".

Actually it's more efficient than before. And the collection views are still backed by arrays and are never used in inner loops (it's just when IndexSearcher initializes or executes searches on all segments). Without any real benchmark showing a slowness (please do this *after* step 2 -> nocommit) I don't think we should risk corrumpt readers.
                
      was (Author: thetaphi):
    Here the first step of this refactoring (affects only few core classes):

- getSequentialSubReaders() returns a List<? extends IndexReader> (unmodifiable)
- Nuked ReaderUtil.Gather completely (this one is ineffective and slow because of useless recursion). We can use ReaderContext.leaves() to get all atomic sub readers including their docStarts
- Moved ReaderUtil.Slice to separate class (it's somehow unrelated), but has nothing to do with ReaderUtil anymore -> Should move to index package + hidden

There is still one (slow) nocommit, because step 2 will refactor the ReaderContexts to use Collections, too.

Yonik:

bq. I think we should allow users to access in the most low-level efficient manner, just as lucene can. Remember, this is expert level stuff!  Seems like at most we would need a javadoc comment saying "don't modify this".

Actually it's more efficient than before. And the collection views are still backed by arrays and are never used in inner loops (it's just when IndexSearcher initializes or executes searches on all segments). Without any real benchmark showing a slowness (please do this *after* step 2 -> nocommit) I don't think we should risk corrumpt readers.
                  
> Make CompositeReader.getSequentialSubReaders() and the corresponding IndexReaderContext methods return unmodifiable List<R extends IndexReader>
> -----------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-3866
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3866
>             Project: Lucene - Java
>          Issue Type: Improvement
>            Reporter: Uwe Schindler
>            Assignee: Uwe Schindler
>             Fix For: 4.0
>
>         Attachments: LUCENE-3866-step1.patch
>
>
> Since Lucene 2.9 we have the method getSequentialSubReader() returning IndexReader[]. Based on hardly-to-debug errors in user's code, Robert and me realized that returning an array from a public API is an anti-pattern. If the array is intended to be modifiable (like byte[] in terms,...), it is fine to use arrays in public APIs, but not, if the array must be protected from modification. As IndexReaders are 100% unmodifiable in trunk code (no deletions,...), the only possibility to corrumpt the reader is by modifying the array returned by getSequentialSubReaders(). We should prevent this.
> The same theoretically applies to FieldCache, too - but the party that is afraid of performance problems is too big to fight against that :(
> For getSequentialSubReaders there is no performance problem at all. The binary search of reader-ids inside BaseCompositeReader can still be done with the internal protected array, but public APIs should expose only a unmodifiable List. The same applies to leaves() and children() in IndexReaderContext. This change to list would also allow to make CompositeReader and CompositeReaderContext Iterable<R extends IndexReader>, so some loops would look nice.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (LUCENE-3866) Make CompositeReader.getSequntialSubReaders() and the corresponding IndexReaderContext methods return unmodifiable List

Posted by "Uwe Schindler (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-3866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13228291#comment-13228291 ] 

Uwe Schindler commented on LUCENE-3866:
---------------------------------------

In revision 1300016 I already committed some JavaDocs warnings about this. But the issue should be solved and discussed before Lucene 4.0 is released.
                
> Make CompositeReader.getSequntialSubReaders() and the corresponding IndexReaderContext methods return unmodifiable List<R extends IndexReader>
> ----------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-3866
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3866
>             Project: Lucene - Java
>          Issue Type: Improvement
>            Reporter: Uwe Schindler
>            Assignee: Uwe Schindler
>             Fix For: 4.0
>
>
> Since Lucene 2.9 we have the method getSequentialSubReader() returning IndexReader[]. Based on hardly-to-debug errors in user's code, Robert and me realized that returning an array from a public API is an anti-pattern. If the array is intended to be modifiable (like byte[] in terms,...), it is fine to use arrays in public APIs, but not, if the array must be protected from modification. As IndexReaders are 100% unmodifiable in trunk code (no deletions,...), the only possibility to corrumpt the reader is by modifying the array returned by getSequentialSubReaders(). We should prevent this.
> The same theoretically applies to FieldCache, too - but the party that is afraid of performance problems is too big to fight against that :(
> For getSequentialSubReaders there is no performance problem at all. The binary search of reader-ids inside BaseCompositeReader can still be done with the internal protected array, but public APIs should expose only a unmodifiable List. The same applies to leaves() and children() in IndexReaderContext. This change to list would also allow to make CompositeReader and CompositeReaderContext Iterable<R extends IndexReader>, so some loops would look nice.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (LUCENE-3866) Make CompositeReader.getSequentialSubReaders() and the corresponding IndexReaderContext methods return unmodifiable List

Posted by "Yonik Seeley (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-3866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13228415#comment-13228415 ] 

Yonik Seeley commented on LUCENE-3866:
--------------------------------------

I think we should allow users to access in the most low-level efficient manner, just as lucene can.
Remember, this is expert level stuff!  Seems like at most we would need a javadoc comment saying "don't modify this".

                
> Make CompositeReader.getSequentialSubReaders() and the corresponding IndexReaderContext methods return unmodifiable List<R extends IndexReader>
> -----------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-3866
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3866
>             Project: Lucene - Java
>          Issue Type: Improvement
>            Reporter: Uwe Schindler
>            Assignee: Uwe Schindler
>             Fix For: 4.0
>
>
> Since Lucene 2.9 we have the method getSequentialSubReader() returning IndexReader[]. Based on hardly-to-debug errors in user's code, Robert and me realized that returning an array from a public API is an anti-pattern. If the array is intended to be modifiable (like byte[] in terms,...), it is fine to use arrays in public APIs, but not, if the array must be protected from modification. As IndexReaders are 100% unmodifiable in trunk code (no deletions,...), the only possibility to corrumpt the reader is by modifying the array returned by getSequentialSubReaders(). We should prevent this.
> The same theoretically applies to FieldCache, too - but the party that is afraid of performance problems is too big to fight against that :(
> For getSequentialSubReaders there is no performance problem at all. The binary search of reader-ids inside BaseCompositeReader can still be done with the internal protected array, but public APIs should expose only a unmodifiable List. The same applies to leaves() and children() in IndexReaderContext. This change to list would also allow to make CompositeReader and CompositeReaderContext Iterable<R extends IndexReader>, so some loops would look nice.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (LUCENE-3866) Make CompositeReader.getSequentialSubReaders() and the corresponding IndexReaderContext methods return unmodifiable List

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-3866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13395847#comment-13395847 ] 

Uwe Schindler commented on LUCENE-3866:
---------------------------------------

I will later also upgrade MIGRATE.txt markdown.
                
> Make CompositeReader.getSequentialSubReaders() and the corresponding IndexReaderContext methods return unmodifiable List<R extends IndexReader>
> -----------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-3866
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3866
>             Project: Lucene - Java
>          Issue Type: Improvement
>            Reporter: Uwe Schindler
>            Assignee: Uwe Schindler
>             Fix For: 4.0
>
>         Attachments: LUCENE-3866-step1.patch, LUCENE-3866-step2.patch, LUCENE-3866-step2.patch
>
>
> Since Lucene 2.9 we have the method getSequentialSubReader() returning IndexReader[]. Based on hardly-to-debug errors in user's code, Robert and me realized that returning an array from a public API is an anti-pattern. If the array is intended to be modifiable (like byte[] in terms,...), it is fine to use arrays in public APIs, but not, if the array must be protected from modification. As IndexReaders are 100% unmodifiable in trunk code (no deletions,...), the only possibility to corrumpt the reader is by modifying the array returned by getSequentialSubReaders(). We should prevent this.
> The same theoretically applies to FieldCache, too - but the party that is afraid of performance problems is too big to fight against that :(
> For getSequentialSubReaders there is no performance problem at all. The binary search of reader-ids inside BaseCompositeReader can still be done with the internal protected array, but public APIs should expose only a unmodifiable List. The same applies to leaves() and children() in IndexReaderContext. This change to list would also allow to make CompositeReader and CompositeReaderContext Iterable<R extends IndexReader>, so some loops would look nice.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org