You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Uwe Schindler (JIRA)" <ji...@apache.org> on 2011/01/11 18:56:46 UTC

[jira] Created: (LUCENE-2858) Separate SegmentReaders (and other atomic readers) from composite IndexReaders

Separate SegmentReaders (and other atomic readers) from composite IndexReaders
------------------------------------------------------------------------------

                 Key: LUCENE-2858
                 URL: https://issues.apache.org/jira/browse/LUCENE-2858
             Project: Lucene - Java
          Issue Type: Task
            Reporter: Uwe Schindler
             Fix For: 4.0


With current trunk, whenever you open an IndexReader on a directory you get back a DirectoryReader which is a composite reader. The interface of IndexReader has now lots of methods that simply throw UOE (in fact more than 50% of all methods that are commonly used ones are unuseable now). This confuses users and makes the API hard to understand.

This issue should split "atomic readers" from "reader collections" with a separate API. After that, you are no longer able, to get TermsEnum without wrapping from those composite readers. We currently have helper classes for wrapping (SlowMultiReaderWrapper - please rename, the name is really ugly; or Multi*), those should be retrofitted to implement the correct classes (SlowMultiReaderWrapper would be an atomic reader but takes a composite reader as ctor param, maybe it could also simply take a List<AtomicReader>). In my opinion, maybe composite readers could implement some collection APIs and also have the ReaderUtil method directly built in (possibly as a "view" in the util.Collection sense). In general composite readers do not really need to look like the previous IndexReaders, they could simply be a "collection" of SegmentReaders with some functionality like reopen.

On the other side, atomic readers do not need reopen logic anymore? When a segment changes, you need a new atomic reader? - maybe because of deletions thats not the best idea, but we should investigate. Maybe make the whole reopen logic simplier to use (ast least on the collection reader level).

We should decide about good names, i have no preference at the moment.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Commented: (LUCENE-2858) Separate SegmentReaders (and other atomic readers) from composite IndexReaders

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-2858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12982134#action_12982134 ] 

Michael McCandless commented on LUCENE-2858:
--------------------------------------------

I don't think we should remove setNorm/deleteDocuments, even from the composite reader class.

Deleting docs from IR has advantages over deleting from IW: the change is "live" to any searches running on that IR; you get an immediate count of how many docs were deleted; you can delete by docID.

setNorm is also useful in that it can be use to boost docs (globally), live, if that reader is being used for searching.  When/if we cutover norms -> doc values we'll have to decide what to do about setNorm...

At a higher level, for this "strong typing of atomic vs composite IRs", we shouldn't try to change functionality -- let's just do a rote refactoring, such that methods that now throw UOE on IR are moved to the atomic reader only.

Separately we can think about whether existing functions should be dropped...

> Separate SegmentReaders (and other atomic readers) from composite IndexReaders
> ------------------------------------------------------------------------------
>
>                 Key: LUCENE-2858
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2858
>             Project: Lucene - Java
>          Issue Type: Task
>            Reporter: Uwe Schindler
>             Fix For: 4.0
>
>
> With current trunk, whenever you open an IndexReader on a directory you get back a DirectoryReader which is a composite reader. The interface of IndexReader has now lots of methods that simply throw UOE (in fact more than 50% of all methods that are commonly used ones are unuseable now). This confuses users and makes the API hard to understand.
> This issue should split "atomic readers" from "reader collections" with a separate API. After that, you are no longer able, to get TermsEnum without wrapping from those composite readers. We currently have helper classes for wrapping (SlowMultiReaderWrapper - please rename, the name is really ugly; or Multi*), those should be retrofitted to implement the correct classes (SlowMultiReaderWrapper would be an atomic reader but takes a composite reader as ctor param, maybe it could also simply take a List<AtomicReader>). In my opinion, maybe composite readers could implement some collection APIs and also have the ReaderUtil method directly built in (possibly as a "view" in the util.Collection sense). In general composite readers do not really need to look like the previous IndexReaders, they could simply be a "collection" of SegmentReaders with some functionality like reopen.
> On the other side, atomic readers do not need reopen logic anymore? When a segment changes, you need a new atomic reader? - maybe because of deletions thats not the best idea, but we should investigate. Maybe make the whole reopen logic simplier to use (ast least on the collection reader level).
> We should decide about good names, i have no preference at the moment.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Commented: (LUCENE-2858) Separate SegmentReaders (and other atomic readers) from composite IndexReaders

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-2858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12980215#action_12980215 ] 

Michael McCandless commented on LUCENE-2858:
--------------------------------------------

+1

We very much need this now -- it should be strongly (statically) typed, whether you are using an AtomicIndexReader vs CompositeIndexReader.

And I agree CompositeIndexReader should feel more like a Collection than what IR is today.

> Separate SegmentReaders (and other atomic readers) from composite IndexReaders
> ------------------------------------------------------------------------------
>
>                 Key: LUCENE-2858
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2858
>             Project: Lucene - Java
>          Issue Type: Task
>            Reporter: Uwe Schindler
>             Fix For: 4.0
>
>
> With current trunk, whenever you open an IndexReader on a directory you get back a DirectoryReader which is a composite reader. The interface of IndexReader has now lots of methods that simply throw UOE (in fact more than 50% of all methods that are commonly used ones are unuseable now). This confuses users and makes the API hard to understand.
> This issue should split "atomic readers" from "reader collections" with a separate API. After that, you are no longer able, to get TermsEnum without wrapping from those composite readers. We currently have helper classes for wrapping (SlowMultiReaderWrapper - please rename, the name is really ugly; or Multi*), those should be retrofitted to implement the correct classes (SlowMultiReaderWrapper would be an atomic reader but takes a composite reader as ctor param, maybe it could also simply take a List<AtomicReader>). In my opinion, maybe composite readers could implement some collection APIs and also have the ReaderUtil method directly built in (possibly as a "view" in the util.Collection sense). In general composite readers do not really need to look like the previous IndexReaders, they could simply be a "collection" of SegmentReaders with some functionality like reopen.
> On the other side, atomic readers do not need reopen logic anymore? When a segment changes, you need a new atomic reader? - maybe because of deletions thats not the best idea, but we should investigate. Maybe make the whole reopen logic simplier to use (ast least on the collection reader level).
> We should decide about good names, i have no preference at the moment.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (LUCENE-2858) Separate SegmentReaders (and other atomic readers) from composite IndexReaders

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-2858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13287563#comment-13287563 ] 

Michael McCandless commented on LUCENE-2858:
--------------------------------------------

Not entirely sure why but it looks like the commits here slowed down our NRT reopen latency.  If you look at the nightly bench graph: http://people.apache.org/~mikemccand/lucenebench/nrt.html and click + drag from Jan 2012 to today, annotation R shows we increased from ~46 msec NRT reopen latency to ~50 msec ... could just be hotspot being upset...
                
> Separate SegmentReaders (and other atomic readers) from composite IndexReaders
> ------------------------------------------------------------------------------
>
>                 Key: LUCENE-2858
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2858
>             Project: Lucene - Java
>          Issue Type: Task
>            Reporter: Uwe Schindler
>            Assignee: Uwe Schindler
>            Priority: Blocker
>             Fix For: 4.0
>
>         Attachments: LUCENE-2858-FCinsanity.patch, LUCENE-2858-FixSlowEnsureOpen.patch, LUCENE-2858.patch, LUCENE-2858.patch
>
>
> With current trunk, whenever you open an IndexReader on a directory you get back a DirectoryReader which is a composite reader. The interface of IndexReader has now lots of methods that simply throw UOE (in fact more than 50% of all methods that are commonly used ones are unuseable now). This confuses users and makes the API hard to understand.
> This issue should split "atomic readers" from "reader collections" with a separate API. After that, you are no longer able, to get TermsEnum without wrapping from those composite readers. We currently have helper classes for wrapping (SlowMultiReaderWrapper - please rename, the name is really ugly; or Multi*), those should be retrofitted to implement the correct classes (SlowMultiReaderWrapper would be an atomic reader but takes a composite reader as ctor param, maybe it could also simply take a List<AtomicReader>). In my opinion, maybe composite readers could implement some collection APIs and also have the ReaderUtil method directly built in (possibly as a "view" in the util.Collection sense). In general composite readers do not really need to look like the previous IndexReaders, they could simply be a "collection" of SegmentReaders with some functionality like reopen.
> On the other side, atomic readers do not need reopen logic anymore? When a segment changes, you need a new atomic reader? - maybe because of deletions thats not the best idea, but we should investigate. Maybe make the whole reopen logic simplier to use (ast least on the collection reader level).
> We should decide about good names, i have no preference at the moment.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (LUCENE-2858) Separate SegmentReaders (and other atomic readers) from composite IndexReaders

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-2858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13288135#comment-13288135 ] 

Michael McCandless commented on LUCENE-2858:
--------------------------------------------

bq. This explains the change back after 4055, I thought un-compressed oops responsible for the speedup.

That's what I thought too (and I was very confused that uncompressed oops would improve things)!  But LUCENE-4055 landed on the same day I turned off uncompressed oops... so I think all mysteries are now explained.
                
> Separate SegmentReaders (and other atomic readers) from composite IndexReaders
> ------------------------------------------------------------------------------
>
>                 Key: LUCENE-2858
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2858
>             Project: Lucene - Java
>          Issue Type: Task
>            Reporter: Uwe Schindler
>            Assignee: Uwe Schindler
>            Priority: Blocker
>             Fix For: 4.0
>
>         Attachments: LUCENE-2858-FCinsanity.patch, LUCENE-2858-FixSlowEnsureOpen.patch, LUCENE-2858.patch, LUCENE-2858.patch
>
>
> With current trunk, whenever you open an IndexReader on a directory you get back a DirectoryReader which is a composite reader. The interface of IndexReader has now lots of methods that simply throw UOE (in fact more than 50% of all methods that are commonly used ones are unuseable now). This confuses users and makes the API hard to understand.
> This issue should split "atomic readers" from "reader collections" with a separate API. After that, you are no longer able, to get TermsEnum without wrapping from those composite readers. We currently have helper classes for wrapping (SlowMultiReaderWrapper - please rename, the name is really ugly; or Multi*), those should be retrofitted to implement the correct classes (SlowMultiReaderWrapper would be an atomic reader but takes a composite reader as ctor param, maybe it could also simply take a List<AtomicReader>). In my opinion, maybe composite readers could implement some collection APIs and also have the ReaderUtil method directly built in (possibly as a "view" in the util.Collection sense). In general composite readers do not really need to look like the previous IndexReaders, they could simply be a "collection" of SegmentReaders with some functionality like reopen.
> On the other side, atomic readers do not need reopen logic anymore? When a segment changes, you need a new atomic reader? - maybe because of deletions thats not the best idea, but we should investigate. Maybe make the whole reopen logic simplier to use (ast least on the collection reader level).
> We should decide about good names, i have no preference at the moment.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Commented: (LUCENE-2858) Separate SegmentReaders (and other atomic readers) from composite IndexReaders

Posted by "Earwin Burrfoot (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-2858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12982132#action_12982132 ] 

Earwin Burrfoot commented on LUCENE-2858:
-----------------------------------------

bq. Still, i think we would need this method (somewhere) even with CSF, so that people can change the norms and they instantly take effect for searches.
This still puzzles me. I can strain my imagination, and get people who just need to change norms without reindexing.
But doing this and *requiring* instant turnaround? Kid me not :)


> Separate SegmentReaders (and other atomic readers) from composite IndexReaders
> ------------------------------------------------------------------------------
>
>                 Key: LUCENE-2858
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2858
>             Project: Lucene - Java
>          Issue Type: Task
>            Reporter: Uwe Schindler
>             Fix For: 4.0
>
>
> With current trunk, whenever you open an IndexReader on a directory you get back a DirectoryReader which is a composite reader. The interface of IndexReader has now lots of methods that simply throw UOE (in fact more than 50% of all methods that are commonly used ones are unuseable now). This confuses users and makes the API hard to understand.
> This issue should split "atomic readers" from "reader collections" with a separate API. After that, you are no longer able, to get TermsEnum without wrapping from those composite readers. We currently have helper classes for wrapping (SlowMultiReaderWrapper - please rename, the name is really ugly; or Multi*), those should be retrofitted to implement the correct classes (SlowMultiReaderWrapper would be an atomic reader but takes a composite reader as ctor param, maybe it could also simply take a List<AtomicReader>). In my opinion, maybe composite readers could implement some collection APIs and also have the ReaderUtil method directly built in (possibly as a "view" in the util.Collection sense). In general composite readers do not really need to look like the previous IndexReaders, they could simply be a "collection" of SegmentReaders with some functionality like reopen.
> On the other side, atomic readers do not need reopen logic anymore? When a segment changes, you need a new atomic reader? - maybe because of deletions thats not the best idea, but we should investigate. Maybe make the whole reopen logic simplier to use (ast least on the collection reader level).
> We should decide about good names, i have no preference at the moment.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Resolved] (LUCENE-2858) Separate SegmentReaders (and other atomic readers) from composite IndexReaders

Posted by "Uwe Schindler (Resolved) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-2858?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Uwe Schindler resolved LUCENE-2858.
-----------------------------------

    Resolution: Fixed

Committed trunk revision: 1238085
                
> Separate SegmentReaders (and other atomic readers) from composite IndexReaders
> ------------------------------------------------------------------------------
>
>                 Key: LUCENE-2858
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2858
>             Project: Lucene - Java
>          Issue Type: Task
>            Reporter: Uwe Schindler
>            Assignee: Uwe Schindler
>            Priority: Blocker
>             Fix For: 4.0
>
>         Attachments: LUCENE-2858.patch, LUCENE-2858.patch
>
>
> With current trunk, whenever you open an IndexReader on a directory you get back a DirectoryReader which is a composite reader. The interface of IndexReader has now lots of methods that simply throw UOE (in fact more than 50% of all methods that are commonly used ones are unuseable now). This confuses users and makes the API hard to understand.
> This issue should split "atomic readers" from "reader collections" with a separate API. After that, you are no longer able, to get TermsEnum without wrapping from those composite readers. We currently have helper classes for wrapping (SlowMultiReaderWrapper - please rename, the name is really ugly; or Multi*), those should be retrofitted to implement the correct classes (SlowMultiReaderWrapper would be an atomic reader but takes a composite reader as ctor param, maybe it could also simply take a List<AtomicReader>). In my opinion, maybe composite readers could implement some collection APIs and also have the ReaderUtil method directly built in (possibly as a "view" in the util.Collection sense). In general composite readers do not really need to look like the previous IndexReaders, they could simply be a "collection" of SegmentReaders with some functionality like reopen.
> On the other side, atomic readers do not need reopen logic anymore? When a segment changes, you need a new atomic reader? - maybe because of deletions thats not the best idea, but we should investigate. Maybe make the whole reopen logic simplier to use (ast least on the collection reader level).
> We should decide about good names, i have no preference at the moment.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Updated] (LUCENE-2858) Separate SegmentReaders (and other atomic readers) from composite IndexReaders

Posted by "Uwe Schindler (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-2858?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Uwe Schindler updated LUCENE-2858:
----------------------------------

    Priority: Blocker  (was: Major)
    
> Separate SegmentReaders (and other atomic readers) from composite IndexReaders
> ------------------------------------------------------------------------------
>
>                 Key: LUCENE-2858
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2858
>             Project: Lucene - Java
>          Issue Type: Task
>            Reporter: Uwe Schindler
>            Assignee: Uwe Schindler
>            Priority: Blocker
>             Fix For: 4.0
>
>
> With current trunk, whenever you open an IndexReader on a directory you get back a DirectoryReader which is a composite reader. The interface of IndexReader has now lots of methods that simply throw UOE (in fact more than 50% of all methods that are commonly used ones are unuseable now). This confuses users and makes the API hard to understand.
> This issue should split "atomic readers" from "reader collections" with a separate API. After that, you are no longer able, to get TermsEnum without wrapping from those composite readers. We currently have helper classes for wrapping (SlowMultiReaderWrapper - please rename, the name is really ugly; or Multi*), those should be retrofitted to implement the correct classes (SlowMultiReaderWrapper would be an atomic reader but takes a composite reader as ctor param, maybe it could also simply take a List<AtomicReader>). In my opinion, maybe composite readers could implement some collection APIs and also have the ReaderUtil method directly built in (possibly as a "view" in the util.Collection sense). In general composite readers do not really need to look like the previous IndexReaders, they could simply be a "collection" of SegmentReaders with some functionality like reopen.
> On the other side, atomic readers do not need reopen logic anymore? When a segment changes, you need a new atomic reader? - maybe because of deletions thats not the best idea, but we should investigate. Maybe make the whole reopen logic simplier to use (ast least on the collection reader level).
> We should decide about good names, i have no preference at the moment.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (LUCENE-2858) Separate SegmentReaders (and other atomic readers) from composite IndexReaders

Posted by "Robert Muir (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-2858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13197134#comment-13197134 ] 

Robert Muir commented on LUCENE-2858:
-------------------------------------

I think somehow ensureOpen is broken here in some situations (possibly slowmultireaderwrapper):

{noformat}
ant test -Dtestcase=TestReaderClosed -Dtestmethod=test -Dtests.seed=42907a63342da06b:-1490dd5f0d0f26d9:-7e3c360ea2d32539 -Dargs="-Dfile.encoding=UTF-8"
{noformat}


                
> Separate SegmentReaders (and other atomic readers) from composite IndexReaders
> ------------------------------------------------------------------------------
>
>                 Key: LUCENE-2858
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2858
>             Project: Lucene - Java
>          Issue Type: Task
>            Reporter: Uwe Schindler
>            Assignee: Uwe Schindler
>            Priority: Blocker
>             Fix For: 4.0
>
>         Attachments: LUCENE-2858-FCinsanity.patch, LUCENE-2858.patch, LUCENE-2858.patch
>
>
> With current trunk, whenever you open an IndexReader on a directory you get back a DirectoryReader which is a composite reader. The interface of IndexReader has now lots of methods that simply throw UOE (in fact more than 50% of all methods that are commonly used ones are unuseable now). This confuses users and makes the API hard to understand.
> This issue should split "atomic readers" from "reader collections" with a separate API. After that, you are no longer able, to get TermsEnum without wrapping from those composite readers. We currently have helper classes for wrapping (SlowMultiReaderWrapper - please rename, the name is really ugly; or Multi*), those should be retrofitted to implement the correct classes (SlowMultiReaderWrapper would be an atomic reader but takes a composite reader as ctor param, maybe it could also simply take a List<AtomicReader>). In my opinion, maybe composite readers could implement some collection APIs and also have the ReaderUtil method directly built in (possibly as a "view" in the util.Collection sense). In general composite readers do not really need to look like the previous IndexReaders, they could simply be a "collection" of SegmentReaders with some functionality like reopen.
> On the other side, atomic readers do not need reopen logic anymore? When a segment changes, you need a new atomic reader? - maybe because of deletions thats not the best idea, but we should investigate. Maybe make the whole reopen logic simplier to use (ast least on the collection reader level).
> We should decide about good names, i have no preference at the moment.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (LUCENE-2858) Separate SegmentReaders (and other atomic readers) from composite IndexReaders

Posted by "Uwe Schindler (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-2858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13188076#comment-13188076 ] 

Uwe Schindler commented on LUCENE-2858:
---------------------------------------

I will hopefully start this weekend. I just have a schedule currently that has no slots left for that.

Robert and I wanted to start a branch soon, maybe I simply do that as first step soon!
                
> Separate SegmentReaders (and other atomic readers) from composite IndexReaders
> ------------------------------------------------------------------------------
>
>                 Key: LUCENE-2858
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2858
>             Project: Lucene - Java
>          Issue Type: Task
>            Reporter: Uwe Schindler
>            Assignee: Uwe Schindler
>            Priority: Blocker
>             Fix For: 4.0
>
>
> With current trunk, whenever you open an IndexReader on a directory you get back a DirectoryReader which is a composite reader. The interface of IndexReader has now lots of methods that simply throw UOE (in fact more than 50% of all methods that are commonly used ones are unuseable now). This confuses users and makes the API hard to understand.
> This issue should split "atomic readers" from "reader collections" with a separate API. After that, you are no longer able, to get TermsEnum without wrapping from those composite readers. We currently have helper classes for wrapping (SlowMultiReaderWrapper - please rename, the name is really ugly; or Multi*), those should be retrofitted to implement the correct classes (SlowMultiReaderWrapper would be an atomic reader but takes a composite reader as ctor param, maybe it could also simply take a List<AtomicReader>). In my opinion, maybe composite readers could implement some collection APIs and also have the ReaderUtil method directly built in (possibly as a "view" in the util.Collection sense). In general composite readers do not really need to look like the previous IndexReaders, they could simply be a "collection" of SegmentReaders with some functionality like reopen.
> On the other side, atomic readers do not need reopen logic anymore? When a segment changes, you need a new atomic reader? - maybe because of deletions thats not the best idea, but we should investigate. Maybe make the whole reopen logic simplier to use (ast least on the collection reader level).
> We should decide about good names, i have no preference at the moment.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Updated] (LUCENE-2858) Separate SegmentReaders (and other atomic readers) from composite IndexReaders

Posted by "Uwe Schindler (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-2858?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Uwe Schindler updated LUCENE-2858:
----------------------------------

    Attachment: LUCENE-2858.patch

Patch with nocommits removed.
                
> Separate SegmentReaders (and other atomic readers) from composite IndexReaders
> ------------------------------------------------------------------------------
>
>                 Key: LUCENE-2858
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2858
>             Project: Lucene - Java
>          Issue Type: Task
>            Reporter: Uwe Schindler
>            Assignee: Uwe Schindler
>            Priority: Blocker
>             Fix For: 4.0
>
>         Attachments: LUCENE-2858.patch, LUCENE-2858.patch
>
>
> With current trunk, whenever you open an IndexReader on a directory you get back a DirectoryReader which is a composite reader. The interface of IndexReader has now lots of methods that simply throw UOE (in fact more than 50% of all methods that are commonly used ones are unuseable now). This confuses users and makes the API hard to understand.
> This issue should split "atomic readers" from "reader collections" with a separate API. After that, you are no longer able, to get TermsEnum without wrapping from those composite readers. We currently have helper classes for wrapping (SlowMultiReaderWrapper - please rename, the name is really ugly; or Multi*), those should be retrofitted to implement the correct classes (SlowMultiReaderWrapper would be an atomic reader but takes a composite reader as ctor param, maybe it could also simply take a List<AtomicReader>). In my opinion, maybe composite readers could implement some collection APIs and also have the ReaderUtil method directly built in (possibly as a "view" in the util.Collection sense). In general composite readers do not really need to look like the previous IndexReaders, they could simply be a "collection" of SegmentReaders with some functionality like reopen.
> On the other side, atomic readers do not need reopen logic anymore? When a segment changes, you need a new atomic reader? - maybe because of deletions thats not the best idea, but we should investigate. Maybe make the whole reopen logic simplier to use (ast least on the collection reader level).
> We should decide about good names, i have no preference at the moment.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Updated] (LUCENE-2858) Separate SegmentReaders (and other atomic readers) from composite IndexReaders

Posted by "Uwe Schindler (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-2858?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Uwe Schindler updated LUCENE-2858:
----------------------------------

    Attachment: LUCENE-2858-FCinsanity.patch

Patch that adds FC insanity checking for slow wrappers back. Will commit now.
                
> Separate SegmentReaders (and other atomic readers) from composite IndexReaders
> ------------------------------------------------------------------------------
>
>                 Key: LUCENE-2858
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2858
>             Project: Lucene - Java
>          Issue Type: Task
>            Reporter: Uwe Schindler
>            Assignee: Uwe Schindler
>            Priority: Blocker
>             Fix For: 4.0
>
>         Attachments: LUCENE-2858-FCinsanity.patch, LUCENE-2858.patch, LUCENE-2858.patch
>
>
> With current trunk, whenever you open an IndexReader on a directory you get back a DirectoryReader which is a composite reader. The interface of IndexReader has now lots of methods that simply throw UOE (in fact more than 50% of all methods that are commonly used ones are unuseable now). This confuses users and makes the API hard to understand.
> This issue should split "atomic readers" from "reader collections" with a separate API. After that, you are no longer able, to get TermsEnum without wrapping from those composite readers. We currently have helper classes for wrapping (SlowMultiReaderWrapper - please rename, the name is really ugly; or Multi*), those should be retrofitted to implement the correct classes (SlowMultiReaderWrapper would be an atomic reader but takes a composite reader as ctor param, maybe it could also simply take a List<AtomicReader>). In my opinion, maybe composite readers could implement some collection APIs and also have the ReaderUtil method directly built in (possibly as a "view" in the util.Collection sense). In general composite readers do not really need to look like the previous IndexReaders, they could simply be a "collection" of SegmentReaders with some functionality like reopen.
> On the other side, atomic readers do not need reopen logic anymore? When a segment changes, you need a new atomic reader? - maybe because of deletions thats not the best idea, but we should investigate. Maybe make the whole reopen logic simplier to use (ast least on the collection reader level).
> We should decide about good names, i have no preference at the moment.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (LUCENE-2858) Separate SegmentReaders (and other atomic readers) from composite IndexReaders

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-2858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13288133#comment-13288133 ] 

Uwe Schindler commented on LUCENE-2858:
---------------------------------------

Thanks Mike for taking care. For me this looked crazy, because refactoring like this should not change perf.

This explains the change back  after 4055, I thought un-compressed oops responsible for the speedup.
                
> Separate SegmentReaders (and other atomic readers) from composite IndexReaders
> ------------------------------------------------------------------------------
>
>                 Key: LUCENE-2858
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2858
>             Project: Lucene - Java
>          Issue Type: Task
>            Reporter: Uwe Schindler
>            Assignee: Uwe Schindler
>            Priority: Blocker
>             Fix For: 4.0
>
>         Attachments: LUCENE-2858-FCinsanity.patch, LUCENE-2858-FixSlowEnsureOpen.patch, LUCENE-2858.patch, LUCENE-2858.patch
>
>
> With current trunk, whenever you open an IndexReader on a directory you get back a DirectoryReader which is a composite reader. The interface of IndexReader has now lots of methods that simply throw UOE (in fact more than 50% of all methods that are commonly used ones are unuseable now). This confuses users and makes the API hard to understand.
> This issue should split "atomic readers" from "reader collections" with a separate API. After that, you are no longer able, to get TermsEnum without wrapping from those composite readers. We currently have helper classes for wrapping (SlowMultiReaderWrapper - please rename, the name is really ugly; or Multi*), those should be retrofitted to implement the correct classes (SlowMultiReaderWrapper would be an atomic reader but takes a composite reader as ctor param, maybe it could also simply take a List<AtomicReader>). In my opinion, maybe composite readers could implement some collection APIs and also have the ReaderUtil method directly built in (possibly as a "view" in the util.Collection sense). In general composite readers do not really need to look like the previous IndexReaders, they could simply be a "collection" of SegmentReaders with some functionality like reopen.
> On the other side, atomic readers do not need reopen logic anymore? When a segment changes, you need a new atomic reader? - maybe because of deletions thats not the best idea, but we should investigate. Maybe make the whole reopen logic simplier to use (ast least on the collection reader level).
> We should decide about good names, i have no preference at the moment.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Issue Comment Edited] (LUCENE-2858) Separate SegmentReaders (and other atomic readers) from composite IndexReaders

Posted by "Uwe Schindler (Issue Comment Edited) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-2858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13196479#comment-13196479 ] 

Uwe Schindler edited comment on LUCENE-2858 at 1/30/12 10:56 PM:
-----------------------------------------------------------------

Here the final patch. Thanks for the good work and lots of help from Robert and Mike.

As this patch contains heavy changes, we will commit it as soon as possible so work can go on. It would be nice, if you would not commit anything until this is done.

There are some minor issues open:
- DirectoryReader is final and has only static factory methods. It is not possible to subclass it in any way. The problem is mainly Solr, as Solr accesses directory(), IndexCommits,... and therefore cannot work on abstract IndexReader anymore. This should be changed, by e.g. handling reopening in the IRFactory, also versions, commits,... Currently its not possible to implement any other IRFactory that returns something else.
- The PayloadProcessorProvider has a broken API, this should be fixed. The current patch mimics the old behaviour, but not 100%.
- ParallelReader is now atomic. We should add a sugar wrapper method to allow synchronized composite readers (with same segment sizes) to be aligned with MultiReaders or wrapped by Slow.

The remaining issues will be fixed in later issues!
                
      was (Author: thetaphi):
    Here the final patch. Thanks for the good work and lots of help from Robert and Mike.

As this patch contains heavy changes, we will commit it as soon as possible so work can go on. It would be nice, if you would not commit anything until this is done.

There are some minor issues open:
- Directory is final and has only static factory methods. It is not possible to subclass it in any way. The problem is mainly Solr, as Solr accesses directory(), IndexCommits,... and therefore cannot work on abstract IndexReader anymore. This should be changed, by e.g. handling reopening in the IRFactory, also versions, commits,... Currently its not possible to implement any other IRFactory that returns something else.
- The PayloadProcessorProvider has a broken API, this should be fixed. The current patch mimics the old behaviour, but not 100%.
- ParallelReader is now atomic. We should add a sugar wrapper method to allow synchronized composite readers (with same segment sizes) to be aligned with MultiReaders or wrapped by Slow.

The remaining issues will be fixed in later issues!
                  
> Separate SegmentReaders (and other atomic readers) from composite IndexReaders
> ------------------------------------------------------------------------------
>
>                 Key: LUCENE-2858
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2858
>             Project: Lucene - Java
>          Issue Type: Task
>            Reporter: Uwe Schindler
>            Assignee: Uwe Schindler
>            Priority: Blocker
>             Fix For: 4.0
>
>         Attachments: LUCENE-2858.patch, LUCENE-2858.patch
>
>
> With current trunk, whenever you open an IndexReader on a directory you get back a DirectoryReader which is a composite reader. The interface of IndexReader has now lots of methods that simply throw UOE (in fact more than 50% of all methods that are commonly used ones are unuseable now). This confuses users and makes the API hard to understand.
> This issue should split "atomic readers" from "reader collections" with a separate API. After that, you are no longer able, to get TermsEnum without wrapping from those composite readers. We currently have helper classes for wrapping (SlowMultiReaderWrapper - please rename, the name is really ugly; or Multi*), those should be retrofitted to implement the correct classes (SlowMultiReaderWrapper would be an atomic reader but takes a composite reader as ctor param, maybe it could also simply take a List<AtomicReader>). In my opinion, maybe composite readers could implement some collection APIs and also have the ReaderUtil method directly built in (possibly as a "view" in the util.Collection sense). In general composite readers do not really need to look like the previous IndexReaders, they could simply be a "collection" of SegmentReaders with some functionality like reopen.
> On the other side, atomic readers do not need reopen logic anymore? When a segment changes, you need a new atomic reader? - maybe because of deletions thats not the best idea, but we should investigate. Maybe make the whole reopen logic simplier to use (ast least on the collection reader level).
> We should decide about good names, i have no preference at the moment.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (LUCENE-2858) Separate SegmentReaders (and other atomic readers) from composite IndexReaders

Posted by "Yonik Seeley (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-2858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13195768#comment-13195768 ] 

Yonik Seeley commented on LUCENE-2858:
--------------------------------------

bq.  SlowMultiReaderWrapper - please rename, the name is really ugly; or Multi*

+1, the Slow* is misleading as it makes it seem like there's a faster way you should be doing it.
CompositeReaderWrapper should be fine.  And no, it doesn't sound too "cool" for the hypothetical developers who use that as a criteria when coding ;-)

Other possibilities include AtomicReaderEmulator, AtomicEmulatorReader, CompositeAsAtomicReader, etc
                
> Separate SegmentReaders (and other atomic readers) from composite IndexReaders
> ------------------------------------------------------------------------------
>
>                 Key: LUCENE-2858
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2858
>             Project: Lucene - Java
>          Issue Type: Task
>            Reporter: Uwe Schindler
>            Assignee: Uwe Schindler
>            Priority: Blocker
>             Fix For: 4.0
>
>
> With current trunk, whenever you open an IndexReader on a directory you get back a DirectoryReader which is a composite reader. The interface of IndexReader has now lots of methods that simply throw UOE (in fact more than 50% of all methods that are commonly used ones are unuseable now). This confuses users and makes the API hard to understand.
> This issue should split "atomic readers" from "reader collections" with a separate API. After that, you are no longer able, to get TermsEnum without wrapping from those composite readers. We currently have helper classes for wrapping (SlowMultiReaderWrapper - please rename, the name is really ugly; or Multi*), those should be retrofitted to implement the correct classes (SlowMultiReaderWrapper would be an atomic reader but takes a composite reader as ctor param, maybe it could also simply take a List<AtomicReader>). In my opinion, maybe composite readers could implement some collection APIs and also have the ReaderUtil method directly built in (possibly as a "view" in the util.Collection sense). In general composite readers do not really need to look like the previous IndexReaders, they could simply be a "collection" of SegmentReaders with some functionality like reopen.
> On the other side, atomic readers do not need reopen logic anymore? When a segment changes, you need a new atomic reader? - maybe because of deletions thats not the best idea, but we should investigate. Maybe make the whole reopen logic simplier to use (ast least on the collection reader level).
> We should decide about good names, i have no preference at the moment.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Commented: (LUCENE-2858) Separate SegmentReaders (and other atomic readers) from composite IndexReaders

Posted by "Robert Muir (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-2858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12980216#action_12980216 ] 

Robert Muir commented on LUCENE-2858:
-------------------------------------

bq. (SlowMultiReaderWrapper - please rename, the name is really ugly; or Multi*)

My problem with Multi* is that this makes things sound cool and powerful.
If they are really slow transition mechanisms, I prefer Slow* as it will actually discourage their use.

But I agree with this issue in general and it would be great for it to be cleaned up.


> Separate SegmentReaders (and other atomic readers) from composite IndexReaders
> ------------------------------------------------------------------------------
>
>                 Key: LUCENE-2858
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2858
>             Project: Lucene - Java
>          Issue Type: Task
>            Reporter: Uwe Schindler
>             Fix For: 4.0
>
>
> With current trunk, whenever you open an IndexReader on a directory you get back a DirectoryReader which is a composite reader. The interface of IndexReader has now lots of methods that simply throw UOE (in fact more than 50% of all methods that are commonly used ones are unuseable now). This confuses users and makes the API hard to understand.
> This issue should split "atomic readers" from "reader collections" with a separate API. After that, you are no longer able, to get TermsEnum without wrapping from those composite readers. We currently have helper classes for wrapping (SlowMultiReaderWrapper - please rename, the name is really ugly; or Multi*), those should be retrofitted to implement the correct classes (SlowMultiReaderWrapper would be an atomic reader but takes a composite reader as ctor param, maybe it could also simply take a List<AtomicReader>). In my opinion, maybe composite readers could implement some collection APIs and also have the ReaderUtil method directly built in (possibly as a "view" in the util.Collection sense). In general composite readers do not really need to look like the previous IndexReaders, they could simply be a "collection" of SegmentReaders with some functionality like reopen.
> On the other side, atomic readers do not need reopen logic anymore? When a segment changes, you need a new atomic reader? - maybe because of deletions thats not the best idea, but we should investigate. Maybe make the whole reopen logic simplier to use (ast least on the collection reader level).
> We should decide about good names, i have no preference at the moment.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Issue Comment Edited] (LUCENE-2858) Separate SegmentReaders (and other atomic readers) from composite IndexReaders

Posted by "Uwe Schindler (Issue Comment Edited) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-2858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13190559#comment-13190559 ] 

Uwe Schindler edited comment on LUCENE-2858 at 1/21/12 11:51 PM:
-----------------------------------------------------------------

I created the branch at [https://svn.apache.org/repos/asf/lucene/dev/branches/lucene2858] and committed my first steps:

- Add CompositeIndexReader and AtomicIndexReader
- Moved methods around, still not yet finished (see below)
- DirectoryReader is public now and is returned by IR.open() and IW.getReader()

TODO:

- IR.openIfChanged makes no sense for any reader other than DirectoryReader, let's move it also there
- isCurrent and getVersion() is also useless for atomic readers and composite readers except DR
- The strange generics in ReaderContext caused by the final field will go away, when changing reader field to aaccessor method returning the correct type (by return type overloading).

Comments welcome and also heavy committing.

                
      was (Author: thetaphi):
    I created the branch at [https://svn.apache.org/repos/asf/lucene/dev/branches/lucene2858] and committed my first steps:

- Add CompositeIndexReader and AtomicIndexReader
- Moved methods around, still now finished (see below)
- DirectoryReader is public now and is returned by IR.open() and IW.getReader()

TODO:

- IR.openIfChanged makes no sense for any reader other than DirectoryReader, let's move it also there
- isCurrent and getVersion() is also useless for atomic readers and composite readers except DR
- The strange generics in ReaderContext caused by the final field will go away, when changing reader field to aaccessor method returning the correct type (by return type overloading).

Comments welcome and also heavy committing.

                  
> Separate SegmentReaders (and other atomic readers) from composite IndexReaders
> ------------------------------------------------------------------------------
>
>                 Key: LUCENE-2858
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2858
>             Project: Lucene - Java
>          Issue Type: Task
>            Reporter: Uwe Schindler
>            Assignee: Uwe Schindler
>            Priority: Blocker
>             Fix For: 4.0
>
>
> With current trunk, whenever you open an IndexReader on a directory you get back a DirectoryReader which is a composite reader. The interface of IndexReader has now lots of methods that simply throw UOE (in fact more than 50% of all methods that are commonly used ones are unuseable now). This confuses users and makes the API hard to understand.
> This issue should split "atomic readers" from "reader collections" with a separate API. After that, you are no longer able, to get TermsEnum without wrapping from those composite readers. We currently have helper classes for wrapping (SlowMultiReaderWrapper - please rename, the name is really ugly; or Multi*), those should be retrofitted to implement the correct classes (SlowMultiReaderWrapper would be an atomic reader but takes a composite reader as ctor param, maybe it could also simply take a List<AtomicReader>). In my opinion, maybe composite readers could implement some collection APIs and also have the ReaderUtil method directly built in (possibly as a "view" in the util.Collection sense). In general composite readers do not really need to look like the previous IndexReaders, they could simply be a "collection" of SegmentReaders with some functionality like reopen.
> On the other side, atomic readers do not need reopen logic anymore? When a segment changes, you need a new atomic reader? - maybe because of deletions thats not the best idea, but we should investigate. Maybe make the whole reopen logic simplier to use (ast least on the collection reader level).
> We should decide about good names, i have no preference at the moment.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (LUCENE-2858) Separate SegmentReaders (and other atomic readers) from composite IndexReaders

Posted by "Simon Willnauer (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-2858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13188005#comment-13188005 ] 

Simon Willnauer commented on LUCENE-2858:
-----------------------------------------

uwe are you gonna work on this?
                
> Separate SegmentReaders (and other atomic readers) from composite IndexReaders
> ------------------------------------------------------------------------------
>
>                 Key: LUCENE-2858
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2858
>             Project: Lucene - Java
>          Issue Type: Task
>            Reporter: Uwe Schindler
>            Assignee: Uwe Schindler
>            Priority: Blocker
>             Fix For: 4.0
>
>
> With current trunk, whenever you open an IndexReader on a directory you get back a DirectoryReader which is a composite reader. The interface of IndexReader has now lots of methods that simply throw UOE (in fact more than 50% of all methods that are commonly used ones are unuseable now). This confuses users and makes the API hard to understand.
> This issue should split "atomic readers" from "reader collections" with a separate API. After that, you are no longer able, to get TermsEnum without wrapping from those composite readers. We currently have helper classes for wrapping (SlowMultiReaderWrapper - please rename, the name is really ugly; or Multi*), those should be retrofitted to implement the correct classes (SlowMultiReaderWrapper would be an atomic reader but takes a composite reader as ctor param, maybe it could also simply take a List<AtomicReader>). In my opinion, maybe composite readers could implement some collection APIs and also have the ReaderUtil method directly built in (possibly as a "view" in the util.Collection sense). In general composite readers do not really need to look like the previous IndexReaders, they could simply be a "collection" of SegmentReaders with some functionality like reopen.
> On the other side, atomic readers do not need reopen logic anymore? When a segment changes, you need a new atomic reader? - maybe because of deletions thats not the best idea, but we should investigate. Maybe make the whole reopen logic simplier to use (ast least on the collection reader level).
> We should decide about good names, i have no preference at the moment.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Updated] (LUCENE-2858) Separate SegmentReaders (and other atomic readers) from composite IndexReaders

Posted by "Uwe Schindler (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-2858?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Uwe Schindler updated LUCENE-2858:
----------------------------------

    Attachment: LUCENE-2858-FixSlowEnsureOpen.patch

Hi Robert, this patch fixes the problem:

SlowMultiReaderWrapper creates the fields and liveDocs on its ctor (as its expensive) and reuses. This is no problem, but the test did something very bad: It closed not the slow reader but the wrapped DirectoryReader. As fields() was not delegated to the DR or MultiFields with DR, Slow returned its own cached instance. Slow's ensureOpen was not throwing Ex, as it was still open.

Theoretically, Slow* should incRef the underlying indexreader and decRef on close(). But that would require, that you close the SlowReader after wrapping.

The current solution is not optimal but makes it easy to wrap without explicitely closing the slow wrapper.

The fix was to call in.ensureOpen() in slow before the cached instance was returned.
                
> Separate SegmentReaders (and other atomic readers) from composite IndexReaders
> ------------------------------------------------------------------------------
>
>                 Key: LUCENE-2858
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2858
>             Project: Lucene - Java
>          Issue Type: Task
>            Reporter: Uwe Schindler
>            Assignee: Uwe Schindler
>            Priority: Blocker
>             Fix For: 4.0
>
>         Attachments: LUCENE-2858-FCinsanity.patch, LUCENE-2858-FixSlowEnsureOpen.patch, LUCENE-2858.patch, LUCENE-2858.patch
>
>
> With current trunk, whenever you open an IndexReader on a directory you get back a DirectoryReader which is a composite reader. The interface of IndexReader has now lots of methods that simply throw UOE (in fact more than 50% of all methods that are commonly used ones are unuseable now). This confuses users and makes the API hard to understand.
> This issue should split "atomic readers" from "reader collections" with a separate API. After that, you are no longer able, to get TermsEnum without wrapping from those composite readers. We currently have helper classes for wrapping (SlowMultiReaderWrapper - please rename, the name is really ugly; or Multi*), those should be retrofitted to implement the correct classes (SlowMultiReaderWrapper would be an atomic reader but takes a composite reader as ctor param, maybe it could also simply take a List<AtomicReader>). In my opinion, maybe composite readers could implement some collection APIs and also have the ReaderUtil method directly built in (possibly as a "view" in the util.Collection sense). In general composite readers do not really need to look like the previous IndexReaders, they could simply be a "collection" of SegmentReaders with some functionality like reopen.
> On the other side, atomic readers do not need reopen logic anymore? When a segment changes, you need a new atomic reader? - maybe because of deletions thats not the best idea, but we should investigate. Maybe make the whole reopen logic simplier to use (ast least on the collection reader level).
> We should decide about good names, i have no preference at the moment.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Commented: (LUCENE-2858) Separate SegmentReaders (and other atomic readers) from composite IndexReaders

Posted by "Robert Muir (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-2858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12982115#action_12982115 ] 

Robert Muir commented on LUCENE-2858:
-------------------------------------

Ah, ok. sorry i was confused. Still, i think we would need this method (somewhere) even 
with CSF, so that people can change the norms and they instantly take effect for searches.


> Separate SegmentReaders (and other atomic readers) from composite IndexReaders
> ------------------------------------------------------------------------------
>
>                 Key: LUCENE-2858
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2858
>             Project: Lucene - Java
>          Issue Type: Task
>            Reporter: Uwe Schindler
>             Fix For: 4.0
>
>
> With current trunk, whenever you open an IndexReader on a directory you get back a DirectoryReader which is a composite reader. The interface of IndexReader has now lots of methods that simply throw UOE (in fact more than 50% of all methods that are commonly used ones are unuseable now). This confuses users and makes the API hard to understand.
> This issue should split "atomic readers" from "reader collections" with a separate API. After that, you are no longer able, to get TermsEnum without wrapping from those composite readers. We currently have helper classes for wrapping (SlowMultiReaderWrapper - please rename, the name is really ugly; or Multi*), those should be retrofitted to implement the correct classes (SlowMultiReaderWrapper would be an atomic reader but takes a composite reader as ctor param, maybe it could also simply take a List<AtomicReader>). In my opinion, maybe composite readers could implement some collection APIs and also have the ReaderUtil method directly built in (possibly as a "view" in the util.Collection sense). In general composite readers do not really need to look like the previous IndexReaders, they could simply be a "collection" of SegmentReaders with some functionality like reopen.
> On the other side, atomic readers do not need reopen logic anymore? When a segment changes, you need a new atomic reader? - maybe because of deletions thats not the best idea, but we should investigate. Maybe make the whole reopen logic simplier to use (ast least on the collection reader level).
> We should decide about good names, i have no preference at the moment.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (LUCENE-2858) Separate SegmentReaders (and other atomic readers) from composite IndexReaders

Posted by "Uwe Schindler (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-2858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13190559#comment-13190559 ] 

Uwe Schindler commented on LUCENE-2858:
---------------------------------------

I created the branch at [https://svn.apache.org/repos/asf/lucene/dev/branches/lucene2858] and committed my first steps:

- Add CompositeIndexReader and AtomicIndexReader
- Moved methods around, still now finished (see below)
- DirectoryReader is public now and is returned by IR.open() and IW.getReader()

TODO:

- IR.openIfChanged makes no sense for any reader other than DirectoryReader, let's move it also there
- isCurrent and getVersion() is also useless for atomic readers and composite readers except DR
- The strange generics in ReaderContext caused by the final field will go away, when changing reader field to aaccessor method returning the correct type (by return type overloading).

Comments welcome and also heavy committing.

                
> Separate SegmentReaders (and other atomic readers) from composite IndexReaders
> ------------------------------------------------------------------------------
>
>                 Key: LUCENE-2858
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2858
>             Project: Lucene - Java
>          Issue Type: Task
>            Reporter: Uwe Schindler
>            Assignee: Uwe Schindler
>            Priority: Blocker
>             Fix For: 4.0
>
>
> With current trunk, whenever you open an IndexReader on a directory you get back a DirectoryReader which is a composite reader. The interface of IndexReader has now lots of methods that simply throw UOE (in fact more than 50% of all methods that are commonly used ones are unuseable now). This confuses users and makes the API hard to understand.
> This issue should split "atomic readers" from "reader collections" with a separate API. After that, you are no longer able, to get TermsEnum without wrapping from those composite readers. We currently have helper classes for wrapping (SlowMultiReaderWrapper - please rename, the name is really ugly; or Multi*), those should be retrofitted to implement the correct classes (SlowMultiReaderWrapper would be an atomic reader but takes a composite reader as ctor param, maybe it could also simply take a List<AtomicReader>). In my opinion, maybe composite readers could implement some collection APIs and also have the ReaderUtil method directly built in (possibly as a "view" in the util.Collection sense). In general composite readers do not really need to look like the previous IndexReaders, they could simply be a "collection" of SegmentReaders with some functionality like reopen.
> On the other side, atomic readers do not need reopen logic anymore? When a segment changes, you need a new atomic reader? - maybe because of deletions thats not the best idea, but we should investigate. Maybe make the whole reopen logic simplier to use (ast least on the collection reader level).
> We should decide about good names, i have no preference at the moment.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (LUCENE-2858) Separate SegmentReaders (and other atomic readers) from composite IndexReaders

Posted by "Uwe Schindler (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-2858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13197187#comment-13197187 ] 

Uwe Schindler commented on LUCENE-2858:
---------------------------------------

Committed fix in revision: 1238788
I removed the extra in.ensureOpen() for hasDeletions(), as this was only a null-check.
                
> Separate SegmentReaders (and other atomic readers) from composite IndexReaders
> ------------------------------------------------------------------------------
>
>                 Key: LUCENE-2858
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2858
>             Project: Lucene - Java
>          Issue Type: Task
>            Reporter: Uwe Schindler
>            Assignee: Uwe Schindler
>            Priority: Blocker
>             Fix For: 4.0
>
>         Attachments: LUCENE-2858-FCinsanity.patch, LUCENE-2858-FixSlowEnsureOpen.patch, LUCENE-2858.patch, LUCENE-2858.patch
>
>
> With current trunk, whenever you open an IndexReader on a directory you get back a DirectoryReader which is a composite reader. The interface of IndexReader has now lots of methods that simply throw UOE (in fact more than 50% of all methods that are commonly used ones are unuseable now). This confuses users and makes the API hard to understand.
> This issue should split "atomic readers" from "reader collections" with a separate API. After that, you are no longer able, to get TermsEnum without wrapping from those composite readers. We currently have helper classes for wrapping (SlowMultiReaderWrapper - please rename, the name is really ugly; or Multi*), those should be retrofitted to implement the correct classes (SlowMultiReaderWrapper would be an atomic reader but takes a composite reader as ctor param, maybe it could also simply take a List<AtomicReader>). In my opinion, maybe composite readers could implement some collection APIs and also have the ReaderUtil method directly built in (possibly as a "view" in the util.Collection sense). In general composite readers do not really need to look like the previous IndexReaders, they could simply be a "collection" of SegmentReaders with some functionality like reopen.
> On the other side, atomic readers do not need reopen logic anymore? When a segment changes, you need a new atomic reader? - maybe because of deletions thats not the best idea, but we should investigate. Maybe make the whole reopen logic simplier to use (ast least on the collection reader level).
> We should decide about good names, i have no preference at the moment.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (LUCENE-2858) Separate SegmentReaders (and other atomic readers) from composite IndexReaders

Posted by "Robert Muir (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-2858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13195868#comment-13195868 ] 

Robert Muir commented on LUCENE-2858:
-------------------------------------

Can we please do some eclipse-renames like:

AtomicIndexReader -> AtomicReader
AtomicIndexReader.AtomicReaderContext -> AtomicReader.Context

The verbosity of the api is killing me :)

                
> Separate SegmentReaders (and other atomic readers) from composite IndexReaders
> ------------------------------------------------------------------------------
>
>                 Key: LUCENE-2858
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2858
>             Project: Lucene - Java
>          Issue Type: Task
>            Reporter: Uwe Schindler
>            Assignee: Uwe Schindler
>            Priority: Blocker
>             Fix For: 4.0
>
>
> With current trunk, whenever you open an IndexReader on a directory you get back a DirectoryReader which is a composite reader. The interface of IndexReader has now lots of methods that simply throw UOE (in fact more than 50% of all methods that are commonly used ones are unuseable now). This confuses users and makes the API hard to understand.
> This issue should split "atomic readers" from "reader collections" with a separate API. After that, you are no longer able, to get TermsEnum without wrapping from those composite readers. We currently have helper classes for wrapping (SlowMultiReaderWrapper - please rename, the name is really ugly; or Multi*), those should be retrofitted to implement the correct classes (SlowMultiReaderWrapper would be an atomic reader but takes a composite reader as ctor param, maybe it could also simply take a List<AtomicReader>). In my opinion, maybe composite readers could implement some collection APIs and also have the ReaderUtil method directly built in (possibly as a "view" in the util.Collection sense). In general composite readers do not really need to look like the previous IndexReaders, they could simply be a "collection" of SegmentReaders with some functionality like reopen.
> On the other side, atomic readers do not need reopen logic anymore? When a segment changes, you need a new atomic reader? - maybe because of deletions thats not the best idea, but we should investigate. Maybe make the whole reopen logic simplier to use (ast least on the collection reader level).
> We should decide about good names, i have no preference at the moment.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (LUCENE-2858) Separate SegmentReaders (and other atomic readers) from composite IndexReaders

Posted by "Uwe Schindler (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-2858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13196583#comment-13196583 ] 

Uwe Schindler commented on LUCENE-2858:
---------------------------------------

Committed trunk revision: 1238112
                
> Separate SegmentReaders (and other atomic readers) from composite IndexReaders
> ------------------------------------------------------------------------------
>
>                 Key: LUCENE-2858
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2858
>             Project: Lucene - Java
>          Issue Type: Task
>            Reporter: Uwe Schindler
>            Assignee: Uwe Schindler
>            Priority: Blocker
>             Fix For: 4.0
>
>         Attachments: LUCENE-2858-FCinsanity.patch, LUCENE-2858.patch, LUCENE-2858.patch
>
>
> With current trunk, whenever you open an IndexReader on a directory you get back a DirectoryReader which is a composite reader. The interface of IndexReader has now lots of methods that simply throw UOE (in fact more than 50% of all methods that are commonly used ones are unuseable now). This confuses users and makes the API hard to understand.
> This issue should split "atomic readers" from "reader collections" with a separate API. After that, you are no longer able, to get TermsEnum without wrapping from those composite readers. We currently have helper classes for wrapping (SlowMultiReaderWrapper - please rename, the name is really ugly; or Multi*), those should be retrofitted to implement the correct classes (SlowMultiReaderWrapper would be an atomic reader but takes a composite reader as ctor param, maybe it could also simply take a List<AtomicReader>). In my opinion, maybe composite readers could implement some collection APIs and also have the ReaderUtil method directly built in (possibly as a "view" in the util.Collection sense). In general composite readers do not really need to look like the previous IndexReaders, they could simply be a "collection" of SegmentReaders with some functionality like reopen.
> On the other side, atomic readers do not need reopen logic anymore? When a segment changes, you need a new atomic reader? - maybe because of deletions thats not the best idea, but we should investigate. Maybe make the whole reopen logic simplier to use (ast least on the collection reader level).
> We should decide about good names, i have no preference at the moment.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (LUCENE-2858) Separate SegmentReaders (and other atomic readers) from composite IndexReaders

Posted by "Uwe Schindler (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-2858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13190456#comment-13190456 ] 

Uwe Schindler commented on LUCENE-2858:
---------------------------------------

Simon: Just to inform you, I am working on this. Currently I have a heavy broken checkout that does no longer compile at all :( Working, working, working... It's a mess!

Once I have something intially compiling for core (not tests), I will create a branch!
                
> Separate SegmentReaders (and other atomic readers) from composite IndexReaders
> ------------------------------------------------------------------------------
>
>                 Key: LUCENE-2858
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2858
>             Project: Lucene - Java
>          Issue Type: Task
>            Reporter: Uwe Schindler
>            Assignee: Uwe Schindler
>            Priority: Blocker
>             Fix For: 4.0
>
>
> With current trunk, whenever you open an IndexReader on a directory you get back a DirectoryReader which is a composite reader. The interface of IndexReader has now lots of methods that simply throw UOE (in fact more than 50% of all methods that are commonly used ones are unuseable now). This confuses users and makes the API hard to understand.
> This issue should split "atomic readers" from "reader collections" with a separate API. After that, you are no longer able, to get TermsEnum without wrapping from those composite readers. We currently have helper classes for wrapping (SlowMultiReaderWrapper - please rename, the name is really ugly; or Multi*), those should be retrofitted to implement the correct classes (SlowMultiReaderWrapper would be an atomic reader but takes a composite reader as ctor param, maybe it could also simply take a List<AtomicReader>). In my opinion, maybe composite readers could implement some collection APIs and also have the ReaderUtil method directly built in (possibly as a "view" in the util.Collection sense). In general composite readers do not really need to look like the previous IndexReaders, they could simply be a "collection" of SegmentReaders with some functionality like reopen.
> On the other side, atomic readers do not need reopen logic anymore? When a segment changes, you need a new atomic reader? - maybe because of deletions thats not the best idea, but we should investigate. Maybe make the whole reopen logic simplier to use (ast least on the collection reader level).
> We should decide about good names, i have no preference at the moment.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (LUCENE-2858) Separate SegmentReaders (and other atomic readers) from composite IndexReaders

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-2858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13288129#comment-13288129 ] 

Michael McCandless commented on LUCENE-2858:
--------------------------------------------

bq. Not entirely sure why but it looks like the commits here slowed down our NRT reopen latency.

OK, sorry, I was wrong about this!

I went back and re-ran the NRTPerfTest and isolated the slowdown to LUCENE-3728 (also committed on the same day)... it was because we lost the SegmentInfo.sizeInBytes caching... but then in LUCENE-4055 we got it back and we got the performance back ... so all's well that ends well :)
                
> Separate SegmentReaders (and other atomic readers) from composite IndexReaders
> ------------------------------------------------------------------------------
>
>                 Key: LUCENE-2858
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2858
>             Project: Lucene - Java
>          Issue Type: Task
>            Reporter: Uwe Schindler
>            Assignee: Uwe Schindler
>            Priority: Blocker
>             Fix For: 4.0
>
>         Attachments: LUCENE-2858-FCinsanity.patch, LUCENE-2858-FixSlowEnsureOpen.patch, LUCENE-2858.patch, LUCENE-2858.patch
>
>
> With current trunk, whenever you open an IndexReader on a directory you get back a DirectoryReader which is a composite reader. The interface of IndexReader has now lots of methods that simply throw UOE (in fact more than 50% of all methods that are commonly used ones are unuseable now). This confuses users and makes the API hard to understand.
> This issue should split "atomic readers" from "reader collections" with a separate API. After that, you are no longer able, to get TermsEnum without wrapping from those composite readers. We currently have helper classes for wrapping (SlowMultiReaderWrapper - please rename, the name is really ugly; or Multi*), those should be retrofitted to implement the correct classes (SlowMultiReaderWrapper would be an atomic reader but takes a composite reader as ctor param, maybe it could also simply take a List<AtomicReader>). In my opinion, maybe composite readers could implement some collection APIs and also have the ReaderUtil method directly built in (possibly as a "view" in the util.Collection sense). In general composite readers do not really need to look like the previous IndexReaders, they could simply be a "collection" of SegmentReaders with some functionality like reopen.
> On the other side, atomic readers do not need reopen logic anymore? When a segment changes, you need a new atomic reader? - maybe because of deletions thats not the best idea, but we should investigate. Maybe make the whole reopen logic simplier to use (ast least on the collection reader level).
> We should decide about good names, i have no preference at the moment.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (LUCENE-2858) Separate SegmentReaders (and other atomic readers) from composite IndexReaders

Posted by "Uwe Schindler (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-2858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13195904#comment-13195904 ] 

Uwe Schindler commented on LUCENE-2858:
---------------------------------------

I renamed the enclosing classes and also removed the public ctors from ReaderContexts (to prevent stupid things already reported on mailing lists).

The renameing of ReaderContexts all to the same name Context, but with different enclosing class is a refactoring, Eclipse cannot do (it creates invalid code). It seems only NetBeans can do this, I will try to find a solution. The problem is that Eclipse always tries to import the inner class, what causes conflicts.

Finally, e.g. the method getDocIdSet should look like getDocIdSet(AtomicReader.Context,...) [only importing AtomicReader], but Eclipse always tries to use Context [and import oal.AtomicReader.Context]. At the end we should have abstract IndexReader.Context, AtomicReader.Context, CompositeReader.Context.

Will go to bed now.
                
> Separate SegmentReaders (and other atomic readers) from composite IndexReaders
> ------------------------------------------------------------------------------
>
>                 Key: LUCENE-2858
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2858
>             Project: Lucene - Java
>          Issue Type: Task
>            Reporter: Uwe Schindler
>            Assignee: Uwe Schindler
>            Priority: Blocker
>             Fix For: 4.0
>
>
> With current trunk, whenever you open an IndexReader on a directory you get back a DirectoryReader which is a composite reader. The interface of IndexReader has now lots of methods that simply throw UOE (in fact more than 50% of all methods that are commonly used ones are unuseable now). This confuses users and makes the API hard to understand.
> This issue should split "atomic readers" from "reader collections" with a separate API. After that, you are no longer able, to get TermsEnum without wrapping from those composite readers. We currently have helper classes for wrapping (SlowMultiReaderWrapper - please rename, the name is really ugly; or Multi*), those should be retrofitted to implement the correct classes (SlowMultiReaderWrapper would be an atomic reader but takes a composite reader as ctor param, maybe it could also simply take a List<AtomicReader>). In my opinion, maybe composite readers could implement some collection APIs and also have the ReaderUtil method directly built in (possibly as a "view" in the util.Collection sense). In general composite readers do not really need to look like the previous IndexReaders, they could simply be a "collection" of SegmentReaders with some functionality like reopen.
> On the other side, atomic readers do not need reopen logic anymore? When a segment changes, you need a new atomic reader? - maybe because of deletions thats not the best idea, but we should investigate. Maybe make the whole reopen logic simplier to use (ast least on the collection reader level).
> We should decide about good names, i have no preference at the moment.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (LUCENE-2858) Separate SegmentReaders (and other atomic readers) from composite IndexReaders

Posted by "Uwe Schindler (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-2858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13196024#comment-13196024 ] 

Uwe Schindler commented on LUCENE-2858:
---------------------------------------

After sleeping one more night about it, the easiest and most simple way to remove the stupid verbosity in imports:

I will move the ReaderContexts out of their enclosing class and make a top-level class without ctor out of it. Then import statements look nice. The RCs are now so important for search, so they should be on its own.
                
> Separate SegmentReaders (and other atomic readers) from composite IndexReaders
> ------------------------------------------------------------------------------
>
>                 Key: LUCENE-2858
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2858
>             Project: Lucene - Java
>          Issue Type: Task
>            Reporter: Uwe Schindler
>            Assignee: Uwe Schindler
>            Priority: Blocker
>             Fix For: 4.0
>
>
> With current trunk, whenever you open an IndexReader on a directory you get back a DirectoryReader which is a composite reader. The interface of IndexReader has now lots of methods that simply throw UOE (in fact more than 50% of all methods that are commonly used ones are unuseable now). This confuses users and makes the API hard to understand.
> This issue should split "atomic readers" from "reader collections" with a separate API. After that, you are no longer able, to get TermsEnum without wrapping from those composite readers. We currently have helper classes for wrapping (SlowMultiReaderWrapper - please rename, the name is really ugly; or Multi*), those should be retrofitted to implement the correct classes (SlowMultiReaderWrapper would be an atomic reader but takes a composite reader as ctor param, maybe it could also simply take a List<AtomicReader>). In my opinion, maybe composite readers could implement some collection APIs and also have the ReaderUtil method directly built in (possibly as a "view" in the util.Collection sense). In general composite readers do not really need to look like the previous IndexReaders, they could simply be a "collection" of SegmentReaders with some functionality like reopen.
> On the other side, atomic readers do not need reopen logic anymore? When a segment changes, you need a new atomic reader? - maybe because of deletions thats not the best idea, but we should investigate. Maybe make the whole reopen logic simplier to use (ast least on the collection reader level).
> We should decide about good names, i have no preference at the moment.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Updated] (LUCENE-2858) Separate SegmentReaders (and other atomic readers) from composite IndexReaders

Posted by "Uwe Schindler (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-2858?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Uwe Schindler updated LUCENE-2858:
----------------------------------

    Attachment: LUCENE-2858.patch

Here the final patch. Thanks for the good work and lots of help from Robert and Mike.

As this patch contains heavy changes, we will commit it as soon as possible so work can go on. It would be nice, if you would not commit anything until this is done.

There are some minor issues open:
- Directory is final and has only static factory methods. It is not possible to subclass it in any way. The problem is mainly Solr, as Solr accesses directory(), IndexCommits,... and therefore cannot work on abstract IndexReader anymore. This should be changed, by e.g. handling reopening in the IRFactory, also versions, commits,... Currently its not possible to implement any other IRFactory that returns something else.
- The PayloadProcessorProvider has a broken API, this should be fixed. The current patch mimics the old behaviour, but not 100%.
                
> Separate SegmentReaders (and other atomic readers) from composite IndexReaders
> ------------------------------------------------------------------------------
>
>                 Key: LUCENE-2858
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2858
>             Project: Lucene - Java
>          Issue Type: Task
>            Reporter: Uwe Schindler
>            Assignee: Uwe Schindler
>            Priority: Blocker
>             Fix For: 4.0
>
>         Attachments: LUCENE-2858.patch
>
>
> With current trunk, whenever you open an IndexReader on a directory you get back a DirectoryReader which is a composite reader. The interface of IndexReader has now lots of methods that simply throw UOE (in fact more than 50% of all methods that are commonly used ones are unuseable now). This confuses users and makes the API hard to understand.
> This issue should split "atomic readers" from "reader collections" with a separate API. After that, you are no longer able, to get TermsEnum without wrapping from those composite readers. We currently have helper classes for wrapping (SlowMultiReaderWrapper - please rename, the name is really ugly; or Multi*), those should be retrofitted to implement the correct classes (SlowMultiReaderWrapper would be an atomic reader but takes a composite reader as ctor param, maybe it could also simply take a List<AtomicReader>). In my opinion, maybe composite readers could implement some collection APIs and also have the ReaderUtil method directly built in (possibly as a "view" in the util.Collection sense). In general composite readers do not really need to look like the previous IndexReaders, they could simply be a "collection" of SegmentReaders with some functionality like reopen.
> On the other side, atomic readers do not need reopen logic anymore? When a segment changes, you need a new atomic reader? - maybe because of deletions thats not the best idea, but we should investigate. Maybe make the whole reopen logic simplier to use (ast least on the collection reader level).
> We should decide about good names, i have no preference at the moment.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (LUCENE-2858) Separate SegmentReaders (and other atomic readers) from composite IndexReaders

Posted by "Robert Muir (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-2858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13196518#comment-13196518 ] 

Robert Muir commented on LUCENE-2858:
-------------------------------------

Thanks for all the work here Uwe! I think its a great step forward. 
The previous situation where half the methods were UOE is really unreleasable.
                
> Separate SegmentReaders (and other atomic readers) from composite IndexReaders
> ------------------------------------------------------------------------------
>
>                 Key: LUCENE-2858
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2858
>             Project: Lucene - Java
>          Issue Type: Task
>            Reporter: Uwe Schindler
>            Assignee: Uwe Schindler
>            Priority: Blocker
>             Fix For: 4.0
>
>         Attachments: LUCENE-2858.patch, LUCENE-2858.patch
>
>
> With current trunk, whenever you open an IndexReader on a directory you get back a DirectoryReader which is a composite reader. The interface of IndexReader has now lots of methods that simply throw UOE (in fact more than 50% of all methods that are commonly used ones are unuseable now). This confuses users and makes the API hard to understand.
> This issue should split "atomic readers" from "reader collections" with a separate API. After that, you are no longer able, to get TermsEnum without wrapping from those composite readers. We currently have helper classes for wrapping (SlowMultiReaderWrapper - please rename, the name is really ugly; or Multi*), those should be retrofitted to implement the correct classes (SlowMultiReaderWrapper would be an atomic reader but takes a composite reader as ctor param, maybe it could also simply take a List<AtomicReader>). In my opinion, maybe composite readers could implement some collection APIs and also have the ReaderUtil method directly built in (possibly as a "view" in the util.Collection sense). In general composite readers do not really need to look like the previous IndexReaders, they could simply be a "collection" of SegmentReaders with some functionality like reopen.
> On the other side, atomic readers do not need reopen logic anymore? When a segment changes, you need a new atomic reader? - maybe because of deletions thats not the best idea, but we should investigate. Maybe make the whole reopen logic simplier to use (ast least on the collection reader level).
> We should decide about good names, i have no preference at the moment.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (LUCENE-2858) Separate SegmentReaders (and other atomic readers) from composite IndexReaders

Posted by "Uwe Schindler (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-2858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13190662#comment-13190662 ] 

Uwe Schindler commented on LUCENE-2858:
---------------------------------------

Some TODOs fixed:

- -The strange generics in ReaderContext caused by the final field will go away, when changing reader field to aaccessor method returning the correct type (by return type overloading).-
- Some refactoring done on docFreq()
- Moved the ReaderContext impls to the corresponding abstract reader class: CompositeReaderContext -> CompositeIndexReader, AtomicReaderContext -> AtomicIndexReader

                
> Separate SegmentReaders (and other atomic readers) from composite IndexReaders
> ------------------------------------------------------------------------------
>
>                 Key: LUCENE-2858
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2858
>             Project: Lucene - Java
>          Issue Type: Task
>            Reporter: Uwe Schindler
>            Assignee: Uwe Schindler
>            Priority: Blocker
>             Fix For: 4.0
>
>
> With current trunk, whenever you open an IndexReader on a directory you get back a DirectoryReader which is a composite reader. The interface of IndexReader has now lots of methods that simply throw UOE (in fact more than 50% of all methods that are commonly used ones are unuseable now). This confuses users and makes the API hard to understand.
> This issue should split "atomic readers" from "reader collections" with a separate API. After that, you are no longer able, to get TermsEnum without wrapping from those composite readers. We currently have helper classes for wrapping (SlowMultiReaderWrapper - please rename, the name is really ugly; or Multi*), those should be retrofitted to implement the correct classes (SlowMultiReaderWrapper would be an atomic reader but takes a composite reader as ctor param, maybe it could also simply take a List<AtomicReader>). In my opinion, maybe composite readers could implement some collection APIs and also have the ReaderUtil method directly built in (possibly as a "view" in the util.Collection sense). In general composite readers do not really need to look like the previous IndexReaders, they could simply be a "collection" of SegmentReaders with some functionality like reopen.
> On the other side, atomic readers do not need reopen logic anymore? When a segment changes, you need a new atomic reader? - maybe because of deletions thats not the best idea, but we should investigate. Maybe make the whole reopen logic simplier to use (ast least on the collection reader level).
> We should decide about good names, i have no preference at the moment.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Issue Comment Edited] (LUCENE-2858) Separate SegmentReaders (and other atomic readers) from composite IndexReaders

Posted by "Uwe Schindler (Issue Comment Edited) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-2858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13169760#comment-13169760 ] 

Uwe Schindler edited comment on LUCENE-2858 at 12/14/11 10:42 PM:
------------------------------------------------------------------

I try to do this now, Robert will help.

Most crap is removed form IndexReader, so this should be easy now. We should use AtomicIndexReader and CompositeIndexReader.

- IndexReader base class only contains the methods that can actually be implemented in *every* reader.
- The current BaseMultiReader (pkg-private) would be the later CompositeIndexReader
- AtomicIndexReader is also abstract, SegmentReader and SlowMultiReaderWrapper would implement it.
- The static open methods should maybe in a separate class or the abstract IndexReader base with the limited API, but they would return CompositeIndexReader
- FilterIndexReader could be split in two classes, but a FilterMultiReader makes no sense :-) FilterIndexReader will be FilterAtomicIndexReader (also the superclass of SlowMultiReaderWrapper)

                
      was (Author: thetaphi):
    I try to do this now, Robert will help.

Most crap is removed form IndexReader, so this should be easy now. We should use AtomicIndexReader and CompositeIndexReader.

- IndexReader base class only contains the methods that can actually be implemented in *every* reader.
- The current BaseMultiReader (pkg-private) would be the later CompositeIndexReader
- AtomicIndexReader is also abstract, SegmentReader and SlowMultiReaderWrapper would implement it.
- The static open methods should maybe in a separate class or the abstract IndexReader base with the limited API, but they would return CompositeIndexReader
- FilterIndexReader would be split in two classes

                  
> Separate SegmentReaders (and other atomic readers) from composite IndexReaders
> ------------------------------------------------------------------------------
>
>                 Key: LUCENE-2858
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2858
>             Project: Lucene - Java
>          Issue Type: Task
>            Reporter: Uwe Schindler
>            Assignee: Uwe Schindler
>             Fix For: 4.0
>
>
> With current trunk, whenever you open an IndexReader on a directory you get back a DirectoryReader which is a composite reader. The interface of IndexReader has now lots of methods that simply throw UOE (in fact more than 50% of all methods that are commonly used ones are unuseable now). This confuses users and makes the API hard to understand.
> This issue should split "atomic readers" from "reader collections" with a separate API. After that, you are no longer able, to get TermsEnum without wrapping from those composite readers. We currently have helper classes for wrapping (SlowMultiReaderWrapper - please rename, the name is really ugly; or Multi*), those should be retrofitted to implement the correct classes (SlowMultiReaderWrapper would be an atomic reader but takes a composite reader as ctor param, maybe it could also simply take a List<AtomicReader>). In my opinion, maybe composite readers could implement some collection APIs and also have the ReaderUtil method directly built in (possibly as a "view" in the util.Collection sense). In general composite readers do not really need to look like the previous IndexReaders, they could simply be a "collection" of SegmentReaders with some functionality like reopen.
> On the other side, atomic readers do not need reopen logic anymore? When a segment changes, you need a new atomic reader? - maybe because of deletions thats not the best idea, but we should investigate. Maybe make the whole reopen logic simplier to use (ast least on the collection reader level).
> We should decide about good names, i have no preference at the moment.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (LUCENE-2858) Separate SegmentReaders (and other atomic readers) from composite IndexReaders

Posted by "Michael McCandless (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-2858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13196561#comment-13196561 ] 

Michael McCandless commented on LUCENE-2858:
--------------------------------------------

Nice work guys!
                
> Separate SegmentReaders (and other atomic readers) from composite IndexReaders
> ------------------------------------------------------------------------------
>
>                 Key: LUCENE-2858
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2858
>             Project: Lucene - Java
>          Issue Type: Task
>            Reporter: Uwe Schindler
>            Assignee: Uwe Schindler
>            Priority: Blocker
>             Fix For: 4.0
>
>         Attachments: LUCENE-2858.patch, LUCENE-2858.patch
>
>
> With current trunk, whenever you open an IndexReader on a directory you get back a DirectoryReader which is a composite reader. The interface of IndexReader has now lots of methods that simply throw UOE (in fact more than 50% of all methods that are commonly used ones are unuseable now). This confuses users and makes the API hard to understand.
> This issue should split "atomic readers" from "reader collections" with a separate API. After that, you are no longer able, to get TermsEnum without wrapping from those composite readers. We currently have helper classes for wrapping (SlowMultiReaderWrapper - please rename, the name is really ugly; or Multi*), those should be retrofitted to implement the correct classes (SlowMultiReaderWrapper would be an atomic reader but takes a composite reader as ctor param, maybe it could also simply take a List<AtomicReader>). In my opinion, maybe composite readers could implement some collection APIs and also have the ReaderUtil method directly built in (possibly as a "view" in the util.Collection sense). In general composite readers do not really need to look like the previous IndexReaders, they could simply be a "collection" of SegmentReaders with some functionality like reopen.
> On the other side, atomic readers do not need reopen logic anymore? When a segment changes, you need a new atomic reader? - maybe because of deletions thats not the best idea, but we should investigate. Maybe make the whole reopen logic simplier to use (ast least on the collection reader level).
> We should decide about good names, i have no preference at the moment.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (LUCENE-2858) Separate SegmentReaders (and other atomic readers) from composite IndexReaders

Posted by "Uwe Schindler (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-2858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13169760#comment-13169760 ] 

Uwe Schindler commented on LUCENE-2858:
---------------------------------------

I try to do this now, Robert will help.

Most crap is removed form IndexReader, so this should be easy now. We should use AtomicIndexReader and CompositeIndexReader.

- IndexReader base class only contains the methods that can actually be implemented in *every* reader.
- The current BaseMultiReader (pkg-private) would be the later CompositeIndexReader
- AtomicIndexReader is also abstract, SegmentReader and SlowMultiReaderWrapper would implement it.
- The static open methods should maybe in a separate class or the abstract IndexReader base with the limited API, but they would return CompositeIndexReader
- FilterIndexReader would be split in two classes

                
> Separate SegmentReaders (and other atomic readers) from composite IndexReaders
> ------------------------------------------------------------------------------
>
>                 Key: LUCENE-2858
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2858
>             Project: Lucene - Java
>          Issue Type: Task
>            Reporter: Uwe Schindler
>            Assignee: Uwe Schindler
>             Fix For: 4.0
>
>
> With current trunk, whenever you open an IndexReader on a directory you get back a DirectoryReader which is a composite reader. The interface of IndexReader has now lots of methods that simply throw UOE (in fact more than 50% of all methods that are commonly used ones are unuseable now). This confuses users and makes the API hard to understand.
> This issue should split "atomic readers" from "reader collections" with a separate API. After that, you are no longer able, to get TermsEnum without wrapping from those composite readers. We currently have helper classes for wrapping (SlowMultiReaderWrapper - please rename, the name is really ugly; or Multi*), those should be retrofitted to implement the correct classes (SlowMultiReaderWrapper would be an atomic reader but takes a composite reader as ctor param, maybe it could also simply take a List<AtomicReader>). In my opinion, maybe composite readers could implement some collection APIs and also have the ReaderUtil method directly built in (possibly as a "view" in the util.Collection sense). In general composite readers do not really need to look like the previous IndexReaders, they could simply be a "collection" of SegmentReaders with some functionality like reopen.
> On the other side, atomic readers do not need reopen logic anymore? When a segment changes, you need a new atomic reader? - maybe because of deletions thats not the best idea, but we should investigate. Maybe make the whole reopen logic simplier to use (ast least on the collection reader level).
> We should decide about good names, i have no preference at the moment.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Commented: (LUCENE-2858) Separate SegmentReaders (and other atomic readers) from composite IndexReaders

Posted by "Marvin Humphrey (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-2858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12982152#action_12982152 ] 

Marvin Humphrey commented on LUCENE-2858:
-----------------------------------------

> Deleting docs from IR has advantages over deleting from IW: the change is
> "live" to any searches running on that IR; you get an immediate count of how
> many docs were deleted; you can delete by docID.

Alternate plan:

  * Move responsibility for deletions to a pluggable DeletionsReader
    subcomponent of SegmentReader.
  * Have the default DeletionsReader be read-only.
  * People who need the esoteric functionality described above can use a
    subclass of DeletionsReader.


> Separate SegmentReaders (and other atomic readers) from composite IndexReaders
> ------------------------------------------------------------------------------
>
>                 Key: LUCENE-2858
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2858
>             Project: Lucene - Java
>          Issue Type: Task
>            Reporter: Uwe Schindler
>             Fix For: 4.0
>
>
> With current trunk, whenever you open an IndexReader on a directory you get back a DirectoryReader which is a composite reader. The interface of IndexReader has now lots of methods that simply throw UOE (in fact more than 50% of all methods that are commonly used ones are unuseable now). This confuses users and makes the API hard to understand.
> This issue should split "atomic readers" from "reader collections" with a separate API. After that, you are no longer able, to get TermsEnum without wrapping from those composite readers. We currently have helper classes for wrapping (SlowMultiReaderWrapper - please rename, the name is really ugly; or Multi*), those should be retrofitted to implement the correct classes (SlowMultiReaderWrapper would be an atomic reader but takes a composite reader as ctor param, maybe it could also simply take a List<AtomicReader>). In my opinion, maybe composite readers could implement some collection APIs and also have the ReaderUtil method directly built in (possibly as a "view" in the util.Collection sense). In general composite readers do not really need to look like the previous IndexReaders, they could simply be a "collection" of SegmentReaders with some functionality like reopen.
> On the other side, atomic readers do not need reopen logic anymore? When a segment changes, you need a new atomic reader? - maybe because of deletions thats not the best idea, but we should investigate. Maybe make the whole reopen logic simplier to use (ast least on the collection reader level).
> We should decide about good names, i have no preference at the moment.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (LUCENE-2858) Separate SegmentReaders (and other atomic readers) from composite IndexReaders

Posted by "Uwe Schindler (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-2858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13195876#comment-13195876 ] 

Uwe Schindler commented on LUCENE-2858:
---------------------------------------

Jaja, will fix this...
                
> Separate SegmentReaders (and other atomic readers) from composite IndexReaders
> ------------------------------------------------------------------------------
>
>                 Key: LUCENE-2858
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2858
>             Project: Lucene - Java
>          Issue Type: Task
>            Reporter: Uwe Schindler
>            Assignee: Uwe Schindler
>            Priority: Blocker
>             Fix For: 4.0
>
>
> With current trunk, whenever you open an IndexReader on a directory you get back a DirectoryReader which is a composite reader. The interface of IndexReader has now lots of methods that simply throw UOE (in fact more than 50% of all methods that are commonly used ones are unuseable now). This confuses users and makes the API hard to understand.
> This issue should split "atomic readers" from "reader collections" with a separate API. After that, you are no longer able, to get TermsEnum without wrapping from those composite readers. We currently have helper classes for wrapping (SlowMultiReaderWrapper - please rename, the name is really ugly; or Multi*), those should be retrofitted to implement the correct classes (SlowMultiReaderWrapper would be an atomic reader but takes a composite reader as ctor param, maybe it could also simply take a List<AtomicReader>). In my opinion, maybe composite readers could implement some collection APIs and also have the ReaderUtil method directly built in (possibly as a "view" in the util.Collection sense). In general composite readers do not really need to look like the previous IndexReaders, they could simply be a "collection" of SegmentReaders with some functionality like reopen.
> On the other side, atomic readers do not need reopen logic anymore? When a segment changes, you need a new atomic reader? - maybe because of deletions thats not the best idea, but we should investigate. Maybe make the whole reopen logic simplier to use (ast least on the collection reader level).
> We should decide about good names, i have no preference at the moment.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (LUCENE-2858) Separate SegmentReaders (and other atomic readers) from composite IndexReaders

Posted by "Uwe Schindler (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-2858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13195757#comment-13195757 ] 

Uwe Schindler commented on LUCENE-2858:
---------------------------------------

I now fixed the branch's test-framework and all remaining TODOs about the API.

Now the horrible stupid slave-work to port all test starts. I assume the API is now fixed, as nobody complained after one week.
                
> Separate SegmentReaders (and other atomic readers) from composite IndexReaders
> ------------------------------------------------------------------------------
>
>                 Key: LUCENE-2858
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2858
>             Project: Lucene - Java
>          Issue Type: Task
>            Reporter: Uwe Schindler
>            Assignee: Uwe Schindler
>            Priority: Blocker
>             Fix For: 4.0
>
>
> With current trunk, whenever you open an IndexReader on a directory you get back a DirectoryReader which is a composite reader. The interface of IndexReader has now lots of methods that simply throw UOE (in fact more than 50% of all methods that are commonly used ones are unuseable now). This confuses users and makes the API hard to understand.
> This issue should split "atomic readers" from "reader collections" with a separate API. After that, you are no longer able, to get TermsEnum without wrapping from those composite readers. We currently have helper classes for wrapping (SlowMultiReaderWrapper - please rename, the name is really ugly; or Multi*), those should be retrofitted to implement the correct classes (SlowMultiReaderWrapper would be an atomic reader but takes a composite reader as ctor param, maybe it could also simply take a List<AtomicReader>). In my opinion, maybe composite readers could implement some collection APIs and also have the ReaderUtil method directly built in (possibly as a "view" in the util.Collection sense). In general composite readers do not really need to look like the previous IndexReaders, they could simply be a "collection" of SegmentReaders with some functionality like reopen.
> On the other side, atomic readers do not need reopen logic anymore? When a segment changes, you need a new atomic reader? - maybe because of deletions thats not the best idea, but we should investigate. Maybe make the whole reopen logic simplier to use (ast least on the collection reader level).
> We should decide about good names, i have no preference at the moment.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Commented: (LUCENE-2858) Separate SegmentReaders (and other atomic readers) from composite IndexReaders

Posted by "Earwin Burrfoot (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-2858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12980388#action_12980388 ] 

Earwin Burrfoot commented on LUCENE-2858:
-----------------------------------------

bq. On the other side, atomic readers do not need reopen logic anymore? When a segment changes, you need a new atomic reader?
There is a freakload of places that "upgrade" SegmentReader in various ways, with deletions guilty only for the part of the cases. I'll try getting back to LUCENE-2355 at the end of the week.

> Separate SegmentReaders (and other atomic readers) from composite IndexReaders
> ------------------------------------------------------------------------------
>
>                 Key: LUCENE-2858
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2858
>             Project: Lucene - Java
>          Issue Type: Task
>            Reporter: Uwe Schindler
>             Fix For: 4.0
>
>
> With current trunk, whenever you open an IndexReader on a directory you get back a DirectoryReader which is a composite reader. The interface of IndexReader has now lots of methods that simply throw UOE (in fact more than 50% of all methods that are commonly used ones are unuseable now). This confuses users and makes the API hard to understand.
> This issue should split "atomic readers" from "reader collections" with a separate API. After that, you are no longer able, to get TermsEnum without wrapping from those composite readers. We currently have helper classes for wrapping (SlowMultiReaderWrapper - please rename, the name is really ugly; or Multi*), those should be retrofitted to implement the correct classes (SlowMultiReaderWrapper would be an atomic reader but takes a composite reader as ctor param, maybe it could also simply take a List<AtomicReader>). In my opinion, maybe composite readers could implement some collection APIs and also have the ReaderUtil method directly built in (possibly as a "view" in the util.Collection sense). In general composite readers do not really need to look like the previous IndexReaders, they could simply be a "collection" of SegmentReaders with some functionality like reopen.
> On the other side, atomic readers do not need reopen logic anymore? When a segment changes, you need a new atomic reader? - maybe because of deletions thats not the best idea, but we should investigate. Maybe make the whole reopen logic simplier to use (ast least on the collection reader level).
> We should decide about good names, i have no preference at the moment.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (LUCENE-2858) Separate SegmentReaders (and other atomic readers) from composite IndexReaders

Posted by "Michael McCandless (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-2858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13195873#comment-13195873 ] 

Michael McCandless commented on LUCENE-2858:
--------------------------------------------

+1 for those names.
                
> Separate SegmentReaders (and other atomic readers) from composite IndexReaders
> ------------------------------------------------------------------------------
>
>                 Key: LUCENE-2858
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2858
>             Project: Lucene - Java
>          Issue Type: Task
>            Reporter: Uwe Schindler
>            Assignee: Uwe Schindler
>            Priority: Blocker
>             Fix For: 4.0
>
>
> With current trunk, whenever you open an IndexReader on a directory you get back a DirectoryReader which is a composite reader. The interface of IndexReader has now lots of methods that simply throw UOE (in fact more than 50% of all methods that are commonly used ones are unuseable now). This confuses users and makes the API hard to understand.
> This issue should split "atomic readers" from "reader collections" with a separate API. After that, you are no longer able, to get TermsEnum without wrapping from those composite readers. We currently have helper classes for wrapping (SlowMultiReaderWrapper - please rename, the name is really ugly; or Multi*), those should be retrofitted to implement the correct classes (SlowMultiReaderWrapper would be an atomic reader but takes a composite reader as ctor param, maybe it could also simply take a List<AtomicReader>). In my opinion, maybe composite readers could implement some collection APIs and also have the ReaderUtil method directly built in (possibly as a "view" in the util.Collection sense). In general composite readers do not really need to look like the previous IndexReaders, they could simply be a "collection" of SegmentReaders with some functionality like reopen.
> On the other side, atomic readers do not need reopen logic anymore? When a segment changes, you need a new atomic reader? - maybe because of deletions thats not the best idea, but we should investigate. Maybe make the whole reopen logic simplier to use (ast least on the collection reader level).
> We should decide about good names, i have no preference at the moment.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Issue Comment Edited] (LUCENE-2858) Separate SegmentReaders (and other atomic readers) from composite IndexReaders

Posted by "Uwe Schindler (Issue Comment Edited) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-2858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13196479#comment-13196479 ] 

Uwe Schindler edited comment on LUCENE-2858 at 1/30/12 10:38 PM:
-----------------------------------------------------------------

Here the final patch. Thanks for the good work and lots of help from Robert and Mike.

As this patch contains heavy changes, we will commit it as soon as possible so work can go on. It would be nice, if you would not commit anything until this is done.

There are some minor issues open:
- Directory is final and has only static factory methods. It is not possible to subclass it in any way. The problem is mainly Solr, as Solr accesses directory(), IndexCommits,... and therefore cannot work on abstract IndexReader anymore. This should be changed, by e.g. handling reopening in the IRFactory, also versions, commits,... Currently its not possible to implement any other IRFactory that returns something else.
- The PayloadProcessorProvider has a broken API, this should be fixed. The current patch mimics the old behaviour, but not 100%.
- ParallelReader is now atomic. We should add a sugar wrapper method to allow synchronized composite readers (with same segment sizes) to be aligned with MultiReaders or wrapped by Slow.

The remaining issues will be fixed in later issues!
                
      was (Author: thetaphi):
    Here the final patch. Thanks for the good work and lots of help from Robert and Mike.

As this patch contains heavy changes, we will commit it as soon as possible so work can go on. It would be nice, if you would not commit anything until this is done.

There are some minor issues open:
- Directory is final and has only static factory methods. It is not possible to subclass it in any way. The problem is mainly Solr, as Solr accesses directory(), IndexCommits,... and therefore cannot work on abstract IndexReader anymore. This should be changed, by e.g. handling reopening in the IRFactory, also versions, commits,... Currently its not possible to implement any other IRFactory that returns something else.
- The PayloadProcessorProvider has a broken API, this should be fixed. The current patch mimics the old behaviour, but not 100%.

The remaining issues will be fixed in later issues!
                  
> Separate SegmentReaders (and other atomic readers) from composite IndexReaders
> ------------------------------------------------------------------------------
>
>                 Key: LUCENE-2858
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2858
>             Project: Lucene - Java
>          Issue Type: Task
>            Reporter: Uwe Schindler
>            Assignee: Uwe Schindler
>            Priority: Blocker
>             Fix For: 4.0
>
>         Attachments: LUCENE-2858.patch, LUCENE-2858.patch
>
>
> With current trunk, whenever you open an IndexReader on a directory you get back a DirectoryReader which is a composite reader. The interface of IndexReader has now lots of methods that simply throw UOE (in fact more than 50% of all methods that are commonly used ones are unuseable now). This confuses users and makes the API hard to understand.
> This issue should split "atomic readers" from "reader collections" with a separate API. After that, you are no longer able, to get TermsEnum without wrapping from those composite readers. We currently have helper classes for wrapping (SlowMultiReaderWrapper - please rename, the name is really ugly; or Multi*), those should be retrofitted to implement the correct classes (SlowMultiReaderWrapper would be an atomic reader but takes a composite reader as ctor param, maybe it could also simply take a List<AtomicReader>). In my opinion, maybe composite readers could implement some collection APIs and also have the ReaderUtil method directly built in (possibly as a "view" in the util.Collection sense). In general composite readers do not really need to look like the previous IndexReaders, they could simply be a "collection" of SegmentReaders with some functionality like reopen.
> On the other side, atomic readers do not need reopen logic anymore? When a segment changes, you need a new atomic reader? - maybe because of deletions thats not the best idea, but we should investigate. Maybe make the whole reopen logic simplier to use (ast least on the collection reader level).
> We should decide about good names, i have no preference at the moment.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Commented: (LUCENE-2858) Separate SegmentReaders (and other atomic readers) from composite IndexReaders

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-2858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12982057#action_12982057 ] 

Uwe Schindler commented on LUCENE-2858:
---------------------------------------

Any comments about removing write access from IndexReaders? I think setNorms() will be removed soo, but how about the others? I would propose to also make all IndexReaders simply *readers* not writers?

> Separate SegmentReaders (and other atomic readers) from composite IndexReaders
> ------------------------------------------------------------------------------
>
>                 Key: LUCENE-2858
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2858
>             Project: Lucene - Java
>          Issue Type: Task
>            Reporter: Uwe Schindler
>             Fix For: 4.0
>
>
> With current trunk, whenever you open an IndexReader on a directory you get back a DirectoryReader which is a composite reader. The interface of IndexReader has now lots of methods that simply throw UOE (in fact more than 50% of all methods that are commonly used ones are unuseable now). This confuses users and makes the API hard to understand.
> This issue should split "atomic readers" from "reader collections" with a separate API. After that, you are no longer able, to get TermsEnum without wrapping from those composite readers. We currently have helper classes for wrapping (SlowMultiReaderWrapper - please rename, the name is really ugly; or Multi*), those should be retrofitted to implement the correct classes (SlowMultiReaderWrapper would be an atomic reader but takes a composite reader as ctor param, maybe it could also simply take a List<AtomicReader>). In my opinion, maybe composite readers could implement some collection APIs and also have the ReaderUtil method directly built in (possibly as a "view" in the util.Collection sense). In general composite readers do not really need to look like the previous IndexReaders, they could simply be a "collection" of SegmentReaders with some functionality like reopen.
> On the other side, atomic readers do not need reopen logic anymore? When a segment changes, you need a new atomic reader? - maybe because of deletions thats not the best idea, but we should investigate. Maybe make the whole reopen logic simplier to use (ast least on the collection reader level).
> We should decide about good names, i have no preference at the moment.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Commented: (LUCENE-2858) Separate SegmentReaders (and other atomic readers) from composite IndexReaders

Posted by "Robert Muir (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-2858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12982102#action_12982102 ] 

Robert Muir commented on LUCENE-2858:
-------------------------------------

bq. I think setNorms() will be removed soon

Why do you think this?

On the norms cleanup issue, i only removed setNorm(float), because its completely useless.
All it did was call Similarity.getDefault().encode(float) + setNorm(byte).


> Separate SegmentReaders (and other atomic readers) from composite IndexReaders
> ------------------------------------------------------------------------------
>
>                 Key: LUCENE-2858
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2858
>             Project: Lucene - Java
>          Issue Type: Task
>            Reporter: Uwe Schindler
>             Fix For: 4.0
>
>
> With current trunk, whenever you open an IndexReader on a directory you get back a DirectoryReader which is a composite reader. The interface of IndexReader has now lots of methods that simply throw UOE (in fact more than 50% of all methods that are commonly used ones are unuseable now). This confuses users and makes the API hard to understand.
> This issue should split "atomic readers" from "reader collections" with a separate API. After that, you are no longer able, to get TermsEnum without wrapping from those composite readers. We currently have helper classes for wrapping (SlowMultiReaderWrapper - please rename, the name is really ugly; or Multi*), those should be retrofitted to implement the correct classes (SlowMultiReaderWrapper would be an atomic reader but takes a composite reader as ctor param, maybe it could also simply take a List<AtomicReader>). In my opinion, maybe composite readers could implement some collection APIs and also have the ReaderUtil method directly built in (possibly as a "view" in the util.Collection sense). In general composite readers do not really need to look like the previous IndexReaders, they could simply be a "collection" of SegmentReaders with some functionality like reopen.
> On the other side, atomic readers do not need reopen logic anymore? When a segment changes, you need a new atomic reader? - maybe because of deletions thats not the best idea, but we should investigate. Maybe make the whole reopen logic simplier to use (ast least on the collection reader level).
> We should decide about good names, i have no preference at the moment.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Issue Comment Edited] (LUCENE-2858) Separate SegmentReaders (and other atomic readers) from composite IndexReaders

Posted by "Uwe Schindler (Issue Comment Edited) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-2858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13196479#comment-13196479 ] 

Uwe Schindler edited comment on LUCENE-2858 at 1/30/12 10:21 PM:
-----------------------------------------------------------------

Here the final patch. Thanks for the good work and lots of help from Robert and Mike.

As this patch contains heavy changes, we will commit it as soon as possible so work can go on. It would be nice, if you would not commit anything until this is done.

There are some minor issues open:
- Directory is final and has only static factory methods. It is not possible to subclass it in any way. The problem is mainly Solr, as Solr accesses directory(), IndexCommits,... and therefore cannot work on abstract IndexReader anymore. This should be changed, by e.g. handling reopening in the IRFactory, also versions, commits,... Currently its not possible to implement any other IRFactory that returns something else.
- The PayloadProcessorProvider has a broken API, this should be fixed. The current patch mimics the old behaviour, but not 100%.

The remaining issues will be fixed in later issues!
                
      was (Author: thetaphi):
    Here the final patch. Thanks for the good work and lots of help from Robert and Mike.

As this patch contains heavy changes, we will commit it as soon as possible so work can go on. It would be nice, if you would not commit anything until this is done.

There are some minor issues open:
- Directory is final and has only static factory methods. It is not possible to subclass it in any way. The problem is mainly Solr, as Solr accesses directory(), IndexCommits,... and therefore cannot work on abstract IndexReader anymore. This should be changed, by e.g. handling reopening in the IRFactory, also versions, commits,... Currently its not possible to implement any other IRFactory that returns something else.
- The PayloadProcessorProvider has a broken API, this should be fixed. The current patch mimics the old behaviour, but not 100%.
                  
> Separate SegmentReaders (and other atomic readers) from composite IndexReaders
> ------------------------------------------------------------------------------
>
>                 Key: LUCENE-2858
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2858
>             Project: Lucene - Java
>          Issue Type: Task
>            Reporter: Uwe Schindler
>            Assignee: Uwe Schindler
>            Priority: Blocker
>             Fix For: 4.0
>
>         Attachments: LUCENE-2858.patch
>
>
> With current trunk, whenever you open an IndexReader on a directory you get back a DirectoryReader which is a composite reader. The interface of IndexReader has now lots of methods that simply throw UOE (in fact more than 50% of all methods that are commonly used ones are unuseable now). This confuses users and makes the API hard to understand.
> This issue should split "atomic readers" from "reader collections" with a separate API. After that, you are no longer able, to get TermsEnum without wrapping from those composite readers. We currently have helper classes for wrapping (SlowMultiReaderWrapper - please rename, the name is really ugly; or Multi*), those should be retrofitted to implement the correct classes (SlowMultiReaderWrapper would be an atomic reader but takes a composite reader as ctor param, maybe it could also simply take a List<AtomicReader>). In my opinion, maybe composite readers could implement some collection APIs and also have the ReaderUtil method directly built in (possibly as a "view" in the util.Collection sense). In general composite readers do not really need to look like the previous IndexReaders, they could simply be a "collection" of SegmentReaders with some functionality like reopen.
> On the other side, atomic readers do not need reopen logic anymore? When a segment changes, you need a new atomic reader? - maybe because of deletions thats not the best idea, but we should investigate. Maybe make the whole reopen logic simplier to use (ast least on the collection reader level).
> We should decide about good names, i have no preference at the moment.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Commented: (LUCENE-2858) Separate SegmentReaders (and other atomic readers) from composite IndexReaders

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-2858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12982107#action_12982107 ] 

Uwe Schindler commented on LUCENE-2858:
---------------------------------------

I was talking about replacing norms by CSF, maybe It's just not soon.

> Separate SegmentReaders (and other atomic readers) from composite IndexReaders
> ------------------------------------------------------------------------------
>
>                 Key: LUCENE-2858
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2858
>             Project: Lucene - Java
>          Issue Type: Task
>            Reporter: Uwe Schindler
>             Fix For: 4.0
>
>
> With current trunk, whenever you open an IndexReader on a directory you get back a DirectoryReader which is a composite reader. The interface of IndexReader has now lots of methods that simply throw UOE (in fact more than 50% of all methods that are commonly used ones are unuseable now). This confuses users and makes the API hard to understand.
> This issue should split "atomic readers" from "reader collections" with a separate API. After that, you are no longer able, to get TermsEnum without wrapping from those composite readers. We currently have helper classes for wrapping (SlowMultiReaderWrapper - please rename, the name is really ugly; or Multi*), those should be retrofitted to implement the correct classes (SlowMultiReaderWrapper would be an atomic reader but takes a composite reader as ctor param, maybe it could also simply take a List<AtomicReader>). In my opinion, maybe composite readers could implement some collection APIs and also have the ReaderUtil method directly built in (possibly as a "view" in the util.Collection sense). In general composite readers do not really need to look like the previous IndexReaders, they could simply be a "collection" of SegmentReaders with some functionality like reopen.
> On the other side, atomic readers do not need reopen logic anymore? When a segment changes, you need a new atomic reader? - maybe because of deletions thats not the best idea, but we should investigate. Maybe make the whole reopen logic simplier to use (ast least on the collection reader level).
> We should decide about good names, i have no preference at the moment.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (LUCENE-2858) Separate SegmentReaders (and other atomic readers) from composite IndexReaders

Posted by "Uwe Schindler (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-2858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13190663#comment-13190663 ] 

Uwe Schindler commented on LUCENE-2858:
---------------------------------------

More TODOs:

- Rename SlowMultiReaderWrapper -> SlowCompositeReaderWrapper (as its not only for MultiReaders, also other CompositeIndexReaders like DirectoryReader. Name is not in line with our current naming
- All version/commit/reopen stuff should go into DirectoryReader, its useless for the abstract IR interfaces
                
> Separate SegmentReaders (and other atomic readers) from composite IndexReaders
> ------------------------------------------------------------------------------
>
>                 Key: LUCENE-2858
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2858
>             Project: Lucene - Java
>          Issue Type: Task
>            Reporter: Uwe Schindler
>            Assignee: Uwe Schindler
>            Priority: Blocker
>             Fix For: 4.0
>
>
> With current trunk, whenever you open an IndexReader on a directory you get back a DirectoryReader which is a composite reader. The interface of IndexReader has now lots of methods that simply throw UOE (in fact more than 50% of all methods that are commonly used ones are unuseable now). This confuses users and makes the API hard to understand.
> This issue should split "atomic readers" from "reader collections" with a separate API. After that, you are no longer able, to get TermsEnum without wrapping from those composite readers. We currently have helper classes for wrapping (SlowMultiReaderWrapper - please rename, the name is really ugly; or Multi*), those should be retrofitted to implement the correct classes (SlowMultiReaderWrapper would be an atomic reader but takes a composite reader as ctor param, maybe it could also simply take a List<AtomicReader>). In my opinion, maybe composite readers could implement some collection APIs and also have the ReaderUtil method directly built in (possibly as a "view" in the util.Collection sense). In general composite readers do not really need to look like the previous IndexReaders, they could simply be a "collection" of SegmentReaders with some functionality like reopen.
> On the other side, atomic readers do not need reopen logic anymore? When a segment changes, you need a new atomic reader? - maybe because of deletions thats not the best idea, but we should investigate. Maybe make the whole reopen logic simplier to use (ast least on the collection reader level).
> We should decide about good names, i have no preference at the moment.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Assigned] (LUCENE-2858) Separate SegmentReaders (and other atomic readers) from composite IndexReaders

Posted by "Uwe Schindler (Assigned) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-2858?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Uwe Schindler reassigned LUCENE-2858:
-------------------------------------

    Assignee: Uwe Schindler
    
> Separate SegmentReaders (and other atomic readers) from composite IndexReaders
> ------------------------------------------------------------------------------
>
>                 Key: LUCENE-2858
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2858
>             Project: Lucene - Java
>          Issue Type: Task
>            Reporter: Uwe Schindler
>            Assignee: Uwe Schindler
>             Fix For: 4.0
>
>
> With current trunk, whenever you open an IndexReader on a directory you get back a DirectoryReader which is a composite reader. The interface of IndexReader has now lots of methods that simply throw UOE (in fact more than 50% of all methods that are commonly used ones are unuseable now). This confuses users and makes the API hard to understand.
> This issue should split "atomic readers" from "reader collections" with a separate API. After that, you are no longer able, to get TermsEnum without wrapping from those composite readers. We currently have helper classes for wrapping (SlowMultiReaderWrapper - please rename, the name is really ugly; or Multi*), those should be retrofitted to implement the correct classes (SlowMultiReaderWrapper would be an atomic reader but takes a composite reader as ctor param, maybe it could also simply take a List<AtomicReader>). In my opinion, maybe composite readers could implement some collection APIs and also have the ReaderUtil method directly built in (possibly as a "view" in the util.Collection sense). In general composite readers do not really need to look like the previous IndexReaders, they could simply be a "collection" of SegmentReaders with some functionality like reopen.
> On the other side, atomic readers do not need reopen logic anymore? When a segment changes, you need a new atomic reader? - maybe because of deletions thats not the best idea, but we should investigate. Maybe make the whole reopen logic simplier to use (ast least on the collection reader level).
> We should decide about good names, i have no preference at the moment.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Commented: (LUCENE-2858) Separate SegmentReaders (and other atomic readers) from composite IndexReaders

Posted by "Earwin Burrfoot (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-2858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12982166#action_12982166 ] 

Earwin Burrfoot commented on LUCENE-2858:
-----------------------------------------

APIs have to be there still. All that commity, segment-deletery, mutabley stuff (that spans both atomic and composite readers).
So, while your plan is viable, it won't remove that much cruft.

> Separate SegmentReaders (and other atomic readers) from composite IndexReaders
> ------------------------------------------------------------------------------
>
>                 Key: LUCENE-2858
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2858
>             Project: Lucene - Java
>          Issue Type: Task
>            Reporter: Uwe Schindler
>             Fix For: 4.0
>
>
> With current trunk, whenever you open an IndexReader on a directory you get back a DirectoryReader which is a composite reader. The interface of IndexReader has now lots of methods that simply throw UOE (in fact more than 50% of all methods that are commonly used ones are unuseable now). This confuses users and makes the API hard to understand.
> This issue should split "atomic readers" from "reader collections" with a separate API. After that, you are no longer able, to get TermsEnum without wrapping from those composite readers. We currently have helper classes for wrapping (SlowMultiReaderWrapper - please rename, the name is really ugly; or Multi*), those should be retrofitted to implement the correct classes (SlowMultiReaderWrapper would be an atomic reader but takes a composite reader as ctor param, maybe it could also simply take a List<AtomicReader>). In my opinion, maybe composite readers could implement some collection APIs and also have the ReaderUtil method directly built in (possibly as a "view" in the util.Collection sense). In general composite readers do not really need to look like the previous IndexReaders, they could simply be a "collection" of SegmentReaders with some functionality like reopen.
> On the other side, atomic readers do not need reopen logic anymore? When a segment changes, you need a new atomic reader? - maybe because of deletions thats not the best idea, but we should investigate. Maybe make the whole reopen logic simplier to use (ast least on the collection reader level).
> We should decide about good names, i have no preference at the moment.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Commented: (LUCENE-2858) Separate SegmentReaders (and other atomic readers) from composite IndexReaders

Posted by "Earwin Burrfoot (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-2858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12982126#action_12982126 ] 

Earwin Burrfoot commented on LUCENE-2858:
-----------------------------------------

bq. Any comments about removing write access from IndexReaders? I think setNorms() will be removed soon, but how about the others like deleteDocument()? I would propose to also make all IndexReaders simply readers not writers? 

Voting with all my extremities - yes!!

> Separate SegmentReaders (and other atomic readers) from composite IndexReaders
> ------------------------------------------------------------------------------
>
>                 Key: LUCENE-2858
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2858
>             Project: Lucene - Java
>          Issue Type: Task
>            Reporter: Uwe Schindler
>             Fix For: 4.0
>
>
> With current trunk, whenever you open an IndexReader on a directory you get back a DirectoryReader which is a composite reader. The interface of IndexReader has now lots of methods that simply throw UOE (in fact more than 50% of all methods that are commonly used ones are unuseable now). This confuses users and makes the API hard to understand.
> This issue should split "atomic readers" from "reader collections" with a separate API. After that, you are no longer able, to get TermsEnum without wrapping from those composite readers. We currently have helper classes for wrapping (SlowMultiReaderWrapper - please rename, the name is really ugly; or Multi*), those should be retrofitted to implement the correct classes (SlowMultiReaderWrapper would be an atomic reader but takes a composite reader as ctor param, maybe it could also simply take a List<AtomicReader>). In my opinion, maybe composite readers could implement some collection APIs and also have the ReaderUtil method directly built in (possibly as a "view" in the util.Collection sense). In general composite readers do not really need to look like the previous IndexReaders, they could simply be a "collection" of SegmentReaders with some functionality like reopen.
> On the other side, atomic readers do not need reopen logic anymore? When a segment changes, you need a new atomic reader? - maybe because of deletions thats not the best idea, but we should investigate. Maybe make the whole reopen logic simplier to use (ast least on the collection reader level).
> We should decide about good names, i have no preference at the moment.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Issue Comment Edited: (LUCENE-2858) Separate SegmentReaders (and other atomic readers) from composite IndexReaders

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-2858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12982057#action_12982057 ] 

Uwe Schindler edited comment on LUCENE-2858 at 1/15/11 5:26 AM:
----------------------------------------------------------------

Any comments about removing write access from IndexReaders? I think setNorms() will be removed soon, but how about the others like deleteDocument()? I would propose to also make all IndexReaders simply *readers* not writers?

      was (Author: thetaphi):
    Any comments about removing write access from IndexReaders? I think setNorms() will be removed soo, but how about the others? I would propose to also make all IndexReaders simply *readers* not writers?
  
> Separate SegmentReaders (and other atomic readers) from composite IndexReaders
> ------------------------------------------------------------------------------
>
>                 Key: LUCENE-2858
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2858
>             Project: Lucene - Java
>          Issue Type: Task
>            Reporter: Uwe Schindler
>             Fix For: 4.0
>
>
> With current trunk, whenever you open an IndexReader on a directory you get back a DirectoryReader which is a composite reader. The interface of IndexReader has now lots of methods that simply throw UOE (in fact more than 50% of all methods that are commonly used ones are unuseable now). This confuses users and makes the API hard to understand.
> This issue should split "atomic readers" from "reader collections" with a separate API. After that, you are no longer able, to get TermsEnum without wrapping from those composite readers. We currently have helper classes for wrapping (SlowMultiReaderWrapper - please rename, the name is really ugly; or Multi*), those should be retrofitted to implement the correct classes (SlowMultiReaderWrapper would be an atomic reader but takes a composite reader as ctor param, maybe it could also simply take a List<AtomicReader>). In my opinion, maybe composite readers could implement some collection APIs and also have the ReaderUtil method directly built in (possibly as a "view" in the util.Collection sense). In general composite readers do not really need to look like the previous IndexReaders, they could simply be a "collection" of SegmentReaders with some functionality like reopen.
> On the other side, atomic readers do not need reopen logic anymore? When a segment changes, you need a new atomic reader? - maybe because of deletions thats not the best idea, but we should investigate. Maybe make the whole reopen logic simplier to use (ast least on the collection reader level).
> We should decide about good names, i have no preference at the moment.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org