You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Jason Rutherglen (JIRA)" <ji...@apache.org> on 2009/01/10 02:01:59 UTC

[jira] Created: (LUCENE-1516) Integrate IndexReader with IndexWriter

Integrate IndexReader with IndexWriter 
---------------------------------------

                 Key: LUCENE-1516
                 URL: https://issues.apache.org/jira/browse/LUCENE-1516
             Project: Lucene - Java
          Issue Type: Improvement
    Affects Versions: 2.4
            Reporter: Jason Rutherglen
            Priority: Minor
             Fix For: 2.9


The current problem is an IndexReader and IndexWriter cannot be open
at the same time and perform updates as they both require a write
lock to the index. While methods such as IW.deleteDocuments enables
deleting from IW, methods such as IR.deleteDocument(int doc) and
norms updating are not available from IW. This limits the
capabilities of performing updates to the index dynamically or in
realtime without closing the IW and opening an IR, deleting or
updating norms, flushing, then opening the IW again, a process which
can be detrimental to realtime updates. 

This patch will expose an IndexWriter.getReader method that returns
the currently flushed state of the index as a class that implements
IndexReader. The new IR implementation will differ from existing IR
implementations such as MultiSegmentReader in that flushing will
synchronize updates with IW in part by sharing the write lock. All
methods of IR will be usable including reopen and clone. 


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Updated: (LUCENE-1516) Integrate IndexReader with IndexWriter

Posted by "Jason Rutherglen (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-1516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jason Rutherglen updated LUCENE-1516:
-------------------------------------

    Attachment: LUCENE-1516.patch

-	commitMergedDeletes uses the docIdMap from commitMerge SegmentMerger to map new deletes to the newly merged segment

-	Removed applyDeletesImmediately

-	Haven't solved the decRef issue which appears in TestIndexWriterReader.testDeleteFromIndexWriter.  It doesn't show up in the debugger.

{code}
[junit] java.lang.AssertionError
    [junit] 	at org.apache.lucene.index.SegmentReader$Ref.incRef(SegmentReader.java:103)
    [junit] 	at org.apache.lucene.index.SegmentReader.incRef(SegmentReader.java:341)
    [junit] 	at org.apache.lucene.index.IndexWriter$SegmentReaderPool.get(IndexWriter.java:647)
    [junit] 	at org.apache.lucene.index.IndexWriter$SegmentReaderPool.getReadOnlyClone(IndexWriter.java:615)
    [junit] 	at org.apache.lucene.index.MultiSegmentReader.<init>(MultiSegmentReader.java:86)
    [junit] 	at org.apache.lucene.index.ReadOnlyMultiSegmentReader.<init>(ReadOnlyMultiSegmentReader.java:35)
    [junit] 	at org.apache.lucene.index.IndexWriter.getReader(IndexWriter.java:396)
    [junit] 	at org.apache.lucene.index.TestIndexWriterReader.testDeleteFromIndexWriter(TestIndexWriterReader.java:159)
    [junit] 	at org.apache.lucene.util.LuceneTestCase.runTest(LuceneTestCase.java:88)

{code}


> Integrate IndexReader with IndexWriter 
> ---------------------------------------
>
>                 Key: LUCENE-1516
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1516
>             Project: Lucene - Java
>          Issue Type: Improvement
>    Affects Versions: 2.4
>            Reporter: Jason Rutherglen
>            Assignee: Michael McCandless
>            Priority: Minor
>             Fix For: 2.9
>
>         Attachments: LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> The current problem is an IndexReader and IndexWriter cannot be open
> at the same time and perform updates as they both require a write
> lock to the index. While methods such as IW.deleteDocuments enables
> deleting from IW, methods such as IR.deleteDocument(int doc) and
> norms updating are not available from IW. This limits the
> capabilities of performing updates to the index dynamically or in
> realtime without closing the IW and opening an IR, deleting or
> updating norms, flushing, then opening the IW again, a process which
> can be detrimental to realtime updates. 
> This patch will expose an IndexWriter.getReader method that returns
> the currently flushed state of the index as a class that implements
> IndexReader. The new IR implementation will differ from existing IR
> implementations such as MultiSegmentReader in that flushing will
> synchronize updates with IW in part by sharing the write lock. All
> methods of IR will be usable including reopen and clone. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Updated: (LUCENE-1516) Integrate IndexReader with IndexWriter

Posted by "Jason Rutherglen (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-1516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jason Rutherglen updated LUCENE-1516:
-------------------------------------

    Attachment: LUCENE-1516.patch

- Added segmentReaders, segmentReaderClone to MergePolicy.OneMerge
- mergeFinish decRefs the OneMerge.segmentReaders
- _mergeInit clones segmentreaders
- commitMergedDeletes uses OneMerge.segmentReaders instead of loading 
bitvectors from the directory
- testAddIndexesAndDoDeletesThreads fails with 20 less documents (80 vs 100)
(an index that was supposed to be added isn't showing up)
- Getting exceptions from org.apache.lucene.TestSnapshotDeletionPolicy, such as:

{code}
Caused by: java.io.IOException: read past EOF
    [junit] 	at org.apache.lucene.store.BufferedIndexInput.readBytes(BufferedIndexInput.java:135)
    [junit] 	at org.apache.lucene.store.BufferedIndexInput.readBytes(BufferedIndexInput.java:92)
    [junit] 	at org.apache.lucene.store.IndexOutput.copyBytes(IndexOutput.java:172)
    [junit] 	at org.apache.lucene.index.TermVectorsWriter.addRawDocuments(TermVectorsWriter.java:185)
    [junit] 	at org.apache.lucene.index.SegmentMerger.mergeVectors(SegmentMerger.java:447)
    [junit] 	at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:145)
    [junit] 	at org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4823)
    [junit] 	at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:4408)
    [junit] 	at org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:218)
    [junit] 	at org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:274)
{code}

> Integrate IndexReader with IndexWriter 
> ---------------------------------------
>
>                 Key: LUCENE-1516
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1516
>             Project: Lucene - Java
>          Issue Type: Improvement
>    Affects Versions: 2.4
>            Reporter: Jason Rutherglen
>            Assignee: Michael McCandless
>            Priority: Minor
>             Fix For: 2.9
>
>         Attachments: LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> The current problem is an IndexReader and IndexWriter cannot be open
> at the same time and perform updates as they both require a write
> lock to the index. While methods such as IW.deleteDocuments enables
> deleting from IW, methods such as IR.deleteDocument(int doc) and
> norms updating are not available from IW. This limits the
> capabilities of performing updates to the index dynamically or in
> realtime without closing the IW and opening an IR, deleting or
> updating norms, flushing, then opening the IW again, a process which
> can be detrimental to realtime updates. 
> This patch will expose an IndexWriter.getReader method that returns
> the currently flushed state of the index as a class that implements
> IndexReader. The new IR implementation will differ from existing IR
> implementations such as MultiSegmentReader in that flushing will
> synchronize updates with IW in part by sharing the write lock. All
> methods of IR will be usable including reopen and clone. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1516) Integrate IndexReader with IndexWriter

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12679939#action_12679939 ] 

Michael McCandless commented on LUCENE-1516:
--------------------------------------------

{quote}
mergeMiddle isn't synchronized so if we use a pooled reader
(instead of a frozen clone) couldn't deletes be applied to the SR as
merging is happening?
{quote}
Hmm -- good point.  You're right, we need to clone the reader up front, so deletions don't change during the merging process.

In which case let's just stick with the current approach in the patch?

> Integrate IndexReader with IndexWriter 
> ---------------------------------------
>
>                 Key: LUCENE-1516
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1516
>             Project: Lucene - Java
>          Issue Type: Improvement
>    Affects Versions: 2.4
>            Reporter: Jason Rutherglen
>            Assignee: Michael McCandless
>            Priority: Minor
>             Fix For: 2.9
>
>         Attachments: LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> The current problem is an IndexReader and IndexWriter cannot be open
> at the same time and perform updates as they both require a write
> lock to the index. While methods such as IW.deleteDocuments enables
> deleting from IW, methods such as IR.deleteDocument(int doc) and
> norms updating are not available from IW. This limits the
> capabilities of performing updates to the index dynamically or in
> realtime without closing the IW and opening an IR, deleting or
> updating norms, flushing, then opening the IW again, a process which
> can be detrimental to realtime updates. 
> This patch will expose an IndexWriter.getReader method that returns
> the currently flushed state of the index as a class that implements
> IndexReader. The new IR implementation will differ from existing IR
> implementations such as MultiSegmentReader in that flushing will
> synchronize updates with IW in part by sharing the write lock. All
> methods of IR will be usable including reopen and clone. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1516) Integrate IndexReader with IndexWriter

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12720569#action_12720569 ] 

Michael McCandless commented on LUCENE-1516:
--------------------------------------------

{quote}
Currently we check the info for deletes, however with this
patch, I think we need to check the segmentReader which could
have deletes that don't show up in the info.
{quote}

Good catch!  Can you open a new issue & attach patch?  Though: how would you do this?  Right now MergePolicy never receives a SegmentReader, and makes all its decisions based on the SegmentInfo.  Each SegmentReader tracks its own pendingDelCount... maybe we add a private pendingDelCount to SegmentInfo, and change SegmentReader to use that instead?  That'd be a single source, and then the merge policy could retrieve it...

> Integrate IndexReader with IndexWriter 
> ---------------------------------------
>
>                 Key: LUCENE-1516
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1516
>             Project: Lucene - Java
>          Issue Type: Improvement
>    Affects Versions: 2.4
>            Reporter: Jason Rutherglen
>            Assignee: Michael McCandless
>            Priority: Minor
>             Fix For: 2.9
>
>         Attachments: LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, magnetic.png, ssd.png, ssd2.png
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> The current problem is an IndexReader and IndexWriter cannot be open
> at the same time and perform updates as they both require a write
> lock to the index. While methods such as IW.deleteDocuments enables
> deleting from IW, methods such as IR.deleteDocument(int doc) and
> norms updating are not available from IW. This limits the
> capabilities of performing updates to the index dynamically or in
> realtime without closing the IW and opening an IR, deleting or
> updating norms, flushing, then opening the IW again, a process which
> can be detrimental to realtime updates. 
> This patch will expose an IndexWriter.getReader method that returns
> the currently flushed state of the index as a class that implements
> IndexReader. The new IR implementation will differ from existing IR
> implementations such as MultiSegmentReader in that flushing will
> synchronize updates with IW in part by sharing the write lock. All
> methods of IR will be usable including reopen and clone. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1516) Integrate IndexReader with IndexWriter

Posted by "Jason Rutherglen (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12679404#action_12679404 ] 

Jason Rutherglen commented on LUCENE-1516:
------------------------------------------

TestIndexWriterDelete.testOperationsOnDiskFull fails with
MockRAMDirectory.close (still open files) because IW.close isn't
called.

TestIndexWriter.testImmediateDiskFullWithThreads fails because
IW.close fails on the disk full exception. Should IW.closeInternal ->
SegmentReaderPool.close be placed in the finally clause?

TestTransactions fails because the purposely random failure occurs
during IW.close (SRP.close specifically) 

The TestStressIndexing2 failure could be tricky

> Integrate IndexReader with IndexWriter 
> ---------------------------------------
>
>                 Key: LUCENE-1516
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1516
>             Project: Lucene - Java
>          Issue Type: Improvement
>    Affects Versions: 2.4
>            Reporter: Jason Rutherglen
>            Assignee: Michael McCandless
>            Priority: Minor
>             Fix For: 2.9
>
>         Attachments: LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> The current problem is an IndexReader and IndexWriter cannot be open
> at the same time and perform updates as they both require a write
> lock to the index. While methods such as IW.deleteDocuments enables
> deleting from IW, methods such as IR.deleteDocument(int doc) and
> norms updating are not available from IW. This limits the
> capabilities of performing updates to the index dynamically or in
> realtime without closing the IW and opening an IR, deleting or
> updating norms, flushing, then opening the IW again, a process which
> can be detrimental to realtime updates. 
> This patch will expose an IndexWriter.getReader method that returns
> the currently flushed state of the index as a class that implements
> IndexReader. The new IR implementation will differ from existing IR
> implementations such as MultiSegmentReader in that flushing will
> synchronize updates with IW in part by sharing the write lock. All
> methods of IR will be usable including reopen and clone. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1516) Integrate IndexReader with IndexWriter

Posted by "Jason Rutherglen (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12720427#action_12720427 ] 

Jason Rutherglen commented on LUCENE-1516:
------------------------------------------

In LogMergePolicy.findMergesToExpungeDeletes I think we need to
change how deletes are checked for. I added this change into
LUCENE-1313, however as we're about to release 2.9, we probably
need to implement it. 

Currently we check the info for deletes, however with this
patch, I think we need to check the segmentReader which could
have deletes that don't show up in the info.

> Integrate IndexReader with IndexWriter 
> ---------------------------------------
>
>                 Key: LUCENE-1516
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1516
>             Project: Lucene - Java
>          Issue Type: Improvement
>    Affects Versions: 2.4
>            Reporter: Jason Rutherglen
>            Assignee: Michael McCandless
>            Priority: Minor
>             Fix For: 2.9
>
>         Attachments: LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, magnetic.png, ssd.png, ssd2.png
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> The current problem is an IndexReader and IndexWriter cannot be open
> at the same time and perform updates as they both require a write
> lock to the index. While methods such as IW.deleteDocuments enables
> deleting from IW, methods such as IR.deleteDocument(int doc) and
> norms updating are not available from IW. This limits the
> capabilities of performing updates to the index dynamically or in
> realtime without closing the IW and opening an IR, deleting or
> updating norms, flushing, then opening the IW again, a process which
> can be detrimental to realtime updates. 
> This patch will expose an IndexWriter.getReader method that returns
> the currently flushed state of the index as a class that implements
> IndexReader. The new IR implementation will differ from existing IR
> implementations such as MultiSegmentReader in that flushing will
> synchronize updates with IW in part by sharing the write lock. All
> methods of IR will be usable including reopen and clone. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1516) Integrate IndexReader with IndexWriter

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12693932#action_12693932 ] 

Michael McCandless commented on LUCENE-1516:
--------------------------------------------

Nice work to you too -- I just picked things up where you left off!  I'm still working on the nocommits, but I think we can commit this soon.

> Integrate IndexReader with IndexWriter 
> ---------------------------------------
>
>                 Key: LUCENE-1516
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1516
>             Project: Lucene - Java
>          Issue Type: Improvement
>    Affects Versions: 2.4
>            Reporter: Jason Rutherglen
>            Assignee: Michael McCandless
>            Priority: Minor
>             Fix For: 2.9
>
>         Attachments: LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, magnetic.png, ssd.png, ssd2.png
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> The current problem is an IndexReader and IndexWriter cannot be open
> at the same time and perform updates as they both require a write
> lock to the index. While methods such as IW.deleteDocuments enables
> deleting from IW, methods such as IR.deleteDocument(int doc) and
> norms updating are not available from IW. This limits the
> capabilities of performing updates to the index dynamically or in
> realtime without closing the IW and opening an IR, deleting or
> updating norms, flushing, then opening the IW again, a process which
> can be detrimental to realtime updates. 
> This patch will expose an IndexWriter.getReader method that returns
> the currently flushed state of the index as a class that implements
> IndexReader. The new IR implementation will differ from existing IR
> implementations such as MultiSegmentReader in that flushing will
> synchronize updates with IW in part by sharing the write lock. All
> methods of IR will be usable including reopen and clone. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Updated: (LUCENE-1516) Integrate IndexReader with IndexWriter

Posted by "Jason Rutherglen (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-1516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jason Rutherglen updated LUCENE-1516:
-------------------------------------

    Attachment: LUCENE-1516.patch

- Revised to trunk
- Still an incRef issue.
TestIndexWriterReader.testDeleteFromIndexWriter fails.
TestConcurrentMergeScheduler fails, as well as others
- Removed the working copy stuff from SegmentReaderPool as the
deletes need to be applied to the internal readers, the external
IW.getReader is a clone of the internal SRs.


> Integrate IndexReader with IndexWriter 
> ---------------------------------------
>
>                 Key: LUCENE-1516
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1516
>             Project: Lucene - Java
>          Issue Type: Improvement
>    Affects Versions: 2.4
>            Reporter: Jason Rutherglen
>            Assignee: Michael McCandless
>            Priority: Minor
>             Fix For: 2.9
>
>         Attachments: LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> The current problem is an IndexReader and IndexWriter cannot be open
> at the same time and perform updates as they both require a write
> lock to the index. While methods such as IW.deleteDocuments enables
> deleting from IW, methods such as IR.deleteDocument(int doc) and
> norms updating are not available from IW. This limits the
> capabilities of performing updates to the index dynamically or in
> realtime without closing the IW and opening an IR, deleting or
> updating norms, flushing, then opening the IW again, a process which
> can be detrimental to realtime updates. 
> This patch will expose an IndexWriter.getReader method that returns
> the currently flushed state of the index as a class that implements
> IndexReader. The new IR implementation will differ from existing IR
> implementations such as MultiSegmentReader in that flushing will
> synchronize updates with IW in part by sharing the write lock. All
> methods of IR will be usable including reopen and clone. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Updated: (LUCENE-1516) Integrate IndexReader with IndexWriter

Posted by "Jason Rutherglen (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-1516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jason Rutherglen updated LUCENE-1516:
-------------------------------------

    Attachment: LUCENE-1516.patch

* In IW.checkpoint, SegmentReaderPool.closeUnusedSegmentReaders is
called which gathers a set of all currently active segment names and
closes the segment readers not in the set. When the IW is closed an
assertion checks to insure the IW.segmentInfos names are the same as
those in the pool. 

* Bunches of private variables were made package protected and need to
be changed back


> Integrate IndexReader with IndexWriter 
> ---------------------------------------
>
>                 Key: LUCENE-1516
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1516
>             Project: Lucene - Java
>          Issue Type: Improvement
>    Affects Versions: 2.4
>            Reporter: Jason Rutherglen
>            Assignee: Michael McCandless
>            Priority: Minor
>             Fix For: 2.9
>
>         Attachments: LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> The current problem is an IndexReader and IndexWriter cannot be open
> at the same time and perform updates as they both require a write
> lock to the index. While methods such as IW.deleteDocuments enables
> deleting from IW, methods such as IR.deleteDocument(int doc) and
> norms updating are not available from IW. This limits the
> capabilities of performing updates to the index dynamically or in
> realtime without closing the IW and opening an IR, deleting or
> updating norms, flushing, then opening the IW again, a process which
> can be detrimental to realtime updates. 
> This patch will expose an IndexWriter.getReader method that returns
> the currently flushed state of the index as a class that implements
> IndexReader. The new IR implementation will differ from existing IR
> implementations such as MultiSegmentReader in that flushing will
> synchronize updates with IW in part by sharing the write lock. All
> methods of IR will be usable including reopen and clone. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Updated: (LUCENE-1516) Integrate IndexReader with IndexWriter

Posted by "Jason Rutherglen (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-1516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jason Rutherglen updated LUCENE-1516:
-------------------------------------

    Attachment: LUCENE-1516.patch

* The patch is updated to trunk, in most tests "there are still open
files" failures occur now (maybe it's not related to the latest revision)

* mergeMiddle isn't synchronized so if we use a pooled reader
(instead of a frozen clone) couldn't deletes be applied to the SR as
merging is happening? 

> Integrate IndexReader with IndexWriter 
> ---------------------------------------
>
>                 Key: LUCENE-1516
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1516
>             Project: Lucene - Java
>          Issue Type: Improvement
>    Affects Versions: 2.4
>            Reporter: Jason Rutherglen
>            Assignee: Michael McCandless
>            Priority: Minor
>             Fix For: 2.9
>
>         Attachments: LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> The current problem is an IndexReader and IndexWriter cannot be open
> at the same time and perform updates as they both require a write
> lock to the index. While methods such as IW.deleteDocuments enables
> deleting from IW, methods such as IR.deleteDocument(int doc) and
> norms updating are not available from IW. This limits the
> capabilities of performing updates to the index dynamically or in
> realtime without closing the IW and opening an IR, deleting or
> updating norms, flushing, then opening the IW again, a process which
> can be detrimental to realtime updates. 
> This patch will expose an IndexWriter.getReader method that returns
> the currently flushed state of the index as a class that implements
> IndexReader. The new IR implementation will differ from existing IR
> implementations such as MultiSegmentReader in that flushing will
> synchronize updates with IW in part by sharing the write lock. All
> methods of IR will be usable including reopen and clone. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1516) Integrate IndexReader with IndexWriter

Posted by "Jason Rutherglen (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12662603#action_12662603 ] 

Jason Rutherglen commented on LUCENE-1516:
------------------------------------------

It looks like DirectoryIndexReader needs to have an IndexWriter mode
as unfortunately subclassing won't work. In this context overriding a
method implies the IW mode is being used.

I assume we'll share the segmentInfos object from IW rather than
share a clone with the IR?

* DirectoryIndexReader.doCommit needs to be overridden 
* IndexReader uses the IndexDeletionPolicy from IndexWriter 
* DirectoryIndexReader.acquireWriteLock is overridden and synchronizes
  on the write lock of IndexWriter, other details need to be worked out
* Method requiring synchronization in IW: optimize, expungeDeletes,
  prepareCommit, flush, addIndexes, methods that modify segmentInfos.
  Do we synchronize on IW or writeLock? 
* addDocument, updateDocument, deleteDocument do not seem to require 
  synchronization 
* IW.getReader returns a cloned reader. IW keeps it's own reference. 
  This is to allow IW to perform deletes using the internal reader rather than 
  open a new reader and in general not affect IW's reader while still sharing
  resources.


> Integrate IndexReader with IndexWriter 
> ---------------------------------------
>
>                 Key: LUCENE-1516
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1516
>             Project: Lucene - Java
>          Issue Type: Improvement
>    Affects Versions: 2.4
>            Reporter: Jason Rutherglen
>            Priority: Minor
>             Fix For: 2.9
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> The current problem is an IndexReader and IndexWriter cannot be open
> at the same time and perform updates as they both require a write
> lock to the index. While methods such as IW.deleteDocuments enables
> deleting from IW, methods such as IR.deleteDocument(int doc) and
> norms updating are not available from IW. This limits the
> capabilities of performing updates to the index dynamically or in
> realtime without closing the IW and opening an IR, deleting or
> updating norms, flushing, then opening the IW again, a process which
> can be detrimental to realtime updates. 
> This patch will expose an IndexWriter.getReader method that returns
> the currently flushed state of the index as a class that implements
> IndexReader. The new IR implementation will differ from existing IR
> implementations such as MultiSegmentReader in that flushing will
> synchronize updates with IW in part by sharing the write lock. All
> methods of IR will be usable including reopen and clone. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1516) Integrate IndexReader with IndexWriter

Posted by "Jason Rutherglen (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12675206#action_12675206 ] 

Jason Rutherglen commented on LUCENE-1516:
------------------------------------------

The path forward seems to be exposing a cloned readonly reader
from IW.getReader. This would be easier than doing hula hoops to do
segment genealogy (at least for now ☺)

{quote}can't we move away from allowing any changes via IR? (Ie
deprecate deleteDocuments/setNorms/etc.){quote}

This would simplify things however as a thought experiment how would
the setNorms work if it were a part of IndexWriter? 

{quote} And, clone should not be reopening segments...? {quote}

DirectoryIndexReader.clone(boolean openReadonly) calls
doReopen(SegmentInfos infos, boolean doClone, boolean openReadOnly)
which is an abstract method that in SegmentReader and
MultiSegmentReader reopens the segments? The segment infos for a
ReaderIW is obtained from IW, which is how it knows about the new
segments. Perhaps not desired behavior?

{quote} do we need delete by docID once we have realtime search? I
think the last compelling reason to keep IR's delete by docID was
immediacy, but realtime search can give us that, from IW, even when
deleting by Term or Query? {quote}

Good point! I think we may want to support it but for now it's
shouldn't be necessary. I'm thinking of the case where someone is
using the field cache (or some variant), performs some sort of query
on it and then needs to delete based on doc id. What do they do?
Would we expose a callback mechanism where a deleteFrom(IndexReader
ir) method is exposed and deletes occur at the time of the IW's
choosing?

{quote} It seems like calling reader.reopen() (on reader obtained
from writer) should basically do the same thing as calling
writer.getReader(). Ie they are nearly synonyms? (Except for small
difference in ref counting - I think writer.getReader() should always
incRef, but reopen only incRefs if it returns a new reader). {quote}

Perhaps ReaderIW.reopen will call IW.getReader underneath instead of
using IR's usual mechanism.




> Integrate IndexReader with IndexWriter 
> ---------------------------------------
>
>                 Key: LUCENE-1516
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1516
>             Project: Lucene - Java
>          Issue Type: Improvement
>    Affects Versions: 2.4
>            Reporter: Jason Rutherglen
>            Priority: Minor
>             Fix For: 2.9
>
>         Attachments: LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> The current problem is an IndexReader and IndexWriter cannot be open
> at the same time and perform updates as they both require a write
> lock to the index. While methods such as IW.deleteDocuments enables
> deleting from IW, methods such as IR.deleteDocument(int doc) and
> norms updating are not available from IW. This limits the
> capabilities of performing updates to the index dynamically or in
> realtime without closing the IW and opening an IR, deleting or
> updating norms, flushing, then opening the IW again, a process which
> can be detrimental to realtime updates. 
> This patch will expose an IndexWriter.getReader method that returns
> the currently flushed state of the index as a class that implements
> IndexReader. The new IR implementation will differ from existing IR
> implementations such as MultiSegmentReader in that flushing will
> synchronize updates with IW in part by sharing the write lock. All
> methods of IR will be usable including reopen and clone. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1516) Integrate IndexReader with IndexWriter

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12700459#action_12700459 ] 

Michael McCandless commented on LUCENE-1516:
--------------------------------------------

This is the alg I use:

{code}
{ "BuildIndex"
  - OpenIndex
  - NearRealtimeReader(1)
   { "UpdateDocs" UpdateDoc > : 100000 : 50/sec
  - CloseIndex
}
{code}

Ie, using a BG thread, the NRT reader reopens once per second, runs a fixed search, and prints timing info.

Meanwhile, a single thread replaces the docs at the rate of 50/sec up to 100K docs.

> Integrate IndexReader with IndexWriter 
> ---------------------------------------
>
>                 Key: LUCENE-1516
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1516
>             Project: Lucene - Java
>          Issue Type: Improvement
>    Affects Versions: 2.4
>            Reporter: Jason Rutherglen
>            Assignee: Michael McCandless
>            Priority: Minor
>             Fix For: 2.9
>
>         Attachments: LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, magnetic.png, ssd.png, ssd2.png
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> The current problem is an IndexReader and IndexWriter cannot be open
> at the same time and perform updates as they both require a write
> lock to the index. While methods such as IW.deleteDocuments enables
> deleting from IW, methods such as IR.deleteDocument(int doc) and
> norms updating are not available from IW. This limits the
> capabilities of performing updates to the index dynamically or in
> realtime without closing the IW and opening an IR, deleting or
> updating norms, flushing, then opening the IW again, a process which
> can be detrimental to realtime updates. 
> This patch will expose an IndexWriter.getReader method that returns
> the currently flushed state of the index as a class that implements
> IndexReader. The new IR implementation will differ from existing IR
> implementations such as MultiSegmentReader in that flushing will
> synchronize updates with IW in part by sharing the write lock. All
> methods of IR will be usable including reopen and clone. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Updated: (LUCENE-1516) Integrate IndexReader with IndexWriter

Posted by "Jason Rutherglen (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-1516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jason Rutherglen updated LUCENE-1516:
-------------------------------------

    Attachment: LUCENE-1516.patch

- Fixed the basic concurrency merge issue (TestSnapshotDeletionPolicy passes)
- Many unit tests fail due to other reasons

> Integrate IndexReader with IndexWriter 
> ---------------------------------------
>
>                 Key: LUCENE-1516
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1516
>             Project: Lucene - Java
>          Issue Type: Improvement
>    Affects Versions: 2.4
>            Reporter: Jason Rutherglen
>            Assignee: Michael McCandless
>            Priority: Minor
>             Fix For: 2.9
>
>         Attachments: LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> The current problem is an IndexReader and IndexWriter cannot be open
> at the same time and perform updates as they both require a write
> lock to the index. While methods such as IW.deleteDocuments enables
> deleting from IW, methods such as IR.deleteDocument(int doc) and
> norms updating are not available from IW. This limits the
> capabilities of performing updates to the index dynamically or in
> realtime without closing the IW and opening an IR, deleting or
> updating norms, flushing, then opening the IW again, a process which
> can be detrimental to realtime updates. 
> This patch will expose an IndexWriter.getReader method that returns
> the currently flushed state of the index as a class that implements
> IndexReader. The new IR implementation will differ from existing IR
> implementations such as MultiSegmentReader in that flushing will
> synchronize updates with IW in part by sharing the write lock. All
> methods of IR will be usable including reopen and clone. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1516) Integrate IndexReader with IndexWriter

Posted by "Jason Rutherglen (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12679355#action_12679355 ] 

Jason Rutherglen commented on LUCENE-1516:
------------------------------------------

Here's a summary of the which tests are failing with a grouped by the
guessed cause:

* TestDeletionPolicy, TestIndexFileDeleter, TestIndexReader fails (not
all methods but they seem to be related, i.e. files are being left
open that should not be)

* In TestIndexWriter most test methods pass, however a few such as
testCommitOnCloseDiskUsage fail.
TestIndexWriterDelete.testUpdatesOnDiskFull fails for reasons
presumably similar to TestIndexWriter.testCommitOnCloseDiskUsage

* TestTransactions and TestStressIndexing2 fails. At first glance I'm
not sure why 

> Integrate IndexReader with IndexWriter 
> ---------------------------------------
>
>                 Key: LUCENE-1516
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1516
>             Project: Lucene - Java
>          Issue Type: Improvement
>    Affects Versions: 2.4
>            Reporter: Jason Rutherglen
>            Assignee: Michael McCandless
>            Priority: Minor
>             Fix For: 2.9
>
>         Attachments: LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> The current problem is an IndexReader and IndexWriter cannot be open
> at the same time and perform updates as they both require a write
> lock to the index. While methods such as IW.deleteDocuments enables
> deleting from IW, methods such as IR.deleteDocument(int doc) and
> norms updating are not available from IW. This limits the
> capabilities of performing updates to the index dynamically or in
> realtime without closing the IW and opening an IR, deleting or
> updating norms, flushing, then opening the IW again, a process which
> can be detrimental to realtime updates. 
> This patch will expose an IndexWriter.getReader method that returns
> the currently flushed state of the index as a class that implements
> IndexReader. The new IR implementation will differ from existing IR
> implementations such as MultiSegmentReader in that flushing will
> synchronize updates with IW in part by sharing the write lock. All
> methods of IR will be usable including reopen and clone. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1516) Integrate IndexReader with IndexWriter

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12675499#action_12675499 ] 

Michael McCandless commented on LUCENE-1516:
--------------------------------------------

bq. I guess it depends on the performance hit.

It's challenging to implement truly live updates w/ decent
performance: I think we'd need to build the reader impl that can
search DocumentsWriter buffer.

Whereas the approach (patch) here is actually quite simple (all the
hard work was already done -- IndexReader.reopen,
collection/sorting/filtering by segment, etc.).

bq. In other words, my guess is that over time, as the performance proves out, it will be the default choice, not "expert".

I agree: realtime search will likely be a popular feature once we
finish it, release it, it proves stable, performant, etc.  Eventually
(maybe soon) it should be made the default.

I think IndexAccessor makes alot of sense, but it's a big change and
I'd rather not couple it to this issue.  There are many questions to
be hashed out (under a new issue): is it a simple pass-through?  Or
does it manage the lifecycle of the readers for you?  Does it warm new
readers?  Should it emulate "live" update semantics?  Should getReader
get it from the writer if there is one (ie, making realtime search the
"default")?  Etc.


> Integrate IndexReader with IndexWriter 
> ---------------------------------------
>
>                 Key: LUCENE-1516
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1516
>             Project: Lucene - Java
>          Issue Type: Improvement
>    Affects Versions: 2.4
>            Reporter: Jason Rutherglen
>            Assignee: Michael McCandless
>            Priority: Minor
>             Fix For: 2.9
>
>         Attachments: LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> The current problem is an IndexReader and IndexWriter cannot be open
> at the same time and perform updates as they both require a write
> lock to the index. While methods such as IW.deleteDocuments enables
> deleting from IW, methods such as IR.deleteDocument(int doc) and
> norms updating are not available from IW. This limits the
> capabilities of performing updates to the index dynamically or in
> realtime without closing the IW and opening an IR, deleting or
> updating norms, flushing, then opening the IW again, a process which
> can be detrimental to realtime updates. 
> This patch will expose an IndexWriter.getReader method that returns
> the currently flushed state of the index as a class that implements
> IndexReader. The new IR implementation will differ from existing IR
> implementations such as MultiSegmentReader in that flushing will
> synchronize updates with IW in part by sharing the write lock. All
> methods of IR will be usable including reopen and clone. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1516) Integrate IndexReader with IndexWriter

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12675719#action_12675719 ] 

Michael McCandless commented on LUCENE-1516:
--------------------------------------------

Woops, sorry -- I downloaded the wrong patch.

> Integrate IndexReader with IndexWriter 
> ---------------------------------------
>
>                 Key: LUCENE-1516
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1516
>             Project: Lucene - Java
>          Issue Type: Improvement
>    Affects Versions: 2.4
>            Reporter: Jason Rutherglen
>            Assignee: Michael McCandless
>            Priority: Minor
>             Fix For: 2.9
>
>         Attachments: LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> The current problem is an IndexReader and IndexWriter cannot be open
> at the same time and perform updates as they both require a write
> lock to the index. While methods such as IW.deleteDocuments enables
> deleting from IW, methods such as IR.deleteDocument(int doc) and
> norms updating are not available from IW. This limits the
> capabilities of performing updates to the index dynamically or in
> realtime without closing the IW and opening an IR, deleting or
> updating norms, flushing, then opening the IW again, a process which
> can be detrimental to realtime updates. 
> This patch will expose an IndexWriter.getReader method that returns
> the currently flushed state of the index as a class that implements
> IndexReader. The new IR implementation will differ from existing IR
> implementations such as MultiSegmentReader in that flushing will
> synchronize updates with IW in part by sharing the write lock. All
> methods of IR will be usable including reopen and clone. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Updated: (LUCENE-1516) Integrate IndexReader with IndexWriter

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-1516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Michael McCandless updated LUCENE-1516:
---------------------------------------

    Attachment: LUCENE-1516.patch

New patch:  Fixed a few small issues... and made some changes to
contrib/benchmark to help in running more realistic near real-time
tests:

  * Fixed LineDocMaker to properly set "docid" primary key field.

  * Added UpdateDocTask that calls IndexWriter.updateDocument,
    randomly picking a docid.

  * Added NearRealTimeReader task, that creates BG thread that every N
    seconds opens a reader, runs a static search and prints results.

This patch also contains patch from LUCENE-1579.

So, using this you can 1) create a large index, 2) create an alg that
does doc updates at a fixed rate and then tests the near real-time
reader performance.


> Integrate IndexReader with IndexWriter 
> ---------------------------------------
>
>                 Key: LUCENE-1516
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1516
>             Project: Lucene - Java
>          Issue Type: Improvement
>    Affects Versions: 2.4
>            Reporter: Jason Rutherglen
>            Assignee: Michael McCandless
>            Priority: Minor
>             Fix For: 2.9
>
>         Attachments: LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, magnetic.png, ssd.png
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> The current problem is an IndexReader and IndexWriter cannot be open
> at the same time and perform updates as they both require a write
> lock to the index. While methods such as IW.deleteDocuments enables
> deleting from IW, methods such as IR.deleteDocument(int doc) and
> norms updating are not available from IW. This limits the
> capabilities of performing updates to the index dynamically or in
> realtime without closing the IW and opening an IR, deleting or
> updating norms, flushing, then opening the IW again, a process which
> can be detrimental to realtime updates. 
> This patch will expose an IndexWriter.getReader method that returns
> the currently flushed state of the index as a class that implements
> IndexReader. The new IR implementation will differ from existing IR
> implementations such as MultiSegmentReader in that flushing will
> synchronize updates with IW in part by sharing the write lock. All
> methods of IR will be usable including reopen and clone. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1516) Integrate IndexReader with IndexWriter

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12675394#action_12675394 ] 

Michael McCandless commented on LUCENE-1516:
--------------------------------------------


{quote}
maybe we don't need all those variants? String, File and Directory are all easily enough collapsed down to just Directory.
{code}
new IndexWriter(new Directory(indexFile));
{code}
{quote}

(You'd presumably need to close that Directory).  But, yeah, we may be
able to drop some of them, although I do think they are convenient for
new users of Lucene.  And forcing users to switch to a totally new yet
pass through API on ugprade, but not giving them one to one carryover,
is not very nice.

bq. Additionally, there are no more variants than there already are on the IW and IR, right?

Right, I'm just saying IndexAccessor will have many methods.  And then
you're asking every app to make this switch, on upgrade.  It's alot of
API swapping/noise vs a single added expert method to IW.

{quote}
As for pass-through or not, I think it would just pass-through, at
least initially, but it certainly leaves open the possibility for
reference counting in the future if someone wants to implement that.
{quote}

If we think it'll be more than just pass through, we should try to
hash out, somewhat, what it will & won't do up front (changing it
later is a big change)?  And we should start from LUCENE-390.

{quote}
As someone who teaches people these APIs on a regular basis, I feel
pretty confident in saying that adding an IR to the IW as a public API
is going to confuse a good chunk of people just as the delete stuff on
the IR currently does now.
{quote}

But this will be an expert/advanced API, a single added method to IW.
I wouldn't expect users to be confused: on upgrade I think most users
will not even notice its existence!

bq. You wouldn't ask FileWriter for a FileReader, would you?

I'm not sure that's the right comparison -- Lucene's IW does far more
than a FileWriter.  And the fact that Lucene allows "point in time"
searching (which is very useful and rather unique) is a very big
difference vs FileReader/Writer.

{quote}
Likewise, isn't it just as logical to ask for an IW from an IR?
{quote}

I don't think so: the functionality is not symmetric, because Lucene
allows only one writer open at a time, but many readers (eg on
different commits).  Since a writer is the one making changes, it
makes sense that you'd ask it, right now, to give you a reader
reflecting all changes up to that point.  And call it again later to
get a reader seeing changes after that, etc.

> Integrate IndexReader with IndexWriter 
> ---------------------------------------
>
>                 Key: LUCENE-1516
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1516
>             Project: Lucene - Java
>          Issue Type: Improvement
>    Affects Versions: 2.4
>            Reporter: Jason Rutherglen
>            Priority: Minor
>             Fix For: 2.9
>
>         Attachments: LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> The current problem is an IndexReader and IndexWriter cannot be open
> at the same time and perform updates as they both require a write
> lock to the index. While methods such as IW.deleteDocuments enables
> deleting from IW, methods such as IR.deleteDocument(int doc) and
> norms updating are not available from IW. This limits the
> capabilities of performing updates to the index dynamically or in
> realtime without closing the IW and opening an IR, deleting or
> updating norms, flushing, then opening the IW again, a process which
> can be detrimental to realtime updates. 
> This patch will expose an IndexWriter.getReader method that returns
> the currently flushed state of the index as a class that implements
> IndexReader. The new IR implementation will differ from existing IR
> implementations such as MultiSegmentReader in that flushing will
> synchronize updates with IW in part by sharing the write lock. All
> methods of IR will be usable including reopen and clone. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1516) Integrate IndexReader with IndexWriter

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12682720#action_12682720 ] 

Michael McCandless commented on LUCENE-1516:
--------------------------------------------

OK, don't worry about cleaning things up -- I'll start from what's here now.  Thanks!

> Integrate IndexReader with IndexWriter 
> ---------------------------------------
>
>                 Key: LUCENE-1516
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1516
>             Project: Lucene - Java
>          Issue Type: Improvement
>    Affects Versions: 2.4
>            Reporter: Jason Rutherglen
>            Assignee: Michael McCandless
>            Priority: Minor
>             Fix For: 2.9
>
>         Attachments: LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> The current problem is an IndexReader and IndexWriter cannot be open
> at the same time and perform updates as they both require a write
> lock to the index. While methods such as IW.deleteDocuments enables
> deleting from IW, methods such as IR.deleteDocument(int doc) and
> norms updating are not available from IW. This limits the
> capabilities of performing updates to the index dynamically or in
> realtime without closing the IW and opening an IR, deleting or
> updating norms, flushing, then opening the IW again, a process which
> can be detrimental to realtime updates. 
> This patch will expose an IndexWriter.getReader method that returns
> the currently flushed state of the index as a class that implements
> IndexReader. The new IR implementation will differ from existing IR
> implementations such as MultiSegmentReader in that flushing will
> synchronize updates with IW in part by sharing the write lock. All
> methods of IR will be usable including reopen and clone. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1516) Integrate IndexReader with IndexWriter

Posted by "Jason Rutherglen (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12676811#action_12676811 ] 

Jason Rutherglen commented on LUCENE-1516:
------------------------------------------

{quote}
I think we should change commitMergedDeletes to use the
already computed doc ID maps, instead of loading the old deletes.
{quote}

Where do I get the already computed doc ID maps?  

{quote}
I think getReader does need to flush, else added docs won't be
seen in the reopend reader. You should call flush(true, true,
true) - because you want to triggerMerges, flushDocStores,
and flushDeletes.
{quote}

I added a flushDeletesToDir flag that defaults to true except for IW.getReader.



> Integrate IndexReader with IndexWriter 
> ---------------------------------------
>
>                 Key: LUCENE-1516
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1516
>             Project: Lucene - Java
>          Issue Type: Improvement
>    Affects Versions: 2.4
>            Reporter: Jason Rutherglen
>            Assignee: Michael McCandless
>            Priority: Minor
>             Fix For: 2.9
>
>         Attachments: LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> The current problem is an IndexReader and IndexWriter cannot be open
> at the same time and perform updates as they both require a write
> lock to the index. While methods such as IW.deleteDocuments enables
> deleting from IW, methods such as IR.deleteDocument(int doc) and
> norms updating are not available from IW. This limits the
> capabilities of performing updates to the index dynamically or in
> realtime without closing the IW and opening an IR, deleting or
> updating norms, flushing, then opening the IW again, a process which
> can be detrimental to realtime updates. 
> This patch will expose an IndexWriter.getReader method that returns
> the currently flushed state of the index as a class that implements
> IndexReader. The new IR implementation will differ from existing IR
> implementations such as MultiSegmentReader in that flushing will
> synchronize updates with IW in part by sharing the write lock. All
> methods of IR will be usable including reopen and clone. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Issue Comment Edited: (LUCENE-1516) Integrate IndexReader with IndexWriter

Posted by "Jason Rutherglen (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12675523#action_12675523 ] 

jasonrutherglen edited comment on LUCENE-1516 at 2/20/09 5:42 PM:
-------------------------------------------------------------------

There's concurrency issues to work out.

- IW.getReader returns a cloned read only reader 
- Removed IW.reopenReader 
- All test methods pass except testAddIndexesAndDoDeletesThreads. testAddIndexesAndDoDeletesThreads
currently merges indexes concurrently (and fails). In the future the
method will test merging, deleting, and searching concurrently. 
- Concurrent merges fail when "ant test-core" is run 
- DocumentsWriter.applyDeletes deletes again at the SegmentReader level



      was (Author: jasonrutherglen):
    There's concurrency issues to work out.

- IW.getReader returns a cloned read only reader 
- Removed IW.reopenReader 
- All test methods pass except testAddIndexesAndDoDeletesThreads. testAddIndexesAndDoDeletesThreads
currently merges indexes concurrently (and fails). In the future the
method will test merging, deleting, and searching concurrently. 
- Concurrent merges fail when "ant test-core" is run 
- DocumentsWriter.applyDeletes deletes again at the SegmentReader level
- What happens when a user executes IR.clone(true) on a read only
reader? 

  
> Integrate IndexReader with IndexWriter 
> ---------------------------------------
>
>                 Key: LUCENE-1516
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1516
>             Project: Lucene - Java
>          Issue Type: Improvement
>    Affects Versions: 2.4
>            Reporter: Jason Rutherglen
>            Assignee: Michael McCandless
>            Priority: Minor
>             Fix For: 2.9
>
>         Attachments: LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> The current problem is an IndexReader and IndexWriter cannot be open
> at the same time and perform updates as they both require a write
> lock to the index. While methods such as IW.deleteDocuments enables
> deleting from IW, methods such as IR.deleteDocument(int doc) and
> norms updating are not available from IW. This limits the
> capabilities of performing updates to the index dynamically or in
> realtime without closing the IW and opening an IR, deleting or
> updating norms, flushing, then opening the IW again, a process which
> can be detrimental to realtime updates. 
> This patch will expose an IndexWriter.getReader method that returns
> the currently flushed state of the index as a class that implements
> IndexReader. The new IR implementation will differ from existing IR
> implementations such as MultiSegmentReader in that flushing will
> synchronize updates with IW in part by sharing the write lock. All
> methods of IR will be usable including reopen and clone. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1516) Integrate IndexReader with IndexWriter

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12676652#action_12676652 ] 

Michael McCandless commented on LUCENE-1516:
--------------------------------------------

Jason could you rebase the patch on current trunk?  (Or post the revision you're on?).

> Integrate IndexReader with IndexWriter 
> ---------------------------------------
>
>                 Key: LUCENE-1516
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1516
>             Project: Lucene - Java
>          Issue Type: Improvement
>    Affects Versions: 2.4
>            Reporter: Jason Rutherglen
>            Assignee: Michael McCandless
>            Priority: Minor
>             Fix For: 2.9
>
>         Attachments: LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> The current problem is an IndexReader and IndexWriter cannot be open
> at the same time and perform updates as they both require a write
> lock to the index. While methods such as IW.deleteDocuments enables
> deleting from IW, methods such as IR.deleteDocument(int doc) and
> norms updating are not available from IW. This limits the
> capabilities of performing updates to the index dynamically or in
> realtime without closing the IW and opening an IR, deleting or
> updating norms, flushing, then opening the IW again, a process which
> can be detrimental to realtime updates. 
> This patch will expose an IndexWriter.getReader method that returns
> the currently flushed state of the index as a class that implements
> IndexReader. The new IR implementation will differ from existing IR
> implementations such as MultiSegmentReader in that flushing will
> synchronize updates with IW in part by sharing the write lock. All
> methods of IR will be usable including reopen and clone. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1516) Integrate IndexReader with IndexWriter

Posted by "Jason Rutherglen (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12725779#action_12725779 ] 

Jason Rutherglen commented on LUCENE-1516:
------------------------------------------

I think we will want to do something like what field cache does
with CreationPlaceholder for IndexWriter.readerPool. Otherwise
we have the (I think somewhat problematic) issue of all other
readerPool.get* methods waiting for an SR to warm. 

{code}
  public synchronized SegmentReader get(SegmentInfo info, boolean doOpenStores, int readBufferSize) throws IOException {
      if (poolReaders) {
        readBufferSize = BufferedIndexInput.BUFFER_SIZE;
      }

      SegmentReader sr = (SegmentReader) readerMap.get(info);
      if (sr == null) {
        // TODO: we may want to avoid doing this while
        // synchronized
        // Returns a ref, which we xfer to readerMap:
        sr = SegmentReader.get(info, readBufferSize, doOpenStores);
        readerMap.put(info, sr);
      } else if (doOpenStores) {
        sr.openDocStores();
      }

      // Return a ref to our caller
      sr.incRef();
      return sr;
    }
{code}

> Integrate IndexReader with IndexWriter 
> ---------------------------------------
>
>                 Key: LUCENE-1516
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1516
>             Project: Lucene - Java
>          Issue Type: Improvement
>    Affects Versions: 2.4
>            Reporter: Jason Rutherglen
>            Assignee: Michael McCandless
>            Priority: Minor
>             Fix For: 2.9
>
>         Attachments: LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, magnetic.png, ssd.png, ssd2.png
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> The current problem is an IndexReader and IndexWriter cannot be open
> at the same time and perform updates as they both require a write
> lock to the index. While methods such as IW.deleteDocuments enables
> deleting from IW, methods such as IR.deleteDocument(int doc) and
> norms updating are not available from IW. This limits the
> capabilities of performing updates to the index dynamically or in
> realtime without closing the IW and opening an IR, deleting or
> updating norms, flushing, then opening the IW again, a process which
> can be detrimental to realtime updates. 
> This patch will expose an IndexWriter.getReader method that returns
> the currently flushed state of the index as a class that implements
> IndexReader. The new IR implementation will differ from existing IR
> implementations such as MultiSegmentReader in that flushing will
> synchronize updates with IW in part by sharing the write lock. All
> methods of IR will be usable including reopen and clone. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Updated: (LUCENE-1516) Integrate IndexReader with IndexWriter

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-1516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Michael McCandless updated LUCENE-1516:
---------------------------------------

    Attachment: LUCENE-1516.patch

Attached patch: added more test cases, and fixed the near real-time reader to NOT try to open segments in a foreign Directory (which happens when addIndexes* is running).

> Integrate IndexReader with IndexWriter 
> ---------------------------------------
>
>                 Key: LUCENE-1516
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1516
>             Project: Lucene - Java
>          Issue Type: Improvement
>    Affects Versions: 2.4
>            Reporter: Jason Rutherglen
>            Assignee: Michael McCandless
>            Priority: Minor
>             Fix For: 2.9
>
>         Attachments: LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, magnetic.png, ssd.png, ssd2.png
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> The current problem is an IndexReader and IndexWriter cannot be open
> at the same time and perform updates as they both require a write
> lock to the index. While methods such as IW.deleteDocuments enables
> deleting from IW, methods such as IR.deleteDocument(int doc) and
> norms updating are not available from IW. This limits the
> capabilities of performing updates to the index dynamically or in
> realtime without closing the IW and opening an IR, deleting or
> updating norms, flushing, then opening the IW again, a process which
> can be detrimental to realtime updates. 
> This patch will expose an IndexWriter.getReader method that returns
> the currently flushed state of the index as a class that implements
> IndexReader. The new IR implementation will differ from existing IR
> implementations such as MultiSegmentReader in that flushing will
> synchronize updates with IW in part by sharing the write lock. All
> methods of IR will be usable including reopen and clone. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Updated: (LUCENE-1516) Integrate IndexReader with IndexWriter

Posted by "Jason Rutherglen (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-1516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jason Rutherglen updated LUCENE-1516:
-------------------------------------

    Attachment: LUCENE-1516.patch

Some terminology for this patch, an internal reader is the
IndexWriter's reader. An external reader given via the
IW.reopenReader method. 

If DirectoryIndexReader has a writer, no lock is acquired on updates.
IR.clone normally passes the writeLock to the new reader, however the
external IR and the IW internal IR both need to hold the write lock.
For this reason the user must be careful when flushing to insure the
proper instance of the IR's deletes are merged with the writer. 

The external IR.flush does not flush the deletes to disk, instead it
merges with the IW's internal IR which is in RAM. IW.flush causes
deletes and new segments to be flushed to the directory.

The test cases from TestIndexWriterReader testIndexWriterDeletes and
testIndexWriterReopenSegment fail when the IW is opened again after
commit and close. The index files are being deleted during IW.commit.
I traced this to IW.finishCommit -> deleter.checkpoint ->
deleter.deletePendingFiles. 





> Integrate IndexReader with IndexWriter 
> ---------------------------------------
>
>                 Key: LUCENE-1516
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1516
>             Project: Lucene - Java
>          Issue Type: Improvement
>    Affects Versions: 2.4
>            Reporter: Jason Rutherglen
>            Priority: Minor
>             Fix For: 2.9
>
>         Attachments: LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> The current problem is an IndexReader and IndexWriter cannot be open
> at the same time and perform updates as they both require a write
> lock to the index. While methods such as IW.deleteDocuments enables
> deleting from IW, methods such as IR.deleteDocument(int doc) and
> norms updating are not available from IW. This limits the
> capabilities of performing updates to the index dynamically or in
> realtime without closing the IW and opening an IR, deleting or
> updating norms, flushing, then opening the IW again, a process which
> can be detrimental to realtime updates. 
> This patch will expose an IndexWriter.getReader method that returns
> the currently flushed state of the index as a class that implements
> IndexReader. The new IR implementation will differ from existing IR
> implementations such as MultiSegmentReader in that flushing will
> synchronize updates with IW in part by sharing the write lock. All
> methods of IR will be usable including reopen and clone. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Updated: (LUCENE-1516) Integrate IndexReader with IndexWriter

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-1516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Michael McCandless updated LUCENE-1516:
---------------------------------------

    Attachment: LUCENE-1516.patch

Added another test case to TestIndexWriterReader, stress testing adding/deleting docs while constant opening near real-time reader.

> Integrate IndexReader with IndexWriter 
> ---------------------------------------
>
>                 Key: LUCENE-1516
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1516
>             Project: Lucene - Java
>          Issue Type: Improvement
>    Affects Versions: 2.4
>            Reporter: Jason Rutherglen
>            Assignee: Michael McCandless
>            Priority: Minor
>             Fix For: 2.9
>
>         Attachments: LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, magnetic.png, ssd.png, ssd2.png
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> The current problem is an IndexReader and IndexWriter cannot be open
> at the same time and perform updates as they both require a write
> lock to the index. While methods such as IW.deleteDocuments enables
> deleting from IW, methods such as IR.deleteDocument(int doc) and
> norms updating are not available from IW. This limits the
> capabilities of performing updates to the index dynamically or in
> realtime without closing the IW and opening an IR, deleting or
> updating norms, flushing, then opening the IW again, a process which
> can be detrimental to realtime updates. 
> This patch will expose an IndexWriter.getReader method that returns
> the currently flushed state of the index as a class that implements
> IndexReader. The new IR implementation will differ from existing IR
> implementations such as MultiSegmentReader in that flushing will
> synchronize updates with IW in part by sharing the write lock. All
> methods of IR will be usable including reopen and clone. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1516) Integrate IndexReader with IndexWriter

Posted by "Jason Rutherglen (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12678525#action_12678525 ] 

Jason Rutherglen commented on LUCENE-1516:
------------------------------------------

The IndexFileDeleter$RefCount.DecRef exception is related to an incorrect ref count of .del files when TestConcurrentMergeScheduler is run.

> Integrate IndexReader with IndexWriter 
> ---------------------------------------
>
>                 Key: LUCENE-1516
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1516
>             Project: Lucene - Java
>          Issue Type: Improvement
>    Affects Versions: 2.4
>            Reporter: Jason Rutherglen
>            Assignee: Michael McCandless
>            Priority: Minor
>             Fix For: 2.9
>
>         Attachments: LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> The current problem is an IndexReader and IndexWriter cannot be open
> at the same time and perform updates as they both require a write
> lock to the index. While methods such as IW.deleteDocuments enables
> deleting from IW, methods such as IR.deleteDocument(int doc) and
> norms updating are not available from IW. This limits the
> capabilities of performing updates to the index dynamically or in
> realtime without closing the IW and opening an IR, deleting or
> updating norms, flushing, then opening the IW again, a process which
> can be detrimental to realtime updates. 
> This patch will expose an IndexWriter.getReader method that returns
> the currently flushed state of the index as a class that implements
> IndexReader. The new IR implementation will differ from existing IR
> implementations such as MultiSegmentReader in that flushing will
> synchronize updates with IW in part by sharing the write lock. All
> methods of IR will be usable including reopen and clone. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Updated: (LUCENE-1516) Integrate IndexReader with IndexWriter

Posted by "Jason Rutherglen (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-1516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jason Rutherglen updated LUCENE-1516:
-------------------------------------

    Attachment: LUCENE-1516.patch

- The patch is not committable as the debugging and comments need cleaning
- The clone issue was hacked around so more tests pass
- TestAtomicUpdate fails

> Integrate IndexReader with IndexWriter 
> ---------------------------------------
>
>                 Key: LUCENE-1516
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1516
>             Project: Lucene - Java
>          Issue Type: Improvement
>    Affects Versions: 2.4
>            Reporter: Jason Rutherglen
>            Assignee: Michael McCandless
>            Priority: Minor
>             Fix For: 2.9
>
>         Attachments: LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> The current problem is an IndexReader and IndexWriter cannot be open
> at the same time and perform updates as they both require a write
> lock to the index. While methods such as IW.deleteDocuments enables
> deleting from IW, methods such as IR.deleteDocument(int doc) and
> norms updating are not available from IW. This limits the
> capabilities of performing updates to the index dynamically or in
> realtime without closing the IW and opening an IR, deleting or
> updating norms, flushing, then opening the IW again, a process which
> can be detrimental to realtime updates. 
> This patch will expose an IndexWriter.getReader method that returns
> the currently flushed state of the index as a class that implements
> IndexReader. The new IR implementation will differ from existing IR
> implementations such as MultiSegmentReader in that flushing will
> synchronize updates with IW in part by sharing the write lock. All
> methods of IR will be usable including reopen and clone. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1516) Integrate IndexReader with IndexWriter

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12676749#action_12676749 ] 

Michael McCandless commented on LUCENE-1516:
--------------------------------------------

Looks good!:

  * The SegmentReaderPool class should be package private.

  * You're still passing readOnly & closeDirectory into the new
    DirectoryIndexReader ctor; I think you can pass in only the
    writer, and then it can get all it needs from that (pool, dir,
    segmentInfos)?

  * Hmm... I think we should change commitMergedDeletes to use the
    already computed doc ID maps, instead of loading the old deletes.
    This way you don't need the old SR.

  * I think you don't need SegmentInfoKey?  Can't you key off of just
    the SegmentInfo?  I think for each segment name we only ever need
    one SR instance in the pool (dir is always the same)?  (The
    readers we give out will make clones of these).

  * Why do you applyDeletesImmediately?  I think we should
    buffer/flush like we normally do?  That code shouldn't need to
    change.

  * I think getReader does need to flush, else added docs won't be
    seen in the reopend reader.  You should call flush(true, true,
    true) -- because you want to triggerMerges, flushDocStores,
    and flushDeletes.

  * Are segmentInfosList/printSegmentInfosList temporary debugging?
    Can you stick "nocommits" on them?

For SegmentReaderPool:

  * I think IW should always use the pool API internally to load &
    decRef the SRs it uses.

  * If getReader has never been called, then the pool should not hold
    refs, ie IW should behave exactly as it does today (get SR, use
    it, commit it, close it).  Ie, the pool should detect on decRef
    that the RC is now 1, and release the SR.

  * If getReader has been called, then the pool should hold onto an SR
    as long as the writer's segmentInfos still references that
    segment.  This means, after a merge commits we should prune the
    pool.  Ie, we decRef and remove the SR from the pool, but we don't
    close it since a reader may still be using it.  Maybe add a
    prune() method?

  * Somehow, we should not let the SR commit itself on close -- that's
    up to IW to decide.  If getReader has not been called then IW must
    commit deletes as soon as applies them (like today).  Else, it's
    only on closing the pool that deletes are committed.  EG pending
    deletes on SRs that have been merged can be freely discarded,
    since those deletes "made it" into the newly merged segment.

  * So I don't think we need a BG thread to do the culling.


> Integrate IndexReader with IndexWriter 
> ---------------------------------------
>
>                 Key: LUCENE-1516
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1516
>             Project: Lucene - Java
>          Issue Type: Improvement
>    Affects Versions: 2.4
>            Reporter: Jason Rutherglen
>            Assignee: Michael McCandless
>            Priority: Minor
>             Fix For: 2.9
>
>         Attachments: LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> The current problem is an IndexReader and IndexWriter cannot be open
> at the same time and perform updates as they both require a write
> lock to the index. While methods such as IW.deleteDocuments enables
> deleting from IW, methods such as IR.deleteDocument(int doc) and
> norms updating are not available from IW. This limits the
> capabilities of performing updates to the index dynamically or in
> realtime without closing the IW and opening an IR, deleting or
> updating norms, flushing, then opening the IW again, a process which
> can be detrimental to realtime updates. 
> This patch will expose an IndexWriter.getReader method that returns
> the currently flushed state of the index as a class that implements
> IndexReader. The new IR implementation will differ from existing IR
> implementations such as MultiSegmentReader in that flushing will
> synchronize updates with IW in part by sharing the write lock. All
> methods of IR will be usable including reopen and clone. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Updated: (LUCENE-1516) Integrate IndexReader with IndexWriter

Posted by "Jason Rutherglen (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-1516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jason Rutherglen updated LUCENE-1516:
-------------------------------------

    Attachment: LUCENE-1516.patch

* TestConcurrentMergeScheduler is passing

* A handful of other tests don't pass, looking into them next

* DirectoryIndexReader.open with a writer isn't necessary so the
related method is removed

* I'm not sure we need to write out the deletes of unused segment
readers because they are no longer used and so should not be
required. I realized the unused readers needs to also check with
the deletion policy?

* The TestConcurrentMergeScheduler. testNoWaitClose exception is
caused by the way .del files are created when SR.commitChanges is
called, altering the SR segmentinfo without IFD.incRefing the .del
file. The solution, manually incref the .del file after the pool
commits on the SRs. Before this process was handled transparently by
IW.checkpoint. Also the other issue was the SegmentInfo files
variable was created before the delGen change, then the .del file was
not being returned by SI.files(). With the .del file being returned
the IFD related exception went away.

* SegmentInfoKey is removed, SegmentInfo is used as the key for
SegmentReaderPool.get

> You commented out the last part of commitMergedDeletes, that
actually saves the deletes. You need to instead get the reader for
the merged segment from the pool and hand it the new deletes.

I wasn't sure what you meant by this, in the patch the deletes are
copied into the merged reader. Do you mean instead the merged reader
should not be opened and instead the deletes file needs to be written
to?

> We lost the cutover to using the already computed docMap in
commitMergedDeletes?

SegmentMerger wasn't always returning the docMap so I stopped using
it.  

> Integrate IndexReader with IndexWriter 
> ---------------------------------------
>
>                 Key: LUCENE-1516
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1516
>             Project: Lucene - Java
>          Issue Type: Improvement
>    Affects Versions: 2.4
>            Reporter: Jason Rutherglen
>            Assignee: Michael McCandless
>            Priority: Minor
>             Fix For: 2.9
>
>         Attachments: LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> The current problem is an IndexReader and IndexWriter cannot be open
> at the same time and perform updates as they both require a write
> lock to the index. While methods such as IW.deleteDocuments enables
> deleting from IW, methods such as IR.deleteDocument(int doc) and
> norms updating are not available from IW. This limits the
> capabilities of performing updates to the index dynamically or in
> realtime without closing the IW and opening an IR, deleting or
> updating norms, flushing, then opening the IW again, a process which
> can be detrimental to realtime updates. 
> This patch will expose an IndexWriter.getReader method that returns
> the currently flushed state of the index as a class that implements
> IndexReader. The new IR implementation will differ from existing IR
> implementations such as MultiSegmentReader in that flushing will
> synchronize updates with IW in part by sharing the write lock. All
> methods of IR will be usable including reopen and clone. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1516) Integrate IndexReader with IndexWriter

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12695306#action_12695306 ] 

Michael McCandless commented on LUCENE-1516:
--------------------------------------------

Good catch!  I'll fix.

> Integrate IndexReader with IndexWriter 
> ---------------------------------------
>
>                 Key: LUCENE-1516
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1516
>             Project: Lucene - Java
>          Issue Type: Improvement
>    Affects Versions: 2.4
>            Reporter: Jason Rutherglen
>            Assignee: Michael McCandless
>            Priority: Minor
>             Fix For: 2.9
>
>         Attachments: LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, magnetic.png, ssd.png, ssd2.png
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> The current problem is an IndexReader and IndexWriter cannot be open
> at the same time and perform updates as they both require a write
> lock to the index. While methods such as IW.deleteDocuments enables
> deleting from IW, methods such as IR.deleteDocument(int doc) and
> norms updating are not available from IW. This limits the
> capabilities of performing updates to the index dynamically or in
> realtime without closing the IW and opening an IR, deleting or
> updating norms, flushing, then opening the IW again, a process which
> can be detrimental to realtime updates. 
> This patch will expose an IndexWriter.getReader method that returns
> the currently flushed state of the index as a class that implements
> IndexReader. The new IR implementation will differ from existing IR
> implementations such as MultiSegmentReader in that flushing will
> synchronize updates with IW in part by sharing the write lock. All
> methods of IR will be usable including reopen and clone. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1516) Integrate IndexReader with IndexWriter

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12679576#action_12679576 ] 

Michael McCandless commented on LUCENE-1516:
--------------------------------------------

{quote}
TestIndexReader.testDocsOutOfOrderJIRA140 fails because IW.close
isn't called before dir.close. Is this a bug in the unit test?
{quote}
Yes, this looks like a bug in the test.  This also means when we eventually commit this, we'll have to first fix that bug on the back-compat tests branch.

{quote}
TestIndexWriterDelete.testOperationsOnDiskFull fails with
MockRAMDirectory.close (still open files) because IW.close isn't
called.
{quote}
I don't understand: that test looks like it does call IW.close for all IW's opened?  (It's a little tricky, because modifier.close gets called the 2nd time the for(int x = 0...) loop runs).

{quote}
TestIndexWriter.testImmediateDiskFullWithThreads fails because
IW.close fails on the disk full exception. Should IW.closeInternal ->
SegmentReaderPool.close be placed in the finally clause?
{quote}
Does the 2nd call to close (close(false)) also hit an exception?  Perhaps, modify the test so that if that 2nd close hits an exception, call abort?

It's good that you're down to mainly the exceptions-based test failures... though I think you should focus more on the bigger structural changes to the approach (eg switching back to docMap for merging deletes, adding SegmentReaderPool.release, which should write changes to the dir & calling it from all places that do a .get(), and the other comments above) before trying to get all tests to pass.

> Integrate IndexReader with IndexWriter 
> ---------------------------------------
>
>                 Key: LUCENE-1516
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1516
>             Project: Lucene - Java
>          Issue Type: Improvement
>    Affects Versions: 2.4
>            Reporter: Jason Rutherglen
>            Assignee: Michael McCandless
>            Priority: Minor
>             Fix For: 2.9
>
>         Attachments: LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> The current problem is an IndexReader and IndexWriter cannot be open
> at the same time and perform updates as they both require a write
> lock to the index. While methods such as IW.deleteDocuments enables
> deleting from IW, methods such as IR.deleteDocument(int doc) and
> norms updating are not available from IW. This limits the
> capabilities of performing updates to the index dynamically or in
> realtime without closing the IW and opening an IR, deleting or
> updating norms, flushing, then opening the IW again, a process which
> can be detrimental to realtime updates. 
> This patch will expose an IndexWriter.getReader method that returns
> the currently flushed state of the index as a class that implements
> IndexReader. The new IR implementation will differ from existing IR
> implementations such as MultiSegmentReader in that flushing will
> synchronize updates with IW in part by sharing the write lock. All
> methods of IR will be usable including reopen and clone. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1516) Integrate IndexReader with IndexWriter

Posted by "Jason Rutherglen (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12695185#action_12695185 ] 

Jason Rutherglen commented on LUCENE-1516:
------------------------------------------

In ReaderPool.get(SegmentInfo info, boolean doOpenStores, int readBufferSize) the readBufferSize needs to be passed into SegmentReader.get

> Integrate IndexReader with IndexWriter 
> ---------------------------------------
>
>                 Key: LUCENE-1516
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1516
>             Project: Lucene - Java
>          Issue Type: Improvement
>    Affects Versions: 2.4
>            Reporter: Jason Rutherglen
>            Assignee: Michael McCandless
>            Priority: Minor
>             Fix For: 2.9
>
>         Attachments: LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, magnetic.png, ssd.png, ssd2.png
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> The current problem is an IndexReader and IndexWriter cannot be open
> at the same time and perform updates as they both require a write
> lock to the index. While methods such as IW.deleteDocuments enables
> deleting from IW, methods such as IR.deleteDocument(int doc) and
> norms updating are not available from IW. This limits the
> capabilities of performing updates to the index dynamically or in
> realtime without closing the IW and opening an IR, deleting or
> updating norms, flushing, then opening the IW again, a process which
> can be detrimental to realtime updates. 
> This patch will expose an IndexWriter.getReader method that returns
> the currently flushed state of the index as a class that implements
> IndexReader. The new IR implementation will differ from existing IR
> implementations such as MultiSegmentReader in that flushing will
> synchronize updates with IW in part by sharing the write lock. All
> methods of IR will be usable including reopen and clone. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Updated: (LUCENE-1516) Integrate IndexReader with IndexWriter

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-1516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Michael McCandless updated LUCENE-1516:
---------------------------------------

    Attachment: LUCENE-1516.patch

New patch: sync'd to trunk, cleaned up the nocommits (some fixing, some turned into TODOs).

I'd like to add some more tests, but otherwise I think this is ready to commit.

> Integrate IndexReader with IndexWriter 
> ---------------------------------------
>
>                 Key: LUCENE-1516
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1516
>             Project: Lucene - Java
>          Issue Type: Improvement
>    Affects Versions: 2.4
>            Reporter: Jason Rutherglen
>            Assignee: Michael McCandless
>            Priority: Minor
>             Fix For: 2.9
>
>         Attachments: LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, magnetic.png, ssd.png, ssd2.png
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> The current problem is an IndexReader and IndexWriter cannot be open
> at the same time and perform updates as they both require a write
> lock to the index. While methods such as IW.deleteDocuments enables
> deleting from IW, methods such as IR.deleteDocument(int doc) and
> norms updating are not available from IW. This limits the
> capabilities of performing updates to the index dynamically or in
> realtime without closing the IW and opening an IR, deleting or
> updating norms, flushing, then opening the IW again, a process which
> can be detrimental to realtime updates. 
> This patch will expose an IndexWriter.getReader method that returns
> the currently flushed state of the index as a class that implements
> IndexReader. The new IR implementation will differ from existing IR
> implementations such as MultiSegmentReader in that flushing will
> synchronize updates with IW in part by sharing the write lock. All
> methods of IR will be usable including reopen and clone. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1516) Integrate IndexReader with IndexWriter

Posted by "Jason Rutherglen (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12682707#action_12682707 ] 

Jason Rutherglen commented on LUCENE-1516:
------------------------------------------

Hi Mike, yes please!  I have not had time to work on it.  All my changes are in the last patch.  I can clean it up, but it would not be until later today.

> Integrate IndexReader with IndexWriter 
> ---------------------------------------
>
>                 Key: LUCENE-1516
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1516
>             Project: Lucene - Java
>          Issue Type: Improvement
>    Affects Versions: 2.4
>            Reporter: Jason Rutherglen
>            Assignee: Michael McCandless
>            Priority: Minor
>             Fix For: 2.9
>
>         Attachments: LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> The current problem is an IndexReader and IndexWriter cannot be open
> at the same time and perform updates as they both require a write
> lock to the index. While methods such as IW.deleteDocuments enables
> deleting from IW, methods such as IR.deleteDocument(int doc) and
> norms updating are not available from IW. This limits the
> capabilities of performing updates to the index dynamically or in
> realtime without closing the IW and opening an IR, deleting or
> updating norms, flushing, then opening the IW again, a process which
> can be detrimental to realtime updates. 
> This patch will expose an IndexWriter.getReader method that returns
> the currently flushed state of the index as a class that implements
> IndexReader. The new IR implementation will differ from existing IR
> implementations such as MultiSegmentReader in that flushing will
> synchronize updates with IW in part by sharing the write lock. All
> methods of IR will be usable including reopen and clone. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Updated: (LUCENE-1516) Integrate IndexReader with IndexWriter

Posted by "Jason Rutherglen (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-1516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jason Rutherglen updated LUCENE-1516:
-------------------------------------

    Attachment: LUCENE-1516.patch

- Added SegmentReaderPool that maintains a map of SegmentInfoKey ->
SegmentReader. SegmentInfoKey returns a hashcode of a SegmentInfo
that also includes the deletes and norms generations.
SegmentReaderPool also maintains a map of SegmentReaders that
represent "working copies" (cloned readers) for realtime deletes
occurring. When IW.getReader is called the "working copy" readers
replace the existing readers keyed with the same segmentinfo in the
SegmentReaderPool. This way IW.deleteDocument applies to the in
memory cloned "working copies" but does not alter SegmentReaders
already returned from SegmentReaderPool. 
- The test methods in TestIndexWriterReader pass 
- I'm hesitant to automatically flush on IW.getReader as IW.flush has
a number of parameters. However for updateDocument to work properly
we'll probably need to do this? 
- DocumentsWriter.applyDeletes should no longer need to apply deletes
as they are already applied in realtime in IW.deleteDocument? It
should only need to call SR.commitChanges? 
- "ant test-core" fails in ConcurrentMergeScheduler with the same
exception as above in TestSnapshotDeletionPolicy (and some others). I
suspect in the case of TestSnapshotDeletionPolicy the errors could be
due to the SegmentReader holding files open while they are being
copied. Perhaps this would explain the EOFExceptions? 

> Integrate IndexReader with IndexWriter 
> ---------------------------------------
>
>                 Key: LUCENE-1516
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1516
>             Project: Lucene - Java
>          Issue Type: Improvement
>    Affects Versions: 2.4
>            Reporter: Jason Rutherglen
>            Assignee: Michael McCandless
>            Priority: Minor
>             Fix For: 2.9
>
>         Attachments: LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> The current problem is an IndexReader and IndexWriter cannot be open
> at the same time and perform updates as they both require a write
> lock to the index. While methods such as IW.deleteDocuments enables
> deleting from IW, methods such as IR.deleteDocument(int doc) and
> norms updating are not available from IW. This limits the
> capabilities of performing updates to the index dynamically or in
> realtime without closing the IW and opening an IR, deleting or
> updating norms, flushing, then opening the IW again, a process which
> can be detrimental to realtime updates. 
> This patch will expose an IndexWriter.getReader method that returns
> the currently flushed state of the index as a class that implements
> IndexReader. The new IR implementation will differ from existing IR
> implementations such as MultiSegmentReader in that flushing will
> synchronize updates with IW in part by sharing the write lock. All
> methods of IR will be usable including reopen and clone. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1516) Integrate IndexReader with IndexWriter

Posted by "Jason Rutherglen (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12675069#action_12675069 ] 

Jason Rutherglen commented on LUCENE-1516:
------------------------------------------

Mike, good points... 

{quote} since you can't predict when IW materializes deletes, your reader
will suddenly see a bunch of deletes appear.{quote}

The reader would need to be reopened to see the deletes. Isn't that
expected behavior?

{quote} Instead of having to merge readers, I think we need a single
source to obtain an SR from {quote}

I like this however how would IR.clone work? I like having the
internal reader separate from the external reader. The main reason to
expose IR from IW is to allow delete by doc id and norms updates
(eventually column stride fields updates). I don't see how we can
grab a reader during a merge, and block realtime deletes occurring on
the external reader. However it is difficult to rectify deletes to an
external SR that's been merged away. 

It seems like we're getting closer to using a unique long UID for
each doc that is carried over between merges. I was going to
implement this above LUCENE-1516 however we may want to make UIDs a
part of LUCENE-1516 to implement the behavior we're discussing. 

If the updates to SR are queued, then it seems like the only way to
achieve this is a doc UID. This way merges can happen in the
background, the IR has a mechanism for mapping it's queue to the
newly merged segments when flushed. Hopefully we aren't wreaking
havoc with the IndexReader API?

The scenario I think we're missing is if there's multiple cloned SRs
out there. With the IW checkout an SR model how do we allow cloning?
A clone's updates will be placed into a central original SR queue?
The queue is drained automatically on a merge or IW.flush? What
happens when we want the IR deletes to be searchable without flushing
to disk? Do a reopen/clone? 

bq. number of iterations before settling

Agreed, if it were simple it wouldn't be fun. ☺

{quote} It's not necessary for IW to write new .del files when it
materializes deletes.{quote}

Good point, DocumentsWriter.applyDeletes shouldn't be flushing to
disk and this sounds like a test case to add to TestIndexWriterReader.

{quote} IW.reopenInternalReader only does a clone not a reopen; however
does it cover the newly flushed segment? {quote}

The segmentinfos is obtained from the Writer. In the test case
testIndexWriterReopenSegment it looks like using clone reopens the
new segments.

{quote} I think it's better if no deletes appear, ever, until you reopen
your reader. Maybe we simply prevent deletion through the IR? {quote}

Preventing deletion through the IR would seem to defeat the purpose
of the patch unless there's some alternative mechanism for deleting
by doc id? 

{quote} commitMergedDeletes to decouple computing the new BitVector from
writing the .del file to disk.{quote}

A hidden method I never noticed. I'll keep it in mind.

{quote} It seems like reader.reopen() (where reader was obtained with
IW.getReader()) doesn't do the right thing? (ie it's looking for the
most recent segments_N in the Directory, but it should be looking for
it @ IW.segmentInfos).{quote}

Using the reopen method implementation for a Reader with IW does not
seem necessary. It seems like it could call clone underneath?



> Integrate IndexReader with IndexWriter 
> ---------------------------------------
>
>                 Key: LUCENE-1516
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1516
>             Project: Lucene - Java
>          Issue Type: Improvement
>    Affects Versions: 2.4
>            Reporter: Jason Rutherglen
>            Priority: Minor
>             Fix For: 2.9
>
>         Attachments: LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> The current problem is an IndexReader and IndexWriter cannot be open
> at the same time and perform updates as they both require a write
> lock to the index. While methods such as IW.deleteDocuments enables
> deleting from IW, methods such as IR.deleteDocument(int doc) and
> norms updating are not available from IW. This limits the
> capabilities of performing updates to the index dynamically or in
> realtime without closing the IW and opening an IR, deleting or
> updating norms, flushing, then opening the IW again, a process which
> can be detrimental to realtime updates. 
> This patch will expose an IndexWriter.getReader method that returns
> the currently flushed state of the index as a class that implements
> IndexReader. The new IR implementation will differ from existing IR
> implementations such as MultiSegmentReader in that flushing will
> synchronize updates with IW in part by sharing the write lock. All
> methods of IR will be usable including reopen and clone. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1516) Integrate IndexReader with IndexWriter

Posted by "Shai Erera (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12700414#action_12700414 ] 

Shai Erera commented on LUCENE-1516:
------------------------------------

look at extractWikipedia.alg, which uses the EnwikiDocMaker, the enwiki.xml file and the WriteLineDoc task. BTW, with LUCENE-1591 you can run this alg with bzip.compression=true and feed it with the .bz2 file rather than the xml.

> Integrate IndexReader with IndexWriter 
> ---------------------------------------
>
>                 Key: LUCENE-1516
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1516
>             Project: Lucene - Java
>          Issue Type: Improvement
>    Affects Versions: 2.4
>            Reporter: Jason Rutherglen
>            Assignee: Michael McCandless
>            Priority: Minor
>             Fix For: 2.9
>
>         Attachments: LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, magnetic.png, ssd.png, ssd2.png
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> The current problem is an IndexReader and IndexWriter cannot be open
> at the same time and perform updates as they both require a write
> lock to the index. While methods such as IW.deleteDocuments enables
> deleting from IW, methods such as IR.deleteDocument(int doc) and
> norms updating are not available from IW. This limits the
> capabilities of performing updates to the index dynamically or in
> realtime without closing the IW and opening an IR, deleting or
> updating norms, flushing, then opening the IW again, a process which
> can be detrimental to realtime updates. 
> This patch will expose an IndexWriter.getReader method that returns
> the currently flushed state of the index as a class that implements
> IndexReader. The new IR implementation will differ from existing IR
> implementations such as MultiSegmentReader in that flushing will
> synchronize updates with IW in part by sharing the write lock. All
> methods of IR will be usable including reopen and clone. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Updated: (LUCENE-1516) Integrate IndexReader with IndexWriter

Posted by "Jason Rutherglen (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-1516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jason Rutherglen updated LUCENE-1516:
-------------------------------------

    Attachment: LUCENE-1516.patch

The previously mentioned issue in the test case is fixed. Perhaps
IW.reopenInternalReader() should be called in IW.checkpoint? 

> Integrate IndexReader with IndexWriter 
> ---------------------------------------
>
>                 Key: LUCENE-1516
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1516
>             Project: Lucene - Java
>          Issue Type: Improvement
>    Affects Versions: 2.4
>            Reporter: Jason Rutherglen
>            Priority: Minor
>             Fix For: 2.9
>
>         Attachments: LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> The current problem is an IndexReader and IndexWriter cannot be open
> at the same time and perform updates as they both require a write
> lock to the index. While methods such as IW.deleteDocuments enables
> deleting from IW, methods such as IR.deleteDocument(int doc) and
> norms updating are not available from IW. This limits the
> capabilities of performing updates to the index dynamically or in
> realtime without closing the IW and opening an IR, deleting or
> updating norms, flushing, then opening the IW again, a process which
> can be detrimental to realtime updates. 
> This patch will expose an IndexWriter.getReader method that returns
> the currently flushed state of the index as a class that implements
> IndexReader. The new IR implementation will differ from existing IR
> implementations such as MultiSegmentReader in that flushing will
> synchronize updates with IW in part by sharing the write lock. All
> methods of IR will be usable including reopen and clone. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Updated: (LUCENE-1516) Integrate IndexReader with IndexWriter

Posted by "Jason Rutherglen (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-1516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jason Rutherglen updated LUCENE-1516:
-------------------------------------

    Attachment: LUCENE-1516.patch

* In TestConcurrentMergeScheduler.testNoWaitClose I'm seeing a couple
of exceptions. I'm not quite sure what to make of them. The
testNoWaitClose method calls IW.close(false) a number of times on an
IW that uses ConcurrentMergeScheduler. 

{code}
[junit] Caused by: java.lang.AssertionError
    [junit] 	at org.apache.lucene.index.IndexFileDeleter$RefCount.DecRef(IndexFileDeleter.java:553)
    [junit] 	at org.apache.lucene.index.IndexFileDeleter.decRef(IndexFileDeleter.java:470)
    [junit] 	at org.apache.lucene.index.IndexFileDeleter.decRef(IndexFileDeleter.java:481)
    [junit] 	at org.apache.lucene.index.IndexWriter.decrefMergeSegments(IndexWriter.java:4485)
    [junit] 	at org.apache.lucene.index.IndexWriter.commitMerge(IndexWriter.java:4473)
    [junit] 	at org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4910)
    [junit] 	at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:4539)
    [junit] 	at org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:218)
    [junit] 	at org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:274)
{code}
{code}
[junit] java.lang.AssertionError
    [junit] 	at org.apache.lucene.index.IndexWriter.finishMerges(IndexWriter.java:3350)
    [junit] 	at org.apache.lucene.index.IndexWriter.closeInternal(IndexWriter.java:2090)
    [junit] 	at org.apache.lucene.index.IndexWriter.close(IndexWriter.java:2045)
    [junit] 	at org.apache.lucene.index.TestConcurrentMergeScheduler.testNoWaitClose(TestConcurrentMergeScheduler.java:211)
    [junit] 	at org.apache.lucene.util.LuceneTestCase.runTest(LuceneTestCase.java:88)
{code}

> Integrate IndexReader with IndexWriter 
> ---------------------------------------
>
>                 Key: LUCENE-1516
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1516
>             Project: Lucene - Java
>          Issue Type: Improvement
>    Affects Versions: 2.4
>            Reporter: Jason Rutherglen
>            Assignee: Michael McCandless
>            Priority: Minor
>             Fix For: 2.9
>
>         Attachments: LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> The current problem is an IndexReader and IndexWriter cannot be open
> at the same time and perform updates as they both require a write
> lock to the index. While methods such as IW.deleteDocuments enables
> deleting from IW, methods such as IR.deleteDocument(int doc) and
> norms updating are not available from IW. This limits the
> capabilities of performing updates to the index dynamically or in
> realtime without closing the IW and opening an IR, deleting or
> updating norms, flushing, then opening the IW again, a process which
> can be detrimental to realtime updates. 
> This patch will expose an IndexWriter.getReader method that returns
> the currently flushed state of the index as a class that implements
> IndexReader. The new IR implementation will differ from existing IR
> implementations such as MultiSegmentReader in that flushing will
> synchronize updates with IW in part by sharing the write lock. All
> methods of IR will be usable including reopen and clone. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Updated: (LUCENE-1516) Integrate IndexReader with IndexWriter

Posted by "Jason Rutherglen (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-1516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jason Rutherglen updated LUCENE-1516:
-------------------------------------

    Attachment: LUCENE-1516.patch

Added TestIndexWriterReader.testIndexWriterDeletes,
testIndexWriterDeletesOptimize, testIndexWriterReopenSegment,
testIndexWriterReopenSegmentOptimize. Where the optimize methods
fail, the non optimize ones work. The optimize methods delete _0.fdt
and _0.fdx and so fail when the writer is created again because it
cannot find those files. It could be a segment infos merging problem
or something else.

> Integrate IndexReader with IndexWriter 
> ---------------------------------------
>
>                 Key: LUCENE-1516
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1516
>             Project: Lucene - Java
>          Issue Type: Improvement
>    Affects Versions: 2.4
>            Reporter: Jason Rutherglen
>            Priority: Minor
>             Fix For: 2.9
>
>         Attachments: LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> The current problem is an IndexReader and IndexWriter cannot be open
> at the same time and perform updates as they both require a write
> lock to the index. While methods such as IW.deleteDocuments enables
> deleting from IW, methods such as IR.deleteDocument(int doc) and
> norms updating are not available from IW. This limits the
> capabilities of performing updates to the index dynamically or in
> realtime without closing the IW and opening an IR, deleting or
> updating norms, flushing, then opening the IW again, a process which
> can be detrimental to realtime updates. 
> This patch will expose an IndexWriter.getReader method that returns
> the currently flushed state of the index as a class that implements
> IndexReader. The new IR implementation will differ from existing IR
> implementations such as MultiSegmentReader in that flushing will
> synchronize updates with IW in part by sharing the write lock. All
> methods of IR will be usable including reopen and clone. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1516) Integrate IndexReader with IndexWriter

Posted by "Jeremy Volkman (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12677523#action_12677523 ] 

Jeremy Volkman commented on LUCENE-1516:
----------------------------------------

I noticed the comments about IW.getReader() flushing the current state of the writer, launching merges, etc. Does this mean that the point of interaction between index writes and reads will still be the Directory? (i.e. the disk)

> Integrate IndexReader with IndexWriter 
> ---------------------------------------
>
>                 Key: LUCENE-1516
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1516
>             Project: Lucene - Java
>          Issue Type: Improvement
>    Affects Versions: 2.4
>            Reporter: Jason Rutherglen
>            Assignee: Michael McCandless
>            Priority: Minor
>             Fix For: 2.9
>
>         Attachments: LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> The current problem is an IndexReader and IndexWriter cannot be open
> at the same time and perform updates as they both require a write
> lock to the index. While methods such as IW.deleteDocuments enables
> deleting from IW, methods such as IR.deleteDocument(int doc) and
> norms updating are not available from IW. This limits the
> capabilities of performing updates to the index dynamically or in
> realtime without closing the IW and opening an IR, deleting or
> updating norms, flushing, then opening the IW again, a process which
> can be detrimental to realtime updates. 
> This patch will expose an IndexWriter.getReader method that returns
> the currently flushed state of the index as a class that implements
> IndexReader. The new IR implementation will differ from existing IR
> implementations such as MultiSegmentReader in that flushing will
> synchronize updates with IW in part by sharing the write lock. All
> methods of IR will be usable including reopen and clone. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1516) Integrate IndexReader with IndexWriter

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12675345#action_12675345 ] 

Michael McCandless commented on LUCENE-1516:
--------------------------------------------

{quote}
The path forward seems to be exposing a cloned readonly reader
from IW.getReader.
{quote}

+1

{quote}
> can't we move away from allowing any changes via IR? (Ie
> deprecate deleteDocuments/setNorms/etc.)

This would simplify things however as a thought experiment how would
the setNorms work if it were a part of IndexWriter?
{quote}

I think it'd look like this?
{code}
IndexWriter.setNorm(Term term, String field, byte norm)
{code}

Ie the Term IDs the doc(s) you want to set the norm for.

{quote}

> And, clone should not be reopening segments...? 

DirectoryIndexReader.clone(boolean openReadonly) calls
doReopen(SegmentInfos infos, boolean doClone, boolean openReadOnly)
which is an abstract method that in SegmentReader and
MultiSegmentReader reopens the segments? The segment infos for a
ReaderIW is obtained from IW, which is how it knows about the new
segments. Perhaps not desired behavior?
{quote}

OK, I think it does not reopen *existing* segments.  Meaning, if a
segment is in common w/ old and new, it truly clones it (does not
reopen norms nor del).  But if there is a new segment that did not
exist in old, it opens a whole new segment reader?  I'll commit an
assert that this doesn't happen -- if caller passes in "doClone=true"
then caller should not have passed in a segmentInfos with changes?
Else the reader is on thin ice (mismatch what's in RAM vs what
SegmentInfo says).

{quote}
> do we need delete by docID once we have realtime search? I
> think the last compelling reason to keep IR's delete by docID was
> immediacy, but realtime search can give us that, from IW, even when
> deleting by Term or Query? 

Good point! I think we may want to support it but for now it's
shouldn't be necessary. I'm thinking of the case where someone is
using the field cache (or some variant), performs some sort of query
on it and then needs to delete based on doc id. What do they do?
Would we expose a callback mechanism where a deleteFrom(IndexReader
ir) method is exposed and deletes occur at the time of the IW's
choosing?
{quote}

Wouldn't delete-by-Query cover this?  Ie one could always make a
Filter implementing the "look @ field cache, do some logic, provide
docIDs to delete", wrap as Query, then delete-by-Query?

{quote}
> It seems like calling reader.reopen() (on reader obtained
> from writer) should basically do the same thing as calling
> writer.getReader(). Ie they are nearly synonyms? (Except for small
> difference in ref counting - I think writer.getReader() should always
> incRef, but reopen only incRefs if it returns a new reader). 

Perhaps ReaderIW.reopen will call IW.getReader underneath instead of
using IR's usual mechanism.
{quote}

Right, that's what I'm thinking.  Once you've obtained reader coupled
to a writer, you can then simply reopen it whenever you want to see
(materialize) changes done by the writer.

We still need a solution for the "warm the just merged
segment"... else we will not be realtime, especially when big merge
finishes.  It seems like after merge finishes, it should immediately
1) open a SegmentReader on the new segment, 2) invoke the method you
passed in (or you subclassed -- not sure which), 3) carry over deletes
that materialized during the merge, 4) commit the merge (replace old
segments w/ new one).


> Integrate IndexReader with IndexWriter 
> ---------------------------------------
>
>                 Key: LUCENE-1516
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1516
>             Project: Lucene - Java
>          Issue Type: Improvement
>    Affects Versions: 2.4
>            Reporter: Jason Rutherglen
>            Priority: Minor
>             Fix For: 2.9
>
>         Attachments: LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> The current problem is an IndexReader and IndexWriter cannot be open
> at the same time and perform updates as they both require a write
> lock to the index. While methods such as IW.deleteDocuments enables
> deleting from IW, methods such as IR.deleteDocument(int doc) and
> norms updating are not available from IW. This limits the
> capabilities of performing updates to the index dynamically or in
> realtime without closing the IW and opening an IR, deleting or
> updating norms, flushing, then opening the IW again, a process which
> can be detrimental to realtime updates. 
> This patch will expose an IndexWriter.getReader method that returns
> the currently flushed state of the index as a class that implements
> IndexReader. The new IR implementation will differ from existing IR
> implementations such as MultiSegmentReader in that flushing will
> synchronize updates with IW in part by sharing the write lock. All
> methods of IR will be usable including reopen and clone. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1516) Integrate IndexReader with IndexWriter

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12696444#action_12696444 ] 

Michael McCandless commented on LUCENE-1516:
--------------------------------------------

I think NRT search is finally ready to go in... I'll wait another day or two.

> Integrate IndexReader with IndexWriter 
> ---------------------------------------
>
>                 Key: LUCENE-1516
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1516
>             Project: Lucene - Java
>          Issue Type: Improvement
>    Affects Versions: 2.4
>            Reporter: Jason Rutherglen
>            Assignee: Michael McCandless
>            Priority: Minor
>             Fix For: 2.9
>
>         Attachments: LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, magnetic.png, ssd.png, ssd2.png
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> The current problem is an IndexReader and IndexWriter cannot be open
> at the same time and perform updates as they both require a write
> lock to the index. While methods such as IW.deleteDocuments enables
> deleting from IW, methods such as IR.deleteDocument(int doc) and
> norms updating are not available from IW. This limits the
> capabilities of performing updates to the index dynamically or in
> realtime without closing the IW and opening an IR, deleting or
> updating norms, flushing, then opening the IW again, a process which
> can be detrimental to realtime updates. 
> This patch will expose an IndexWriter.getReader method that returns
> the currently flushed state of the index as a class that implements
> IndexReader. The new IR implementation will differ from existing IR
> implementations such as MultiSegmentReader in that flushing will
> synchronize updates with IW in part by sharing the write lock. All
> methods of IR will be usable including reopen and clone. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1516) Integrate IndexReader with IndexWriter

Posted by "Jason Rutherglen (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12700899#action_12700899 ] 

Jason Rutherglen commented on LUCENE-1516:
------------------------------------------

We need an IndexWriter.getMergedSegmentWarmer method?

> Integrate IndexReader with IndexWriter 
> ---------------------------------------
>
>                 Key: LUCENE-1516
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1516
>             Project: Lucene - Java
>          Issue Type: Improvement
>    Affects Versions: 2.4
>            Reporter: Jason Rutherglen
>            Assignee: Michael McCandless
>            Priority: Minor
>             Fix For: 2.9
>
>         Attachments: LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, magnetic.png, ssd.png, ssd2.png
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> The current problem is an IndexReader and IndexWriter cannot be open
> at the same time and perform updates as they both require a write
> lock to the index. While methods such as IW.deleteDocuments enables
> deleting from IW, methods such as IR.deleteDocument(int doc) and
> norms updating are not available from IW. This limits the
> capabilities of performing updates to the index dynamically or in
> realtime without closing the IW and opening an IR, deleting or
> updating norms, flushing, then opening the IW again, a process which
> can be detrimental to realtime updates. 
> This patch will expose an IndexWriter.getReader method that returns
> the currently flushed state of the index as a class that implements
> IndexReader. The new IR implementation will differ from existing IR
> implementations such as MultiSegmentReader in that flushing will
> synchronize updates with IW in part by sharing the write lock. All
> methods of IR will be usable including reopen and clone. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1516) Integrate IndexReader with IndexWriter

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12675343#action_12675343 ] 

Michael McCandless commented on LUCENE-1516:
--------------------------------------------

Jason, I think you need to "svn up".  Or, tell us which revision you're on and we can downgrade to that revision before applying the patch.  (We need "svn patch"!).

> Integrate IndexReader with IndexWriter 
> ---------------------------------------
>
>                 Key: LUCENE-1516
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1516
>             Project: Lucene - Java
>          Issue Type: Improvement
>    Affects Versions: 2.4
>            Reporter: Jason Rutherglen
>            Priority: Minor
>             Fix For: 2.9
>
>         Attachments: LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> The current problem is an IndexReader and IndexWriter cannot be open
> at the same time and perform updates as they both require a write
> lock to the index. While methods such as IW.deleteDocuments enables
> deleting from IW, methods such as IR.deleteDocument(int doc) and
> norms updating are not available from IW. This limits the
> capabilities of performing updates to the index dynamically or in
> realtime without closing the IW and opening an IR, deleting or
> updating norms, flushing, then opening the IW again, a process which
> can be detrimental to realtime updates. 
> This patch will expose an IndexWriter.getReader method that returns
> the currently flushed state of the index as a class that implements
> IndexReader. The new IR implementation will differ from existing IR
> implementations such as MultiSegmentReader in that flushing will
> synchronize updates with IW in part by sharing the write lock. All
> methods of IR will be usable including reopen and clone. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1516) Integrate IndexReader with IndexWriter

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12679747#action_12679747 ] 

Michael McCandless commented on LUCENE-1516:
--------------------------------------------

Jason can you resync to trunk?

> Integrate IndexReader with IndexWriter 
> ---------------------------------------
>
>                 Key: LUCENE-1516
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1516
>             Project: Lucene - Java
>          Issue Type: Improvement
>    Affects Versions: 2.4
>            Reporter: Jason Rutherglen
>            Assignee: Michael McCandless
>            Priority: Minor
>             Fix For: 2.9
>
>         Attachments: LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> The current problem is an IndexReader and IndexWriter cannot be open
> at the same time and perform updates as they both require a write
> lock to the index. While methods such as IW.deleteDocuments enables
> deleting from IW, methods such as IR.deleteDocument(int doc) and
> norms updating are not available from IW. This limits the
> capabilities of performing updates to the index dynamically or in
> realtime without closing the IW and opening an IR, deleting or
> updating norms, flushing, then opening the IW again, a process which
> can be detrimental to realtime updates. 
> This patch will expose an IndexWriter.getReader method that returns
> the currently flushed state of the index as a class that implements
> IndexReader. The new IR implementation will differ from existing IR
> implementations such as MultiSegmentReader in that flushing will
> synchronize updates with IW in part by sharing the write lock. All
> methods of IR will be usable including reopen and clone. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1516) Integrate IndexReader with IndexWriter

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12675349#action_12675349 ] 

Michael McCandless commented on LUCENE-1516:
--------------------------------------------


bq. I suppose if I know I'm not going to be changing my index, I can still just get a read-only IR, right?

Right, I think we still want to allow opening a standalone (uncoupled)
reader.

bq. Then, everyone has a single point of entry for both writers and readers and all of this stuff can just be done through package private methods on the IW and it allows us to change things if we decide otherwise and it means that the IW is not coupled with the IR publicly.

I'm torn... the IndexAccessor would need to expose many variants to
carry over all the options we now have (String or File or Directory,
IndexCommit or not, IndexDeletionPolicy or not, create or not).  It
will end up exposing a number of new methods...  and, would it try to
be "smart" (like IndexModifier, and the LuceneIndexAccessor class in
LUCENE-390), keeping track of references to the readers it's handed
out, etc.?  Or is it simply a pass-through to the underlying
open/ctors we have today?

The alternative (as of right now, unless we are missing something
further with these changes) is adding one method to IndexWriter,
getReader, that returns a readOnly IndexReader, "coupled" to the
writer you got it from in that it's able to search un-committed
changes and if you reopen it, writer will materialize all changes and
make them visible to the reopened reader.

I guess so far I don't really see why this small (one method) API
change merits a switch to a whole new accessor API for creating
readers & writers on an index?  Maybe there is a
straw-that-breaks-the-camel's-back argument that I'm missing...


> Integrate IndexReader with IndexWriter 
> ---------------------------------------
>
>                 Key: LUCENE-1516
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1516
>             Project: Lucene - Java
>          Issue Type: Improvement
>    Affects Versions: 2.4
>            Reporter: Jason Rutherglen
>            Priority: Minor
>             Fix For: 2.9
>
>         Attachments: LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> The current problem is an IndexReader and IndexWriter cannot be open
> at the same time and perform updates as they both require a write
> lock to the index. While methods such as IW.deleteDocuments enables
> deleting from IW, methods such as IR.deleteDocument(int doc) and
> norms updating are not available from IW. This limits the
> capabilities of performing updates to the index dynamically or in
> realtime without closing the IW and opening an IR, deleting or
> updating norms, flushing, then opening the IW again, a process which
> can be detrimental to realtime updates. 
> This patch will expose an IndexWriter.getReader method that returns
> the currently flushed state of the index as a class that implements
> IndexReader. The new IR implementation will differ from existing IR
> implementations such as MultiSegmentReader in that flushing will
> synchronize updates with IW in part by sharing the write lock. All
> methods of IR will be usable including reopen and clone. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1516) Integrate IndexReader with IndexWriter

Posted by "Jason Rutherglen (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12693922#action_12693922 ] 

Jason Rutherglen commented on LUCENE-1516:
------------------------------------------

Mike, nice work!  

I will hopefully test this week.

> Integrate IndexReader with IndexWriter 
> ---------------------------------------
>
>                 Key: LUCENE-1516
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1516
>             Project: Lucene - Java
>          Issue Type: Improvement
>    Affects Versions: 2.4
>            Reporter: Jason Rutherglen
>            Assignee: Michael McCandless
>            Priority: Minor
>             Fix For: 2.9
>
>         Attachments: LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, magnetic.png, ssd.png, ssd2.png
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> The current problem is an IndexReader and IndexWriter cannot be open
> at the same time and perform updates as they both require a write
> lock to the index. While methods such as IW.deleteDocuments enables
> deleting from IW, methods such as IR.deleteDocument(int doc) and
> norms updating are not available from IW. This limits the
> capabilities of performing updates to the index dynamically or in
> realtime without closing the IW and opening an IR, deleting or
> updating norms, flushing, then opening the IW again, a process which
> can be detrimental to realtime updates. 
> This patch will expose an IndexWriter.getReader method that returns
> the currently flushed state of the index as a class that implements
> IndexReader. The new IR implementation will differ from existing IR
> implementations such as MultiSegmentReader in that flushing will
> synchronize updates with IW in part by sharing the write lock. All
> methods of IR will be usable including reopen and clone. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1516) Integrate IndexReader with IndexWriter

Posted by "Jason Rutherglen (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12700354#action_12700354 ] 

Jason Rutherglen commented on LUCENE-1516:
------------------------------------------

I noticed NearRealtimeReaderTask is in trunk, can you include the .alg?  How does one create a line based wiki file?

> Integrate IndexReader with IndexWriter 
> ---------------------------------------
>
>                 Key: LUCENE-1516
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1516
>             Project: Lucene - Java
>          Issue Type: Improvement
>    Affects Versions: 2.4
>            Reporter: Jason Rutherglen
>            Assignee: Michael McCandless
>            Priority: Minor
>             Fix For: 2.9
>
>         Attachments: LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, magnetic.png, ssd.png, ssd2.png
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> The current problem is an IndexReader and IndexWriter cannot be open
> at the same time and perform updates as they both require a write
> lock to the index. While methods such as IW.deleteDocuments enables
> deleting from IW, methods such as IR.deleteDocument(int doc) and
> norms updating are not available from IW. This limits the
> capabilities of performing updates to the index dynamically or in
> realtime without closing the IW and opening an IR, deleting or
> updating norms, flushing, then opening the IW again, a process which
> can be detrimental to realtime updates. 
> This patch will expose an IndexWriter.getReader method that returns
> the currently flushed state of the index as a class that implements
> IndexReader. The new IR implementation will differ from existing IR
> implementations such as MultiSegmentReader in that flushing will
> synchronize updates with IW in part by sharing the write lock. All
> methods of IR will be usable including reopen and clone. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1516) Integrate IndexReader with IndexWriter

Posted by "Grant Ingersoll (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12675354#action_12675354 ] 

Grant Ingersoll commented on LUCENE-1516:
-----------------------------------------

Good points, MIke, but maybe we don't need all those variants?  String, File and Directory are all easily enough collapsed down to just Directory.
{code}
new IndexWriter(new Directory(indexFile));
{code}

Additionally, there are no more variants than there already are on the IW and IR, right?   

As for pass-through or not, I think it would just pass-through, at least initially, but it certainly leaves open the possibility for reference counting in the future if someone wants to implement that.

As someone who teaches people these APIs on a regular basis, I feel pretty confident in saying that adding an IR to the IW as a public API is going to confuse a good chunk of people just as the delete stuff on the IR currently does now.  You wouldn't ask FileWriter for a FileReader, would you?  I don't see why it would be good to ask a IW for an IR, API-wise (I get why we are doing this, it makes sense).

Likewise, isn't it just as logical to ask for an IW from an IR?  If I have an IR already and I want it to be aware of the writes I want to do, why wouldn't we then add IR.getIW()?  And then we can have total circular dependency. 



> Integrate IndexReader with IndexWriter 
> ---------------------------------------
>
>                 Key: LUCENE-1516
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1516
>             Project: Lucene - Java
>          Issue Type: Improvement
>    Affects Versions: 2.4
>            Reporter: Jason Rutherglen
>            Priority: Minor
>             Fix For: 2.9
>
>         Attachments: LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> The current problem is an IndexReader and IndexWriter cannot be open
> at the same time and perform updates as they both require a write
> lock to the index. While methods such as IW.deleteDocuments enables
> deleting from IW, methods such as IR.deleteDocument(int doc) and
> norms updating are not available from IW. This limits the
> capabilities of performing updates to the index dynamically or in
> realtime without closing the IW and opening an IR, deleting or
> updating norms, flushing, then opening the IW again, a process which
> can be detrimental to realtime updates. 
> This patch will expose an IndexWriter.getReader method that returns
> the currently flushed state of the index as a class that implements
> IndexReader. The new IR implementation will differ from existing IR
> implementations such as MultiSegmentReader in that flushing will
> synchronize updates with IW in part by sharing the write lock. All
> methods of IR will be usable including reopen and clone. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1516) Integrate IndexReader with IndexWriter

Posted by "Jason Rutherglen (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12676713#action_12676713 ] 

Jason Rutherglen commented on LUCENE-1516:
------------------------------------------

We'll need a method to check the SegmentReaderPool for readers that
may be removed. It seems it would check IW.segmentInfos, pending
merges, current merges for segmentinfo(s) that are in use, then all
others in the pool are expunged. I'm not sure if this method would
need to run in a background thread. 

> Integrate IndexReader with IndexWriter 
> ---------------------------------------
>
>                 Key: LUCENE-1516
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1516
>             Project: Lucene - Java
>          Issue Type: Improvement
>    Affects Versions: 2.4
>            Reporter: Jason Rutherglen
>            Assignee: Michael McCandless
>            Priority: Minor
>             Fix For: 2.9
>
>         Attachments: LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> The current problem is an IndexReader and IndexWriter cannot be open
> at the same time and perform updates as they both require a write
> lock to the index. While methods such as IW.deleteDocuments enables
> deleting from IW, methods such as IR.deleteDocument(int doc) and
> norms updating are not available from IW. This limits the
> capabilities of performing updates to the index dynamically or in
> realtime without closing the IW and opening an IR, deleting or
> updating norms, flushing, then opening the IW again, a process which
> can be detrimental to realtime updates. 
> This patch will expose an IndexWriter.getReader method that returns
> the currently flushed state of the index as a class that implements
> IndexReader. The new IR implementation will differ from existing IR
> implementations such as MultiSegmentReader in that flushing will
> synchronize updates with IW in part by sharing the write lock. All
> methods of IR will be usable including reopen and clone. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Updated: (LUCENE-1516) Integrate IndexReader with IndexWriter

Posted by "Jason Rutherglen (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-1516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jason Rutherglen updated LUCENE-1516:
-------------------------------------

    Attachment: LUCENE-1516.patch

Added the test case to the patch.  

> Integrate IndexReader with IndexWriter 
> ---------------------------------------
>
>                 Key: LUCENE-1516
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1516
>             Project: Lucene - Java
>          Issue Type: Improvement
>    Affects Versions: 2.4
>            Reporter: Jason Rutherglen
>            Priority: Minor
>             Fix For: 2.9
>
>         Attachments: LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> The current problem is an IndexReader and IndexWriter cannot be open
> at the same time and perform updates as they both require a write
> lock to the index. While methods such as IW.deleteDocuments enables
> deleting from IW, methods such as IR.deleteDocument(int doc) and
> norms updating are not available from IW. This limits the
> capabilities of performing updates to the index dynamically or in
> realtime without closing the IW and opening an IR, deleting or
> updating norms, flushing, then opening the IW again, a process which
> can be detrimental to realtime updates. 
> This patch will expose an IndexWriter.getReader method that returns
> the currently flushed state of the index as a class that implements
> IndexReader. The new IR implementation will differ from existing IR
> implementations such as MultiSegmentReader in that flushing will
> synchronize updates with IW in part by sharing the write lock. All
> methods of IR will be usable including reopen and clone. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Updated: (LUCENE-1516) Integrate IndexReader with IndexWriter

Posted by "Jason Rutherglen (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-1516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jason Rutherglen updated LUCENE-1516:
-------------------------------------

    Attachment: LUCENE-1516.patch

On IW.close -> SegmentReaderPool.flushChanged ->
segmentreader.commitChanges was being called on SRs that had no
changes. TestDeletionPolicy and TestIndexFileDeleter passes.



> Integrate IndexReader with IndexWriter 
> ---------------------------------------
>
>                 Key: LUCENE-1516
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1516
>             Project: Lucene - Java
>          Issue Type: Improvement
>    Affects Versions: 2.4
>            Reporter: Jason Rutherglen
>            Assignee: Michael McCandless
>            Priority: Minor
>             Fix For: 2.9
>
>         Attachments: LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> The current problem is an IndexReader and IndexWriter cannot be open
> at the same time and perform updates as they both require a write
> lock to the index. While methods such as IW.deleteDocuments enables
> deleting from IW, methods such as IR.deleteDocument(int doc) and
> norms updating are not available from IW. This limits the
> capabilities of performing updates to the index dynamically or in
> realtime without closing the IW and opening an IR, deleting or
> updating norms, flushing, then opening the IW again, a process which
> can be detrimental to realtime updates. 
> This patch will expose an IndexWriter.getReader method that returns
> the currently flushed state of the index as a class that implements
> IndexReader. The new IR implementation will differ from existing IR
> implementations such as MultiSegmentReader in that flushing will
> synchronize updates with IW in part by sharing the write lock. All
> methods of IR will be usable including reopen and clone. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Resolved: (LUCENE-1516) Integrate IndexReader with IndexWriter

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-1516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Michael McCandless resolved LUCENE-1516.
----------------------------------------

    Resolution: Fixed

I just committed this.  Thanks Jason!

> Integrate IndexReader with IndexWriter 
> ---------------------------------------
>
>                 Key: LUCENE-1516
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1516
>             Project: Lucene - Java
>          Issue Type: Improvement
>    Affects Versions: 2.4
>            Reporter: Jason Rutherglen
>            Assignee: Michael McCandless
>            Priority: Minor
>             Fix For: 2.9
>
>         Attachments: LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, magnetic.png, ssd.png, ssd2.png
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> The current problem is an IndexReader and IndexWriter cannot be open
> at the same time and perform updates as they both require a write
> lock to the index. While methods such as IW.deleteDocuments enables
> deleting from IW, methods such as IR.deleteDocument(int doc) and
> norms updating are not available from IW. This limits the
> capabilities of performing updates to the index dynamically or in
> realtime without closing the IW and opening an IR, deleting or
> updating norms, flushing, then opening the IW again, a process which
> can be detrimental to realtime updates. 
> This patch will expose an IndexWriter.getReader method that returns
> the currently flushed state of the index as a class that implements
> IndexReader. The new IR implementation will differ from existing IR
> implementations such as MultiSegmentReader in that flushing will
> synchronize updates with IW in part by sharing the write lock. All
> methods of IR will be usable including reopen and clone. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1516) Integrate IndexReader with IndexWriter

Posted by "Grant Ingersoll (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12675334#action_12675334 ] 

Grant Ingersoll commented on LUCENE-1516:
-----------------------------------------

This latest patch doesn't seem to apply.



> Integrate IndexReader with IndexWriter 
> ---------------------------------------
>
>                 Key: LUCENE-1516
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1516
>             Project: Lucene - Java
>          Issue Type: Improvement
>    Affects Versions: 2.4
>            Reporter: Jason Rutherglen
>            Priority: Minor
>             Fix For: 2.9
>
>         Attachments: LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> The current problem is an IndexReader and IndexWriter cannot be open
> at the same time and perform updates as they both require a write
> lock to the index. While methods such as IW.deleteDocuments enables
> deleting from IW, methods such as IR.deleteDocument(int doc) and
> norms updating are not available from IW. This limits the
> capabilities of performing updates to the index dynamically or in
> realtime without closing the IW and opening an IR, deleting or
> updating norms, flushing, then opening the IW again, a process which
> can be detrimental to realtime updates. 
> This patch will expose an IndexWriter.getReader method that returns
> the currently flushed state of the index as a class that implements
> IndexReader. The new IR implementation will differ from existing IR
> implementations such as MultiSegmentReader in that flushing will
> synchronize updates with IW in part by sharing the write lock. All
> methods of IR will be usable including reopen and clone. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Assigned: (LUCENE-1516) Integrate IndexReader with IndexWriter

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-1516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Michael McCandless reassigned LUCENE-1516:
------------------------------------------

    Assignee: Michael McCandless

> Integrate IndexReader with IndexWriter 
> ---------------------------------------
>
>                 Key: LUCENE-1516
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1516
>             Project: Lucene - Java
>          Issue Type: Improvement
>    Affects Versions: 2.4
>            Reporter: Jason Rutherglen
>            Assignee: Michael McCandless
>            Priority: Minor
>             Fix For: 2.9
>
>         Attachments: LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> The current problem is an IndexReader and IndexWriter cannot be open
> at the same time and perform updates as they both require a write
> lock to the index. While methods such as IW.deleteDocuments enables
> deleting from IW, methods such as IR.deleteDocument(int doc) and
> norms updating are not available from IW. This limits the
> capabilities of performing updates to the index dynamically or in
> realtime without closing the IW and opening an IR, deleting or
> updating norms, flushing, then opening the IW again, a process which
> can be detrimental to realtime updates. 
> This patch will expose an IndexWriter.getReader method that returns
> the currently flushed state of the index as a class that implements
> IndexReader. The new IR implementation will differ from existing IR
> implementations such as MultiSegmentReader in that flushing will
> synchronize updates with IW in part by sharing the write lock. All
> methods of IR will be usable including reopen and clone. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Updated: (LUCENE-1516) Integrate IndexReader with IndexWriter

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-1516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Michael McCandless updated LUCENE-1516:
---------------------------------------

    Attachment: magnetic.png
                ssd.png

OK I ran a basic initial test of the latency when opening a near
real-time reader.

Using contrib/benchmark, I index wikipedia docs like normal, but then
I added a NearRealTimeReaderTask, which runs a BG thread that once
every N (I did 3) seconds it gets a new reader from the writer.

Then it does a simple search for term "1" in the body, and sorts by
the docdate field.

I measured milli-seconds to reopen and to run the search, and plot
those as a function of index size.

I ran two tests.  The first (attached as ssd.png) stores the index on
an Intel X25M solid-state disk; the second (attached as magnetic.png)
on a WD Velociraptor.

Notes:

  * The reopen time is ~700 msec in both cases, and doesn't change
    much as index grows (which is nice).

  * It is quite noisy, likely due to merges committing.

  * I logged (but did not graph) the flush time vs actual reopen time,
    and it's the flush time that dominates.  This is good because with
    a slower indexing rate, this flush time would go way down.  My
    guess is flushing is CPU bound not IO bound.

  * The search time ramps up linearly (expected), and also shows
    spikes due to merging.  There's one massive spike at the end of
    the SSD one that's odd (did not correspond to reopening after a
    merge, though perhaps during a merge).

  * This is a somewhat overly stressful test because I'm indexing docs
    at full speed.  Whereas I'd expect for the typical near realtime
    search app, the docs would usually be trickling in more slowly and
    a reopen would happen after just a few docs.

  * SSD and magnetic look pretty darn similar, though magnetic shows
    more noise and maybe is more affected by merges.

  * I'm doing no deletions in this test, but the typical near
    real-time app presumably would.


> Integrate IndexReader with IndexWriter 
> ---------------------------------------
>
>                 Key: LUCENE-1516
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1516
>             Project: Lucene - Java
>          Issue Type: Improvement
>    Affects Versions: 2.4
>            Reporter: Jason Rutherglen
>            Assignee: Michael McCandless
>            Priority: Minor
>             Fix For: 2.9
>
>         Attachments: LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, magnetic.png, ssd.png
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> The current problem is an IndexReader and IndexWriter cannot be open
> at the same time and perform updates as they both require a write
> lock to the index. While methods such as IW.deleteDocuments enables
> deleting from IW, methods such as IR.deleteDocument(int doc) and
> norms updating are not available from IW. This limits the
> capabilities of performing updates to the index dynamically or in
> realtime without closing the IW and opening an IR, deleting or
> updating norms, flushing, then opening the IW again, a process which
> can be detrimental to realtime updates. 
> This patch will expose an IndexWriter.getReader method that returns
> the currently flushed state of the index as a class that implements
> IndexReader. The new IR implementation will differ from existing IR
> implementations such as MultiSegmentReader in that flushing will
> synchronize updates with IW in part by sharing the write lock. All
> methods of IR will be usable including reopen and clone. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Issue Comment Edited: (LUCENE-1516) Integrate IndexReader with IndexWriter

Posted by "Jason Rutherglen (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12679355#action_12679355 ] 

jasonrutherglen edited comment on LUCENE-1516 at 3/5/09 1:11 PM:
------------------------------------------------------------------

Here's a summary of the which tests are failing grouped by the
guessed cause:

* TestDeletionPolicy, TestIndexFileDeleter, TestIndexReader fails (not
all methods but they seem to be related, i.e. files are being left
open that should not be)

* In TestIndexWriter most test methods pass, however a few such as
testCommitOnCloseDiskUsage fail.
TestIndexWriterDelete.testUpdatesOnDiskFull fails for reasons
presumably similar to TestIndexWriter.testCommitOnCloseDiskUsage

* TestTransactions and TestStressIndexing2 fails. At first glance I'm
not sure why 

      was (Author: jasonrutherglen):
    Here's a summary of the which tests are failing with a grouped by the
guessed cause:

* TestDeletionPolicy, TestIndexFileDeleter, TestIndexReader fails (not
all methods but they seem to be related, i.e. files are being left
open that should not be)

* In TestIndexWriter most test methods pass, however a few such as
testCommitOnCloseDiskUsage fail.
TestIndexWriterDelete.testUpdatesOnDiskFull fails for reasons
presumably similar to TestIndexWriter.testCommitOnCloseDiskUsage

* TestTransactions and TestStressIndexing2 fails. At first glance I'm
not sure why 
  
> Integrate IndexReader with IndexWriter 
> ---------------------------------------
>
>                 Key: LUCENE-1516
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1516
>             Project: Lucene - Java
>          Issue Type: Improvement
>    Affects Versions: 2.4
>            Reporter: Jason Rutherglen
>            Assignee: Michael McCandless
>            Priority: Minor
>             Fix For: 2.9
>
>         Attachments: LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> The current problem is an IndexReader and IndexWriter cannot be open
> at the same time and perform updates as they both require a write
> lock to the index. While methods such as IW.deleteDocuments enables
> deleting from IW, methods such as IR.deleteDocument(int doc) and
> norms updating are not available from IW. This limits the
> capabilities of performing updates to the index dynamically or in
> realtime without closing the IW and opening an IR, deleting or
> updating norms, flushing, then opening the IW again, a process which
> can be detrimental to realtime updates. 
> This patch will expose an IndexWriter.getReader method that returns
> the currently flushed state of the index as a class that implements
> IndexReader. The new IR implementation will differ from existing IR
> implementations such as MultiSegmentReader in that flushing will
> synchronize updates with IW in part by sharing the write lock. All
> methods of IR will be usable including reopen and clone. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1516) Integrate IndexReader with IndexWriter

Posted by "Grant Ingersoll (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12675504#action_12675504 ] 

Grant Ingersoll commented on LUCENE-1516:
-----------------------------------------

OK, I agree.  Let's just mark it as expert/subject to revision and then we're good.

We can revisit IndexAccessor separately.



> Integrate IndexReader with IndexWriter 
> ---------------------------------------
>
>                 Key: LUCENE-1516
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1516
>             Project: Lucene - Java
>          Issue Type: Improvement
>    Affects Versions: 2.4
>            Reporter: Jason Rutherglen
>            Assignee: Michael McCandless
>            Priority: Minor
>             Fix For: 2.9
>
>         Attachments: LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> The current problem is an IndexReader and IndexWriter cannot be open
> at the same time and perform updates as they both require a write
> lock to the index. While methods such as IW.deleteDocuments enables
> deleting from IW, methods such as IR.deleteDocument(int doc) and
> norms updating are not available from IW. This limits the
> capabilities of performing updates to the index dynamically or in
> realtime without closing the IW and opening an IR, deleting or
> updating norms, flushing, then opening the IW again, a process which
> can be detrimental to realtime updates. 
> This patch will expose an IndexWriter.getReader method that returns
> the currently flushed state of the index as a class that implements
> IndexReader. The new IR implementation will differ from existing IR
> implementations such as MultiSegmentReader in that flushing will
> synchronize updates with IW in part by sharing the write lock. All
> methods of IR will be usable including reopen and clone. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1516) Integrate IndexReader with IndexWriter

Posted by "Grant Ingersoll (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12675414#action_12675414 ] 

Grant Ingersoll commented on LUCENE-1516:
-----------------------------------------

bq. Right, I'm just saying IndexAccessor will have many methods. And then
you're asking every app to make this switch, on upgrade. It's alot of
API swapping/noise vs a single added expert method to IW.

Sure, but that is already the case w/ IW/IR anyway.

I agree about the short term noise, but in the long run it seems cleaner.

bq. But this will be an expert/advanced API, a single added method to IW.
I wouldn't expect users to be confused: on upgrade I think most users
will not even notice its existence!

Hmm, I don't agree, but I guess it depends on the performance hit.  If given a choice between the semantics of a reader that sees changes as they are made versus having to do the whole reopen thing, I'm betting most users will say "duh, I want to see my changes right away" and choose the IR that is synced w/ the IW, b/c that is what people think is the logical thing to happen and it is how DBs work, which many devs. are used to.  As an app developer, if I don't have to think about IR lifecycle management, why would I want to as long as it performs?  What this patch is offering, AFAICT, is the removal of IR lifecycle managment from the user.

In other words, my guess is that over time, as the performance proves out, it will be the default choice, not "expert".  Now, if you're telling me this is going to be significantly slower even when updates are rare, then maybe I would stick to the current lifecycle, but if there isn't much difference, I'll take the one that pushes the lifecycle complexity down into Lucene instead of in my app.

> Integrate IndexReader with IndexWriter 
> ---------------------------------------
>
>                 Key: LUCENE-1516
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1516
>             Project: Lucene - Java
>          Issue Type: Improvement
>    Affects Versions: 2.4
>            Reporter: Jason Rutherglen
>            Priority: Minor
>             Fix For: 2.9
>
>         Attachments: LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> The current problem is an IndexReader and IndexWriter cannot be open
> at the same time and perform updates as they both require a write
> lock to the index. While methods such as IW.deleteDocuments enables
> deleting from IW, methods such as IR.deleteDocument(int doc) and
> norms updating are not available from IW. This limits the
> capabilities of performing updates to the index dynamically or in
> realtime without closing the IW and opening an IR, deleting or
> updating norms, flushing, then opening the IW again, a process which
> can be detrimental to realtime updates. 
> This patch will expose an IndexWriter.getReader method that returns
> the currently flushed state of the index as a class that implements
> IndexReader. The new IR implementation will differ from existing IR
> implementations such as MultiSegmentReader in that flushing will
> synchronize updates with IW in part by sharing the write lock. All
> methods of IR will be usable including reopen and clone. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1516) Integrate IndexReader with IndexWriter

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12679140#action_12679140 ] 

Michael McCandless commented on LUCENE-1516:
--------------------------------------------

Thanks Jason, good progress...

{quote}
I'm not sure we need to write out the deletes of unused segment
readers because they are no longer used and so should not be
required.
{quote}

EG I think DocumentsWriter.applyDeletes should get the reader from
pool, do deletes, then call pool.release(reader), and that release
would write the changes to disk if we're not pooling readers.  Then we
don't need a new "flushDeletesToDir" propogated around.

On an explicit commit(), we should also sweep the pool and write
changes to disk for any SR that has pending changes.

{quote}
> You commented out the last part of commitMergedDeletes, that
actually saves the deletes. You need to instead get the reader for
the merged segment from the pool and hand it the new deletes.

I wasn't sure what you meant by this, in the patch the deletes are
copied into the merged reader. Do you mean instead the merged reader
should not be opened and instead the deletes file needs to be written
to?
{quote}

Woops -- sorry: you are doing it correctly (applying to the
mergedReader).  I missed that.

bq. SegmentMerger wasn't always returning the docMap so I stopped using it.

It should always return it, unless there were no deletes on that
segment when the merge started, in which case the docMap is null.  But
you can handle that case (just check if there are now any deletes, and
carry them over).

(Then we shouldn't need merge.segmentReadersClone, again).

{quote}
The solution, manually incref the .del file after the pool
commits on the SRs.
{quote}

That's spooky -- why exactly is it needed?  We should only do an extra
incRef if we can explain exactly which decRef it will correspond to.
Shouldn't IW.checkpoint still work properly for the .del files?

Also, SegmentInfo.files should be cleared whenever delGen is advanced
-- can you give more details on what path doesn't properly clear the
files?

The fact that an SR can carry deletes in memory without writing a new
.del file should not impact IFD.  Whenever we do advance delGen and
write new .del files, then we must call checkpoint().

So I think we need to really explain the root cause here...

Others:

  * You overrode decRef in DirectoryIndexReader, to not write changes
    whenever a writer is present, but I think that a better approach
    is to leave decRef as it is and then from writer, clear hasChanges
    when you want to discard them (because they were merged away).

  * You inserted applyDeletes at the top of commitMergedDeletes -- why
    was that needed?

  * There's alot of noise in the patch -- whitespace changes,
    commented out debug code, etc.  Can you remove some of it?  It
    makes it harder to separate signal from noise...


> Integrate IndexReader with IndexWriter 
> ---------------------------------------
>
>                 Key: LUCENE-1516
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1516
>             Project: Lucene - Java
>          Issue Type: Improvement
>    Affects Versions: 2.4
>            Reporter: Jason Rutherglen
>            Assignee: Michael McCandless
>            Priority: Minor
>             Fix For: 2.9
>
>         Attachments: LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> The current problem is an IndexReader and IndexWriter cannot be open
> at the same time and perform updates as they both require a write
> lock to the index. While methods such as IW.deleteDocuments enables
> deleting from IW, methods such as IR.deleteDocument(int doc) and
> norms updating are not available from IW. This limits the
> capabilities of performing updates to the index dynamically or in
> realtime without closing the IW and opening an IR, deleting or
> updating norms, flushing, then opening the IW again, a process which
> can be detrimental to realtime updates. 
> This patch will expose an IndexWriter.getReader method that returns
> the currently flushed state of the index as a class that implements
> IndexReader. The new IR implementation will differ from existing IR
> implementations such as MultiSegmentReader in that flushing will
> synchronize updates with IW in part by sharing the write lock. All
> methods of IR will be usable including reopen and clone. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Updated: (LUCENE-1516) Integrate IndexReader with IndexWriter

Posted by "Jason Rutherglen (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-1516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jason Rutherglen updated LUCENE-1516:
-------------------------------------

    Attachment: LUCENE-1516.patch

This is an extremely early patch and has a bunch of System.outs for debugging

- TestIndexWriterReader.testMerge tries to use IW.addIndexes and fails
because the lookup of the SegmentReader from the IW internal IR
returns the incorrect reader (I spent a lot of time on trying to
figure out why and could not). 
- TestIndexWriterReader.testReader passes 


> Integrate IndexReader with IndexWriter 
> ---------------------------------------
>
>                 Key: LUCENE-1516
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1516
>             Project: Lucene - Java
>          Issue Type: Improvement
>    Affects Versions: 2.4
>            Reporter: Jason Rutherglen
>            Priority: Minor
>             Fix For: 2.9
>
>         Attachments: LUCENE-1516.patch, LUCENE-1516.patch
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> The current problem is an IndexReader and IndexWriter cannot be open
> at the same time and perform updates as they both require a write
> lock to the index. While methods such as IW.deleteDocuments enables
> deleting from IW, methods such as IR.deleteDocument(int doc) and
> norms updating are not available from IW. This limits the
> capabilities of performing updates to the index dynamically or in
> realtime without closing the IW and opening an IR, deleting or
> updating norms, flushing, then opening the IW again, a process which
> can be detrimental to realtime updates. 
> This patch will expose an IndexWriter.getReader method that returns
> the currently flushed state of the index as a class that implements
> IndexReader. The new IR implementation will differ from existing IR
> implementations such as MultiSegmentReader in that flushing will
> synchronize updates with IW in part by sharing the write lock. All
> methods of IR will be usable including reopen and clone. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1516) Integrate IndexReader with IndexWriter

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12677533#action_12677533 ] 

Michael McCandless commented on LUCENE-1516:
--------------------------------------------

bq. Does this mean that the point of interaction between index writes and reads will still be the Directory? (i.e. the disk)

For added docs, yes.  But deletions will carry straight through in memory.

> Integrate IndexReader with IndexWriter 
> ---------------------------------------
>
>                 Key: LUCENE-1516
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1516
>             Project: Lucene - Java
>          Issue Type: Improvement
>    Affects Versions: 2.4
>            Reporter: Jason Rutherglen
>            Assignee: Michael McCandless
>            Priority: Minor
>             Fix For: 2.9
>
>         Attachments: LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> The current problem is an IndexReader and IndexWriter cannot be open
> at the same time and perform updates as they both require a write
> lock to the index. While methods such as IW.deleteDocuments enables
> deleting from IW, methods such as IR.deleteDocument(int doc) and
> norms updating are not available from IW. This limits the
> capabilities of performing updates to the index dynamically or in
> realtime without closing the IW and opening an IR, deleting or
> updating norms, flushing, then opening the IW again, a process which
> can be detrimental to realtime updates. 
> This patch will expose an IndexWriter.getReader method that returns
> the currently flushed state of the index as a class that implements
> IndexReader. The new IR implementation will differ from existing IR
> implementations such as MultiSegmentReader in that flushing will
> synchronize updates with IW in part by sharing the write lock. All
> methods of IR will be usable including reopen and clone. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1516) Integrate IndexReader with IndexWriter

Posted by "Jason Rutherglen (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12675982#action_12675982 ] 

Jason Rutherglen commented on LUCENE-1516:
------------------------------------------

In SegmentInfo it's hashCode only takes into
account the name and directory, why not the delete generation?

> Integrate IndexReader with IndexWriter 
> ---------------------------------------
>
>                 Key: LUCENE-1516
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1516
>             Project: Lucene - Java
>          Issue Type: Improvement
>    Affects Versions: 2.4
>            Reporter: Jason Rutherglen
>            Assignee: Michael McCandless
>            Priority: Minor
>             Fix For: 2.9
>
>         Attachments: LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> The current problem is an IndexReader and IndexWriter cannot be open
> at the same time and perform updates as they both require a write
> lock to the index. While methods such as IW.deleteDocuments enables
> deleting from IW, methods such as IR.deleteDocument(int doc) and
> norms updating are not available from IW. This limits the
> capabilities of performing updates to the index dynamically or in
> realtime without closing the IW and opening an IR, deleting or
> updating norms, flushing, then opening the IW again, a process which
> can be detrimental to realtime updates. 
> This patch will expose an IndexWriter.getReader method that returns
> the currently flushed state of the index as a class that implements
> IndexReader. The new IR implementation will differ from existing IR
> implementations such as MultiSegmentReader in that flushing will
> synchronize updates with IW in part by sharing the write lock. All
> methods of IR will be usable including reopen and clone. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Updated: (LUCENE-1516) Integrate IndexReader with IndexWriter

Posted by "Jason Rutherglen (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-1516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jason Rutherglen updated LUCENE-1516:
-------------------------------------

    Attachment: LUCENE-1516.patch

- TestAddIndexesNoOptimize test passes
- There's an incRef issue still

> Integrate IndexReader with IndexWriter 
> ---------------------------------------
>
>                 Key: LUCENE-1516
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1516
>             Project: Lucene - Java
>          Issue Type: Improvement
>    Affects Versions: 2.4
>            Reporter: Jason Rutherglen
>            Assignee: Michael McCandless
>            Priority: Minor
>             Fix For: 2.9
>
>         Attachments: LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> The current problem is an IndexReader and IndexWriter cannot be open
> at the same time and perform updates as they both require a write
> lock to the index. While methods such as IW.deleteDocuments enables
> deleting from IW, methods such as IR.deleteDocument(int doc) and
> norms updating are not available from IW. This limits the
> capabilities of performing updates to the index dynamically or in
> realtime without closing the IW and opening an IR, deleting or
> updating norms, flushing, then opening the IW again, a process which
> can be detrimental to realtime updates. 
> This patch will expose an IndexWriter.getReader method that returns
> the currently flushed state of the index as a class that implements
> IndexReader. The new IR implementation will differ from existing IR
> implementations such as MultiSegmentReader in that flushing will
> synchronize updates with IW in part by sharing the write lock. All
> methods of IR will be usable including reopen and clone. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1516) Integrate IndexReader with IndexWriter

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12675397#action_12675397 ] 

Michael McCandless commented on LUCENE-1516:
--------------------------------------------

Or, here's an idea: can we do both?  Put IndexAccessor as an optional
"convenience" layer that simplifies the ctors and expert methods of IW
& IR, but leave public direct access to the ctros & expert methods?
This way on upgrade nobody is forced to migrate to an entirely new yet
simply pass-through API?

Or another idea is to decouple these two discussions: go ahead and add
the single expert method to IW, but as a separate discussion/JIRA work
out how we can simplify overall access/management of IR/IW instances?


> Integrate IndexReader with IndexWriter 
> ---------------------------------------
>
>                 Key: LUCENE-1516
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1516
>             Project: Lucene - Java
>          Issue Type: Improvement
>    Affects Versions: 2.4
>            Reporter: Jason Rutherglen
>            Priority: Minor
>             Fix For: 2.9
>
>         Attachments: LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> The current problem is an IndexReader and IndexWriter cannot be open
> at the same time and perform updates as they both require a write
> lock to the index. While methods such as IW.deleteDocuments enables
> deleting from IW, methods such as IR.deleteDocument(int doc) and
> norms updating are not available from IW. This limits the
> capabilities of performing updates to the index dynamically or in
> realtime without closing the IW and opening an IR, deleting or
> updating norms, flushing, then opening the IW again, a process which
> can be detrimental to realtime updates. 
> This patch will expose an IndexWriter.getReader method that returns
> the currently flushed state of the index as a class that implements
> IndexReader. The new IR implementation will differ from existing IR
> implementations such as MultiSegmentReader in that flushing will
> synchronize updates with IW in part by sharing the write lock. All
> methods of IR will be usable including reopen and clone. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1516) Integrate IndexReader with IndexWriter

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12675717#action_12675717 ] 

Michael McCandless commented on LUCENE-1516:
--------------------------------------------

I think your latest test is missing the the new unit test source.

> Integrate IndexReader with IndexWriter 
> ---------------------------------------
>
>                 Key: LUCENE-1516
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1516
>             Project: Lucene - Java
>          Issue Type: Improvement
>    Affects Versions: 2.4
>            Reporter: Jason Rutherglen
>            Assignee: Michael McCandless
>            Priority: Minor
>             Fix For: 2.9
>
>         Attachments: LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> The current problem is an IndexReader and IndexWriter cannot be open
> at the same time and perform updates as they both require a write
> lock to the index. While methods such as IW.deleteDocuments enables
> deleting from IW, methods such as IR.deleteDocument(int doc) and
> norms updating are not available from IW. This limits the
> capabilities of performing updates to the index dynamically or in
> realtime without closing the IW and opening an IR, deleting or
> updating norms, flushing, then opening the IW again, a process which
> can be detrimental to realtime updates. 
> This patch will expose an IndexWriter.getReader method that returns
> the currently flushed state of the index as a class that implements
> IndexReader. The new IR implementation will differ from existing IR
> implementations such as MultiSegmentReader in that flushing will
> synchronize updates with IW in part by sharing the write lock. All
> methods of IR will be usable including reopen and clone. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1516) Integrate IndexReader with IndexWriter

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12675490#action_12675490 ] 

Michael McCandless commented on LUCENE-1516:
--------------------------------------------

bq. It's hard keeping up with the current proposal in big issues/threads, but I don't think anyone is proposing a reader that automatically sees changes... i.e. the view of an IndexReader instance will still be fixed.

That's right.  The current proposal is to add one method to IW:

{code}
IndexReader getReader()
{code}

that returns a point-in-time view of the index plus all changes
buffered in IW up until that point.  Then you can reopen that reader
(or call getReader() again, which does the same thing) to quickly get
a new point-in-time reader.

I think the point-in-time semantics is important to keep.

Also, you can't easily emulate point-in-time if we implemented the
"live" approach, but you can easily do vice/versa (assuming we can
keep reopen() time fast enough).

EG the IndexAccessor convenience layer could do automatic reopening so
that when you ask it for the reader it always reopens it; this would
emulate "live updates" and hide the lifecycle management.



> Integrate IndexReader with IndexWriter 
> ---------------------------------------
>
>                 Key: LUCENE-1516
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1516
>             Project: Lucene - Java
>          Issue Type: Improvement
>    Affects Versions: 2.4
>            Reporter: Jason Rutherglen
>            Assignee: Michael McCandless
>            Priority: Minor
>             Fix For: 2.9
>
>         Attachments: LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> The current problem is an IndexReader and IndexWriter cannot be open
> at the same time and perform updates as they both require a write
> lock to the index. While methods such as IW.deleteDocuments enables
> deleting from IW, methods such as IR.deleteDocument(int doc) and
> norms updating are not available from IW. This limits the
> capabilities of performing updates to the index dynamically or in
> realtime without closing the IW and opening an IR, deleting or
> updating norms, flushing, then opening the IW again, a process which
> can be detrimental to realtime updates. 
> This patch will expose an IndexWriter.getReader method that returns
> the currently flushed state of the index as a class that implements
> IndexReader. The new IR implementation will differ from existing IR
> implementations such as MultiSegmentReader in that flushing will
> synchronize updates with IW in part by sharing the write lock. All
> methods of IR will be usable including reopen and clone. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1516) Integrate IndexReader with IndexWriter

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12679758#action_12679758 ] 

Michael McCandless commented on LUCENE-1516:
--------------------------------------------


bq. Where should I get the numDeletedDocs from?

SegmentMerger.getDelCounts()

bq. Should the entire docIdMap be scanned?

Yes, when the before/after delCount is different.

bq. Is there an expense in cloning the segment readers besides the extra bitvectors?

Copy-on-write of the bitvectors (time and space) is the biggest cost I
think.  Since docMap already has everything we need, I think we should
use it... we could even separate out this change and do it first, if
you want.

{quote}
I think you're saying put this patch's decRef logic in
the SRP.release method?
{quote}

Actually the release method should always commit changes, if there are
any, when an SR is removed SR from the pool.

Then, something higher up (merging, once successful) should clear
changes when it's safe, so that release doesn't save anything.

The "decRef that never commits whenever writer is present" in
DirectoryIndexReader is too low level, I think.


> Integrate IndexReader with IndexWriter 
> ---------------------------------------
>
>                 Key: LUCENE-1516
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1516
>             Project: Lucene - Java
>          Issue Type: Improvement
>    Affects Versions: 2.4
>            Reporter: Jason Rutherglen
>            Assignee: Michael McCandless
>            Priority: Minor
>             Fix For: 2.9
>
>         Attachments: LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> The current problem is an IndexReader and IndexWriter cannot be open
> at the same time and perform updates as they both require a write
> lock to the index. While methods such as IW.deleteDocuments enables
> deleting from IW, methods such as IR.deleteDocument(int doc) and
> norms updating are not available from IW. This limits the
> capabilities of performing updates to the index dynamically or in
> realtime without closing the IW and opening an IR, deleting or
> updating norms, flushing, then opening the IW again, a process which
> can be detrimental to realtime updates. 
> This patch will expose an IndexWriter.getReader method that returns
> the currently flushed state of the index as a class that implements
> IndexReader. The new IR implementation will differ from existing IR
> implementations such as MultiSegmentReader in that flushing will
> synchronize updates with IW in part by sharing the write lock. All
> methods of IR will be usable including reopen and clone. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1516) Integrate IndexReader with IndexWriter

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12682690#action_12682690 ] 

Michael McCandless commented on LUCENE-1516:
--------------------------------------------

Jason can I have the conch here?  (Ie take your patch and make some changes / iterate).  Are all of your changes included in the last patch?

> Integrate IndexReader with IndexWriter 
> ---------------------------------------
>
>                 Key: LUCENE-1516
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1516
>             Project: Lucene - Java
>          Issue Type: Improvement
>    Affects Versions: 2.4
>            Reporter: Jason Rutherglen
>            Assignee: Michael McCandless
>            Priority: Minor
>             Fix For: 2.9
>
>         Attachments: LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> The current problem is an IndexReader and IndexWriter cannot be open
> at the same time and perform updates as they both require a write
> lock to the index. While methods such as IW.deleteDocuments enables
> deleting from IW, methods such as IR.deleteDocument(int doc) and
> norms updating are not available from IW. This limits the
> capabilities of performing updates to the index dynamically or in
> realtime without closing the IW and opening an IR, deleting or
> updating norms, flushing, then opening the IW again, a process which
> can be detrimental to realtime updates. 
> This patch will expose an IndexWriter.getReader method that returns
> the currently flushed state of the index as a class that implements
> IndexReader. The new IR implementation will differ from existing IR
> implementations such as MultiSegmentReader in that flushing will
> synchronize updates with IW in part by sharing the write lock. All
> methods of IR will be usable including reopen and clone. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1516) Integrate IndexReader with IndexWriter

Posted by "Jason Rutherglen (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12675979#action_12675979 ] 

Jason Rutherglen commented on LUCENE-1516:
------------------------------------------

> make a new SegmentReaderPool private class

I'd prefer the SegmentReaderPool model over the patch's existing one
as it is simpler and closer to how the underlying system actually
works meaning it works directly with the segments in a systematized way.

> We would never mergeSegmentInfos, never ask the "internal reader"
to commit

Good, merging the segmentInfos is confusing and tricky to debug

> Since IndexReader will always be readonly, you can simplify the new
DirectoryIndexReader.open method

+1

> why we need merge.segmentReadersClone?

I was modeling after the segmentInfosClone. If it's not necessary
I'll happily remove it.

> I think the only reason for reader to hold a reference to writer is
so that on reopen, the reader realizes it was created from
IW.getReader and simply forwards the request to IW

+1

> Wouldn't delete-by-Query cover this? Ie one could always make a
Filter implementing the "look @ field cache, do some logic, provide
docIDs to delete", wrap as Query, then delete-by-Query? (from a
previous coment) 

Yes that will work. 

How are these sample SegmentReaderPool method signatures? 

{code}
private class SegmentReaderPool {
  public SegmentReader get(SegmentInfo si) {}
    
  public void decRef(SegmentReader sr) {}
}
{code}


> Integrate IndexReader with IndexWriter 
> ---------------------------------------
>
>                 Key: LUCENE-1516
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1516
>             Project: Lucene - Java
>          Issue Type: Improvement
>    Affects Versions: 2.4
>            Reporter: Jason Rutherglen
>            Assignee: Michael McCandless
>            Priority: Minor
>             Fix For: 2.9
>
>         Attachments: LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> The current problem is an IndexReader and IndexWriter cannot be open
> at the same time and perform updates as they both require a write
> lock to the index. While methods such as IW.deleteDocuments enables
> deleting from IW, methods such as IR.deleteDocument(int doc) and
> norms updating are not available from IW. This limits the
> capabilities of performing updates to the index dynamically or in
> realtime without closing the IW and opening an IR, deleting or
> updating norms, flushing, then opening the IW again, a process which
> can be detrimental to realtime updates. 
> This patch will expose an IndexWriter.getReader method that returns
> the currently flushed state of the index as a class that implements
> IndexReader. The new IR implementation will differ from existing IR
> implementations such as MultiSegmentReader in that flushing will
> synchronize updates with IW in part by sharing the write lock. All
> methods of IR will be usable including reopen and clone. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1516) Integrate IndexReader with IndexWriter

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12675177#action_12675177 ] 

Michael McCandless commented on LUCENE-1516:
--------------------------------------------

{quote}
> since you can't predict when IW materializes deletes, your reader
> will suddenly see a bunch of deletes appear.

The reader would need to be reopened to see the deletes. Isn't that
expected behavior?
{quote}

Ahh right, so long as we keep internal (private) clone, materializing
the deletes won't affect the external reader.

{quote}
> Instead of having to merge readers, I think we need a single
> source to obtain an SR from 

I like this however how would IR.clone work?
{quote}

It should work fine?  The single source would only be used internally
by IW (for merging, for materializing deletes, for the internal
reader).

bq. I like having the internal reader separate from the external reader.

I think we should keep that separation.

{quote}
The main reason to
expose IR from IW is to allow delete by doc id and norms updates
(eventually column stride fields updates). I don't see how we can
grab a reader during a merge, and block realtime deletes occurring on
the external reader. However it is difficult to rectify deletes to an
external SR that's been merged away.

It seems like we're getting closer to using a unique long UID for
each doc that is carried over between merges. I was going to
implement this above LUCENE-1516 however we may want to make UIDs a
part of LUCENE-1516 to implement the behavior we're discussing.

If the updates to SR are queued, then it seems like the only way to
achieve this is a doc UID. This way merges can happen in the
background, the IR has a mechanism for mapping it's queue to the
newly merged segments when flushed. Hopefully we aren't wreaking
havoc with the IndexReader API?
{quote}

But... do we need delete by docID once we have realtime search?  I
think the last compelling reason to keep IR's delete by docID was
immediacy, but realtime search can give us that, from IW, even when
deleting by Term or Query?

(Your app can always add that long UID if it doesn't already have
something usable).

docIDs are free to changing inside IW.  I don't see how we can hand
out a reader, allow deletes by docID to it, and merge those deletes
back in at a later time, unless we track the genealogy of the
segments?

{quote}
The scenario I think we're missing is if there's multiple cloned SRs
out there. With the IW checkout an SR model how do we allow cloning?
A clone's updates will be placed into a central original SR queue?
The queue is drained automatically on a merge or IW.flush? What
happens when we want the IR deletes to be searchable without flushing
to disk? Do a reopen/clone?
{quote}

This is why I think all changes must be done through IW if you've
opened a reader from it.  In fact, with the addition of realtime
search to Lucene, if we also add updating norms/column-stride fields
to IW, can't we move away from allowing any changes via IR?  (Ie
deprecate deleteDocuments/setNorms/etc.)

{quote}
> It's not necessary for IW to write new .del files when it
> materializes deletes.

Good point, DocumentsWriter.applyDeletes shouldn't be flushing to
disk and this sounds like a test case to add to TestIndexWriterReader.
{quote}

Well, if IW has no persistent reader to hold the deletes, it must keep
doing what it does now (flush immediately to disk)?

{quote}
> IW.reopenInternalReader only does a clone not a reopen; however
> does it cover the newly flushed segment? 

The segmentinfos is obtained from the Writer. In the test case
testIndexWriterReopenSegment it looks like using clone reopens the
new segments.
{quote}

Wait, where is this test?  Maybe you need to svn add it?

And, clone should not be reopening segments...?

{quote}
> I think it's better if no deletes appear, ever, until you reopen
> your reader. Maybe we simply prevent deletion through the IR? 

Preventing deletion through the IR would seem to defeat the purpose
of the patch unless there's some alternative mechanism for deleting
by doc id?
{quote}

See above.

{quote}
> commitMergedDeletes to decouple computing the new BitVector from
> writing the .del file to disk.

A hidden method I never noticed. I'll keep it in mind.
{quote}

It's actually very important.  This is how IW allows deletes to
materialize to docIDs, while a merge is running -- any newly
materialized deletes against the just-merged segments are coalesced
and carried over to the newly created segment.  Any further deletes
must be done against the docIDs in the new segment (which is why I
don't see how we can allow deletes by docID to happen against a
checked out reader).

{quote}
> It seems like reader.reopen() (where reader was obtained with
> IW.getReader()) doesn't do the right thing? (ie it's looking for the
> most recent segments_N in the Directory, but it should be looking for
> it @ IW.segmentInfos).

Using the reopen method implementation for a Reader with IW does not
seem necessary. It seems like it could call clone underneath?
{quote}

Well, clone should be very different from reopen.  It seems like
calling reader.reopen() (on reader obtained from writer) should
basically do the same thing as calling writer.getReader().  Ie they
are nearly synonyms?  (Except for small difference in ref counting --
I think writer.getReader() should always incRef, but reopen only
incRefs if it returns a new reader).


> Integrate IndexReader with IndexWriter 
> ---------------------------------------
>
>                 Key: LUCENE-1516
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1516
>             Project: Lucene - Java
>          Issue Type: Improvement
>    Affects Versions: 2.4
>            Reporter: Jason Rutherglen
>            Priority: Minor
>             Fix For: 2.9
>
>         Attachments: LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> The current problem is an IndexReader and IndexWriter cannot be open
> at the same time and perform updates as they both require a write
> lock to the index. While methods such as IW.deleteDocuments enables
> deleting from IW, methods such as IR.deleteDocument(int doc) and
> norms updating are not available from IW. This limits the
> capabilities of performing updates to the index dynamically or in
> realtime without closing the IW and opening an IR, deleting or
> updating norms, flushing, then opening the IW again, a process which
> can be detrimental to realtime updates. 
> This patch will expose an IndexWriter.getReader method that returns
> the currently flushed state of the index as a class that implements
> IndexReader. The new IR implementation will differ from existing IR
> implementations such as MultiSegmentReader in that flushing will
> synchronize updates with IW in part by sharing the write lock. All
> methods of IR will be usable including reopen and clone. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1516) Integrate IndexReader with IndexWriter

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12675863#action_12675863 ] 

Michael McCandless commented on LUCENE-1516:
--------------------------------------------


I think we can simplify the approach here:

  * I don't think IW should track an "internal reader" at all.
    Instead, it tracks a pool of held-open and re-used SegmentReader
    instances.  Maybe make a new SegmentReaderPool private class.  It
    exposes get() to load a SegmentReader, returning the pooled one if
    present, else falling back to the normal SegmentReader.get.

  * Then, the many places where we currently pull a segment reader (to
    apply deletes, to do a merge, to carry over new deletes after
    merge finishes) as well as the new getReader() method, should use
    this pool to get the reader.

  * We would never mergeSegmentInfos, never ask the "internal reader"
    to commit, etc.

Other things:

  * Since IndexReader will always be readonly, you can simplify the
    new DirectoryIndexReader.open method, eg we don't need to copy
    over the writeLock nor the DeletionPolicy, and we don't need the
    readOnly param, and closeDirectory is always false.  In fact we
    could simply create a ReadOnly{Multi,}SegmentReader directly.

  * I think IndexReader.clone() should not grab the writer's
    segmentInfos?  Ie it should truly be a clone(), not a reopen.
    Actually, when the reader is created we should make a private full
    copy of the SegmentInfos.

  * We lost some "private" modifiers, unecessarily; eg
    DirectoryIndexReader.writeLock.

  * I don't understand why we need merge.segmentReadersClone?  If we
    only use to detect new deletions for carrying over deletes after
    merge finishes, we should instead just grab the delCount when the
    merge kicked off?

  * I think the only reason for reader to hold a reference to writer
    is so that on reopen, the reader realizes it was created from
    IW.getReader and simply forwards the request to IW.  Otherwise
    writer should not be used in reader.

  * Once we no longer store/maintain/merge an "internal" reader,
    IW.getReader simply becomes a call to IndexReader.open, with two
    differences: 1) we pass in a SegmentInfos instance (instead of
    looking for last segments_N in dir), and 2) we pass in our own
    SegmentReaderPool that should be used to open the SR's.

  * You need to add mainDir.close() in
    testAddIndexesAndDoDeletesThreads.


> Integrate IndexReader with IndexWriter 
> ---------------------------------------
>
>                 Key: LUCENE-1516
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1516
>             Project: Lucene - Java
>          Issue Type: Improvement
>    Affects Versions: 2.4
>            Reporter: Jason Rutherglen
>            Assignee: Michael McCandless
>            Priority: Minor
>             Fix For: 2.9
>
>         Attachments: LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> The current problem is an IndexReader and IndexWriter cannot be open
> at the same time and perform updates as they both require a write
> lock to the index. While methods such as IW.deleteDocuments enables
> deleting from IW, methods such as IR.deleteDocument(int doc) and
> norms updating are not available from IW. This limits the
> capabilities of performing updates to the index dynamically or in
> realtime without closing the IW and opening an IR, deleting or
> updating norms, flushing, then opening the IW again, a process which
> can be detrimental to realtime updates. 
> This patch will expose an IndexWriter.getReader method that returns
> the currently flushed state of the index as a class that implements
> IndexReader. The new IR implementation will differ from existing IR
> implementations such as MultiSegmentReader in that flushing will
> synchronize updates with IW in part by sharing the write lock. All
> methods of IR will be usable including reopen and clone. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1516) Integrate IndexReader with IndexWriter

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12693465#action_12693465 ] 

Michael McCandless commented on LUCENE-1516:
--------------------------------------------

Disregard the search time in the above results... we have a sneaky bug
(LUCENE-1579) that is causing FieldCache to not be re-used for shared
segments in a reopened reader.  This makes the search time after
reopen far worse than it should be.


> Integrate IndexReader with IndexWriter 
> ---------------------------------------
>
>                 Key: LUCENE-1516
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1516
>             Project: Lucene - Java
>          Issue Type: Improvement
>    Affects Versions: 2.4
>            Reporter: Jason Rutherglen
>            Assignee: Michael McCandless
>            Priority: Minor
>             Fix For: 2.9
>
>         Attachments: LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, magnetic.png, ssd.png
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> The current problem is an IndexReader and IndexWriter cannot be open
> at the same time and perform updates as they both require a write
> lock to the index. While methods such as IW.deleteDocuments enables
> deleting from IW, methods such as IR.deleteDocument(int doc) and
> norms updating are not available from IW. This limits the
> capabilities of performing updates to the index dynamically or in
> realtime without closing the IW and opening an IR, deleting or
> updating norms, flushing, then opening the IW again, a process which
> can be detrimental to realtime updates. 
> This patch will expose an IndexWriter.getReader method that returns
> the currently flushed state of the index as a class that implements
> IndexReader. The new IR implementation will differ from existing IR
> implementations such as MultiSegmentReader in that flushing will
> synchronize updates with IW in part by sharing the write lock. All
> methods of IR will be usable including reopen and clone. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1516) Integrate IndexReader with IndexWriter

Posted by "Grant Ingersoll (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12675338#action_12675338 ] 

Grant Ingersoll commented on LUCENE-1516:
-----------------------------------------

Is there ever a need for the normal IR construction anymore?  Or do we always just ask for it from the IW (or wherever we choose to expose this, as I still don't think it belongs on the IW API wise, but that isn't a big deal right now) every time?  I suppose if I know I'm not going to be changing my index, I can still just get a read-only IR, right?

API wise, I think we could do something like (with obvious other variations):
{code}
IndexAccessor{

  IndexWriter getWriter(Directory);

  //returns read-only reader
  IndexReader getReader(Directory);

  //returns the external IR described above
  IndexReader.getReader(IndexWriter);
}
{code}

Then, everyone has a single point of entry for both writers and readers and all of this stuff can just be done through package private methods on the IW and it allows us to change things if we decide otherwise and it means that the IW is not coupled with the IR publicly.

> Integrate IndexReader with IndexWriter 
> ---------------------------------------
>
>                 Key: LUCENE-1516
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1516
>             Project: Lucene - Java
>          Issue Type: Improvement
>    Affects Versions: 2.4
>            Reporter: Jason Rutherglen
>            Priority: Minor
>             Fix For: 2.9
>
>         Attachments: LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> The current problem is an IndexReader and IndexWriter cannot be open
> at the same time and perform updates as they both require a write
> lock to the index. While methods such as IW.deleteDocuments enables
> deleting from IW, methods such as IR.deleteDocument(int doc) and
> norms updating are not available from IW. This limits the
> capabilities of performing updates to the index dynamically or in
> realtime without closing the IW and opening an IR, deleting or
> updating norms, flushing, then opening the IW again, a process which
> can be detrimental to realtime updates. 
> This patch will expose an IndexWriter.getReader method that returns
> the currently flushed state of the index as a class that implements
> IndexReader. The new IR implementation will differ from existing IR
> implementations such as MultiSegmentReader in that flushing will
> synchronize updates with IW in part by sharing the write lock. All
> methods of IR will be usable including reopen and clone. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1516) Integrate IndexReader with IndexWriter

Posted by "Jason Rutherglen (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12678528#action_12678528 ] 

Jason Rutherglen commented on LUCENE-1516:
------------------------------------------

Because the patch does not flush deletes to disk like the existing
code does, the SegmentInfo delGen etc isn't updated at the same
points as expected. In tracing the code the .del files that have 0
refs then are decRefed throwing the assertion error were never
incRefed by IndexFileDeleter. This is because the deletes are carried
over in the IW internal readers. Eventually the .del files are
written but aren't increfed as they usually would be. Is there a good
solution to this? Do we need an interim SegmentInfo variable that
indicates future flushed deletions? 

> Integrate IndexReader with IndexWriter 
> ---------------------------------------
>
>                 Key: LUCENE-1516
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1516
>             Project: Lucene - Java
>          Issue Type: Improvement
>    Affects Versions: 2.4
>            Reporter: Jason Rutherglen
>            Assignee: Michael McCandless
>            Priority: Minor
>             Fix For: 2.9
>
>         Attachments: LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> The current problem is an IndexReader and IndexWriter cannot be open
> at the same time and perform updates as they both require a write
> lock to the index. While methods such as IW.deleteDocuments enables
> deleting from IW, methods such as IR.deleteDocument(int doc) and
> norms updating are not available from IW. This limits the
> capabilities of performing updates to the index dynamically or in
> realtime without closing the IW and opening an IR, deleting or
> updating norms, flushing, then opening the IW again, a process which
> can be detrimental to realtime updates. 
> This patch will expose an IndexWriter.getReader method that returns
> the currently flushed state of the index as a class that implements
> IndexReader. The new IR implementation will differ from existing IR
> implementations such as MultiSegmentReader in that flushing will
> synchronize updates with IW in part by sharing the write lock. All
> methods of IR will be usable including reopen and clone. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1516) Integrate IndexReader with IndexWriter

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12678884#action_12678884 ] 

Michael McCandless commented on LUCENE-1516:
--------------------------------------------


bq. I added a flushDeletesToDir flag that defaults to true except for IW.getReader.

I think that should not be necessary.  On releasing a reader, if it
has pending changes and we want to drop it from the pool, we should
move its deletes to the directory?

SegmentMerger should clear the hasChanges on the reader after its done
merging, so we don't bother saving delete files for segments that were
merged away.

bq. In TestConcurrentMergeScheduler.testNoWaitClose I'm seeing a couple of exceptions.

I think the first is the root cause (and causes the 2nd one).

Which test case hits that?  The test cases that intentionally throw
exceptions at interesting times are especially fun to debug ;)

{quote}
Because the patch does not flush deletes to disk like the existing
code does, the SegmentInfo delGen etc isn't updated at the same
points as expected.
{quote}

IFD should be fine with this -- there must be something else at play,
causing us to over-decRef.

Other notes:

  * You still have SegmentInfoKey (see above)

  * The new DirectoryIndexReader.open should just take IndexWriter
    (see above)?

  * Rather than the getSegmentsInUse "polling" approach, I think you
    can incrementally add & remove readers from the pool?  EG after a
    merge commits, drop the readers for the just-merged segments and
    add a reader for the newly merged segment (and using it to hold
    the deletes carried over in commitMergedDeletes).  I think for
    just-flushed segments you can wait for a getReader() call to
    happen again, to init their readers, unless something else does
    first (flushing deletes, merging)?

  * You commented out the last part of commitMergedDeletes, that
    actually saves the deletes.  You need to instead get the reader
    for the merged segment from the pool and hand it the new deletes.

  * We lost the cutover to using the already computed docMap in
    commitMergedDeletes?  We should put that back -- it saves having
    to read in the prior deletions from disk.


> Integrate IndexReader with IndexWriter 
> ---------------------------------------
>
>                 Key: LUCENE-1516
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1516
>             Project: Lucene - Java
>          Issue Type: Improvement
>    Affects Versions: 2.4
>            Reporter: Jason Rutherglen
>            Assignee: Michael McCandless
>            Priority: Minor
>             Fix For: 2.9
>
>         Attachments: LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> The current problem is an IndexReader and IndexWriter cannot be open
> at the same time and perform updates as they both require a write
> lock to the index. While methods such as IW.deleteDocuments enables
> deleting from IW, methods such as IR.deleteDocument(int doc) and
> norms updating are not available from IW. This limits the
> capabilities of performing updates to the index dynamically or in
> realtime without closing the IW and opening an IR, deleting or
> updating norms, flushing, then opening the IW again, a process which
> can be detrimental to realtime updates. 
> This patch will expose an IndexWriter.getReader method that returns
> the currently flushed state of the index as a class that implements
> IndexReader. The new IR implementation will differ from existing IR
> implementations such as MultiSegmentReader in that flushing will
> synchronize updates with IW in part by sharing the write lock. All
> methods of IR will be usable including reopen and clone. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Updated: (LUCENE-1516) Integrate IndexReader with IndexWriter

Posted by "Jason Rutherglen (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-1516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jason Rutherglen updated LUCENE-1516:
-------------------------------------

    Attachment: LUCENE-1516.patch

Ah yes, patch from the old directory that need deleting.  Here's the correct one.  Sorry about that.

> Integrate IndexReader with IndexWriter 
> ---------------------------------------
>
>                 Key: LUCENE-1516
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1516
>             Project: Lucene - Java
>          Issue Type: Improvement
>    Affects Versions: 2.4
>            Reporter: Jason Rutherglen
>            Priority: Minor
>             Fix For: 2.9
>
>         Attachments: LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> The current problem is an IndexReader and IndexWriter cannot be open
> at the same time and perform updates as they both require a write
> lock to the index. While methods such as IW.deleteDocuments enables
> deleting from IW, methods such as IR.deleteDocument(int doc) and
> norms updating are not available from IW. This limits the
> capabilities of performing updates to the index dynamically or in
> realtime without closing the IW and opening an IR, deleting or
> updating norms, flushing, then opening the IW again, a process which
> can be detrimental to realtime updates. 
> This patch will expose an IndexWriter.getReader method that returns
> the currently flushed state of the index as a class that implements
> IndexReader. The new IR implementation will differ from existing IR
> implementations such as MultiSegmentReader in that flushing will
> synchronize updates with IW in part by sharing the write lock. All
> methods of IR will be usable including reopen and clone. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Updated: (LUCENE-1516) Integrate IndexReader with IndexWriter

Posted by "Jason Rutherglen (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-1516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jason Rutherglen updated LUCENE-1516:
-------------------------------------

    Attachment: LUCENE-1516.patch

There's concurrency issues to work out.

- IW.getReader returns a cloned read only reader 
- Removed IW.reopenReader 
- All test methods pass except testAddIndexesAndDoDeletesThreads. testAddIndexesAndDoDeletesThreads
currently merges indexes concurrently (and fails). In the future the
method will test merging, deleting, and searching concurrently. 
- Concurrent merges fail when "ant test-core" is run 
- DocumentsWriter.applyDeletes deletes again at the SegmentReader level
- What happens when a user executes IR.clone(true) on a read only
reader? 


> Integrate IndexReader with IndexWriter 
> ---------------------------------------
>
>                 Key: LUCENE-1516
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1516
>             Project: Lucene - Java
>          Issue Type: Improvement
>    Affects Versions: 2.4
>            Reporter: Jason Rutherglen
>            Assignee: Michael McCandless
>            Priority: Minor
>             Fix For: 2.9
>
>         Attachments: LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> The current problem is an IndexReader and IndexWriter cannot be open
> at the same time and perform updates as they both require a write
> lock to the index. While methods such as IW.deleteDocuments enables
> deleting from IW, methods such as IR.deleteDocument(int doc) and
> norms updating are not available from IW. This limits the
> capabilities of performing updates to the index dynamically or in
> realtime without closing the IW and opening an IR, deleting or
> updating norms, flushing, then opening the IW again, a process which
> can be detrimental to realtime updates. 
> This patch will expose an IndexWriter.getReader method that returns
> the currently flushed state of the index as a class that implements
> IndexReader. The new IR implementation will differ from existing IR
> implementations such as MultiSegmentReader in that flushing will
> synchronize updates with IW in part by sharing the write lock. All
> methods of IR will be usable including reopen and clone. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1516) Integrate IndexReader with IndexWriter

Posted by "Yonik Seeley (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12675462#action_12675462 ] 

Yonik Seeley commented on LUCENE-1516:
--------------------------------------

bq. a reader that sees changes as they are made versus having to do the whole reopen thing

It's hard keeping up with the current proposal in big issues/threads, but I don't think anyone is proposing a reader that automatically sees changes... i.e. the view of an IndexReader instance will still be fixed.

> Integrate IndexReader with IndexWriter 
> ---------------------------------------
>
>                 Key: LUCENE-1516
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1516
>             Project: Lucene - Java
>          Issue Type: Improvement
>    Affects Versions: 2.4
>            Reporter: Jason Rutherglen
>            Priority: Minor
>             Fix For: 2.9
>
>         Attachments: LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> The current problem is an IndexReader and IndexWriter cannot be open
> at the same time and perform updates as they both require a write
> lock to the index. While methods such as IW.deleteDocuments enables
> deleting from IW, methods such as IR.deleteDocument(int doc) and
> norms updating are not available from IW. This limits the
> capabilities of performing updates to the index dynamically or in
> realtime without closing the IW and opening an IR, deleting or
> updating norms, flushing, then opening the IW again, a process which
> can be detrimental to realtime updates. 
> This patch will expose an IndexWriter.getReader method that returns
> the currently flushed state of the index as a class that implements
> IndexReader. The new IR implementation will differ from existing IR
> implementations such as MultiSegmentReader in that flushing will
> synchronize updates with IW in part by sharing the write lock. All
> methods of IR will be usable including reopen and clone. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Updated: (LUCENE-1516) Integrate IndexReader with IndexWriter

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-1516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Michael McCandless updated LUCENE-1516:
---------------------------------------

    Attachment: LUCENE-1516.patch

New patch attached.  All tests pass.  Back-compat tests pass with no
changes.

I made a number of changes, added asserts, cleaned things up, added
javadocs, CHANGES entry, etc.

I try to consistently refer to this as "near real-time search".
Calling it realtime is overstating it, especially when you compare it
to what "realtime operating systems" do.

I also added IndexWriter.setMergedSegmentWarmer(..) and abstract base
class so the caller can have a merged segment warmed before it's
committed to the index.  This is important for reducing turnaround
time after merges are completed; eg it means you can go ahead adding
new docs/deletes, and opening new readers, while the merged segment is
being warmed.

Net/net I think the approach here is sound, and it's a good step
forward.  I think we can get this in for 2.9.

But, I added a few nocommits that we still need to figure out.

Otherwise this is close to committable, though I'd like to let it age
for a while so people can look/test.

We also still need to pursue in-memory transactional data structures
for representing deletes/norms (LUCENE-1526 is already open for
this, and it should not block this issue).

Next up I plan to run some basic tests to see what the "typical" delay
is for opening a near-realtime reader...


> Integrate IndexReader with IndexWriter 
> ---------------------------------------
>
>                 Key: LUCENE-1516
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1516
>             Project: Lucene - Java
>          Issue Type: Improvement
>    Affects Versions: 2.4
>            Reporter: Jason Rutherglen
>            Assignee: Michael McCandless
>            Priority: Minor
>             Fix For: 2.9
>
>         Attachments: LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> The current problem is an IndexReader and IndexWriter cannot be open
> at the same time and perform updates as they both require a write
> lock to the index. While methods such as IW.deleteDocuments enables
> deleting from IW, methods such as IR.deleteDocument(int doc) and
> norms updating are not available from IW. This limits the
> capabilities of performing updates to the index dynamically or in
> realtime without closing the IW and opening an IR, deleting or
> updating norms, flushing, then opening the IW again, a process which
> can be detrimental to realtime updates. 
> This patch will expose an IndexWriter.getReader method that returns
> the currently flushed state of the index as a class that implements
> IndexReader. The new IR implementation will differ from existing IR
> implementations such as MultiSegmentReader in that flushing will
> synchronize updates with IW in part by sharing the write lock. All
> methods of IR will be usable including reopen and clone. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1516) Integrate IndexReader with IndexWriter

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12676053#action_12676053 ] 

Michael McCandless commented on LUCENE-1516:
--------------------------------------------

bq. How are these sample SegmentReaderPool method signatures? 

Looks good for starters.  decRef would simply pass-through to the SR's
normal decRef, but would then also decRef its copy if the ref count has
dropped to 1?

{quote}
In SegmentInfo it's hashCode only takes into
account the name and directory, why not the delete generation?
{quote}

Two SegmentInfo instances are considered equal if the dir is == and
the name is .equals, so hashCode must match that (be the same value
when equals returns true).


> Integrate IndexReader with IndexWriter 
> ---------------------------------------
>
>                 Key: LUCENE-1516
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1516
>             Project: Lucene - Java
>          Issue Type: Improvement
>    Affects Versions: 2.4
>            Reporter: Jason Rutherglen
>            Assignee: Michael McCandless
>            Priority: Minor
>             Fix For: 2.9
>
>         Attachments: LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> The current problem is an IndexReader and IndexWriter cannot be open
> at the same time and perform updates as they both require a write
> lock to the index. While methods such as IW.deleteDocuments enables
> deleting from IW, methods such as IR.deleteDocument(int doc) and
> norms updating are not available from IW. This limits the
> capabilities of performing updates to the index dynamically or in
> realtime without closing the IW and opening an IR, deleting or
> updating norms, flushing, then opening the IW again, a process which
> can be detrimental to realtime updates. 
> This patch will expose an IndexWriter.getReader method that returns
> the currently flushed state of the index as a class that implements
> IndexReader. The new IR implementation will differ from existing IR
> implementations such as MultiSegmentReader in that flushing will
> synchronize updates with IW in part by sharing the write lock. All
> methods of IR will be usable including reopen and clone. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1516) Integrate IndexReader with IndexWriter

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12702461#action_12702461 ] 

Michael McCandless commented on LUCENE-1516:
--------------------------------------------

bq. We need an IndexWriter.getMergedSegmentWarmer method?

Yes, I just committed; thanks!

> Integrate IndexReader with IndexWriter 
> ---------------------------------------
>
>                 Key: LUCENE-1516
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1516
>             Project: Lucene - Java
>          Issue Type: Improvement
>    Affects Versions: 2.4
>            Reporter: Jason Rutherglen
>            Assignee: Michael McCandless
>            Priority: Minor
>             Fix For: 2.9
>
>         Attachments: LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, magnetic.png, ssd.png, ssd2.png
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> The current problem is an IndexReader and IndexWriter cannot be open
> at the same time and perform updates as they both require a write
> lock to the index. While methods such as IW.deleteDocuments enables
> deleting from IW, methods such as IR.deleteDocument(int doc) and
> norms updating are not available from IW. This limits the
> capabilities of performing updates to the index dynamically or in
> realtime without closing the IW and opening an IR, deleting or
> updating norms, flushing, then opening the IW again, a process which
> can be detrimental to realtime updates. 
> This patch will expose an IndexWriter.getReader method that returns
> the currently flushed state of the index as a class that implements
> IndexReader. The new IR implementation will differ from existing IR
> implementations such as MultiSegmentReader in that flushing will
> synchronize updates with IW in part by sharing the write lock. All
> methods of IR will be usable including reopen and clone. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Updated: (LUCENE-1516) Integrate IndexReader with IndexWriter

Posted by "Jason Rutherglen (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-1516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jason Rutherglen updated LUCENE-1516:
-------------------------------------

    Attachment: LUCENE-1516.patch

A little over zealous in altering the flushDeletesToDisk flag in IW.flush

> Integrate IndexReader with IndexWriter 
> ---------------------------------------
>
>                 Key: LUCENE-1516
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1516
>             Project: Lucene - Java
>          Issue Type: Improvement
>    Affects Versions: 2.4
>            Reporter: Jason Rutherglen
>            Assignee: Michael McCandless
>            Priority: Minor
>             Fix For: 2.9
>
>         Attachments: LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> The current problem is an IndexReader and IndexWriter cannot be open
> at the same time and perform updates as they both require a write
> lock to the index. While methods such as IW.deleteDocuments enables
> deleting from IW, methods such as IR.deleteDocument(int doc) and
> norms updating are not available from IW. This limits the
> capabilities of performing updates to the index dynamically or in
> realtime without closing the IW and opening an IR, deleting or
> updating norms, flushing, then opening the IW again, a process which
> can be detrimental to realtime updates. 
> This patch will expose an IndexWriter.getReader method that returns
> the currently flushed state of the index as a class that implements
> IndexReader. The new IR implementation will differ from existing IR
> implementations such as MultiSegmentReader in that flushing will
> synchronize updates with IW in part by sharing the write lock. All
> methods of IR will be usable including reopen and clone. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1516) Integrate IndexReader with IndexWriter

Posted by "Jason Rutherglen (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12679383#action_12679383 ] 

Jason Rutherglen commented on LUCENE-1516:
------------------------------------------

TestIndexReader.testDocsOutOfOrderJIRA140 fails because IW.close
isn't called before dir.close. Is this a bug in the unit test?

> Integrate IndexReader with IndexWriter 
> ---------------------------------------
>
>                 Key: LUCENE-1516
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1516
>             Project: Lucene - Java
>          Issue Type: Improvement
>    Affects Versions: 2.4
>            Reporter: Jason Rutherglen
>            Assignee: Michael McCandless
>            Priority: Minor
>             Fix For: 2.9
>
>         Attachments: LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> The current problem is an IndexReader and IndexWriter cannot be open
> at the same time and perform updates as they both require a write
> lock to the index. While methods such as IW.deleteDocuments enables
> deleting from IW, methods such as IR.deleteDocument(int doc) and
> norms updating are not available from IW. This limits the
> capabilities of performing updates to the index dynamically or in
> realtime without closing the IW and opening an IR, deleting or
> updating norms, flushing, then opening the IW again, a process which
> can be detrimental to realtime updates. 
> This patch will expose an IndexWriter.getReader method that returns
> the currently flushed state of the index as a class that implements
> IndexReader. The new IR implementation will differ from existing IR
> implementations such as MultiSegmentReader in that flushing will
> synchronize updates with IW in part by sharing the write lock. All
> methods of IR will be usable including reopen and clone. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1516) Integrate IndexReader with IndexWriter

Posted by "Jason Rutherglen (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12676815#action_12676815 ] 

Jason Rutherglen commented on LUCENE-1516:
------------------------------------------

Found the doc ID map in SegmentMerger in IW.commitMerge

> Integrate IndexReader with IndexWriter 
> ---------------------------------------
>
>                 Key: LUCENE-1516
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1516
>             Project: Lucene - Java
>          Issue Type: Improvement
>    Affects Versions: 2.4
>            Reporter: Jason Rutherglen
>            Assignee: Michael McCandless
>            Priority: Minor
>             Fix For: 2.9
>
>         Attachments: LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> The current problem is an IndexReader and IndexWriter cannot be open
> at the same time and perform updates as they both require a write
> lock to the index. While methods such as IW.deleteDocuments enables
> deleting from IW, methods such as IR.deleteDocument(int doc) and
> norms updating are not available from IW. This limits the
> capabilities of performing updates to the index dynamically or in
> realtime without closing the IW and opening an IR, deleting or
> updating norms, flushing, then opening the IW again, a process which
> can be detrimental to realtime updates. 
> This patch will expose an IndexWriter.getReader method that returns
> the currently flushed state of the index as a class that implements
> IndexReader. The new IR implementation will differ from existing IR
> implementations such as MultiSegmentReader in that flushing will
> synchronize updates with IW in part by sharing the write lock. All
> methods of IR will be usable including reopen and clone. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Updated: (LUCENE-1516) Integrate IndexReader with IndexWriter

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-1516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Michael McCandless updated LUCENE-1516:
---------------------------------------

    Attachment: ssd2.png


OK using the last patch, I ran another near real-time test, using this
alg:

{code}

analyzer=org.apache.lucene.analysis.standard.StandardAnalyzer

doc.maker=org.apache.lucene.benchmark.byTask.feeds.LineDocMaker

merge.policy=org.apache.lucene.index.LogDocMergePolicy

docs.file=/Volumes/External/lucene/wiki.txt
doc.stored = false
doc.term.vector = false
doc.add.log.step=10
max.field.length=2147483647

directory=FSDirectory
autocommit=false
compound=false
merge.factor = 10
ram.flush.mb = 128
doc.maker.forever = false
doc.random.id.limit = 3204040

work.dir=/lucene/work

{ "BuildIndex"
  - OpenIndex
  - NearRealtimeReader(1)
   { "UpdateDocs" UpdateDoc > : 100000 : 50/sec
  - CloseIndex
}

RepSumByPrefRound BuildIndex
{code}

It opens a full (3.2M docs, previously built) wikipedia index, then
randomly selects a doc and updates it (deletes old, adds new) at the
rate of 50 docs/sec.  Then, once per second I open a new reader, do
the same search (term "1", sorted by date).

I attached another graph (ssd2.png) with the results, showing reopen &
search time as a function of how many updates have been done; rough
comments:

  * Search time is pretty constant ~35 msec, except occassional
    glitches where it goes as high as ~340 msec.  Net/net very
    reasonable I think.

  * Search time is remarkably non-noisy, except for occasional
    spikes.

  * Reopen time is also fast (~ 40 msec) but is more noisy.

  * It's not clear the merges are really impacting things that much.
    It could simply be that I didn't run test for long enough for a
    big merge to run.  Also, this index has no stored fields nor term
    vectors, so if we added those, merges would get slower.

  * This is a better test than last one, since it's doing some deletes

  * Since I open writer with autoCommit false, and near-realtime
    carries all pending deletes in RAM, no *.del file ever gets
    written to the index


> Integrate IndexReader with IndexWriter 
> ---------------------------------------
>
>                 Key: LUCENE-1516
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1516
>             Project: Lucene - Java
>          Issue Type: Improvement
>    Affects Versions: 2.4
>            Reporter: Jason Rutherglen
>            Assignee: Michael McCandless
>            Priority: Minor
>             Fix For: 2.9
>
>         Attachments: LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, magnetic.png, ssd.png, ssd2.png
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> The current problem is an IndexReader and IndexWriter cannot be open
> at the same time and perform updates as they both require a write
> lock to the index. While methods such as IW.deleteDocuments enables
> deleting from IW, methods such as IR.deleteDocument(int doc) and
> norms updating are not available from IW. This limits the
> capabilities of performing updates to the index dynamically or in
> realtime without closing the IW and opening an IR, deleting or
> updating norms, flushing, then opening the IW again, a process which
> can be detrimental to realtime updates. 
> This patch will expose an IndexWriter.getReader method that returns
> the currently flushed state of the index as a class that implements
> IndexReader. The new IR implementation will differ from existing IR
> implementations such as MultiSegmentReader in that flushing will
> synchronize updates with IW in part by sharing the write lock. All
> methods of IR will be usable including reopen and clone. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Updated: (LUCENE-1516) Integrate IndexReader with IndexWriter

Posted by "Jason Rutherglen (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-1516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jason Rutherglen updated LUCENE-1516:
-------------------------------------

    Attachment: LUCENE-1516.patch

{quote} On an explicit commit(), we should also sweep the pool and
write changes to disk for any SR that has pending changes. {quote}

I created SegmentReaderPool.commitAll which commits changes for all
SRs in the pool in IW.startCommit.

bq. applyDeletes at the top of commitMergedDeletes

Removed

bq. switching back to docMap for merging deletes

Where should I get the numDeletedDocs from? (Used for docUpto +=
docCount - previousReader.numDeletedDocs()) Should the entire
docIdMap be scanned? Is there an expense in cloning the segment
readers besides the extra bitvectors?

* SegmentReaderPool.release is implemented instead of using
reader.decRef. I think you're saying put this patch's decRef logic in
the SRP.release method? 

* Added IW.close to TestIndexReader.testDocsOutOfOrderJIRA140

* TestTransactions may sometimes be failing legitimately during the
prepareCommit, the exception can't be reliably reproduced but perhaps
another test case can be written that does

* TestIndexWriterDelete.testErrorAfterApplyDeletes fails due to
IW.commit not throwing an expected exception 

> Integrate IndexReader with IndexWriter 
> ---------------------------------------
>
>                 Key: LUCENE-1516
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1516
>             Project: Lucene - Java
>          Issue Type: Improvement
>    Affects Versions: 2.4
>            Reporter: Jason Rutherglen
>            Assignee: Michael McCandless
>            Priority: Minor
>             Fix For: 2.9
>
>         Attachments: LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> The current problem is an IndexReader and IndexWriter cannot be open
> at the same time and perform updates as they both require a write
> lock to the index. While methods such as IW.deleteDocuments enables
> deleting from IW, methods such as IR.deleteDocument(int doc) and
> norms updating are not available from IW. This limits the
> capabilities of performing updates to the index dynamically or in
> realtime without closing the IW and opening an IR, deleting or
> updating norms, flushing, then opening the IW again, a process which
> can be detrimental to realtime updates. 
> This patch will expose an IndexWriter.getReader method that returns
> the currently flushed state of the index as a class that implements
> IndexReader. The new IR implementation will differ from existing IR
> implementations such as MultiSegmentReader in that flushing will
> synchronize updates with IW in part by sharing the write lock. All
> methods of IR will be usable including reopen and clone. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Updated: (LUCENE-1516) Integrate IndexReader with IndexWriter

Posted by "Jason Rutherglen (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-1516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jason Rutherglen updated LUCENE-1516:
-------------------------------------

    Attachment: LUCENE-1516.patch

- Most tests pass. Because SegmentReaders are held by the writer,
files remain open whereas before this patch they did not. This causes
some tests to fail. 
- I'm working a way to remove unreferenced segments from the pool 

> Integrate IndexReader with IndexWriter 
> ---------------------------------------
>
>                 Key: LUCENE-1516
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1516
>             Project: Lucene - Java
>          Issue Type: Improvement
>    Affects Versions: 2.4
>            Reporter: Jason Rutherglen
>            Assignee: Michael McCandless
>            Priority: Minor
>             Fix For: 2.9
>
>         Attachments: LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> The current problem is an IndexReader and IndexWriter cannot be open
> at the same time and perform updates as they both require a write
> lock to the index. While methods such as IW.deleteDocuments enables
> deleting from IW, methods such as IR.deleteDocument(int doc) and
> norms updating are not available from IW. This limits the
> capabilities of performing updates to the index dynamically or in
> realtime without closing the IW and opening an IR, deleting or
> updating norms, flushing, then opening the IW again, a process which
> can be detrimental to realtime updates. 
> This patch will expose an IndexWriter.getReader method that returns
> the currently flushed state of the index as a class that implements
> IndexReader. The new IR implementation will differ from existing IR
> implementations such as MultiSegmentReader in that flushing will
> synchronize updates with IW in part by sharing the write lock. All
> methods of IR will be usable including reopen and clone. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Updated: (LUCENE-1516) Integrate IndexReader with IndexWriter

Posted by "Jason Rutherglen (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-1516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jason Rutherglen updated LUCENE-1516:
-------------------------------------

    Attachment: LUCENE-1516.patch

Patch includes the latest patch from LUCENE-1314 and is unfortunately
not readable. Is there a way to create a patch minus another patch?

The IW internal reader is always the results of the latest external
IR.reopen. In this way deletes made in IW and from the external IR
remain in sync. The latest IR always has the write lock anyways
according to IR.clone semantics. 

DocumentsWriter.applyDeletes reuses the reader in IW rather than call
SegmentReader.get/.deleteDocument

In IW.doFlush the internal reader is reopened with the new
segmentinfos

Included a very basic test case 


> Integrate IndexReader with IndexWriter 
> ---------------------------------------
>
>                 Key: LUCENE-1516
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1516
>             Project: Lucene - Java
>          Issue Type: Improvement
>    Affects Versions: 2.4
>            Reporter: Jason Rutherglen
>            Priority: Minor
>             Fix For: 2.9
>
>         Attachments: LUCENE-1516.patch
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> The current problem is an IndexReader and IndexWriter cannot be open
> at the same time and perform updates as they both require a write
> lock to the index. While methods such as IW.deleteDocuments enables
> deleting from IW, methods such as IR.deleteDocument(int doc) and
> norms updating are not available from IW. This limits the
> capabilities of performing updates to the index dynamically or in
> realtime without closing the IW and opening an IR, deleting or
> updating norms, flushing, then opening the IW again, a process which
> can be detrimental to realtime updates. 
> This patch will expose an IndexWriter.getReader method that returns
> the currently flushed state of the index as a class that implements
> IndexReader. The new IR implementation will differ from existing IR
> implementations such as MultiSegmentReader in that flushing will
> synchronize updates with IW in part by sharing the write lock. All
> methods of IR will be usable including reopen and clone. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1516) Integrate IndexReader with IndexWriter

Posted by "Grant Ingersoll (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12662800#action_12662800 ] 

Grant Ingersoll commented on LUCENE-1516:
-----------------------------------------

It still seems to me that we should think about what objects need to be shared and how they can be produced/instantiated appropriately instead of adding a Reader onto the Writer, which, IMO, pollutes the Writer API.  I know it complicates things, but I think it will be less confusing to users of the API.

> Integrate IndexReader with IndexWriter 
> ---------------------------------------
>
>                 Key: LUCENE-1516
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1516
>             Project: Lucene - Java
>          Issue Type: Improvement
>    Affects Versions: 2.4
>            Reporter: Jason Rutherglen
>            Priority: Minor
>             Fix For: 2.9
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> The current problem is an IndexReader and IndexWriter cannot be open
> at the same time and perform updates as they both require a write
> lock to the index. While methods such as IW.deleteDocuments enables
> deleting from IW, methods such as IR.deleteDocument(int doc) and
> norms updating are not available from IW. This limits the
> capabilities of performing updates to the index dynamically or in
> realtime without closing the IW and opening an IR, deleting or
> updating norms, flushing, then opening the IW again, a process which
> can be detrimental to realtime updates. 
> This patch will expose an IndexWriter.getReader method that returns
> the currently flushed state of the index as a class that implements
> IndexReader. The new IR implementation will differ from existing IR
> implementations such as MultiSegmentReader in that flushing will
> synchronize updates with IW in part by sharing the write lock. All
> methods of IR will be usable including reopen and clone. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1516) Integrate IndexReader with IndexWriter

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12674662#action_12674662 ] 

Michael McCandless commented on LUCENE-1516:
--------------------------------------------


Looks good, Jason.  This is big change, and I expect to go through a
number of iterations before settling... plus we still need to figure
out how the API is exposed.  Comments:

  * All this logic needs to be conditional (this also depends on what
    API we actually settle on to expose this...): right now you always
    open a reader whenever IW is created.

  * We should assume we do not need to support autoCommit=true in this
    patch (since this will land after 3.0).  This simplifies things.

  * IW.reopenInternalReader only does a clone not a reopen; how does
    it cover the newly flushed segment?

  * After a merge commits you don't seem to reopen the reader?  This
    is actually tricky to do right, for realtime search: we somehow
    need to allow for warming of the newly created (merged) segment,
    in such a way that we do not block the flushing of further
    segments and reopen of readers against those new segments.  I
    think what may be best is to subclass IW, and override a newly
    added "postMerge" method that's invoked on the new segment before
    the merge is committed into the SegmentInfos.  This is cleaner
    than allowing the change into the SegmentInfos and then having to
    make a custom deletion policy & track history of each segment.

  * It seems like reader.reopen() (where reader was obtained with
    IW.getReader()) doesn't do the right thing?  (ie it's looking for
    the most recent segments_N in the Directory, but it should be
    looking for it @ IW.segmentInfos).

  * I think we should decouple "materializing deletes down to docIDs"
    from "flushing deletes to disk".  IW does both as the same
    operation now (because it doesn't want to hold SR open for a long
    time), but once we have persistent open SegmentReaders we should
    separate these.  It's not necessary for IW to write new .del files
    when it materializes deletes.

  * Instead of having to merge readers, I think we should have a
    single source to obtain an SR from.  This way, when IW needs to
    materialize deletes, it will grab the same instance of SR for a
    given segment that the currently open MSR is using.  Also, when
    merging kicks off, it'll grab the SR from the same source (this
    way deletes in RAM will be correctly merged away).  Also, I think
    we should not use MSR for doing deletions (and still go segment by
    segment): it's quite a bit slower since every invocation must do
    the binary search again.

  * Likewise, you have to fix the commitMergedDeletes to decouple
    computing the new BitVector from writing the .del file to disk.
    That method should only create a new BitVector, for the newly
    merged segment.  It must be synchronized to prevent any new
    deletions against the segments that were just merged.  In fact,
    this is a real danger: after a merge finishes, if one continues to
    use an older reader to do deletions you get into trouble.

  * I still don't really like having both the IR and IW able to do
    deletions, with slightly different semantics.  As it stands now,
    since you can't predict when IW materializes deletes, your reader
    will suddenly see a bunch of deletes appear.  I think it's better
    if no deletes appear, ever, until you reopen your reader.  Maybe
    we simply prevent deletion through the IR?

  * We need some serious unit tests here!


> Integrate IndexReader with IndexWriter 
> ---------------------------------------
>
>                 Key: LUCENE-1516
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1516
>             Project: Lucene - Java
>          Issue Type: Improvement
>    Affects Versions: 2.4
>            Reporter: Jason Rutherglen
>            Priority: Minor
>             Fix For: 2.9
>
>         Attachments: LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> The current problem is an IndexReader and IndexWriter cannot be open
> at the same time and perform updates as they both require a write
> lock to the index. While methods such as IW.deleteDocuments enables
> deleting from IW, methods such as IR.deleteDocument(int doc) and
> norms updating are not available from IW. This limits the
> capabilities of performing updates to the index dynamically or in
> realtime without closing the IW and opening an IR, deleting or
> updating norms, flushing, then opening the IW again, a process which
> can be detrimental to realtime updates. 
> This patch will expose an IndexWriter.getReader method that returns
> the currently flushed state of the index as a class that implements
> IndexReader. The new IR implementation will differ from existing IR
> implementations such as MultiSegmentReader in that flushing will
> synchronize updates with IW in part by sharing the write lock. All
> methods of IR will be usable including reopen and clone. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org