You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Michael McCandless (JIRA)" <ji...@apache.org> on 2009/01/06 18:28:44 UTC
[jira] Commented: (LUCENE-1314) IndexReader.clone

    [ https://issues.apache.org/jira/browse/LUCENE-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12661214#action_12661214 ] 

Michael McCandless commented on LUCENE-1314:
--------------------------------------------


{quote}
> The problem is the user may get into trouble by updating the stale reader which was debated before. I got the impression insuring the reader being updated was the latest was important.
{quote}

But: when one attempts to change a stale reader, that's caught when
trying to acquire the write lock?  (Ie during clone I think you don't
need to also check for this).

{quote}
> The cost of cloning them meaning the creating a new byte array
{quote}

Yeah I was thinking CPU cost of creating & copying deleted docs &
norms; I was just curious (I don't think we have to measure this
before committing).

{quote}
> I need to reread Marvin's tombstones which at first glance seemed to be an iterative approach to saving deletions that seems like a transaction log. Correct?
{quote}

Similar to a transaction log in that the size of what's written is in
proportion to how many changes (deletions) you made.  But different in
that there is no other data structure (ie the tombstones *are* the
representation of the deletes) and so the tombstones are used "live"
(whereas transaction log is typically "played back" on next startup
after a failure).

If we had tombstones to represent deletes in Lucene then any new
deletions would not require any cloning of prior deletions.  Ie there
would be no copy-on-write.

{quote}
> M.M.: SegmentReader.Norm now has two refCounts, and I think both are necessary. One tracks refs to the Norm instance itself and the other tracks refs to the byte[]. Can you add some comments explaining the difference (because it's confusing at first blush)?
> 
> Byte[] referencing is used because a new norm object needs to be created for each clone, and the byte array is all that is needed for sharing between cloned readers. The current norm referencing is for sharing between readers whereas the byte[] referencing is for copy on write which is independent of reader references.
{quote}

Got it.  Can you put this into the javadocs in the Norm class?

{quote}
> M.M.: In SegmentReader.doClose() you are failing to call deletedDocsCopyOnWriteRef.decRef(), so you have a refCount leak. Can you create a unit test that 1) opens reader 1, 2) does deletes on reader 1, 3) clones reader 1 --> reader 2, 4) closes reader 1, 5) deletes more docs with reader 1, and 6) asserts that the
deletedDocs BitVector did not get cloned? First verify the test fails, then fix the bug...
> 
> In regards to #5, the test cannot delete from reader 1 once it's closed. A method called TestIndexReaderClone.testSegmentReaderCloseReferencing was added to test this closing use case.
{quote}

Woops -- I meant "5) deletes more docs with reader 2".  Test case
looks good!  Thanks.

A few more comments:

  * Can you update javadocs of IndexReader.reopen to remove the
    warning about not doing modification operations?  With
    copy-on-write you are now free to do deletes against the reopened
    reader with no impact to the reader you had reopened/cloned.

  * What is SegmentReader.doDecRef for?  It seems dead?

  * SegmentReader.doUndeleteAll has 4 space indent (should be 2)

  * We have this in SegmentReader.reopenSegment:
{code}
if (deletedDocsRef == null) deletedDocsRef = new Ref();
else deletedDocsRef.incRef();
{code}
   But I think if I clone a reader with no deletes, the clone then
   [incorrectly] has a deletedDocsRef set?  Can you fix that code to
   keep the invariant that if deleteDocs is null, so is
   deletedDocsRef, and v/v?  Can you sprinkle asserts to make sure
   that invariant always holds?

  * In SegmentReader.decRef we have "if (deletedDocsRef != null &&
    deletedDocsRef.refCount() > 1) deletedDocsRef.decRef();" -- but,
    you should not have to check if deletedDocsRef.refCount() > 1?
    Does something break when you remove that?  (In which case I think
    we have a refCount bug lurking...)

  * The norm cloning logic in SegmentReader.reopenSegment needs to be
    cleaned up... eg we first sweep through each Norm, incRef'ing it,
    and then make 2nd pass to do full clone.  Really we should have if
    (doClone) up front and do a single pass?  Also: I think we need
    that same logic to re-open the singleNormStream for the clone case
    as well.
.
    Hmm, in the non-single-norm stream case I think we also must
    re-open the norm file, rather than clone it, in Norm.clone().  I
    think if you 1) open reader 1(do no searching w/ it), 2) clone it
    --> reader 2, 3) close reader 1, 4) try to do a search against a
    field that then needs to load norms, you'll hit an
    AlreadyClosedException, because you had a cloned IndexInput vs a
    newly reopened one?  Can you add that test case?

  * Why was this needed:
{code}
if (doClone && normsDirty) {
  normsUpToDate = false;
}
{code}
    It seems like leaving normsUpToDate as true should have worked
    (all the Norm instances are cloned anyway?).

  * In SegmentReader.doUndeleteAll, can you conditionalize on whether
    deletedDocs != null?  In general rather than having separate
    checks for deleteDocs != null and deletedDocsRef != null, I'd
    prefer to check only deletedDocs != null and add an assert that
    deleteDocsRef != null, with else clause having assert
    deletedDocsRef == null.


> IndexReader.clone
> -----------------
>
>                 Key: LUCENE-1314
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1314
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: Index
>    Affects Versions: 2.3.1
>            Reporter: Jason Rutherglen
>            Assignee: Michael McCandless
>            Priority: Minor
>             Fix For: 2.9
>
>         Attachments: LUCENE-1314.patch, LUCENE-1314.patch, LUCENE-1314.patch, LUCENE-1314.patch, LUCENE-1314.patch, LUCENE-1314.patch, LUCENE-1314.patch, LUCENE-1314.patch, lucene-1314.patch, lucene-1314.patch, lucene-1314.patch, lucene-1314.patch, lucene-1314.patch, lucene-1314.patch, lucene-1314.patch, lucene-1314.patch, lucene-1314.patch, lucene-1314.patch, lucene-1314.patch, lucene-1314.patch
>
>
> Based on discussion http://www.nabble.com/IndexReader.reopen-issue-td18070256.html.  The problem is reopen returns the same reader if there are no changes, so if docs are deleted from the new reader, they are also reflected in the previous reader which is not always desired behavior.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org