You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Simon Willnauer (Created) (JIRA)" <ji...@apache.org> on 2011/11/01 20:41:32 UTC

[jira] [Created] (LUCENE-3551) Yet another race in IW#nrtIsCurrent

Yet another race in IW#nrtIsCurrent
-----------------------------------

                 Key: LUCENE-3551
                 URL: https://issues.apache.org/jira/browse/LUCENE-3551
             Project: Lucene - Java
          Issue Type: Bug
          Components: core/index
    Affects Versions: 4.0
            Reporter: Simon Willnauer
             Fix For: 4.0


In IW#nrtIsCurrent looks like this:

{code}
  synchronized boolean nrtIsCurrent(SegmentInfos infos) {
    ensureOpen();
    return infos.version == segmentInfos.version && !docWriter.anyChanges() && !bufferedDeletesStream.any();
  }
{code}

* the version changes once we checkpoint the IW
* docWriter has changes if there are any docs in ram or any deletes in the delQueue
* bufferedDeletes contain all frozen del packages from the delQueue

yet, what happens is 1. we decrement the numDocsInRam in DWPT#doAfterFlush (which is executed during DWPT#flush) but before we checkpoint. 2. if we freeze deletes (empty the delQueue) we put them in the flushQueue to maintain the order.  This means they are not yet in the bufferedDeleteStream.

Bottom line, there is a window where we could see IW#nrtIsCurrent returning true if we check within this particular window. Phew, I am not 100% sure if that is the reason for our latest failure in SOLR-2861 but from what the logs look like this could be what happens. If we randomly hit low values for maxBufferedDocs & maxBufferedDeleteTerms this is absolutely possible.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Updated] (LUCENE-3551) Yet another race in IW#nrtIsCurrent

Posted by "Simon Willnauer (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-3551?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Simon Willnauer updated LUCENE-3551:
------------------------------------

    Attachment: LUCENE-3551.patch

here is a patch that checks the flushQueue as a last resort
                
> Yet another race in IW#nrtIsCurrent
> -----------------------------------
>
>                 Key: LUCENE-3551
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3551
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: core/index
>    Affects Versions: 4.0
>            Reporter: Simon Willnauer
>             Fix For: 4.0
>
>         Attachments: LUCENE-3551.patch
>
>
> In IW#nrtIsCurrent looks like this:
> {code}
>   synchronized boolean nrtIsCurrent(SegmentInfos infos) {
>     ensureOpen();
>     return infos.version == segmentInfos.version && !docWriter.anyChanges() && !bufferedDeletesStream.any();
>   }
> {code}
> * the version changes once we checkpoint the IW
> * docWriter has changes if there are any docs in ram or any deletes in the delQueue
> * bufferedDeletes contain all frozen del packages from the delQueue
> yet, what happens is 1. we decrement the numDocsInRam in DWPT#doAfterFlush (which is executed during DWPT#flush) but before we checkpoint. 2. if we freeze deletes (empty the delQueue) we put them in the flushQueue to maintain the order.  This means they are not yet in the bufferedDeleteStream.
> Bottom line, there is a window where we could see IW#nrtIsCurrent returning true if we check within this particular window. Phew, I am not 100% sure if that is the reason for our latest failure in SOLR-2861 but from what the logs look like this could be what happens. If we randomly hit low values for maxBufferedDocs & maxBufferedDeleteTerms this is absolutely possible.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (LUCENE-3551) Yet another race in IW#nrtIsCurrent

Posted by "Simon Willnauer (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-3551?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13141532#comment-13141532 ] 

Simon Willnauer commented on LUCENE-3551:
-----------------------------------------

this seems to have a deadlock.... I need to investigate more how to solve this.
                
> Yet another race in IW#nrtIsCurrent
> -----------------------------------
>
>                 Key: LUCENE-3551
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3551
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: core/index
>    Affects Versions: 4.0
>            Reporter: Simon Willnauer
>             Fix For: 4.0
>
>         Attachments: LUCENE-3551.patch, LUCENE-3551.patch
>
>
> In IW#nrtIsCurrent looks like this:
> {code}
>   synchronized boolean nrtIsCurrent(SegmentInfos infos) {
>     ensureOpen();
>     return infos.version == segmentInfos.version && !docWriter.anyChanges() && !bufferedDeletesStream.any();
>   }
> {code}
> * the version changes once we checkpoint the IW
> * docWriter has changes if there are any docs in ram or any deletes in the delQueue
> * bufferedDeletes contain all frozen del packages from the delQueue
> yet, what happens is 1. we decrement the numDocsInRam in DWPT#doAfterFlush (which is executed during DWPT#flush) but before we checkpoint. 2. if we freeze deletes (empty the delQueue) we put them in the flushQueue to maintain the order.  This means they are not yet in the bufferedDeleteStream.
> Bottom line, there is a window where we could see IW#nrtIsCurrent returning true if we check within this particular window. Phew, I am not 100% sure if that is the reason for our latest failure in SOLR-2861 but from what the logs look like this could be what happens. If we randomly hit low values for maxBufferedDocs & maxBufferedDeleteTerms this is absolutely possible.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (LUCENE-3551) Yet another race in IW#nrtIsCurrent

Posted by "Simon Willnauer (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-3551?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13144290#comment-13144290 ] 

Simon Willnauer commented on LUCENE-3551:
-----------------------------------------

This seems to fix SOLR-2861 I went through the changes again double checking all conditions. It seems ready, I will commit soon if nobody objects.
                
> Yet another race in IW#nrtIsCurrent
> -----------------------------------
>
>                 Key: LUCENE-3551
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3551
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: core/index
>    Affects Versions: 4.0
>            Reporter: Simon Willnauer
>             Fix For: 4.0
>
>         Attachments: LUCENE-3551.patch, LUCENE-3551.patch, LUCENE-3551.patch, LUCENE-3551.patch
>
>
> In IW#nrtIsCurrent looks like this:
> {code}
>   synchronized boolean nrtIsCurrent(SegmentInfos infos) {
>     ensureOpen();
>     return infos.version == segmentInfos.version && !docWriter.anyChanges() && !bufferedDeletesStream.any();
>   }
> {code}
> * the version changes once we checkpoint the IW
> * docWriter has changes if there are any docs in ram or any deletes in the delQueue
> * bufferedDeletes contain all frozen del packages from the delQueue
> yet, what happens is 1. we decrement the numDocsInRam in DWPT#doAfterFlush (which is executed during DWPT#flush) but before we checkpoint. 2. if we freeze deletes (empty the delQueue) we put them in the flushQueue to maintain the order.  This means they are not yet in the bufferedDeleteStream.
> Bottom line, there is a window where we could see IW#nrtIsCurrent returning true if we check within this particular window. Phew, I am not 100% sure if that is the reason for our latest failure in SOLR-2861 but from what the logs look like this could be what happens. If we randomly hit low values for maxBufferedDocs & maxBufferedDeleteTerms this is absolutely possible.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Updated] (LUCENE-3551) Yet another race in IW#nrtIsCurrent

Posted by "Simon Willnauer (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-3551?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Simon Willnauer updated LUCENE-3551:
------------------------------------

    Attachment: LUCENE-3551.patch

here is a new patch fixing this issue. With this patch I could not reproduce any failure from SOLR-2861 which usually failed fairly quickly.
                
> Yet another race in IW#nrtIsCurrent
> -----------------------------------
>
>                 Key: LUCENE-3551
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3551
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: core/index
>    Affects Versions: 4.0
>            Reporter: Simon Willnauer
>             Fix For: 4.0
>
>         Attachments: LUCENE-3551.patch, LUCENE-3551.patch, LUCENE-3551.patch
>
>
> In IW#nrtIsCurrent looks like this:
> {code}
>   synchronized boolean nrtIsCurrent(SegmentInfos infos) {
>     ensureOpen();
>     return infos.version == segmentInfos.version && !docWriter.anyChanges() && !bufferedDeletesStream.any();
>   }
> {code}
> * the version changes once we checkpoint the IW
> * docWriter has changes if there are any docs in ram or any deletes in the delQueue
> * bufferedDeletes contain all frozen del packages from the delQueue
> yet, what happens is 1. we decrement the numDocsInRam in DWPT#doAfterFlush (which is executed during DWPT#flush) but before we checkpoint. 2. if we freeze deletes (empty the delQueue) we put them in the flushQueue to maintain the order.  This means they are not yet in the bufferedDeleteStream.
> Bottom line, there is a window where we could see IW#nrtIsCurrent returning true if we check within this particular window. Phew, I am not 100% sure if that is the reason for our latest failure in SOLR-2861 but from what the logs look like this could be what happens. If we randomly hit low values for maxBufferedDocs & maxBufferedDeleteTerms this is absolutely possible.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Resolved] (LUCENE-3551) Yet another race in IW#nrtIsCurrent

Posted by "Simon Willnauer (Resolved) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-3551?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Simon Willnauer resolved LUCENE-3551.
-------------------------------------

    Resolution: Fixed

Committed in revision 1197742.

                
> Yet another race in IW#nrtIsCurrent
> -----------------------------------
>
>                 Key: LUCENE-3551
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3551
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: core/index
>    Affects Versions: 4.0
>            Reporter: Simon Willnauer
>             Fix For: 4.0
>
>         Attachments: LUCENE-3551.patch, LUCENE-3551.patch, LUCENE-3551.patch, LUCENE-3551.patch
>
>
> In IW#nrtIsCurrent looks like this:
> {code}
>   synchronized boolean nrtIsCurrent(SegmentInfos infos) {
>     ensureOpen();
>     return infos.version == segmentInfos.version && !docWriter.anyChanges() && !bufferedDeletesStream.any();
>   }
> {code}
> * the version changes once we checkpoint the IW
> * docWriter has changes if there are any docs in ram or any deletes in the delQueue
> * bufferedDeletes contain all frozen del packages from the delQueue
> yet, what happens is 1. we decrement the numDocsInRam in DWPT#doAfterFlush (which is executed during DWPT#flush) but before we checkpoint. 2. if we freeze deletes (empty the delQueue) we put them in the flushQueue to maintain the order.  This means they are not yet in the bufferedDeleteStream.
> Bottom line, there is a window where we could see IW#nrtIsCurrent returning true if we check within this particular window. Phew, I am not 100% sure if that is the reason for our latest failure in SOLR-2861 but from what the logs look like this could be what happens. If we randomly hit low values for maxBufferedDocs & maxBufferedDeleteTerms this is absolutely possible.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Updated] (LUCENE-3551) Yet another race in IW#nrtIsCurrent

Posted by "Simon Willnauer (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-3551?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Simon Willnauer updated LUCENE-3551:
------------------------------------

    Attachment: LUCENE-3551.patch

I isolated the problem in SOLR-2861 into a lucene testcase that fails reproduceable & very very quickly. (passed on 3.x too though) 

with the fixes in this patch it passes reliably. The problem among the others I already explained is that once DW has flushed all threads and put all deletes in the frozenPacketBuffer we have a little window where those changes are not taken into account. This only happens if I only flush deletes (no documents) since we prune the frozenBufferedDeletes before we checkpoint so nrtIsCurrent doesn't see those changes for a little while. 
                
> Yet another race in IW#nrtIsCurrent
> -----------------------------------
>
>                 Key: LUCENE-3551
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3551
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: core/index
>    Affects Versions: 4.0
>            Reporter: Simon Willnauer
>             Fix For: 4.0
>
>         Attachments: LUCENE-3551.patch, LUCENE-3551.patch, LUCENE-3551.patch, LUCENE-3551.patch
>
>
> In IW#nrtIsCurrent looks like this:
> {code}
>   synchronized boolean nrtIsCurrent(SegmentInfos infos) {
>     ensureOpen();
>     return infos.version == segmentInfos.version && !docWriter.anyChanges() && !bufferedDeletesStream.any();
>   }
> {code}
> * the version changes once we checkpoint the IW
> * docWriter has changes if there are any docs in ram or any deletes in the delQueue
> * bufferedDeletes contain all frozen del packages from the delQueue
> yet, what happens is 1. we decrement the numDocsInRam in DWPT#doAfterFlush (which is executed during DWPT#flush) but before we checkpoint. 2. if we freeze deletes (empty the delQueue) we put them in the flushQueue to maintain the order.  This means they are not yet in the bufferedDeleteStream.
> Bottom line, there is a window where we could see IW#nrtIsCurrent returning true if we check within this particular window. Phew, I am not 100% sure if that is the reason for our latest failure in SOLR-2861 but from what the logs look like this could be what happens. If we randomly hit low values for maxBufferedDocs & maxBufferedDeleteTerms this is absolutely possible.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Updated] (LUCENE-3551) Yet another race in IW#nrtIsCurrent

Posted by "Simon Willnauer (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-3551?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Simon Willnauer updated LUCENE-3551:
------------------------------------

    Attachment: LUCENE-3551.patch

ups one negation to rule them all :) - fixing the prev patch
                
> Yet another race in IW#nrtIsCurrent
> -----------------------------------
>
>                 Key: LUCENE-3551
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3551
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: core/index
>    Affects Versions: 4.0
>            Reporter: Simon Willnauer
>             Fix For: 4.0
>
>         Attachments: LUCENE-3551.patch, LUCENE-3551.patch
>
>
> In IW#nrtIsCurrent looks like this:
> {code}
>   synchronized boolean nrtIsCurrent(SegmentInfos infos) {
>     ensureOpen();
>     return infos.version == segmentInfos.version && !docWriter.anyChanges() && !bufferedDeletesStream.any();
>   }
> {code}
> * the version changes once we checkpoint the IW
> * docWriter has changes if there are any docs in ram or any deletes in the delQueue
> * bufferedDeletes contain all frozen del packages from the delQueue
> yet, what happens is 1. we decrement the numDocsInRam in DWPT#doAfterFlush (which is executed during DWPT#flush) but before we checkpoint. 2. if we freeze deletes (empty the delQueue) we put them in the flushQueue to maintain the order.  This means they are not yet in the bufferedDeleteStream.
> Bottom line, there is a window where we could see IW#nrtIsCurrent returning true if we check within this particular window. Phew, I am not 100% sure if that is the reason for our latest failure in SOLR-2861 but from what the logs look like this could be what happens. If we randomly hit low values for maxBufferedDocs & maxBufferedDeleteTerms this is absolutely possible.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org