You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@couchdb.apache.org by "Randall Leeds (Created) (JIRA)" <ji...@apache.org> on 2011/12/15 06:33:30 UTC

[jira] [Created] (COUCHDB-1363) Race condition edge case when pulling local changes

Race condition edge case when pulling local changes
---------------------------------------------------

                 Key: COUCHDB-1363
                 URL: https://issues.apache.org/jira/browse/COUCHDB-1363
             Project: CouchDB
          Issue Type: Bug
          Components: Database Core
    Affects Versions: 1.1.1, 1.0.3
            Reporter: Randall Leeds
            Priority: Minor
             Fix For: 1.2, 1.3


It's necessary to re-open the #db after subscribing to notifications so that updates are not lost. In practice, this is rarely problematic because the next change will cause everything to catch up, but if a quick burst of changes happens while replication is starting the replication can go stale. Detected by intermittent replicator_db js test failures.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (COUCHDB-1363) Race condition edge case when pulling local changes

Posted by "Randall Leeds (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/COUCHDB-1363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13170547#comment-13170547 ] 

Randall Leeds commented on COUCHDB-1363:
----------------------------------------

Thanks for the review. I'll explain more (I was distracted by talking after the CouchBase meetup).

I've run into this during the course of my work for COUCHDB-1350. As part of that, I had to change the replicator_db test and make the continuous_replication_survives_restart use bare db names instead of urls (the url changes after /_restart). I'm guessing I uncovered this race because the timing changes as a result of this different path.

In any case, unless I'm missing something fundamental it's most definitely a bug and this patch definitely fixes it. I tested by removing the line to stop CouchDB after the tests, so I could leave the replication running after the failure, which was line 700 of replicator_db.js. The replication stays open in active tasks but the fourth change, to doc 'foo1000', isn't replicated until I PUT another change to the source database. That is because the Db handle that's passed into handle_changes is pointing at an old header and the update_notifier hasn't been started yet. The patch re-opens the db after subscribing to notifications, which ensures we can't miss an update and stall in this way.

It probably doesn't bite many people, though (replication starts again as soon as the next change comes), but I can think of situations I've seen where this could be a problem.

Thanks for catching the shadowing issue. I didn't notice the warning when I compiled. I thought that the DbName in the fun header would be matched against the DbName in the handle_changes header instead of shadowing. My intention was not to change that functionality and I should probably have left it alone.

Let me know if it's okay to commit (- the shadowing). I don't see a better fix. Do you?
                
> Race condition edge case when pulling local changes
> ---------------------------------------------------
>
>                 Key: COUCHDB-1363
>                 URL: https://issues.apache.org/jira/browse/COUCHDB-1363
>             Project: CouchDB
>          Issue Type: Bug
>          Components: Database Core
>    Affects Versions: 1.0.3, 1.1.1
>            Reporter: Randall Leeds
>            Assignee: Filipe Manana
>            Priority: Minor
>             Fix For: 1.2, 1.3
>
>         Attachments: 0001-Fix-a-race-condition-starting-replications.patch
>
>
> It's necessary to re-open the #db after subscribing to notifications so that updates are not lost. In practice, this is rarely problematic because the next change will cause everything to catch up, but if a quick burst of changes happens while replication is starting the replication can go stale. Detected by intermittent replicator_db js test failures.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Closed] (COUCHDB-1363) callback invocation for docs added during couch_changes startup can be delayed by race condition

Posted by "Randall Leeds (Closed) (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/COUCHDB-1363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Randall Leeds closed COUCHDB-1363.
----------------------------------

       Resolution: Fixed
    Fix Version/s: 1.1.2
                   1.0.4
         Assignee:     (was: Filipe Manana)

Fixed.

To https://git-wip-us.apache.org/repos/asf/couchdb.git
   9c377a1..e82a0c9  1.0.x -> 1.0.x
   c9b20f2..6a04e33  1.1.x -> 1.1.x
   7f9376c..1bbe619  1.2.x -> 1.2.x
   2d90a12..573a7bb  master -> master
                
> callback invocation for docs added during couch_changes startup can be delayed by race condition
> ------------------------------------------------------------------------------------------------
>
>                 Key: COUCHDB-1363
>                 URL: https://issues.apache.org/jira/browse/COUCHDB-1363
>             Project: CouchDB
>          Issue Type: Bug
>          Components: Database Core
>    Affects Versions: 1.0.3, 1.1.1
>            Reporter: Randall Leeds
>            Priority: Minor
>             Fix For: 1.0.4, 1.2, 1.3, 1.1.2
>
>         Attachments: 0001-Fix-a-race-condition-starting-replications.patch
>
>
> After subscribing to notifications it's necessary to re-open the #db a so that the header points at all updates for which the updater notifier has already fired events. In practice, this is rarely problematic because the next change will cause everything to catch up, but if a quick burst of changes happens while, e.g., replication is starting the replication can go stale. Detected by intermittent replicator_db js test failures.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (COUCHDB-1363) callback invocation for docs added during couch_changes startup can be delayed by race condition

Posted by "Filipe Manana (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/COUCHDB-1363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13170574#comment-13170574 ] 

Filipe Manana commented on COUCHDB-1363:
----------------------------------------

Go ahead Randall, it's a genuine issue. +1
                
> callback invocation for docs added during couch_changes startup can be delayed by race condition
> ------------------------------------------------------------------------------------------------
>
>                 Key: COUCHDB-1363
>                 URL: https://issues.apache.org/jira/browse/COUCHDB-1363
>             Project: CouchDB
>          Issue Type: Bug
>          Components: Database Core
>    Affects Versions: 1.0.3, 1.1.1
>            Reporter: Randall Leeds
>            Assignee: Filipe Manana
>            Priority: Minor
>             Fix For: 1.2, 1.3
>
>         Attachments: 0001-Fix-a-race-condition-starting-replications.patch
>
>
> After subscribing to notifications it's necessary to re-open the #db a so that the header points at all updates for which the updater notifier has already fired events. In practice, this is rarely problematic because the next change will cause everything to catch up, but if a quick burst of changes happens while, e.g., replication is starting the replication can go stale. Detected by intermittent replicator_db js test failures.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (COUCHDB-1363) Race condition edge case when pulling local changes

Posted by "Randall Leeds (Updated) (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/COUCHDB-1363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Randall Leeds updated COUCHDB-1363:
-----------------------------------

    Attachment: 0001-Fix-a-race-condition-starting-replications.patch
    
> Race condition edge case when pulling local changes
> ---------------------------------------------------
>
>                 Key: COUCHDB-1363
>                 URL: https://issues.apache.org/jira/browse/COUCHDB-1363
>             Project: CouchDB
>          Issue Type: Bug
>          Components: Database Core
>    Affects Versions: 1.0.3, 1.1.1
>            Reporter: Randall Leeds
>            Priority: Minor
>             Fix For: 1.2, 1.3
>
>         Attachments: 0001-Fix-a-race-condition-starting-replications.patch
>
>
> It's necessary to re-open the #db after subscribing to notifications so that updates are not lost. In practice, this is rarely problematic because the next change will cause everything to catch up, but if a quick burst of changes happens while replication is starting the replication can go stale. Detected by intermittent replicator_db js test failures.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (COUCHDB-1363) Race condition edge case when pulling local changes

Posted by "Filipe Manana (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/COUCHDB-1363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13170157#comment-13170157 ] 

Filipe Manana commented on COUCHDB-1363:
----------------------------------------

Hi Randall,

This would only minimize the "problem", as right after re-opening the database some changes can happen.

As for the replicator_db.js test, the only case I can see this happening is when triggering a non-continuous replication, immediately after add some docs to the source database and then assert the docs were written to the target. I think this would be more a problem of the test then anything else. I don't recall if there's any test function which does that in replicator_db.js. Is there any? Which one did you find?

Also the following line you changed is a bit dangerous:

-                fun({_, DbName}) when DbName == Db#db.name ->
+                fun({_, DbName}) ->

The DbName inside the fun is shadowing the DbName in the handle_changes clause. This means you'll accept updates for any database.
The compiler should give you a warning about this.

I would also prefer you update the commit's title because this is not replication specific, but rather couch_changes specific.

I'm mostly convinced it's a test issue anyway.
                
> Race condition edge case when pulling local changes
> ---------------------------------------------------
>
>                 Key: COUCHDB-1363
>                 URL: https://issues.apache.org/jira/browse/COUCHDB-1363
>             Project: CouchDB
>          Issue Type: Bug
>          Components: Database Core
>    Affects Versions: 1.0.3, 1.1.1
>            Reporter: Randall Leeds
>            Assignee: Filipe Manana
>            Priority: Minor
>             Fix For: 1.2, 1.3
>
>         Attachments: 0001-Fix-a-race-condition-starting-replications.patch
>
>
> It's necessary to re-open the #db after subscribing to notifications so that updates are not lost. In practice, this is rarely problematic because the next change will cause everything to catch up, but if a quick burst of changes happens while replication is starting the replication can go stale. Detected by intermittent replicator_db js test failures.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (COUCHDB-1363) callback invocation for docs added during couch_changes startup can be delayed by race condition

Posted by "Randall Leeds (Updated) (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/COUCHDB-1363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Randall Leeds updated COUCHDB-1363:
-----------------------------------

    Description: After subscribing to notifications it's necessary to re-open the #db a so that the header points at all updates for which the updater notifier has already fired events. In practice, this is rarely problematic because the next change will cause everything to catch up, but if a quick burst of changes happens while, e.g., replication is starting the replication can go stale. Detected by intermittent replicator_db js test failures.  (was: It's necessary to re-open the #db after subscribing to notifications so that updates are not lost. In practice, this is rarely problematic because the next change will cause everything to catch up, but if a quick burst of changes happens while replication is starting the replication can go stale. Detected by intermittent replicator_db js test failures.)
        Summary: callback invocation for docs added during couch_changes startup can be delayed by race condition  (was: Race condition edge case when pulling local changes)
    
> callback invocation for docs added during couch_changes startup can be delayed by race condition
> ------------------------------------------------------------------------------------------------
>
>                 Key: COUCHDB-1363
>                 URL: https://issues.apache.org/jira/browse/COUCHDB-1363
>             Project: CouchDB
>          Issue Type: Bug
>          Components: Database Core
>    Affects Versions: 1.0.3, 1.1.1
>            Reporter: Randall Leeds
>            Assignee: Filipe Manana
>            Priority: Minor
>             Fix For: 1.2, 1.3
>
>         Attachments: 0001-Fix-a-race-condition-starting-replications.patch
>
>
> After subscribing to notifications it's necessary to re-open the #db a so that the header points at all updates for which the updater notifier has already fired events. In practice, this is rarely problematic because the next change will cause everything to catch up, but if a quick burst of changes happens while, e.g., replication is starting the replication can go stale. Detected by intermittent replicator_db js test failures.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Assigned] (COUCHDB-1363) Race condition edge case when pulling local changes

Posted by "Randall Leeds (Assigned) (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/COUCHDB-1363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Randall Leeds reassigned COUCHDB-1363:
--------------------------------------

    Assignee: Filipe Manana
    
> Race condition edge case when pulling local changes
> ---------------------------------------------------
>
>                 Key: COUCHDB-1363
>                 URL: https://issues.apache.org/jira/browse/COUCHDB-1363
>             Project: CouchDB
>          Issue Type: Bug
>          Components: Database Core
>    Affects Versions: 1.0.3, 1.1.1
>            Reporter: Randall Leeds
>            Assignee: Filipe Manana
>            Priority: Minor
>             Fix For: 1.2, 1.3
>
>         Attachments: 0001-Fix-a-race-condition-starting-replications.patch
>
>
> It's necessary to re-open the #db after subscribing to notifications so that updates are not lost. In practice, this is rarely problematic because the next change will cause everything to catch up, but if a quick burst of changes happens while replication is starting the replication can go stale. Detected by intermittent replicator_db js test failures.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira