You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@couchdb.apache.org by "Alex Markham (Created) (JIRA)" <ji...@apache.org> on 2011/12/09 17:29:39 UTC

[jira] [Created] (COUCHDB-1359) Spurious "checkpoint failure: conflict (are you replicating to yourself?)"

Spurious "checkpoint failure: conflict (are you replicating to yourself?)"
--------------------------------------------------------------------------

                 Key: COUCHDB-1359
                 URL: https://issues.apache.org/jira/browse/COUCHDB-1359
             Project: CouchDB
          Issue Type: Bug
          Components: Replication
    Affects Versions: 1.1.1
         Environment: Centos 5.6/x64 - spidermonkey 1.8.5, couch 1.1.1 patched for COUCHDB-1333 and COUCHDB-1340
            Reporter: Alex Markham


I'm seeing these errors in the log when couch just stops replicating (even though it appears in _active_tasks it doesn't checkpoint again, even with _replicate being called every 5 mins)
It seems to occur when replicating from a couch 1.1.1 (I have seen it on 1.0.3 machines replicating from 1.1.1)

It definitely is not replicating to itself, but I suspect it is a problem in PUTing the _local doc on the source db.

log here (snipped from host33 couch.log): http://www.friendpaste.com/3FLgRFzOEAkkKazLbc7Jgw 
for that log our replication cron does an ssh to host33, then curls it to replicate from host01 to the database (with no host specified) as coninuous pull replication


We have occasionally seen slow PUTing of documents on that database (and only that database) which can take upwards of 10 seconds (via futon or our app) as it is a creaking database that has a scarred history of documents that contain many (thousands) of conflicts.
Could this occasional slow PUT manifest itself as this error in the log?

As a workaround to keep replication flowing, would it restart this replication id if the curl called the cancelling of the replication ("cancel":true) followed by the starting of replication?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (COUCHDB-1359) Spurious "checkpoint failure: conflict (are you replicating to yourself?)"

Posted by "Mathias Leppich (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/COUCHDB-1359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13170840#comment-13170840 ] 

Mathias Leppich commented on COUCHDB-1359:
------------------------------------------

I ran into the same issue. I was replicating from couch-A (version 1.0.3) to couch-B (version 1.1.1) via couch-B:

    curl -X POST -H 'Content-Type: application/json' -d '{
     "source":"some-db", 
     "target":"http://couch-A/some-db",
     "filter":"filters_erl/no_design",
     "continuous":true
    }' 'http://couch-B/_replicate'

Then suddenly (after more than 2 days of replication) I got the following error in the log:
   [Thu, 15 Dec 2011 19:10:00 GMT] [error] [<0.259.0>] checkpoint failure: conflict (are you replicating to yourself?)

And the replication stopped replicating but still showing up in _active_tasks. It did not started replicating again until I canceled the replication and re-initiated it again. So no couchrestart required, but I had to first cancel then restart the replication.

I might have to add that even thou these 2 couches have a different version, they are both based on the same database file. So couch-B was created from a filesystem snapshot of couch-A. The size of both DB's is about 52M doc with 70M seq.
                
> Spurious "checkpoint failure: conflict (are you replicating to yourself?)"
> --------------------------------------------------------------------------
>
>                 Key: COUCHDB-1359
>                 URL: https://issues.apache.org/jira/browse/COUCHDB-1359
>             Project: CouchDB
>          Issue Type: Bug
>          Components: Replication
>    Affects Versions: 1.1.1
>         Environment: Centos 5.6/x64 - spidermonkey 1.8.5, couch 1.1.1 patched for COUCHDB-1333 and COUCHDB-1340
>            Reporter: Alex Markham
>              Labels: PUT, _local, checkpoint, conflict, replication, slow
>
> I'm seeing these errors in the log when couch just stops replicating (even though it appears in _active_tasks it doesn't checkpoint again, even with _replicate being called every 5 mins)
> It seems to occur when replicating from a couch 1.1.1 (I have seen it on 1.0.3 machines replicating from 1.1.1)
> It definitely is not replicating to itself, but I suspect it is a problem in PUTing the _local doc on the source db.
> log here (snipped from host33 couch.log): http://www.friendpaste.com/3FLgRFzOEAkkKazLbc7Jgw 
> for that log our replication cron does an ssh to host33, then curls it to replicate from host01 to the database (with no host specified) as coninuous pull replication
> We have occasionally seen slow PUTing of documents on that database (and only that database) which can take upwards of 10 seconds (via futon or our app) as it is a creaking database that has a scarred history of documents that contain many (thousands) of conflicts.
> Could this occasional slow PUT manifest itself as this error in the log?
> As a workaround to keep replication flowing, would it restart this replication id if the curl called the cancelling of the replication ("cancel":true) followed by the starting of replication?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira