You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@couchdb.apache.org by "Mathias Leppich (Commented) (JIRA)" <ji...@apache.org> on 2011/12/16 10:04:30 UTC

[jira] [Commented] (COUCHDB-1359) Spurious "checkpoint failure: conflict (are you replicating to yourself?)"

    [ https://issues.apache.org/jira/browse/COUCHDB-1359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13170840#comment-13170840 ] 

Mathias Leppich commented on COUCHDB-1359:
------------------------------------------

I ran into the same issue. I was replicating from couch-A (version 1.0.3) to couch-B (version 1.1.1) via couch-B:

    curl -X POST -H 'Content-Type: application/json' -d '{
     "source":"some-db", 
     "target":"http://couch-A/some-db",
     "filter":"filters_erl/no_design",
     "continuous":true
    }' 'http://couch-B/_replicate'

Then suddenly (after more than 2 days of replication) I got the following error in the log:
   [Thu, 15 Dec 2011 19:10:00 GMT] [error] [<0.259.0>] checkpoint failure: conflict (are you replicating to yourself?)

And the replication stopped replicating but still showing up in _active_tasks. It did not started replicating again until I canceled the replication and re-initiated it again. So no couchrestart required, but I had to first cancel then restart the replication.

I might have to add that even thou these 2 couches have a different version, they are both based on the same database file. So couch-B was created from a filesystem snapshot of couch-A. The size of both DB's is about 52M doc with 70M seq.
                
> Spurious "checkpoint failure: conflict (are you replicating to yourself?)"
> --------------------------------------------------------------------------
>
>                 Key: COUCHDB-1359
>                 URL: https://issues.apache.org/jira/browse/COUCHDB-1359
>             Project: CouchDB
>          Issue Type: Bug
>          Components: Replication
>    Affects Versions: 1.1.1
>         Environment: Centos 5.6/x64 - spidermonkey 1.8.5, couch 1.1.1 patched for COUCHDB-1333 and COUCHDB-1340
>            Reporter: Alex Markham
>              Labels: PUT, _local, checkpoint, conflict, replication, slow
>
> I'm seeing these errors in the log when couch just stops replicating (even though it appears in _active_tasks it doesn't checkpoint again, even with _replicate being called every 5 mins)
> It seems to occur when replicating from a couch 1.1.1 (I have seen it on 1.0.3 machines replicating from 1.1.1)
> It definitely is not replicating to itself, but I suspect it is a problem in PUTing the _local doc on the source db.
> log here (snipped from host33 couch.log): http://www.friendpaste.com/3FLgRFzOEAkkKazLbc7Jgw 
> for that log our replication cron does an ssh to host33, then curls it to replicate from host01 to the database (with no host specified) as coninuous pull replication
> We have occasionally seen slow PUTing of documents on that database (and only that database) which can take upwards of 10 seconds (via futon or our app) as it is a creaking database that has a scarred history of documents that contain many (thousands) of conflicts.
> Could this occasional slow PUT manifest itself as this error in the log?
> As a workaround to keep replication flowing, would it restart this replication id if the curl called the cancelling of the replication ("cancel":true) followed by the starting of replication?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira