You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@couchdb.apache.org by "Adam Kocoloski (JIRA)" <ji...@apache.org> on 2010/12/14 03:38:03 UTC

[jira] Reopened: (COUCHDB-968) Duplicated IDs in _all_docs

     [ https://issues.apache.org/jira/browse/COUCHDB-968?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Adam Kocoloski reopened COUCHDB-968:
------------------------------------


It turns out this series of patches does not merge key trees correctly in all cases.  It wrongly assumes that the "InsertTree" is always a linear path.  Now, it is true that every invocation of couch_key_tree:merge/2 has a linear revision path in the 2nd argument.  However, when couch_key_tree:merge_one/4 successfully merges the inserted revision path into one of the branches of an existing tree, creating a new "Merged" branch, it turns around and tries to merge that Merged branch into the next branch of the tree.  At this point, all bets are off -- the new InsertTree (a.k.a. Merged) is a full revision tree and can have an arbitrary number of siblings at each level.

I believe this commit addresses the issue:

https://github.com/kocolosk/couchdb/commit/a542113796653c6ff3673e05563fa20f041e6983

> Duplicated IDs in _all_docs
> ---------------------------
>
>                 Key: COUCHDB-968
>                 URL: https://issues.apache.org/jira/browse/COUCHDB-968
>             Project: CouchDB
>          Issue Type: Bug
>          Components: Database Core
>    Affects Versions: 0.10.1, 0.10.2, 0.11.1, 0.11.2, 1.0, 1.0.1, 1.0.2
>         Environment: any
>            Reporter: Sebastian Cohnen
>            Assignee: Adam Kocoloski
>            Priority: Blocker
>             Fix For: 0.11.3, 1.0.2, 1.1
>
>
> We have a database, which is causing serious trouble with compaction and replication (huge memory and cpu usage, often causing couchdb to crash b/c all system memory is exhausted). Yesterday we discovered that db/_all_docs is reporting duplicated IDs (see [1]). Until a few minutes ago we thought that there are only few duplicates but today I took a closer look and I found 10 IDs which sum up to a total of 922 duplicates. Some of them have only 1 duplicate, others have hundreds.
> Some facts about the database in question:
> * ~13k documents, with 3-5k revs each
> * all duplicated documents are in conflict (with 1 up to 14 conflicts)
> * compaction is run on a daily bases
> * several thousands updates per hour
> * multi-master setup with pull replication from each other
> * delayed_commits=false on all nodes
> * used couchdb versions 1.0.0 and 1.0.x (*)
> Unfortunately the database's contents are confidential and I'm not allowed to publish it.
> [1]: Part of http://localhost:5984/DBNAME/_all_docs
> ...
> {"id":"9997","key":"9997","value":{"rev":"6096-603c68c1fa90ac3f56cf53771337ac9f"}},
> {"id":"9999","key":"9999","value":{"rev":"6097-3c873ccf6875ff3c4e2c6fa264c6a180"}},
> {"id":"9999","key":"9999","value":{"rev":"6097-3c873ccf6875ff3c4e2c6fa264c6a180"}},
> ...
> [*]
> There were two (old) servers (1.0.0) in production (already having the replication and compaction issues). Then two servers (1.0.x) were added and replication was set up to bring them in sync with the old production servers since the two new servers were meant to replace the old ones (to update node.js application code among other things).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.