You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@couchdb.apache.org by ko...@apache.org on 2010/12/08 17:11:26 UTC
svn commit: r1043479 -
/couchdb/branches/1.0.x/src/couchdb/couch_db_updater.erl
Author: kocolosk
Date: Wed Dec 8 16:11:25 2010
New Revision: 1043479
URL: http://svn.apache.org/viewvc?rev=1043479&view=rev
Log:
Usort the infos during compaction to remove dupes, COUCHDB-968
This is not a bulletproof solution; it only removes dupes when the
they appear in the same batch of 1000 updates. However, for dupes
that show up in _all_docs the probability of that happening is quite
high. If the dupes are only in _changes a user may need to compact
twice, once to get the dupes ordered together and a second time to
remove them.
A more complete solution would be to trigger the compaction in "retry"
mode, but this is siginificantly slower.
Modified:
couchdb/branches/1.0.x/src/couchdb/couch_db_updater.erl
Modified: couchdb/branches/1.0.x/src/couchdb/couch_db_updater.erl
URL: http://svn.apache.org/viewvc/couchdb/branches/1.0.x/src/couchdb/couch_db_updater.erl?rev=1043479&r1=1043478&r2=1043479&view=diff
==============================================================================
--- couchdb/branches/1.0.x/src/couchdb/couch_db_updater.erl (original)
+++ couchdb/branches/1.0.x/src/couchdb/couch_db_updater.erl Wed Dec 8 16:11:25 2010
@@ -775,7 +775,10 @@ copy_rev_tree_attachments(SrcDb, DestFd,
end, Tree).
-copy_docs(Db, #db{fd=DestFd}=NewDb, InfoBySeq, Retry) ->
+copy_docs(Db, #db{fd=DestFd}=NewDb, InfoBySeq0, Retry) ->
+ % COUCHDB-968, make sure we prune duplicates during compaction
+ InfoBySeq = lists:usort(fun(#doc_info{id=A}, #doc_info{id=B}) -> A =< B end,
+ InfoBySeq0),
Ids = [Id || #doc_info{id=Id} <- InfoBySeq],
LookupResults = couch_btree:lookup(Db#db.fulldocinfo_by_id_btree, Ids),