You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@couchdb.apache.org by Sebastian Cohnen <se...@googlemail.com> on 2010/12/08 16:54:07 UTC
Re: svn commit: r1043461 - /couchdb/branches/1.1.x/src/couchdb/couch_db_updater.erl
do I read this correctly and two normal compaction runs will take care of dupes in both, _all_docs and _changes?
On 08.12.2010, at 16:48, kocolosk@apache.org wrote:
> Author: kocolosk
> Date: Wed Dec 8 15:48:52 2010
> New Revision: 1043461
>
> URL: http://svn.apache.org/viewvc?rev=1043461&view=rev
> Log:
> Usort the infos during compaction to remove dupes, COUCHDB-968
>
> This is not a bulletproof solution; it only removes dupes when the
> they appear in the same batch of 1000 updates. However, for dupes
> that show up in _all_docs the probability of that happening is quite
> high. If the dupes are only in _changes a user may need to compact
> twice, once to get the dupes ordered together and a second time to
> remove them.
>
> A more complete solution would be to trigger the compaction in "retry"
> mode, but this is siginificantly slower.
>
> Modified:
> couchdb/branches/1.1.x/src/couchdb/couch_db_updater.erl
>
> Modified: couchdb/branches/1.1.x/src/couchdb/couch_db_updater.erl
> URL: http://svn.apache.org/viewvc/couchdb/branches/1.1.x/src/couchdb/couch_db_updater.erl?rev=1043461&r1=1043460&r2=1043461&view=diff
> ==============================================================================
> --- couchdb/branches/1.1.x/src/couchdb/couch_db_updater.erl (original)
> +++ couchdb/branches/1.1.x/src/couchdb/couch_db_updater.erl Wed Dec 8 15:48:52 2010
> @@ -775,7 +775,10 @@ copy_rev_tree_attachments(SrcDb, DestFd,
> end, Tree).
>
>
> -copy_docs(Db, #db{fd=DestFd}=NewDb, InfoBySeq, Retry) ->
> +copy_docs(Db, #db{fd=DestFd}=NewDb, InfoBySeq0, Retry) ->
> + % COUCHDB-968, make sure we prune duplicates during compaction
> + InfoBySeq = lists:usort(fun(#doc_info{id=A}, #doc_info{id=B}) -> A =< B end,
> + InfoBySeq0),
> Ids = [Id || #doc_info{id=Id} <- InfoBySeq],
> LookupResults = couch_btree:lookup(Db#db.fulldocinfo_by_id_btree, Ids),
>
>
>
Re: svn commit: r1043461 - /couchdb/branches/1.1.x/src/couchdb/couch_db_updater.erl
Posted by Adam Kocoloski <ko...@apache.org>.
With this patch applied, in ~99% of cases, yes. Best,
Adam
On Dec 8, 2010, at 10:54 AM, Sebastian Cohnen wrote:
> do I read this correctly and two normal compaction runs will take care of dupes in both, _all_docs and _changes?
>
> On 08.12.2010, at 16:48, kocolosk@apache.org wrote:
>
>> Author: kocolosk
>> Date: Wed Dec 8 15:48:52 2010
>> New Revision: 1043461
>>
>> URL: http://svn.apache.org/viewvc?rev=1043461&view=rev
>> Log:
>> Usort the infos during compaction to remove dupes, COUCHDB-968
>>
>> This is not a bulletproof solution; it only removes dupes when the
>> they appear in the same batch of 1000 updates. However, for dupes
>> that show up in _all_docs the probability of that happening is quite
>> high. If the dupes are only in _changes a user may need to compact
>> twice, once to get the dupes ordered together and a second time to
>> remove them.
>>
>> A more complete solution would be to trigger the compaction in "retry"
>> mode, but this is siginificantly slower.
>>
>> Modified:
>> couchdb/branches/1.1.x/src/couchdb/couch_db_updater.erl
>>
>> Modified: couchdb/branches/1.1.x/src/couchdb/couch_db_updater.erl
>> URL: http://svn.apache.org/viewvc/couchdb/branches/1.1.x/src/couchdb/couch_db_updater.erl?rev=1043461&r1=1043460&r2=1043461&view=diff
>> ==============================================================================
>> --- couchdb/branches/1.1.x/src/couchdb/couch_db_updater.erl (original)
>> +++ couchdb/branches/1.1.x/src/couchdb/couch_db_updater.erl Wed Dec 8 15:48:52 2010
>> @@ -775,7 +775,10 @@ copy_rev_tree_attachments(SrcDb, DestFd,
>> end, Tree).
>>
>>
>> -copy_docs(Db, #db{fd=DestFd}=NewDb, InfoBySeq, Retry) ->
>> +copy_docs(Db, #db{fd=DestFd}=NewDb, InfoBySeq0, Retry) ->
>> + % COUCHDB-968, make sure we prune duplicates during compaction
>> + InfoBySeq = lists:usort(fun(#doc_info{id=A}, #doc_info{id=B}) -> A =< B end,
>> + InfoBySeq0),
>> Ids = [Id || #doc_info{id=Id} <- InfoBySeq],
>> LookupResults = couch_btree:lookup(Db#db.fulldocinfo_by_id_btree, Ids),
>>
>>
>>
>