You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@couchdb.apache.org by Dave Cottlehuber <dc...@skunkwerks.at> on 2020/05/03 06:20:38 UTC

Re: Disable Compaction for a single database

On Tue, 28 Apr 2020, at 07:06, Andrea Brancatelli wrote:
> Hello Robert, 
> 
> I see your point and mostly understand it. The plan was not to "use"
> this secondary database as an active one, but as a passively replicated
> database from a main instance, so performances of the secondary database
> weren't a big priority - the idea is to keep the whole "journal" of the
> main database. 

Hi Andrea

I've spent some time recently dealing with a 1.7.x era database that has
made this decision in the past, and weird things start to happen, when
you have a lot of versions. Best not to go against the grain.

I have a couple of suggestions on dealing with this, both based on the
assumption that you will likely not need the data on a regular basis,
that haven't already come up.

1. use Kafka or similar, for storing a record. This stream oriented
functionality with potentially repeating IDs is what they specialise in.
A small listener on the changes feeds on top of couch that handles
attachments does what you need. This is obviously rather one-way.

2. same couchdb listener, but you move the _id of the doc into a
different field, or prepend a new time-ordered id to it. There are
many choices here for the new _id, but you want one that will
sort correctly for your needs - for example, time ordered uuids,
called "flake ids".

You use the latter part of the doc id to store your original _id,
and the initial part ensures that "events" naturally sort by time,
which allows you to reconstruct the _changes feed if needed, and
you can provide a view that splits the _id to give you a per-
doc view as well.

Both flake & uuid formats are possible here, but you must validate
that the _id works in both javascript for couch, and whatever
language you choose to implement your listener in.

Boundary[1] has a great write-up & yeller[2] too, incl the
relevant papers[3]. Search for "flake id" in your preferred language.
Craig's writeup in his erlang one is really helpful too[4], and
the IETF RFC[5] has a more formal spec of other uuid schemes.
The proposed UUID "v6" format[6] still in draft, will have time
ordered uuid capabilities.

I haven't checked either of these implementations[7][8].

Maybe flake, or v6 uuids when finalised, would be a useful addition
to CouchDB.

A+
Dave

[1]: https://archive.is/2015.07.08-082503/http://www.boundary.com/blog/2012/01/flake-a-decentralized-k-ordered-unique-id-generator-in-erlang/
[2]: http://yellerapp.com/posts/2015-02-09-flake-ids.html
[3]: https://www.researchgate.net/publication/262154069_Roughly_sorting_sequential_and_parallel_approach
[4]: https://gitlab.com/zxq9/zuuid
[5]: https://tools.ietf.org/html/rfc4122
[6]: https://tools.ietf.org/html/draft-peabody-dispatch-new-uuid-format-00
[7]: https://github.com/boundary/flake
[8]: https://github.com/s-yadav/FlakeId