You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@couchdb.apache.org by GitBox <gi...@apache.org> on 2022/06/08 19:23:40 UTC
[GitHub] [couchdb] nickva commented on issue #4044: Add support for collections

nickva commented on issue #4044:
URL: https://github.com/apache/couchdb/issues/4044#issuecomment-1150307821

   Thank you for your proposal and for reaching out, @nkosi23!
   
   Yeah, I think something like that would be nice to have. Or, just in general it would just be nice to remove the partitioned shard size caveat and have it all work as expected. Then partitions and collections would just work the same. 
   
   However, given how partitions are implemented currently that would be a bit hard to achieve. A partition prefix is used to decide which shard the document belongs to.  In the normal, non-partitioned case, the scheme is `hash(doc1) -> couch.1`, or `hash(doc2) -> couch.2` etc. In the partitioned case, document `x:a` does `hash(x)` and that ends up on `1.couch`. Document `y:b` does `hash(y)` and the hash result puts it on shard `2.couch`. So, if partition `x` has 1B documents, it would get quite large and essentially each partition is a bit like a `q=1` database. That's why that warning in the docs. Of course, there could be multiple partitions mapping to the same shard too, so `z:c` with `hash(z)` mapping to `1.couch` as well. 
   
   Now, the idea might be to have a multi-level shard mapping scheme. First based on the the collection/partition, then another level based on the rest of the doc id. So a `hash(x)` then would be followed by `hash(a)` which would determine if the document should live on `1-1.couch`, `1-2.couch` ... `1-N.couch` shard files etc. And that gets a bit tricky as you can imagine.
   
   At a higher level, this goes down to the fundamental sharding strategy used with their associated trade-offs:
      1) **random sharding (using hashing)** which is what CouchDB uses. This favors fast writes with uniformly distributed load across shards. However, it makes reads cumbersome as they have to spread the query to all the shard ranges and then wait for all to respond.
      2) **range based sharding** which is what other databases use (FoundationDB for instance), where the sorted key range is split. So, it might end up with ranges `a..r` on file `1.couch` and `s...z` on `2.couch`. This scheme makes it easy to end up with a single hot shard  during incrementing key updates (inserting 1M docs starting with "user15..."). And of course, as in shown in the example on purpose, the shards become easily imbalanced (a lot more `a..r` docs vs `s..z` ones). However, reads becomes really efficient, because now start/end key ranges can pick out just the relevant shards.
   
   Of course you can have multi-level schemes - levels of sharding followed by range based hashing or vice versa.
   
   Another interesting idea we had been kicking around is to have the main db shards use the random hashing scheme but indices using a range-based one to allow more efficient start/end key on views. That would be nice but have the same hot shard issue when building the views since now that won't be parallelized as easily.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@couchdb.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org