You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@couchdb.apache.org by GitBox <gi...@apache.org> on 2021/07/13 16:59:58 UTC

[GitHub] [couchdb-documentation] nickva commented on pull request #651: Add RFC on sharded changes

nickva commented on pull request #651:
URL: https://github.com/apache/couchdb-documentation/pull/651#issuecomment-879250910


   Agree with @rnewson . Even if we switch the index storage format to allow paralelizable updates, adding a static Q would be a step back it seem.
   
   One issue is  at the user/API level. We'd bring back Q, which we didn't want to have to deal with now using FDB. And then in the code, we just removed sharding code in fabric, I am not too excited about bringing parts of it back, unless it's a last resort and nothing else works. We invent some auto-sharding of course, but that would be even more complexity.
   
   It seems we'd also want to separate a bit better change feed improvements vs indexing improvements. Could we speed up indexing without a static Q sharding of change feed with all the API changes involved and hand-written resharding code (epochs) and hard values?
   
   I think we can, if we invent a new index structure that allow paralelizable updates. Like say an inverted json index for Mango Queries based on https://github.com/cockroachdb/cockroach/blob/master/docs/RFCS/20171020_inverted_indexes.md. 
   
   The idea I had was to use locality API to split the _changes feed into sub-sequences, and either start a separate couch_jobs job (or just processes under a single couch_job indexer) to fetch docs, process and write to the index in parallel. So, if the _changes sequence looks like `[10, 20, 25, 30]`, locality API might split them as `[10, 20]`, `[25, 30]`. Then two indexers would index those in parallel. In the meantime the doc at sequence 20, could be updated to and now be at sequence [35]. Then we'd catch up from 35 to up the next db sequence and so on. The benefit there would be to avoid managing a static Q at all. The downside is it would work only for a write-paralelizable index and would only work if we "hide" the index being built in the background from queries (as it would look quite odd with as it wouldn't built in changes feed order). Then, once it's built, if we can update the index transactionally, we'd get consistent reads on it.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@couchdb.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org