You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@couchdb.apache.org by GitBox <gi...@apache.org> on 2020/01/10 23:49:01 UTC

[GitHub] [couchdb] nickimho commented on issue #2329: Add option to enforce fetching data from local shards, instead of from shards on on remote nodes(if data present on local node)

nickimho commented on issue #2329: Add option to enforce fetching data from local shards, instead of from shards on on remote nodes(if data present on local node)
URL: https://github.com/apache/couchdb/issues/2329#issuecomment-573249534
 
 
   We tested some behavior with 3 zone cluster (each zone with 5 nodes, n=3, q=1, and placement is one in each zone{a,b,c}). For us, we use network impairment tools so that there is 60ms RTD between each zone. We used CouchDB2.3.1
   
   	1. Terms/Definition
                   a. Client – This is the host that initiates the query to couchdb's port 5984
                   b. Couchdb_QUERY_NODE – This is the couchdb node in cluster that receives the database query from Client on port 5984. This node may or may NOT be the node that holds shard for the database. 
                   c. Couchdb_METALOOKUP_NODE – This is the couchdb node that Couchdb_QUERY_NODE queries for some meta info (not sure what it is). Couchdb_METALOOKUP_NODE is a node in Couchdb_DATA_NODES. The selection of this Couchdb_METALOOKUP_NODE
   				    i The selection of Couchdb_METALOOKUP_NODE is based on "by_range" key in the couchdb:5986/dbs/mydb. The first one in the array ia picked.
                   d. Couchdb_DATA_NODES – This is the set of couchdb nodes that actually hold a copy of the database asked by the query.
   	2.  General data flow we observed:
                   a.  General data flow for doc query:
   					i.  Client -> Couchdb_QUERY_NODE:5984
   					ii. Couchdb_QUERY_NODE -> Couchdb_METALOOKUP_NODE:11500  
   						1. This selection is determinitic based on 1.c.i. Suppose Couchdb_DATA_NODES in zonea for mydb is first in "by_range" key, it will always be queried for this phase. This makes queries into mydb from zonec and zonb having an additional 60ms RTD network delay compared to zonea.  
   					iii. Couchdb_QUERY_NODE -> “three Couchdb_DATA_NODES”:11500
   						1. Once enough Couchdb_DATA_NODE’s (default read quorum is 2 when n=3) returns data, this phase stops
   					iv.  Couchdb_QUERY_NODE->Client with query result
                   b. View query largely follows the same as doc. Except for the following:
   					i. Couchdb_QUERY_NODE seems to cache the View definition/metadata
   						1. During the first query to /mydb/_view/myview, it will retrieve the the view doc following 2.a process
   							a. subsequent query to /mydb/_view/myview would bypass this. 
   					2. When Couchdb_QUERY_NODE actually retrieve the myview result, it seems to ONLY query the Couchdb_DATA_NODES in the SAME zone as itself. This is good as it saves bandwidth for large returns between zones.
   
   We haven't tested attachment retrieve yet, but it seems to me that it should follow the same view query logic in 2.b.2 if not already. We will try to test this some time next week. 
   
   Also, not sure if this needs to be a different ticket, but we would really like to see 2.a.ii.1 to be optimized so that it would query its local zone Couchdb_METALOOKUP_NODE first. Currently, we plan to workaround this by change the "by_range" order in 5986/dbs/mydb to favorite the primary zone for our service. 
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services