You are viewing a plain text version of this content. The canonical link for it is here.

Posted to notifications@iotdb.apache.org by "Chao Wang (Jira)" <ji...@apache.org> on 2021/03/04 06:51:00 UTC

[jira] [Created] (IOTDB-1182) [Distributed] improve query performance in follower node

Chao Wang created IOTDB-1182:
--------------------------------

             Summary: [Distributed] improve query performance  in follower node
                 Key: IOTDB-1182
                 URL: https://issues.apache.org/jira/browse/IOTDB-1182
             Project: Apache IoTDB
          Issue Type: Improvement
          Components: Core/Cluster
            Reporter: Chao Wang
            Assignee: Chao Wang
             Fix For: 0.12.0


In the raft algorithm, because the leader always has the latest data, the query requests sent by the client should be executed at the leader. However, this will cause the read and write requests to go to the leader, which makes the single point pressure too large, thus affecting the overall throughput of the system. Therefore, in order to reduce hot issues, we can use follower linear read to support follower reading. The cost is that when the follower needs to execute a read request, it needs to ensure that the local applyindex is greater than or equal to the leader's commitindex before reading the state. Therefore, you need to send an RPC to the leader, and judge according to the returned result. If the applyindex of the follower is greater than or equal to the commitindex of the leader, you can query directly. Otherwise, you should passively wait until the local applyindex is greater than or equal to the commitindex of the leader. Considering that there may be temporary inconsistency between the follower and the leader when reading and writing concurrent, it is easy to enter the passive waiting for catchup detection phase when the query is routed to the follower. At this time, the batch of data that is different between the follower and the leader is likely to be in the network, so the best way is for the follower to wait passively and not do any operation. In case of RPC loss or partition, the leader needs to actively detect the inconsistency of the follower data and initiate catchup,            Because the cycle of leader detecting follower is heartbeat interval, which is 1 s by default, and in order to reduce traffic abuse, catchup will be initiated only after 5 times of detection. Once entering this stage, the expected time of waiting for inconsistency to be detected is 2500 MS, which may greatly increase the latency of system read request. Fortunately, this extreme condition should be rare.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)