You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@iotdb.apache.org by GitBox <gi...@apache.org> on 2021/09/13 12:25:31 UTC

[GitHub] [iotdb] LebronAl commented on issue #3954: Integrate Apache Ratis to help manage Raft status

LebronAl commented on issue #3954:
URL: https://github.com/apache/iotdb/issues/3954#issuecomment-918139858


   > As I see the raft is one consensus algorithm, which may do not have many relations with the linearizability, just put the post[1] for reference.
   
   In academic terms, whether it is Raft, Multi-Paxos or Zab, these consensus algorithms exist only for multi-node consensus, and have nothing to do with consistency or storage systems. I admit that.
   
   In engineering, what we call Raft is often combined with replication state machines and storage systems. In this context, the relationship between Raft and linearizability is very large.
   
   Take the TiKV you quoted in the blog for example. For writing, Raft needs to ensure that the log which received the majority of acks can be committed. For reading, Raft needs to use Read-Index or Lease-Read to ensure linearizability. These designs are actually decisive factors affecting performance, and the key lies in whether we need such safety. However, such core logic is likely to be hard-coded in the Raft library and cannot be changed (if it can be changed, you can ignore me). But in fact, for OLAP scenarios, do we really need such a high level of consistency? For example, [tdengine's](https://www.taosdata.com/cn/documentation/architecture#replication) default synchronization strategy is asynchronous replication rather than majority.
   
   For another example, Zookeeper's ZAB algorithm guarantees sequential consistency, which is a slightly weaker than linearizable consistency but still a strong consistency level (refer to the [jepsen](https://jepsen.io/consistency) official website). Then in theory, it has higher performance than the Raft algorithm that guarantees linearizability.
   
   ![image](https://user-images.githubusercontent.com/32640567/133080048-8c5bb062-eacc-4334-804b-99321087f923.png)
   
   I'm not talking about which solution is better. I just think that since we are talking about refactoring, we should think clearly about the guarantee that the entire data model can provide to the outside world. Because they may often determine the upper bound of performance, and in most cases, these are trade-offs between safety and performance. If we don't need some level of safety, then we can definitely go for better performance.
   
   > As far as I know, the apply function is user-defined, we can still implement a parallel apply function according to different storage groups in the one raft log.
   
   In the etcd [example](https://github.com/etcd-io/etcd/blob/main/contrib/raftexample/kvstore.go), we do have the freedom to handle the commit logs, thanks to their excellent abstraction. In ratis's [example](https://github.com/apache/ratis/blob/master/ratis-examples/src/main/java/org/apache/ratis/examples/counter/server/CounterStateMachine.java#L182), it looks like they only exposed one `applyTransaction` interface for us to override, and I doubt whether we can implement our parallel asynchronous apply optimizations. Of course, I've just had a glance. This area needs further investigation. I just hope some of our optimizations don't go away after the migration.
   
   > What is certain is that using a raft library will definitely limit our optimization work compared with the current implementation (mixing the raft framework and business logic), but I think the availability and correctness are far greater than the performance for now.
   
   If performance drops a little bit after migration, I support it. But if it's a big drop, I think it still needs to be considered very carefully. Of course, a preliminary conclusion can be made after further investigation.
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@iotdb.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org