You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@iotdb.apache.org by GitBox <gi...@apache.org> on 2021/09/13 08:21:40 UTC

[GitHub] [iotdb] chengjianyun commented on pull request #3939: init dummyIndex after restart cluster

chengjianyun commented on pull request #3939:
URL: https://github.com/apache/iotdb/pull/3939#issuecomment-917955815


   > To solve these problems completely, I think `raftLogManager` needs a complete refactoring, including but not limited to better concurrency control, better persistence strategy, etc.
   > 
   > When I first joined the community in my senior year, the first thing I did was to implement `raftLogManager` with reference to `storage` and `unstable` interfaces in `etcd`. Because I didn't have a very deep understanding of `raft` and `etcd` at that time, after a long period, I finally realized that this was probably an awful implementation and I apologize for that. There are two main reasons:
   > 
   > * For `raft`, uncommitted logs also need to be persisted (i.e., logs on disk may also need to be truncated) in order to ensure correctness after reboot.
   > * `etcd` is an event-driven architecture, so its interior is all unlocked. However, in our architecture, `raftLogManager` can be accessed concurrently by multiple threads. So we've added a lot of patches for concurrency control right now, but this is actually an area we can think about implementing better.
   > 
   > Thank you very much for your great contributions. This week I will focus on the PR you raised about `cluster-` branch and a bug at hand. After that, I would like to discuss the refactoring of `raftLogManager` with you guys and community. And of course you're welcome to start to design the new `raftLogManager` with us right now if you are free.
   
   Thanks for your implementation so that we can enjoy the first cluster version of IoTDB  :). Raft algorithm is really hard to implementation, there are too many corner cases to consideration. We could make the algorithm work correct in most of cases in a short time which is terrific.
   
   In my opinion, we need to spend too much effort to guarantee the correctness of Raft if we want to keep working on the current cluster implementation. Cluster module lack a whole system design according to my feeling. 
   
   Another my suggestion is we can gradually separate raft implementation and raft client(application) just like what `etcd` did. The goal of this is, finally, we can involve `Ratis` to help manage the Raft status. Maybe it's time to put `Ratis` on the agenda. I strongly suggest someone spend some time to investigate `Ratis` and evaluate the cost of integration. Instead of Raft correctness, We could focus on improving the core engine of IoTDB and ecosystem after that.
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@iotdb.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org