You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@iotdb.apache.org by GitBox <gi...@apache.org> on 2021/09/13 03:17:50 UTC
[GitHub] [iotdb] cigarl commented on pull request #3939: init dummyIndex after restart cluster

cigarl commented on pull request #3939:
URL: https://github.com/apache/iotdb/pull/3939#issuecomment-917804418


   > To solve these problems completely, I think `raftLogManager` needs a complete refactoring, including but not limited to better concurrency control, better persistence strategy, etc.
   > 
   > When I first joined the community in my senior year, the first thing I did was to implement `raftLogManager` with reference to `storage` and `unstable` interfaces in `etcd`. Because I didn't have a very deep understanding of `raft` and `etcd` at that time, after a long period, I finally realized that this was probably an awful implementation and I apologize for that. There are two main reasons:
   > 
   > * For `raft`, uncommitted logs also need to be persisted (i.e., logs on disk may also need to be truncated) in order to ensure correctness after reboot.
   > * `etcd` is an event-driven architecture, so its interior is all unlocked. However, in our architecture, `raftLogManager` can be accessed concurrently by multiple threads. So we've added a lot of patches for concurrency control right now, but this is actually an area we can think about implementing better.
   > 
   > Thank you very much for your great contributions. This week I will focus on the PR you raised about `cluster-` branch and a bug at hand. After that, I would like to discuss the refactoring of `raftLogManager` with you guys and community. And of course you're welcome to start to design the new `raftLogManager` with us right now if you are free.
   
   IMO，we need the following two approaches in parallel:
   
   - First,we should solve this problem in the current code structure and submit to `master` and `rel/0.12.x` branch. Because when a user has a problem like mine, it's hard to solve immediately, this means that some data could be loss, and cluster may enter an unstable state.This is a disaster for the production environment.We need to provide users with a patch to avoid this problem.
   - Second,we can refactor this part on `cluster-` branch.Once we have solved the first problem, we can spend more time refactoring this part better.  And we don't have to push this part of the content to the user immediately.It can be put into the next release as a better implementation.
   
   I'd be happy to do it together   : )


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@iotdb.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org