You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@iotdb.apache.org by GitBox <gi...@apache.org> on 2021/09/13 13:01:36 UTC

[GitHub] [iotdb] LebronAl edited a comment on issue #3954: Integrate Apache Ratis to help manage Raft status

LebronAl edited a comment on issue #3954:
URL: https://github.com/apache/iotdb/issues/3954#issuecomment-918163937

> What do you mean by "strictly speaking it does not guarantee linearizability"? In which case, IoTDB doesn't guarantee the linearizability? In my opinion, it doesn't matter IoTDB or other products, if you are using Raft, that means you need the linearizability which is what raft provided. If IoTDB doesn't guarantee the linearizability, I think it's a bug. Right now, I didn't find any of the case.

The devil of consensus lies in the corner case. So let me give you three examples that I know so far that may violate linearizability.
1. The uncommitted Raft logs are not persisted. Consider a scenario where node A is the leader, nodes B and C are followers, node A synchronizes the log to B and C and gets all acks, then submits and applies the log, and then returns success to the client before the next heartbeat to followers. Then there was a momentary power outage, nodes B and C restarted, and node A restarted slowly. Node B and C's log are empty after restarting and recovering, but they are already a majority, so they can serve the client's read request, then this may violate linearizability. Of course, this is strictly an implementation bug and can be fixed. But even if the bug is fixed, you can see that our raft log's serialization buffer refers to the implementation of the stand-alone WAL, it does not write to the disk and call async every time a log is written, which ensures that the performance will not be limited by the IOPS of the disk. But this may also cause the corner case mentioned above. This is t
he trade-off between performance and safety.
2. In fact, to ensure linearizability Raft uses read-index or lease-read. In our current implementation, direct-read is used for the leader and read-index is used for the follower. This can violate linear consistency when a node outage causes a replacement node to execute the same read request. For more specific examples, you can refer to my [blog](https://tanxinyu.work/consistency-and-consensus/). Of course, we could also use read-index on the leader, but this would undoubtedly degrade performance.
3. In fact, Raft's naive implementation guarantees at-least-once semantics, and to ensure linearizability semantics, uuid is generated on the client side and a map is logged on the server side to ensure that each command is executed only once. You can refer to section 6.2 of [Raft's PhD thesis](https://web.stanford.edu/~ouster/cgi-bin/papers/OngaroPhD.pdf) and dragonboat's discussion on [Zhihu](https://www.zhihu.com/question/278551592).There is no doubt that such an implementation will also affect performance.

>Application(Raft user) takes care of log apply, so it doesn't matter how Raft implemented.

I hope so~Maybe we need to make a detailed investigation on `Ratis`

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@iotdb.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org