You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@iotdb.apache.org by 李思佳 <li...@360.cn> on 2022/05/23 04:04:10 UTC

答复: Re: Re: Flush function in cluster

In fact, this is because we cannot compare tsFiles to determine whether the replica data is consistent.

If the user flush ensures that all copies are flushed, then the next restart, we only need to check whether the operation after this flush is consistent and update it.

Otherwise, when the follower is much behind the leader and we need to catch up via tsfile, is there a copy of the all data files?    

BR,
-----------------------------------
Sijia Li


-----邮件原件-----
发件人: Xiangdong Huang <sa...@gmail.com> 
发送时间: 2022年5月23日 11:52
收件人: dev <de...@iotdb.apache.org>
主题: Re: Re: Flush function in cluster

> " flush can reduce memory and speed up the restart process" , this
assumes that all copies have been flushed synchronously, so we can ensure that the data files are logically consistent at this point.

Sorry that maybe I lag behind current cluster design..
Do we need "all copies have been flushed synchronously, so we can ensure that the data files are logically consistent at this point" ? why? because of the raft protocol?


-----------------------------------
Xiangdong Huang
School of Software, Tsinghua University

 黄向东
清华大学 软件学院


李思佳 <li...@360.cn> 于2022年5月23日周一 11:47写道：

> " flush can reduce memory and speed up the restart process" , this 
> assumes that all copies have been flushed synchronously, so we can 
> ensure that the data files are logically consistent at this point.
>
> The operation of datanode flushing should be the process of resource 
> release before the node is shutdown(but this does not guarantee that 
> all copies are logically consistent at this point). For example, 
> shutdownHook requires the default disk flushing and resource release. 
> We need to provide a flush command scenario, perhaps because our node 
> shutdown operation is not incomplete?
>
> BR,
> -----------------------------------
> Sijia Li
>
>
> -----邮件原件-----
> 发件人: Xiangdong Huang <sa...@gmail.com>
> 发送时间: 2022年5月23日 11:37
> 收件人: dev <de...@iotdb.apache.org>
> 主题: Re: Flush function in cluster
>
> I think distinguishing flushing on one node or on the cluster has its 
> meaning.
>
> As you said, flush can reduce memory and speed up the restart process. 
> So, how about if the DBA just wants to restart one node..
>
> However, the default behavior can be discussed: flush on one node by 
> default or on the whole cluster by default.
>
> -----------------------------------
> Xiangdong Huang
> School of Software, Tsinghua University
>
>  黄向东
> 清华大学 软件学院
>
>
> 李思佳 <li...@360.cn> 于2022年5月23日周一 11:28写道：
>
> > Sorry, I don't understand what the purpose and use of flushing 
> > current datanode is.
> >
> > IMO, flush all should mean that all storage group could be flushed, 
> > in another word, flush sg is a subset of flush all.
> >
> > For users, distributed is a black box, while SG is an exposed structure.
> > Therefore, for cli commands, there is no need to be aware of the 
> > relationship between the datanode and the self-created SG.
> >
> > In addition, the Flush operation may speed up our restart recovery 
> > process. For example, when we flush an SG successfully, we can label 
> > the associated data files to indicate that all copies are consistent 
> > at that point in time(here are flush and write priorities). During 
> > the next restart, we can use this flag to quickly skip the verification step.
> >
> > In summary, here are my questions and thoughts:
> > 1. Is it necessary to flush a dataNode? What are the benefits of this?
> > 2. Can the Flush operation affect the consensus group or WAL for a 
> > quick restart?
> >
> > BR,
> > -----------------------------------
> > Sijia Li
> >
> >
> > -----邮件原件-----
> > 发件人: Jialin Qiao <qi...@apache.org>
> > 发送时间: 2022年5月23日 11:07
> > 收件人: dev@iotdb.apache.org
> > 主题: Flush function in cluster
> >
> > Hi,
> >
> > Flush is a frequently used command in IoTDB, which flushes memtable 
> > into disk and closes all tsfiles.
> >
> > In the new cluster, we need to redefine this function [1].
> >
> > * flush: flushing current datanode
> >
> > * flush all/cluster: flushing all datanodes
> >
> > * flush sg: flush all DataRegions of a storage group
> >
> >
> > What do you think?
> >
> > [1] https://issues.apache.org/jira/browse/IOTDB-3099
> >
> > —————————————————
> > Jialin Qiao
> > Apache IoTDB PMC
> >
>

Re: Re: Re: Flush function in cluster

Posted by Jialin Qiao <qi...@apache.org>.

Hi,

We cannot ensure that all replicas has the same tsfile, except for user
flush, the storage engine will auto flush memtables according to its memory
usage. We can not guarantee different nodes has the same memory.

As for accelerating restart and catch up in the cluster, this is the
responsibility of the snapshot of the consensus layer, not related to the
user flush.
The snapshot is a behavior of one replica: call flush of storage engine,
record the tsfiles.

Thanks,
—————————————————
Jialin Qiao
Apache IoTDB PMC


李思佳 <li...@360.cn> 于2022年5月23日周一 12:04写道：

> In fact, this is because we cannot compare tsFiles to determine whether
> the replica data is consistent.
>
> If the user flush ensures that all copies are flushed, then the next
> restart, we only need to check whether the operation after this flush is
> consistent and update it.
>
> Otherwise, when the follower is much behind the leader and we need to
> catch up via tsfile, is there a copy of the all data files?
>
> BR,
> -----------------------------------
> Sijia Li
>
>
> -----邮件原件-----
> 发件人: Xiangdong Huang <sa...@gmail.com>
> 发送时间: 2022年5月23日 11:52
> 收件人: dev <de...@iotdb.apache.org>
> 主题: Re: Re: Flush function in cluster
>
> > " flush can reduce memory and speed up the restart process" , this
> assumes that all copies have been flushed synchronously, so we can ensure
> that the data files are logically consistent at this point.
>
> Sorry that maybe I lag behind current cluster design..
> Do we need "all copies have been flushed synchronously, so we can ensure
> that the data files are logically consistent at this point" ? why? because
> of the raft protocol?
>
>
> -----------------------------------
> Xiangdong Huang
> School of Software, Tsinghua University
>
>  黄向东
> 清华大学 软件学院
>
>
> 李思佳 <li...@360.cn> 于2022年5月23日周一 11:47写道：
>
> > " flush can reduce memory and speed up the restart process" , this
> > assumes that all copies have been flushed synchronously, so we can
> > ensure that the data files are logically consistent at this point.
> >
> > The operation of datanode flushing should be the process of resource
> > release before the node is shutdown(but this does not guarantee that
> > all copies are logically consistent at this point). For example,
> > shutdownHook requires the default disk flushing and resource release.
> > We need to provide a flush command scenario, perhaps because our node
> > shutdown operation is not incomplete?
> >
> > BR,
> > -----------------------------------
> > Sijia Li
> >
> >
> > -----邮件原件-----
> > 发件人: Xiangdong Huang <sa...@gmail.com>
> > 发送时间: 2022年5月23日 11:37
> > 收件人: dev <de...@iotdb.apache.org>
> > 主题: Re: Flush function in cluster
> >
> > I think distinguishing flushing on one node or on the cluster has its
> > meaning.
> >
> > As you said, flush can reduce memory and speed up the restart process.
> > So, how about if the DBA just wants to restart one node..
> >
> > However, the default behavior can be discussed: flush on one node by
> > default or on the whole cluster by default.
> >
> > -----------------------------------
> > Xiangdong Huang
> > School of Software, Tsinghua University
> >
> >  黄向东
> > 清华大学 软件学院
> >
> >
> > 李思佳 <li...@360.cn> 于2022年5月23日周一 11:28写道：
> >
> > > Sorry, I don't understand what the purpose and use of flushing
> > > current datanode is.
> > >
> > > IMO, flush all should mean that all storage group could be flushed,
> > > in another word, flush sg is a subset of flush all.
> > >
> > > For users, distributed is a black box, while SG is an exposed
> structure.
> > > Therefore, for cli commands, there is no need to be aware of the
> > > relationship between the datanode and the self-created SG.
> > >
> > > In addition, the Flush operation may speed up our restart recovery
> > > process. For example, when we flush an SG successfully, we can label
> > > the associated data files to indicate that all copies are consistent
> > > at that point in time(here are flush and write priorities). During
> > > the next restart, we can use this flag to quickly skip the
> verification step.
> > >
> > > In summary, here are my questions and thoughts:
> > > 1. Is it necessary to flush a dataNode? What are the benefits of this?
> > > 2. Can the Flush operation affect the consensus group or WAL for a
> > > quick restart?
> > >
> > > BR,
> > > -----------------------------------
> > > Sijia Li
> > >
> > >
> > > -----邮件原件-----
> > > 发件人: Jialin Qiao <qi...@apache.org>
> > > 发送时间: 2022年5月23日 11:07
> > > 收件人: dev@iotdb.apache.org
> > > 主题: Flush function in cluster
> > >
> > > Hi,
> > >
> > > Flush is a frequently used command in IoTDB, which flushes memtable
> > > into disk and closes all tsfiles.
> > >
> > > In the new cluster, we need to redefine this function [1].
> > >
> > > * flush: flushing current datanode
> > >
> > > * flush all/cluster: flushing all datanodes
> > >
> > > * flush sg: flush all DataRegions of a storage group
> > >
> > >
> > > What do you think?
> > >
> > > [1] https://issues.apache.org/jira/browse/IOTDB-3099
> > >
> > > —————————————————
> > > Jialin Qiao
> > > Apache IoTDB PMC
> > >
> >
>