You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@iotdb.apache.org by ja...@apache.org on 2020/07/23 09:33:50 UTC
[incubator-iotdb] 01/01: add docs for recover process

This is an automated email from the ASF dual-hosted git repository.

jackietien pushed a commit to branch RecoverDocs
in repository https://gitbox.apache.org/repos/asf/incubator-iotdb.git

commit 851302a9225e3588c791b1f03093bdd344f2dc1b
Author: JackieTien97 <Ja...@foxmail.com>
AuthorDate: Thu Jul 23 17:33:43 2020 +0800

    add docs for recover process
---
 docs/SystemDesign/StorageEngine/Recover.md    | 106 +++++++++++++++++++++++++
 docs/zh/SystemDesign/StorageEngine/Recover.md | 107 ++++++++++++++++++++++++++
 site/src/main/.vuepress/config.js             |   6 +-
 3 files changed, 217 insertions(+), 2 deletions(-)

diff --git a/docs/SystemDesign/StorageEngine/Recover.md b/docs/SystemDesign/StorageEngine/Recover.md
new file mode 100644
index 0000000..4d0f64e
--- /dev/null
+++ b/docs/SystemDesign/StorageEngine/Recover.md
@@ -0,0 +1,106 @@
+<!--
+
+    Licensed to the Apache Software Foundation (ASF) under one
+    or more contributor license agreements.  See the NOTICE file
+    distributed with this work for additional information
+    regarding copyright ownership.  The ASF licenses this file
+    to you under the Apache License, Version 2.0 (the
+    "License"); you may not use this file except in compliance
+    with the License.  You may obtain a copy of the License at
+
+        http://www.apache.org/licenses/LICENSE-2.0
+
+    Unless required by applicable law or agreed to in writing,
+    software distributed under the License is distributed on an
+    "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+    KIND, either express or implied.  See the License for the
+    specific language governing permissions and limitations
+    under the License.
+
+-->
+
+# Recovery Process
+
+Recovery are performed at the granularity of the storage group, and the entry point for recovery is the recover() of the StorageGroupProcessor
+
+## Recovery Process Of Storage Group
+
+* First get all the data files ending with .tsfile in the storage group, return TsFileResource, there are several file lists as follows
+
+	* Sequence Files
+		* 0.10 version tsfiles（sealed/unsealed）
+		* 0.9 version tsfiles（sealed）
+	* Unsequence Files
+		* 0.10 version tsfiles（sealed/unsealed）
+		* 0.9 version tsfiles（sealed）
+
+
+* If there exists 0.9 version TsFile in the storage group, add the old version's sequence and unsequence files to `upgradeSeqFileList` and `upgradeSeqFileList` respectively for upgrade and query.
+
+* Group sequence and unsequence files according to partition -- `Map<Long, List<TsFileResource>>`
+
+* To recover the sequential files of each partition, take the sequential TsFile of each partition obtained in the previous step as a parameter, and call `recoverTsFiles` to recover. This method will put the restored sequence TsFile into the `sequenceFileTreeSet` in the form of TsFileResource. If the TsFile is the last one of this partition and it is not sealed, construct a `TsFileProcessor` object for it and add it to `workSequenceTsFileProcessors`. The specific details of this method wi [...]
+
+* To recover the disordered files of each partition, take the unsequential TsFile of each partition obtained in the previous step as a parameter, and call `recoverTsFiles` to recover. This method will put the restored unsequence TsFile into the `unSequenceFileList ` in the form of TsFileResource. if the TsFile is the last one in this partition and it is not sealed, a `TsFileProcessor` object must be constructed for it and added to `workUnsequenceTsFileProcessors`. The specific details of [...]
+
+* Traverse the `sequenceFileTreeSet` and `unSequenceFileList` obtained in the previous two steps respectively, and update the version number corresponding to the partition
+
+* Check whether there is a Modification file during the merge, and call the `RecoverMergeTask.recoverMerge` method to recover the merge
+
+* Call the `updateLastestFlushedTime()` method to update the `latestTimeForEachDevice`, `partitionLatestFlushedTimeForEachDevice` and `globalLatestFlushedTimeForEachDevice` with sequential tsfile of version 0.9
+
+	* `latestTimeForEachDevice` records the latest timestamp under each partition that all devices have been inserted into (including unflushed and flushed)
+	* `partitionLatestFlushedTimeForEachDevice` records the latest timestamp of all devices under each partition that has been flushed. It is used to determine whether a newly inserted point is out of order.
+	* `globalLatestFlushedTimeForEachDevice` records the latest timestamp of all devices that have been flushed (a summary of the latest timestamps of each partition)
+
+* Finally traverse `sequenceFileTreeSet`, and use the restored sequence file to update `latestTimeForEachDevice`, `partitionLatestFlushedTimeForEachDevice` and `globalLatestFlushedTimeForEachDevice` again
+
+## Recover a TsFile(Seq/Unseq) of each partiton
+
+* org.apache.iotdb.db.engine.storagegroup.StorageGroupProcessor.recoverTsFiles
+
+This method is mainly responsible for traversing all TsFiles passed in and recovering them one by one.
+
+* Construct a `TsFileRecoverPerformer` object to recover the TsFile. The recovery logic is encapsulated in the `recover()` method of `TsFileRecoverPerformer` (details will be explained in the next section), which will return a restored `RestorableTsFileIOWriter `Object.
+
+	* If the recovery process fails, record the log and skip the tsfile
+
+* If the TsFile file is not the last file, or the TsFile file is the last file, but has been closed or marked as closed, just set the `closed` attribute of the `TsFileResource` object corresponding to the TsFile in memory to `true`.
+
+* If the TsFile file can continue to be written, it means that this is the last TsFile of this partition, and it is unsealed, and it will continue to remain unsealed. You need to construct a `TsFileProcessor` object for it and place it in ` workSequenceTsFileProcessors` or `workUnsequenceTsFileProcessors`.
+
+* Finally, put the corresponding `TsFileResource` object in the memory of the restored TsFile into `sequenceFileTreeSet` or `unSequenceFileList`
+
+
+### Details about recovering a TsFile
+
+* org.apache.iotdb.db.writelog.recover.TsFileRecoverPerformer.recover
+
+This method is mainly responsible for the recovery of each specific TsFile file.
+
+* First use the tsfile to construct a `RestorableTsFileIOWriter` object. In the construction method of `RestorableTsFileIOWriter`, the content of the tsfile will be checked and truncated if necessary
+	* If there is nothing in this file, write `MAGIC_STRING` and `VERSION_NUMBER` for it, and return directly. At this time, `crashed` is `false`, and `canWrite` is `true`;
+	* If there is content in this file, construct a `TsFileSequenceReader` object to parse the content, call the `selfCheck` method, truncate the incomplete content and initialize `truncatedSize` to `HeaderLength`
+		* If the content of the file is complete (have a complete header of `MAGIC_STRING` and `VERSION_NUMBER`, and a tail of `MAGIC_STRING`), return `TsFileCheckStatus.COMPLETE_FILE`
+		* If the file length is less than `HeaderLength(len(MAGIC_STRING) + len(VERSION_NUMBER))`, or the content of the file header is not `MAGIC_STRING + VERSION_NUMBER `, return `INCOMPATIBLE_FILE`
+		* If the file length is exactly equal to `HeaderLength`, and the file content is `MAGIC_STRING + VERSION_NUMBER`, then retunr `HeaderLength`
+		* If the file length is greater than `HeaderLength` and the file header is legal, but there is no `MAGIC_STRING` at the end of the file, it means that the file is incomplete and needs to be truncated. Read from `VERSION_NUMBER` position, read out the data in the following chunk, and recover the ChunkMetadata based on the data in the chunk. If you encounter `CHUNK_GROUP_FOOTER`, it means that the entire ChunkGroup is complete. Update `truncatedSize` to the current position
+		* Return `truncatedSize`
+	* truncated the file according to the returned `truncatedSize`
+		* If `truncatedSize` is equal to `TsFileCheckStatus.COMPLETE_FILE`, set `crashed` and `canWrite` to `false`, and close the output stream of the file
+		* If `truncatedSize` is equal to `TsFileCheckStatus.INCOMPATIBLE_FILE`, the output stream of the file is closed and an exception is thrown
+		* Otherwise, set `crashed` and `canWrite` to `true` and truncated the file to `truncatedSize`
+
+* Judge whether the file is complete by the returned RestorableTsFileIOWriter
+
+	* If the TsFile file is complete
+		* If the resource file corresponding to the TsFile exists, the resource file is deserialized (including the minimum and maximum timestamps of each device in the tsfile), and the file version number is restored
+		* If the resource file corresponding to the TsFile does not exist, regenerate the resource file and persist it to disk.
+		* Return the generated `RestorableTsFileIOWriter`
+
+	* If TsFile is incomplete
+		* Call `recoverResourceFromWriter` to recover the resource information through the ChunkMetadata information in `RestorableTsFileIOWriter`
+		* Call the `redoLogs` method to write the data in one or more wal files corresponding to this file to a temporary Memtable and persist to this incomplete TsFile
+			* For sequential files, skip WALs whose timestamp is less than or equal to the current resource
+			* For unsequential files, redo all WAL, it is possible to repeatedly write to ChunkGroup of multiple devices
+		* If the TsFile is not the last TsFile of the current partition, or there is a `.closing` file in the TsFile, call the `endFile()` method of `RestorableTsFileIOWriter` to seal the file, delete the `.closing` file and generates resource file for it.
\ No newline at end of file
diff --git a/docs/zh/SystemDesign/StorageEngine/Recover.md b/docs/zh/SystemDesign/StorageEngine/Recover.md
new file mode 100644
index 0000000..cd4735b
--- /dev/null
+++ b/docs/zh/SystemDesign/StorageEngine/Recover.md
@@ -0,0 +1,107 @@
+<!--
+
+    Licensed to the Apache Software Foundation (ASF) under one
+    or more contributor license agreements.  See the NOTICE file
+    distributed with this work for additional information
+    regarding copyright ownership.  The ASF licenses this file
+    to you under the Apache License, Version 2.0 (the
+    "License"); you may not use this file except in compliance
+    with the License.  You may obtain a copy of the License at
+
+        http://www.apache.org/licenses/LICENSE-2.0
+
+    Unless required by applicable law or agreed to in writing,
+    software distributed under the License is distributed on an
+    "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+    KIND, either express or implied.  See the License for the
+    specific language governing permissions and limitations
+    under the License.
+
+-->
+
+# 重启恢复流程
+
+重启恢复是以存储组为粒度进行的，恢复的入口是 StorageGroupProcessor 的 recover()
+
+## 存储组恢复流程
+
+* org.apache.iotdb.db.engine.storagegroup.StorageGroupProcessor.recover()
+
+* 首先获得该存储组下所有以.tsfile结尾的数据文件，返回 TsFileResource，共有如下几个文件列表
+
+* 顺序文件
+	* 0.10 版本的文件（封口/未封口）
+	* 0.9 版本的文件（封口）
+* 乱序文件
+	* 0.10 版本的文件（封口/未封口）
+	* 0.9 版本的文件（封口） 
+
+
+* 若该存储组下有 0.9 版本的 TsFile 文件，则将旧版本的顺序和乱序文件分别加入`upgradeSeqFileList`和`upgradeSeqFileList`中，供升级和查询使用。
+
+* 将顺序、乱序文件按照分区分组 Map<Long, List<TsFileResource>>
+
+* 恢复每个分区的顺序文件，将上一步获得的每个分区的顺序 TsFile 文件作为参数，调用`recoverTsFiles`进行恢复，该方法会将恢复后的顺序 TsFile 以TsFileResource 的形式放入`sequenceFileTreeSet`中，若该 TsFile 是此分区的最后一个，且未封口，则还要为其构造`TsFileProcessor`对象，并加入`workSequenceTsFileProcessors`中，该方法的具体细节会在下一小节阐述。
+
+* 恢复每个分区的乱序文件，将上一步获得的每个分区的乱序 TsFile 文件作为参数，调用`recoverTsFiles`进行恢复，该方法会将恢复后的乱序 TsFile 以 TsFileResource 的形式放入`unSequenceFileList`中，若该 TsFile 是此分区的最后一个，且未封口，则还要为其构造`TsFileProcessor`对象，并加入`workUnsequenceTsFileProcessors`中，该方法的具体细节会在下一小节阐述。
+
+* 分别遍历上两步得到的`sequenceFileTreeSet`和`unSequenceFileList`，更新分区对应的版本号
+
+* 检查有没有merge时候的Modification文件，并调用`RecoverMergeTask.recoverMerge`方法对merge进行恢复
+
+* 调用`updateLastestFlushedTime()`方法，用 0.9 版本的顺序tsfile文件，更新`latestTimeForEachDevice`, `partitionLatestFlushedTimeForEachDevice`以及`globalLatestFlushedTimeForEachDevice`
+
+	* `latestTimeForEachDevice` 记录了所有device已经插入的各个分区下的最新的时间戳(包括未flush的和已flush的)
+	* `partitionLatestFlushedTimeForEachDevice` 记录了所有device已经flush的各个分区下的最新的时间戳，它用来判断一个新插入的点是不是乱序点
+	* `globalLatestFlushedTimeForEachDevice` 记录了所有device已经flush的最新时间戳(是各个分区的最新时间戳的汇总)
+
+* 最后遍历`sequenceFileTreeSet`，用恢复出来的顺序文件，再次更新`latestTimeForEachDevice`, `partitionLatestFlushedTimeForEachDevice`以及`globalLatestFlushedTimeForEachDevice`
+
+## 恢复一个分区的（顺序/乱序） TsFile
+
+* org.apache.iotdb.db.engine.storagegroup.StorageGroupProcessor.recoverTsFiles
+
+该方法主要负责遍历传进来的所有 TsFile，挨个进行恢复。
+
+* 构造出`TsFileRecoverPerformer`对象，对 TsFile 文件进行恢复，恢复的逻辑封装在`TsFileRecoverPerformer`的`recover()`方法中（具体细节在下一小节展开阐述），该方法会返回一个恢复后的`RestorableTsFileIOWriter`对象。
+	* 若恢复过程失败，则记录log，并跳过该tsfile
+
+* 若该 TsFile 文件不是最后一个文件，或者该 TsFile 文件是最后一个文件，但已经被关闭或标记被关闭，只需将该 TsFile 文件在内存中对应的`TsFileResource`对象的`closed`属性置成`true`即可。 
+
+* 若该 TsFile 文件可以继续写入，则表示这是此分区的最后一个 TsFile，且未封口，则继续保持其未封口的状态，需要为它构造一个`TsFileProcessor`对象，并将其放到`workSequenceTsFileProcessors`或`workUnsequenceTsFileProcessors`中。
+
+* 最后将恢复出来的 TsFile 文件在内存中对应的`TsFileResource`对象放入`sequenceFileTreeSet`或`unSequenceFileList`中
+
+### 恢复一个 TsFile 文件
+
+* org.apache.iotdb.db.writelog.recover.TsFileRecoverPerformer.recover
+
+该方法主要负责每个具体的 TsFile 文件的恢复。
+
+* 首先用tsfile文件构造出一个`RestorableTsFileIOWriter`对象，在`RestorableTsFileIOWriter`的构造方法中，会对该tsfile的文件内容进行检查，必要时进行截断
+	* 如果这个文件中没有任何内容，则为其写入`MAGIC_STRING`和`VERSION_NUMBER`后，直接返回，此时的`crashed`为`false`，`canWrite`为`true`；
+	* 如果文件中有内容，构造`TsFileSequenceReader`对象对内容进行解析，调用`selfCheck`方法进行自检，并将不完整的内容截断，初始化`truncatedSize`为`HeaderLength`
+		* 若文件内容完整（有完整的头部的`MAGIC_STRING`和`VERSION_NUMBER`，以及尾部的`MAGIC_STRING`），则返回`TsFileCheckStatus.COMPLETE_FILE`
+		* 若文件长度小于`HeaderLength(len(MAGIC_STRING) + len(VERSION_NUMBER))`，或者文件头部内容不是`MAGIC_STRING`，则返回`INCOMPATIBLE_FILE`
+		* 若文件长度刚好等于`HeaderLength`，且文件内容就是`MAGIC_STRING + VERSION_NUMBER`，则返回`HeaderLength`
+		* 若文件长度大于`HeaderLength`，且文件头合法，但文件尾部没有`MAGIC_STRING`，表示该文件不完整，需要进行截断。从`VERSION_NUMBER`往后读，读出chunk中的数据，并根据chunk中的数据恢复出ChunkMetadata，若遇到`CHUNK_GROUP_FOOTER`，则表示整个ChunkGroup是完整的，更新`truncatedSize`至当前位置
+		* 返回`truncatedSize`
+	* 根据返回的`truncatedSize`，对文件进行截断
+		* 若`truncatedSize`等于`TsFileCheckStatus.COMPLETE_FILE`，则将`crashed`和`canWrite`置为`false`，并关闭文件的输出流
+		* 若`truncatedSize`等于`TsFileCheckStatus.INCOMPATIBLE_FILE`，则关闭文件的输出流，并抛异常
+		* 否则，将`crashed`和`canWrite`置为`true`，并将文件截断至`truncatedSize`
+
+		
+* 通过返回的 RestorableTsFileIOWriter 判断文件是否完整
+	
+	* 若该 TsFile 文件是完整的
+		* 若 TsFile 文件对应的 resource 文件存在，则将 resource 文件反序列化(包括每个设备在该tsfile文件中的最小和最大时间戳），并恢复文件版本号
+		* 若 TsFile 文件对应的 resource 文件不存在，则重新生成resource 文件
+		* 返回生成的 `RestorableTsFileIOWriter`
+
+	* 若 TsFile 不完整
+		* 调用`recoverResourceFromWriter`，通过`RestorableTsFileIOWriter`中的ChunkMetadata信息，恢复出resource信息
+		* 调用`redoLogs`方法将这个文件对应的一个或多个写前日志文件中的数据都写到一个临时 Memtable 中，并持久化到这个不完整的 TsFile 中
+			* 对于顺序文件，跳过时间戳小于等于当前 resource 的 WAL
+			* 对于乱序文件，将 WAL 全部重做，有可能重复写入多个 device 的 ChunkGroup
+		* 如果该 TsFile 不是当前分区的最后一个 TsFile，或者该 TsFile 有`.closing`文件存在，则调用`RestorableTsFileIOWriter`的`endFile()`方法，将文件封口，并删除`.closing`文件，并为其生成resource文件
\ No newline at end of file
diff --git a/site/src/main/.vuepress/config.js b/site/src/main/.vuepress/config.js
index ec97b4b..a94fa84 100644
--- a/site/src/main/.vuepress/config.js
+++ b/site/src/main/.vuepress/config.js
@@ -537,7 +537,8 @@ var config = {
 							['StorageEngine/FlushManager','FlushManager'],
 							['StorageEngine/MergeManager','MergeManager'],
 							['StorageEngine/DataPartition','DataPartition'],
-							['StorageEngine/DataManipulation','DataManipulation']
+							['StorageEngine/DataManipulation','DataManipulation'],
+							['StorageEngine/Recover','Recover']
 						]
 					},
 					{
@@ -1052,7 +1053,8 @@ var config = {
 							['StorageEngine/FlushManager','FlushManager'],
 							['StorageEngine/MergeManager','文件合并机制'],
 							['StorageEngine/DataPartition','数据分区'],
-							['StorageEngine/DataManipulation','数据增删改']
+							['StorageEngine/DataManipulation','数据增删改'],
+							['StorageEngine/Recover','重启恢复'],
 						]
 					},
 					{