You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@doris.apache.org by zh...@apache.org on 2019/10/25 14:27:48 UTC

[incubator-doris] branch master updated: Update doc for FE metadata recover (#2073)

This is an automated email from the ASF dual-hosted git repository.

zhaoc pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-doris.git


The following commit(s) were added to refs/heads/master by this push:
     new 1859819  Update doc for FE metadata recover (#2073)
1859819 is described below

commit 1859819aa7b03af74d16d9ca56d12009441138ea
Author: kangkaisen <ka...@apache.org>
AuthorDate: Fri Oct 25 22:27:41 2019 +0800

    Update doc for FE metadata recover (#2073)
---
 .../operation/metadata-operation.md                 | 21 ++++++++++++++++++++-
 .../operation/metadata-operation_EN.md              | 19 +++++++++++++++++++
 2 files changed, 39 insertions(+), 1 deletion(-)

diff --git a/docs/documentation/cn/administrator-guide/operation/metadata-operation.md b/docs/documentation/cn/administrator-guide/operation/metadata-operation.md
index f10d2f4..6f1c819 100644
--- a/docs/documentation/cn/administrator-guide/operation/metadata-operation.md
+++ b/docs/documentation/cn/administrator-guide/operation/metadata-operation.md
@@ -232,7 +232,26 @@ FE 目前有以下几个端口
 
     修改配置后,直接重启 FE 即可。这个只影响到 mysql 的连接目标。
 
-    
+
+### 从 FE 内存中恢复元数据
+
+在某些极端情况下,磁盘上 image 文件可能会损坏,但是内存中的元数据是完好的,此时我们可以先从内存中 dump 出元数据,再替换掉磁盘上的 image 文件,来恢复元数据,整个**不停查询服务**的操作步骤如下:
+1. 集群停止所有 Load,Create,Alter 操作
+2. 执行以下命令,从 Master FE 内存中 dump 出元数据:(下面称为 image_mem)
+```
+curl -u $root_user:$password http://$master_hostname:8410/dump
+```
+3. 用 image_mem 文件替换掉 OBSERVER FE 节点上`meta_dir/image`目录下的 image 文件,重启 OBSERVER FE 节点,
+验证 image_mem 文件的完整性和正确性(可以在 FE Web 页面查看 DB 和 Table 的元数据是否正常,查看fe.log 是否有异常,是否在正常 replayed journal)
+4. 依次用 image_mem 文件替换掉 FOLLOWER FE 节点上`meta_dir/image`目录下的 image 文件,重启 FOLLOWER FE 节点,
+确认元数据和查询服务都正常
+5. 用 image_mem 文件替换掉 Master FE 节点上`meta_dir/image`目录下的 image 文件,重启 Master FE 节点,
+确认 FE Master 切换正常, Master FE 节点可以通过 checkpoint 正常生成新的 image 文件
+6. 集群恢复所有 Load,Create,Alter 操作
+
+**注意:如果 Image 文件很大,整个操作过程耗时可能会很长,所以在此期间,要确保 Master FE 不会通过 checkpoint 生成新的 image 文件。
+当观察到 Master FE 节点上 `meta_dir/image`目录下的 `image.ckpt` 文件快和 `image.xxx` 文件一样大时,可以直接删除掉`image.ckpt` 文件。**
+
 ## 最佳实践
 
 FE 的部署推荐,在 [安装与部署文档](../../installing/install-deploy.md) 中有介绍,这里再做一些补充。
diff --git a/docs/documentation/en/administrator-guide/operation/metadata-operation_EN.md b/docs/documentation/en/administrator-guide/operation/metadata-operation_EN.md
index 7f30ae9..0dd119e 100644
--- a/docs/documentation/en/administrator-guide/operation/metadata-operation_EN.md
+++ b/docs/documentation/en/administrator-guide/operation/metadata-operation_EN.md
@@ -232,6 +232,25 @@ FE currently has the following ports
 
 	After modifying the configuration, restart FE directly. This only affects mysql's connection target.
 
+### Recover metadata from FE memory
+In some extreme cases, the image file on the disk may be damaged, but the metadata in the memory is intact. At this point, we can dump the metadata from the memory and replace the image file on the disk to recover the metadata. the entire non-stop query service operation steps are as follows:
+
+1. Stop all Load, Create, Alter operations.
+
+2. Execute the following command to dump metadata from the Master FE memory: (hereafter called image_mem)
+```
+curl -u $root_user:$password http://$master_hostname:8410/dump
+```
+3. Replace the image file in the `meta_dir/image` directory on the OBSERVER FE node with the image_mem file, restart the OBSERVER FE node, and verify the integrity and correctness of the image_mem file. You can check whether the DB and Table metadata are normal on the FE Web page, whether there is an exception in `fe.log`, whether it is in a normal replayed jour.
+
+4. Replace the image file in the `meta_dir/image` directory on the FOLLOWER FE node with the image_mem file in turn, restart the FOLLOWER FE node, and confirm that the metadata and query services are normal.
+
+5. Replace the image file in the `meta_dir/image` directory on the Master FE node with the image_mem file, restart the Master FE node, and then confirm that the FE Master switch is normal and The Master FE node can generate a new image file through checkpoint.
+
+6. Recover all Load, Create, Alter operations.
+
+**Note: If the Image file is large, the entire process can take a long time, so during this time, make sure Master FE does not generate a new image file via checkpoint. When the image.ckpt file in the meta_dir/image directory on the Master FE node is observed to be as large as the image.xxx file, the image.ckpt file can be deleted directly.**
+
 
 ## Best Practices
 


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org