You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@doris.apache.org by ya...@apache.org on 2022/06/14 01:20:09 UTC

[incubator-doris] branch master updated: [docs] Add common error messages to doris backup (#10048)

This is an automated email from the ASF dual-hosted git repository.

yangzhg pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-doris.git


The following commit(s) were added to refs/heads/master by this push:
     new dc4761593b [docs] Add common error messages to doris backup (#10048)
dc4761593b is described below

commit dc4761593b009261fbe6fa475fb60db935485ba4
Author: caoliang-web <71...@users.noreply.github.com>
AuthorDate: Tue Jun 14 09:20:04 2022 +0800

    [docs] Add common error messages to doris backup (#10048)
---
 docs/en/docs/admin-manual/data-admin/backup.md    | 5 ++++-
 docs/zh-CN/docs/admin-manual/data-admin/backup.md | 3 +++
 2 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/docs/en/docs/admin-manual/data-admin/backup.md b/docs/en/docs/admin-manual/data-admin/backup.md
index 52e677c3c8..e3339e68a4 100644
--- a/docs/en/docs/admin-manual/data-admin/backup.md
+++ b/docs/en/docs/admin-manual/data-admin/backup.md
@@ -145,7 +145,10 @@ It is recommended to import the new and old clusters in parallel for a period of
 3. Both backup and recovery support operations at the minimum partition (Partition) level. When the amount of data in the table is large, it is recommended to perform operations by partition to reduce the cost of failed retry.
 4. Because of the backup and restore operations, the operations are the actual data files. Therefore, when a table has too many shards, or a shard has too many small versions, it may take a long time to backup or restore even if the total amount of data is small. Users can use `SHOW PARTITIONS FROM table_name;` and `SHOW TABLET FROM table_name;` to view the number of shards in each partition and the number of file versions in each shard to estimate job execution time. The number of files [...]
 5. When checking job status via `SHOW BACKUP` or `SHOW RESTORE` command. It is possible to see error messages in the `TaskErrMsg` column. But as long as the `State` column is not `CANCELLED`, the job is still continuing. These tasks may retry successfully. Of course, some Task errors will also directly cause the job to fail.
-6. If the recovery job is an overwrite operation (specifying the recovery data to an existing table or partition), then from the `COMMIT` phase of the recovery job, the overwritten data on the current cluster may no longer be restored. If the restore job fails or is canceled at this time, the previous data may be damaged and inaccessible. In this case, the only way to do it is to perform the recovery operation again and wait for the job to complete. Therefore, we recommend that if unnece [...]
+   Common `TaskErrMsg` errors are as follows:
+      Q1: Backup to HDFS, the status shows UPLOADING, TaskErrMsg error message: [13333: Close broker writer failed, broker:TNetworkAddress(hostname=10.10.0.0, port=8000) msg:errors while close file output stream, cause by: DataStreamer Exception : ]
+      This is generally a network communication problem. Check the broker log to see if a certain ip or port is blocked. If it is a cloud service, you need to check whether is accessed the intranet. If so, you can add hdfs-site.xml in the broker/conf folder, you need to add dfs.client.use.datanode.hostname=true under the hdfs-site.xml configuration file, and configure the hostname mapping of the HADOOP cluster on the broker node.
+7. If the recovery job is an overwrite operation (specifying the recovery data to an existing table or partition), then from the `COMMIT` phase of the recovery job, the overwritten data on the current cluster may no longer be restored. If the restore job fails or is canceled at this time, the previous data may be damaged and inaccessible. In this case, the only way to do it is to perform the recovery operation again and wait for the job to complete. Therefore, we recommend that if unnece [...]
 
 ## Related Commands
 
diff --git a/docs/zh-CN/docs/admin-manual/data-admin/backup.md b/docs/zh-CN/docs/admin-manual/data-admin/backup.md
index 80a80245e1..b1c32056ee 100644
--- a/docs/zh-CN/docs/admin-manual/data-admin/backup.md
+++ b/docs/zh-CN/docs/admin-manual/data-admin/backup.md
@@ -145,6 +145,9 @@ BACKUP的更多用法可参考 [这里](../../sql-manual/sql-reference/Data-Defi
 3. 备份和恢复都支持最小分区(Partition)级别的操作,当表的数据量很大时,建议按分区分别执行,以降低失败重试的代价。
 4. 因为备份恢复操作,操作的都是实际的数据文件。所以当一个表的分片过多,或者一个分片有过多的小版本时,可能即使总数据量很小,依然需要备份或恢复很长时间。用户可以通过 `SHOW PARTITIONS FROM table_name;` 和 `SHOW TABLET FROM table_name;` 来查看各个分区的分片数量,以及各个分片的文件版本数量,来预估作业执行时间。文件数量对作业执行的时间影响非常大,所以建议在建表时,合理规划分区分桶,以避免过多的分片。
 5. 当通过 `SHOW BACKUP` 或者 `SHOW RESTORE` 命令查看作业状态时。有可能会在 `TaskErrMsg` 一列中看到错误信息。但只要 `State` 列不为 `CANCELLED`,则说明作业依然在继续。这些 Task 有可能会重试成功。当然,有些 Task 错误,也会直接导致作业失败。
+   常见的`TaskErrMsg`错误如下:
+      Q1:备份到HDFS,状态显示UPLOADING,TaskErrMsg 错误信息:[13333: Close broker writer failed, broker:TNetworkAddress(hostname=10.10.0.0,port=8000) msg:errors while close file output stream, cause by: DataStreamer Exception: ]
+      这个一般是网络通信问题,查看broker日志,看某个ip 或者端口不通,如果是云服务,则需要查看是否访问了内网,如果是,则可以在borker/conf文件夹下添加hdfs-site.xml,还需在hdfs-site.xml配置文件下添加dfs.client.use.datanode.hostname=true,并在broker节点上配置HADOOP集群的主机名映射。
 6. 如果恢复作业是一次覆盖操作(指定恢复数据到已经存在的表或分区中),那么从恢复作业的 `COMMIT` 阶段开始,当前集群上被覆盖的数据有可能不能再被还原。此时如果恢复作业失败或被取消,有可能造成之前的数据已损坏且无法访问。这种情况下,只能通过再次执行恢复操作,并等待作业完成。因此,我们建议,如无必要,尽量不要使用覆盖的方式恢复数据,除非确认当前数据已不再使用。
 
 ## 相关命令


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org