You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "zhuyaogai (Jira)" <ji...@apache.org> on 2023/02/14 03:13:00 UTC

[jira] [Created] (HADOOP-18629) Hadoop DistCp supports specifying favoredNodes for data copying

zhuyaogai created HADOOP-18629:
----------------------------------

             Summary: Hadoop DistCp supports specifying favoredNodes for data copying
                 Key: HADOOP-18629
                 URL: https://issues.apache.org/jira/browse/HADOOP-18629
             Project: Hadoop Common
          Issue Type: New Feature
          Components: tools
            Reporter: zhuyaogai


When importing large scale data to HBase, we always generate the hfiles with other Hadoop clusters, use the Distcp tool to copy the data to the HBase cluster, and bulkload data to HBase table. However, the data locality is rather low which may result in high query latency. After taking a compaction it will recover. Therefore, we can increase the data locality by specifying the favoredNodes in Distcp.

Could I submit a pull request to optimize it?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-dev-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-dev-help@hadoop.apache.org