You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-issues@hadoop.apache.org by "ASF GitHub Bot (Jira)" <ji...@apache.org> on 2023/02/14 08:01:00 UTC

[jira] [Commented] (HADOOP-18629) Hadoop DistCp supports specifying favoredNodes for data copying

    [ https://issues.apache.org/jira/browse/HADOOP-18629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17688351#comment-17688351 ] 

ASF GitHub Bot commented on HADOOP-18629:
-----------------------------------------

zhuyaogai opened a new pull request, #5391:
URL: https://github.com/apache/hadoop/pull/5391

   ### Description of PR
   Hadoop DistCp supports specifying favoredNodes for data copying.
   
   ### How was this patch tested?
   Add new UT.
   
   ### For code changes:
   
   - [ ] Does the title or this PR starts with the corresponding JIRA issue id (e.g. 'HADOOP-17799. Your PR title ...')?
   - [ ] Object storage: have the integration tests been executed and the endpoint declared according to the connector-specific documentation?
   - [ ] If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under [ASF 2.0](http://www.apache.org/legal/resolved.html#category-a)?
   - [ ] If applicable, have you updated the `LICENSE`, `LICENSE-binary`, `NOTICE-binary` files?
   
   




> Hadoop DistCp supports specifying favoredNodes for data copying
> ---------------------------------------------------------------
>
>                 Key: HADOOP-18629
>                 URL: https://issues.apache.org/jira/browse/HADOOP-18629
>             Project: Hadoop Common
>          Issue Type: New Feature
>          Components: tools
>            Reporter: zhuyaogai
>            Priority: Major
>
> When importing large scale data to HBase, we always generate the hfiles with other Hadoop cluster, use the Distcp tool to copy the data to the HBase cluster, and bulkload data to HBase table. However, the data locality is rather low which may result in high query latency. After taking a compaction it will recover. Therefore, we can increase the data locality by specifying the favoredNodes in Distcp.
> Could I submit a pull request to optimize it?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org