You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hadoop.apache.org by Shady Xu <sh...@gmail.com> on 2016/08/15 08:06:39 UTC

How to distcp data between two clusters which are not in the same local network?

Hi all,

Recently I tried to use distcp to copy data across two clusters which are
not in the same local network. Fortunately, the nodes of the source cluster
each has an extra interface and ip which can be accessed from the
destination cluster. But during the process of distcp, the map tasks always
used the local ip of the source cluster nodes which they cannot reach.

I tried changing the property 'dfs.datanode.dns.interface' to the one I
want, and I tried changing the property 'dfs.datanode.use.datanode.hostname'
to true too. Nothing works.

Does hadoop now support this or do I miss something?

Re: How to distcp data between two clusters which are not in the same local network?

Posted by Shady Xu <sh...@gmail.com>.
Thanks iain, it works now. I read the doc you mentioned, but forgot to set
the `dfs.client.use.datanode.hostname` property in the destination cluster.

Though I still don't know why the `dfs.datanode.dns.interface` property does
not work. I read though the related source code but don't find anything
wrong.

2016-08-25 1:48 GMT+08:00 iain wright <ia...@gmail.com>:

> @Shady, please see: https://hadoop.apache.org/docs/r2.7.2/hadoop-
> project-dist/hadoop-hdfs/HdfsMultihoming.html
>
>
> --
> Iain Wright
>
> This email message is confidential, intended only for the recipient(s)
> named above and may contain information that is privileged, exempt from
> disclosure under applicable law. If you are not the intended recipient, do
> not disclose or disseminate the message to anyone except the intended
> recipient. If you have received this message in error, or are not the named
> recipient(s), please immediately notify the sender by return email, and
> delete all copies of this message.
>
> On Wed, Aug 24, 2016 at 2:17 AM, Shady Xu <sh...@gmail.com> wrote:
>
>> Anyone any idea?
>>
>> 2016-08-16 10:27 GMT+08:00 Shady Xu <sh...@gmail.com>:
>>
>>> Thanks Wei-Chiu and Sunil, I have read the docs you mentioned before
>>> starting. The specific problem now is that the DataNodes of the source
>>> cluster report their local ip instead of the public one, which cannot be
>>> accessed from the NodeManagers of the destination cluster. Seems the
>>> solution is to set the `dfs.datanode.dns.interface` property but
>>> unfortunately it doesn't work.
>>>
>>> 2016-08-15 22:06 GMT+08:00 Sunil Govind <su...@gmail.com>:
>>>
>>>> Hi
>>>>
>>>> I think you can also refer below link too.
>>>> http://aajisaka.github.io/hadoop-project/hadoop-distcp/DistCp.html
>>>>
>>>> Thanks
>>>> Sunil
>>>>
>>>> On Mon, Aug 15, 2016 at 7:26 PM Wei-Chiu Chuang <we...@apache.org>
>>>> wrote:
>>>>
>>>>> Hello,
>>>>> if I understand your question correctly, you are actually building a
>>>>> multi-home Hadoop, correct?
>>>>> Multi-homed Hadoop cluster can be tricky to set up, to the extend that
>>>>> Cloudera does not recommend it. I've not set up a multihome Hadoop cluster
>>>>> before, but I think you have to make sure the reverse resolution works for
>>>>> the IP addresses.
>>>>>
>>>>> https://hadoop.apache.org/docs/r2.7.2/hadoop-project-dist/ha
>>>>> doop-hdfs/HdfsMultihoming.html
>>>>>
>>>>>
>>>>> On Mon, Aug 15, 2016 at 1:06 AM, Shady Xu <sh...@gmail.com> wrote:
>>>>>
>>>>>> Hi all,
>>>>>>
>>>>>> Recently I tried to use distcp to copy data across two clusters which
>>>>>> are not in the same local network. Fortunately, the nodes of the source
>>>>>> cluster each has an extra interface and ip which can be accessed from the
>>>>>> destination cluster. But during the process of distcp, the map tasks always
>>>>>> used the local ip of the source cluster nodes which they cannot reach.
>>>>>>
>>>>>> I tried changing the property 'dfs.datanode.dns.interface' to the one
>>>>>> I want, and I tried changing the property '
>>>>>> dfs.datanode.use.datanode.hostname' to true too. Nothing works.
>>>>>>
>>>>>> Does hadoop now support this or do I miss something?
>>>>>>
>>>>>
>>>>>
>>>
>>
>

Re: How to distcp data between two clusters which are not in the same local network?

Posted by iain wright <ia...@gmail.com>.
@Shady, please see:
https://hadoop.apache.org/docs/r2.7.2/hadoop-project-dist/hadoop-hdfs/HdfsMultihoming.html


-- 
Iain Wright

This email message is confidential, intended only for the recipient(s)
named above and may contain information that is privileged, exempt from
disclosure under applicable law. If you are not the intended recipient, do
not disclose or disseminate the message to anyone except the intended
recipient. If you have received this message in error, or are not the named
recipient(s), please immediately notify the sender by return email, and
delete all copies of this message.

On Wed, Aug 24, 2016 at 2:17 AM, Shady Xu <sh...@gmail.com> wrote:

> Anyone any idea?
>
> 2016-08-16 10:27 GMT+08:00 Shady Xu <sh...@gmail.com>:
>
>> Thanks Wei-Chiu and Sunil, I have read the docs you mentioned before
>> starting. The specific problem now is that the DataNodes of the source
>> cluster report their local ip instead of the public one, which cannot be
>> accessed from the NodeManagers of the destination cluster. Seems the
>> solution is to set the `dfs.datanode.dns.interface` property but
>> unfortunately it doesn't work.
>>
>> 2016-08-15 22:06 GMT+08:00 Sunil Govind <su...@gmail.com>:
>>
>>> Hi
>>>
>>> I think you can also refer below link too.
>>> http://aajisaka.github.io/hadoop-project/hadoop-distcp/DistCp.html
>>>
>>> Thanks
>>> Sunil
>>>
>>> On Mon, Aug 15, 2016 at 7:26 PM Wei-Chiu Chuang <we...@apache.org>
>>> wrote:
>>>
>>>> Hello,
>>>> if I understand your question correctly, you are actually building a
>>>> multi-home Hadoop, correct?
>>>> Multi-homed Hadoop cluster can be tricky to set up, to the extend that
>>>> Cloudera does not recommend it. I've not set up a multihome Hadoop cluster
>>>> before, but I think you have to make sure the reverse resolution works for
>>>> the IP addresses.
>>>>
>>>> https://hadoop.apache.org/docs/r2.7.2/hadoop-project-dist/ha
>>>> doop-hdfs/HdfsMultihoming.html
>>>>
>>>>
>>>> On Mon, Aug 15, 2016 at 1:06 AM, Shady Xu <sh...@gmail.com> wrote:
>>>>
>>>>> Hi all,
>>>>>
>>>>> Recently I tried to use distcp to copy data across two clusters which
>>>>> are not in the same local network. Fortunately, the nodes of the source
>>>>> cluster each has an extra interface and ip which can be accessed from the
>>>>> destination cluster. But during the process of distcp, the map tasks always
>>>>> used the local ip of the source cluster nodes which they cannot reach.
>>>>>
>>>>> I tried changing the property 'dfs.datanode.dns.interface' to the one
>>>>> I want, and I tried changing the property '
>>>>> dfs.datanode.use.datanode.hostname' to true too. Nothing works.
>>>>>
>>>>> Does hadoop now support this or do I miss something?
>>>>>
>>>>
>>>>
>>
>

Re: How to distcp data between two clusters which are not in the same local network?

Posted by Shady Xu <sh...@gmail.com>.
Anyone any idea?

2016-08-16 10:27 GMT+08:00 Shady Xu <sh...@gmail.com>:

> Thanks Wei-Chiu and Sunil, I have read the docs you mentioned before
> starting. The specific problem now is that the DataNodes of the source
> cluster report their local ip instead of the public one, which cannot be
> accessed from the NodeManagers of the destination cluster. Seems the
> solution is to set the `dfs.datanode.dns.interface` property but
> unfortunately it doesn't work.
>
> 2016-08-15 22:06 GMT+08:00 Sunil Govind <su...@gmail.com>:
>
>> Hi
>>
>> I think you can also refer below link too.
>> http://aajisaka.github.io/hadoop-project/hadoop-distcp/DistCp.html
>>
>> Thanks
>> Sunil
>>
>> On Mon, Aug 15, 2016 at 7:26 PM Wei-Chiu Chuang <we...@apache.org>
>> wrote:
>>
>>> Hello,
>>> if I understand your question correctly, you are actually building a
>>> multi-home Hadoop, correct?
>>> Multi-homed Hadoop cluster can be tricky to set up, to the extend that
>>> Cloudera does not recommend it. I've not set up a multihome Hadoop cluster
>>> before, but I think you have to make sure the reverse resolution works for
>>> the IP addresses.
>>>
>>> https://hadoop.apache.org/docs/r2.7.2/hadoop-project-dist/
>>> hadoop-hdfs/HdfsMultihoming.html
>>>
>>>
>>> On Mon, Aug 15, 2016 at 1:06 AM, Shady Xu <sh...@gmail.com> wrote:
>>>
>>>> Hi all,
>>>>
>>>> Recently I tried to use distcp to copy data across two clusters which
>>>> are not in the same local network. Fortunately, the nodes of the source
>>>> cluster each has an extra interface and ip which can be accessed from the
>>>> destination cluster. But during the process of distcp, the map tasks always
>>>> used the local ip of the source cluster nodes which they cannot reach.
>>>>
>>>> I tried changing the property 'dfs.datanode.dns.interface' to the one I
>>>> want, and I tried changing the property 'dfs.datanode.use.datanode.hos
>>>> tname' to true too. Nothing works.
>>>>
>>>> Does hadoop now support this or do I miss something?
>>>>
>>>
>>>
>

Re: How to distcp data between two clusters which are not in the same local network?

Posted by Shady Xu <sh...@gmail.com>.
Thanks Wei-Chiu and Sunil, I have read the docs you mentioned before
starting. The specific problem now is that the DataNodes of the source
cluster report their local ip instead of the public one, which cannot be
accessed from the NodeManagers of the destination cluster. Seems the
solution is to set the `dfs.datanode.dns.interface` property but
unfortunately it doesn't work.

2016-08-15 22:06 GMT+08:00 Sunil Govind <su...@gmail.com>:

> Hi
>
> I think you can also refer below link too.
> http://aajisaka.github.io/hadoop-project/hadoop-distcp/DistCp.html
>
> Thanks
> Sunil
>
> On Mon, Aug 15, 2016 at 7:26 PM Wei-Chiu Chuang <we...@apache.org>
> wrote:
>
>> Hello,
>> if I understand your question correctly, you are actually building a
>> multi-home Hadoop, correct?
>> Multi-homed Hadoop cluster can be tricky to set up, to the extend that
>> Cloudera does not recommend it. I've not set up a multihome Hadoop cluster
>> before, but I think you have to make sure the reverse resolution works for
>> the IP addresses.
>>
>> https://hadoop.apache.org/docs/r2.7.2/hadoop-project-dist/hadoop-hdfs/
>> HdfsMultihoming.html
>>
>>
>> On Mon, Aug 15, 2016 at 1:06 AM, Shady Xu <sh...@gmail.com> wrote:
>>
>>> Hi all,
>>>
>>> Recently I tried to use distcp to copy data across two clusters which
>>> are not in the same local network. Fortunately, the nodes of the source
>>> cluster each has an extra interface and ip which can be accessed from the
>>> destination cluster. But during the process of distcp, the map tasks always
>>> used the local ip of the source cluster nodes which they cannot reach.
>>>
>>> I tried changing the property 'dfs.datanode.dns.interface' to the one I
>>> want, and I tried changing the property 'dfs.datanode.use.datanode.
>>> hostname' to true too. Nothing works.
>>>
>>> Does hadoop now support this or do I miss something?
>>>
>>
>>

Re: How to distcp data between two clusters which are not in the same local network?

Posted by Sunil Govind <su...@gmail.com>.
Hi

I think you can also refer below link too.
http://aajisaka.github.io/hadoop-project/hadoop-distcp/DistCp.html

Thanks
Sunil

On Mon, Aug 15, 2016 at 7:26 PM Wei-Chiu Chuang <we...@apache.org> wrote:

> Hello,
> if I understand your question correctly, you are actually building a
> multi-home Hadoop, correct?
> Multi-homed Hadoop cluster can be tricky to set up, to the extend that
> Cloudera does not recommend it. I've not set up a multihome Hadoop cluster
> before, but I think you have to make sure the reverse resolution works for
> the IP addresses.
>
>
> https://hadoop.apache.org/docs/r2.7.2/hadoop-project-dist/hadoop-hdfs/HdfsMultihoming.html
>
>
> On Mon, Aug 15, 2016 at 1:06 AM, Shady Xu <sh...@gmail.com> wrote:
>
>> Hi all,
>>
>> Recently I tried to use distcp to copy data across two clusters which are
>> not in the same local network. Fortunately, the nodes of the source cluster
>> each has an extra interface and ip which can be accessed from the
>> destination cluster. But during the process of distcp, the map tasks always
>> used the local ip of the source cluster nodes which they cannot reach.
>>
>> I tried changing the property 'dfs.datanode.dns.interface' to the one I
>> want, and I tried changing the property '
>> dfs.datanode.use.datanode.hostname' to true too. Nothing works.
>>
>> Does hadoop now support this or do I miss something?
>>
>
>

Re: How to distcp data between two clusters which are not in the same local network?

Posted by Wei-Chiu Chuang <we...@apache.org>.
Hello,
if I understand your question correctly, you are actually building a
multi-home Hadoop, correct?
Multi-homed Hadoop cluster can be tricky to set up, to the extend that
Cloudera does not recommend it. I've not set up a multihome Hadoop cluster
before, but I think you have to make sure the reverse resolution works for
the IP addresses.

https://hadoop.apache.org/docs/r2.7.2/hadoop-project-dist/hadoop-hdfs/HdfsMultihoming.html


On Mon, Aug 15, 2016 at 1:06 AM, Shady Xu <sh...@gmail.com> wrote:

> Hi all,
>
> Recently I tried to use distcp to copy data across two clusters which are
> not in the same local network. Fortunately, the nodes of the source cluster
> each has an extra interface and ip which can be accessed from the
> destination cluster. But during the process of distcp, the map tasks always
> used the local ip of the source cluster nodes which they cannot reach.
>
> I tried changing the property 'dfs.datanode.dns.interface' to the one I
> want, and I tried changing the property 'dfs.datanode.use.datanode.
> hostname' to true too. Nothing works.
>
> Does hadoop now support this or do I miss something?
>