You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hadoop.apache.org by akshay naidu <ak...@gmail.com> on 2019/05/23 12:54:56 UTC

hadoop distcp error.

Hello Users,
I'm trying to copy data from one hadoop cluster in london to another
cluster in singapore.
I'm using distCp for the first time. For test purpose I've created a hadoop
cluster on each data center.
using following command for distCp:-

> hadoop distcp
> hftp://123.45.672:54310/data-analytics/strike/myLogs/today/815_19_104_150_2019-05-22.access.log.gz
> hdfs://987.65.43.21:50070/distCp/


 Getting following error :-
19/05/23 12:07:27 INFO tools.OptionsParser: parseChunkSize: blocksperchunk
false
19/05/23 12:07:48 INFO ipc.Client: Retrying connect to server:
li868-219.members.linode.com/ 987.65.43.21:8020. Already tried 0 time(s);
maxRetries=45
.
.
19/05/23 12:22:29 INFO ipc.Client: Retrying connect to server:
li868-219.members.linode.com/
987.65.43.21:8020 <http://li868-219.members.linode.com/139.162.27.219:8020>.
Already tried 44 time(s); maxRetries=45
19/05/23 12:22:49 ERROR tools.DistCp: Invalid arguments:
org.apache.hadoop.net.ConnectTimeoutException: Call From
HDP-master/123.45.672 to li868-219.members.linode.com:8020 failed on socket
timeout exception: org.apache.hadoop.net.ConnectTimeoutException: 20000
millis timeout while waiting for channel to be ready for connect. ch :
java.nio.channels.SocketChannel[connection-pending
remote=li868-219.members.linode.com/
<http://li868-219.members.linode.com/139.162.27.219:8020>987.65.43.21
<http://li868-219.members.linode.com/139.162.27.219:8020>:8020]; For more
details see:  http://wiki.apache.org/hadoop/SocketTimeout
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
Method)
        at
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
        at
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
        at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:824)
        at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:774)
        at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1497)
        at org.apache.hadoop.ipc.Client.call(Client.java:1439)
        at org.apache.hadoop.ipc.Client.call(Client.java:1349)
        at
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:227)
        at
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116)
        at com.sun.proxy.$Proxy10.getFileInfo(Unknown Source)
        at
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:796)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:422)
        at
org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:165)
        at
org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:157)
        at
org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95)
        at
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:359)
        at com.sun.proxy.$Proxy11.getFileInfo(Unknown Source)
        at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1717)
        at
org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1526)
        at
org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1523)
        at
org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
        at
org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1523)
        at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1627)
        at
org.apache.hadoop.tools.DistCp.setTargetPathExists(DistCp.java:234)
        at org.apache.hadoop.tools.DistCp.run(DistCp.java:138)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
        at org.apache.hadoop.tools.DistCp.main(DistCp.java:519)
Caused by: org.apache.hadoop.net.ConnectTimeoutException: 20000 millis
timeout while waiting for channel to be ready for connect. ch :
java.nio.channels.SocketChannel[connection-pending remote=
li868-219.members.linode.com/
<http://li868-219.members.linode.com/139.162.27.219:8020>987.65.43.21
<http://li868-219.members.linode.com/139.162.27.219:8020>:8020]
        at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:534)
        at
org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:687)
        at
org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:790)
        at
org.apache.hadoop.ipc.Client$Connection.access$3500(Client.java:411)
        at org.apache.hadoop.ipc.Client.getConnection(Client.java:1554)
        at org.apache.hadoop.ipc.Client.call(Client.java:1385)
        ... 25 more
Invalid arguments: Call From HDP-master/ 123.45.672 to
li868-219.members.linode.com:8020 failed on socket timeout exception:
org.apache.hadoop.net.ConnectTimeoutException: 20000 millis timeout while
waiting for channel to be ready for connect. ch :
java.nio.channels.SocketChannel[connection-pending remote=
li868-219.members.linode.com/
<http://li868-219.members.linode.com/139.162.27.219:8020>987.65.43.21
<http://li868-219.members.linode.com/139.162.27.219:8020>:8020]; For more
details see:  http://wiki.apache.org/hadoop/SocketTimeout


Any guidance or hint would be very helpful. Thanks.

Re: hadoop distcp error.

Posted by "yangtao.yt" <ya...@alibaba-inc.com>.

Is the conf you checked belonging to 987.65.43.21?  If no rpc-address configured in hdfs-site.xml, 8020 should be taken as the default rpc port.
I think you should check whether there is an available namenode process running on 987.65.43.21 at first then find the right port, It'll be very easy if you can log onto that machine.

> 在 2019年5月24日，下午4:10，akshay naidu <ak...@gmail.com> 写道：
> 
> Yes,  "hadoop fs -ls hdfs://987.65.43.21:8020/ <>" this is not working.
> 
> I check ping  987.65.43.21 which is working fine but  <>
> 
>  <>
> telnet  <>987.65.43.21 8020(also tried port 80, 50070) is throwing error telnet: Unable to connect to remote host: Connection timed out <>
> I think this is where the problem is. <>
> 
>  <>
> 
>  <>
> Also checked the conf folder.
> There isn't any property related to  <>rpc-address in hdfs-site.xml . And the cluster is non-HA.
> 
> On Fri, May 24, 2019 at 1:18 PM yangtao.yt <http://yangtao.yt/> <yangtao.yt@alibaba-inc.com <ma...@alibaba-inc.com>> wrote:
> Hi, akshay
> 
> Can you successfully execute command such as "hadoop fs -ls hdfs://987.65.43.21:8020/ <>" on the machine you ran distcp? I think the answer is no.
> If both servers are reachable, I think you should check conf dfs.namenode.rpc-address or dfs.namenode.rpc-address.<nameservice>.nn1 (if HA enabled) from hdfs-site.xml to fetch the correct rpc port.
> 
> Best,
> Tao Yang
> 
>> 在 2019年5月24日，下午3:20，akshay naidu <akshaynaidu.9@gmail.com <ma...@gmail.com>> 写道：
>> 
>> Hello Joey,
>> Just to understand distcp I am trying to copy one file. Otherwise the data is to be copied is > 1.5TB .
>> Anyways I tried running -cp but looks like the issue is in connectivity. See logs :-
>> hdfs dfs -cp hdfs:// 123.45.67.89:54310/data-analytics/spike/beNginxLogs/today/123.45.67.89_2019-05-22.access.log.gz <http://123.45.67.89:54310/data-analytics/spike/beNginxLogs/today/123.45.67.89_2019-05-22.access.log.gz> hdfs://987.65.43.21:50070/distCp/ <>
>> 19/05/24 07:15:22 INFO ipc.Client: Retrying connect to server: li868-219.members.linode.com/ <http://li868-219.members.linode.com/> 987.65.43.21:50070. Already tried 0 time(s); maxRetries=45
>> 19/05/24 07:15:42 INFO ipc.Client: Retrying connect to server: li868-219.members.linode.com/ <http://li868-219.members.linode.com/> 987.65.43.21:50070. Already tried 1 time(s); maxRetries=45
>> 19/05/24 07:16:02 INFO ipc.Client: Retrying connect to server: li868-219.members.linode.com/ <http://li868-219.members.linode.com/> 987.65.43.21:50070. Already tried 2 time(s); maxRetries=45
>> .
>> .
>> facing same issue.
>> Any Idea?
>> Thanks. Regards
>> 
>> On Fri, May 24, 2019 at 8:10 AM Joey Krabacher <jkrabacher@gmail.com <ma...@gmail.com>> wrote:
>> It looks like you're just trying to copy 1 file?
>> Why not use 'hdfs dfs -cp ...' instead?
>> 
>> On Thu, May 23, 2019, 21:22 yangtao.yt <http://yangtao.yt/> <yangtao.yt@alibaba-inc.com <ma...@alibaba-inc.com>> wrote:
>> Hi, akshay
>> 
>> Seems it’s not distcp’s business, SocketTimeout exceptions may be caused by network unreachable or unavailable remote server, you can communicate with the target hdfs cluster directly on the machine you executed distcp command to have a test.
>> Fully causes and suggestions given by community can be fetched from here: https://wiki.apache.org/hadoop/SocketTimeout <https://wiki.apache.org/hadoop/SocketTimeout>
>> 
>> There is a doubt in your distcp command, why using port 50070 (http port) instead of 8020 (rpc port) for the target hdfs cluster? I’m confusing that it still can connect with 8020 according to your logs.
>> 
>> Best,
>> Tao Yang
>> 
>>> 在 2019年5月23日，下午8:54，akshay naidu <akshaynaidu.9@gmail.com <ma...@gmail.com>> 写道：
>>> 
>>> sun.reflect.NativeConstructorAccessorImpl.newInstance0
>> 
>

Re: hadoop distcp error.

Posted by akshay naidu <ak...@gmail.com>.

Yes,  "hadoop fs -ls hdfs://987.65.43.21:8020/" this is not working.

I check ping  987.65.43.21 which is working fine but

telnet 987.65.43.21 8020(also tried port 80, 50070) is throwing error *telnet:
Unable to connect to remote host: Connection timed out*
I think this is where the problem is.


Also checked the conf folder.
There isn't any property related to rpc-address in hdfs-site.xml . And the
cluster is non-HA.

On Fri, May 24, 2019 at 1:18 PM yangtao.yt <ya...@alibaba-inc.com>
wrote:

> Hi, akshay
>
> Can you successfully execute command such as "hadoop fs -ls
> hdfs://987.65.43.21:8020/" on the machine you ran distcp? I think the
> answer is no.
> If both servers are reachable, I think you should check conf
> dfs.namenode.rpc-address or dfs.namenode.rpc-address.<nameservice>.nn1 (if
> HA enabled) from hdfs-site.xml to fetch the correct rpc port.
>
> Best,
> Tao Yang
>
> 在 2019年5月24日，下午3:20，akshay naidu <ak...@gmail.com> 写道：
>
> Hello Joey,
> Just to understand distcp I am trying to copy one file. Otherwise the data
> is to be copied is > 1.5TB .
> Anyways I tried running -cp but looks like the issue is in connectivity.
> See logs :-
> hdfs dfs -cp hdfs://
> 123.45.67.89:54310/data-analytics/spike/beNginxLogs/today/123.45.67.89_2019-05-22.access.log.gz
> hdfs://987.65.43.21:50070/distCp/
> 19/05/24 07:15:22 INFO ipc.Client: Retrying connect to server:
> li868-219.members.linode.com/ 987.65.43.21:50070. Already tried 0
> time(s); maxRetries=45
> 19/05/24 07:15:42 INFO ipc.Client: Retrying connect to server:
> li868-219.members.linode.com/ 987.65.43.21:50070. Already tried 1
> time(s); maxRetries=45
> 19/05/24 07:16:02 INFO ipc.Client: Retrying connect to server:
> li868-219.members.linode.com/ 987.65.43.21:50070. Already tried 2
> time(s); maxRetries=45
> .
> .
> facing same issue.
> Any Idea?
> Thanks. Regards
>
> On Fri, May 24, 2019 at 8:10 AM Joey Krabacher <jk...@gmail.com>
> wrote:
>
>> It looks like you're just trying to copy 1 file?
>> Why not use 'hdfs dfs -cp ...' instead?
>>
>> On Thu, May 23, 2019, 21:22 yangtao.yt <ya...@alibaba-inc.com>
>> wrote:
>>
>>> Hi, akshay
>>>
>>> Seems it’s not distcp’s business, SocketTimeout exceptions may be caused
>>> by network unreachable or unavailable remote server, you can communicate
>>> with the target hdfs cluster directly on the machine you executed distcp
>>> command to have a test.
>>> Fully causes and suggestions given by community can be fetched from
>>> here: https://wiki.apache.org/hadoop/SocketTimeout
>>>
>>> There is a doubt in your distcp command, why using port 50070 (http
>>> port) instead of 8020 (rpc port) for the target hdfs cluster? I’m confusing
>>> that it still can connect with 8020 according to your logs.
>>>
>>> Best,
>>> Tao Yang
>>>
>>> 在 2019年5月23日，下午8:54，akshay naidu <ak...@gmail.com> 写道：
>>>
>>> sun.reflect.NativeConstructorAccessorImpl.newInstance0
>>>
>>>
>>>
>

Re: hadoop distcp error.

Posted by "yangtao.yt" <ya...@alibaba-inc.com>.

Hi, akshay

Can you successfully execute command such as "hadoop fs -ls hdfs://987.65.43.21:8020/" on the machine you ran distcp? I think the answer is no.
If both servers are reachable, I think you should check conf dfs.namenode.rpc-address or dfs.namenode.rpc-address.<nameservice>.nn1 (if HA enabled) from hdfs-site.xml to fetch the correct rpc port.

Best,
Tao Yang

> 在 2019年5月24日，下午3:20，akshay naidu <ak...@gmail.com> 写道：
> 
> Hello Joey,
> Just to understand distcp I am trying to copy one file. Otherwise the data is to be copied is > 1.5TB .
> Anyways I tried running -cp but looks like the issue is in connectivity. See logs :-
> hdfs dfs -cp hdfs:// 123.45.67.89:54310/data-analytics/spike/beNginxLogs/today/123.45.67.89_2019-05-22.access.log.gz <http://123.45.67.89:54310/data-analytics/spike/beNginxLogs/today/123.45.67.89_2019-05-22.access.log.gz> hdfs://987.65.43.21:50070/distCp/
> 19/05/24 07:15:22 INFO ipc.Client: Retrying connect to server: li868-219.members.linode.com/ <http://li868-219.members.linode.com/> 987.65.43.21:50070. Already tried 0 time(s); maxRetries=45
> 19/05/24 07:15:42 INFO ipc.Client: Retrying connect to server: li868-219.members.linode.com/ <http://li868-219.members.linode.com/> 987.65.43.21:50070. Already tried 1 time(s); maxRetries=45
> 19/05/24 07:16:02 INFO ipc.Client: Retrying connect to server: li868-219.members.linode.com/ <http://li868-219.members.linode.com/> 987.65.43.21:50070. Already tried 2 time(s); maxRetries=45
> .
> .
> facing same issue.
> Any Idea?
> Thanks. Regards
> 
> On Fri, May 24, 2019 at 8:10 AM Joey Krabacher <jkrabacher@gmail.com <ma...@gmail.com>> wrote:
> It looks like you're just trying to copy 1 file?
> Why not use 'hdfs dfs -cp ...' instead?
> 
> On Thu, May 23, 2019, 21:22 yangtao.yt <http://yangtao.yt/> <yangtao.yt@alibaba-inc.com <ma...@alibaba-inc.com>> wrote:
> Hi, akshay
> 
> Seems it’s not distcp’s business, SocketTimeout exceptions may be caused by network unreachable or unavailable remote server, you can communicate with the target hdfs cluster directly on the machine you executed distcp command to have a test.
> Fully causes and suggestions given by community can be fetched from here: https://wiki.apache.org/hadoop/SocketTimeout <https://wiki.apache.org/hadoop/SocketTimeout>
> 
> There is a doubt in your distcp command, why using port 50070 (http port) instead of 8020 (rpc port) for the target hdfs cluster? I’m confusing that it still can connect with 8020 according to your logs.
> 
> Best,
> Tao Yang
> 
>> 在 2019年5月23日，下午8:54，akshay naidu <akshaynaidu.9@gmail.com <ma...@gmail.com>> 写道：
>> 
>> sun.reflect.NativeConstructorAccessorImpl.newInstance0
>

Re: hadoop distcp error.

Posted by akshay naidu <ak...@gmail.com>.

Hello Joey,
Just to understand distcp I am trying to copy one file. Otherwise the data
is to be copied is > 1.5TB .
Anyways I tried running -cp but looks like the issue is in connectivity.
See logs :-
hdfs dfs -cp hdfs://
123.45.67.89:54310/data-analytics/spike/beNginxLogs/today/123.45.67.89_2019-05-22.access.log.gz
hdfs://987.65.43.21:50070/distCp/
19/05/24 07:15:22 INFO ipc.Client: Retrying connect to server:
li868-219.members.linode.com/ 987.65.43.21:50070. Already tried 0 time(s);
maxRetries=45
19/05/24 07:15:42 INFO ipc.Client: Retrying connect to server:
li868-219.members.linode.com/ 987.65.43.21:50070. Already tried 1 time(s);
maxRetries=45
19/05/24 07:16:02 INFO ipc.Client: Retrying connect to server:
li868-219.members.linode.com/ 987.65.43.21:50070. Already tried 2 time(s);
maxRetries=45
.
.
facing same issue.
Any Idea?
Thanks. Regards

On Fri, May 24, 2019 at 8:10 AM Joey Krabacher <jk...@gmail.com> wrote:

> It looks like you're just trying to copy 1 file?
> Why not use 'hdfs dfs -cp ...' instead?
>
> On Thu, May 23, 2019, 21:22 yangtao.yt <ya...@alibaba-inc.com> wrote:
>
>> Hi, akshay
>>
>> Seems it’s not distcp’s business, SocketTimeout exceptions may be caused
>> by network unreachable or unavailable remote server, you can communicate
>> with the target hdfs cluster directly on the machine you executed distcp
>> command to have a test.
>> Fully causes and suggestions given by community can be fetched from here:
>> https://wiki.apache.org/hadoop/SocketTimeout
>>
>> There is a doubt in your distcp command, why using port 50070 (http port)
>> instead of 8020 (rpc port) for the target hdfs cluster? I’m confusing that
>> it still can connect with 8020 according to your logs.
>>
>> Best,
>> Tao Yang
>>
>> 在 2019年5月23日，下午8:54，akshay naidu <ak...@gmail.com> 写道：
>>
>> sun.reflect.NativeConstructorAccessorImpl.newInstance0
>>
>>
>>

Re: hadoop distcp error.

Posted by Joey Krabacher <jk...@gmail.com>.

It looks like you're just trying to copy 1 file?
Why not use 'hdfs dfs -cp ...' instead?

On Thu, May 23, 2019, 21:22 yangtao.yt <ya...@alibaba-inc.com> wrote:

> Hi, akshay
>
> Seems it’s not distcp’s business, SocketTimeout exceptions may be caused
> by network unreachable or unavailable remote server, you can communicate
> with the target hdfs cluster directly on the machine you executed distcp
> command to have a test.
> Fully causes and suggestions given by community can be fetched from here:
> https://wiki.apache.org/hadoop/SocketTimeout
>
> There is a doubt in your distcp command, why using port 50070 (http port)
> instead of 8020 (rpc port) for the target hdfs cluster? I’m confusing that
> it still can connect with 8020 according to your logs.
>
> Best,
> Tao Yang
>
> 在 2019年5月23日，下午8:54，akshay naidu <ak...@gmail.com> 写道：
>
> sun.reflect.NativeConstructorAccessorImpl.newInstance0
>
>
>

Re: hadoop distcp error.

Posted by akshay naidu <ak...@gmail.com>.

Hey Yang,
Both the servers are reachable to each other. I am able to transfer files
from one server to another both ways.

There is a doubt in your distcp command, why using port 50070 (http port)
> instead of 8020 (rpc port) for the target hdfs cluster? I’m confusing that
> it still can connect with 8020 according to your logs.
>
I had read about this on stackoverflow so had already tried 8020 port
instead of 50070 but that try was also giving the same error.

On Fri 24 May, 2019, 7:52 AM yangtao.yt, <ya...@alibaba-inc.com> wrote:

> Hi, akshay
>
> Seems it’s not distcp’s business, SocketTimeout exceptions may be caused
> by network unreachable or unavailable remote server, you can communicate
> with the target hdfs cluster directly on the machine you executed distcp
> command to have a test.
> Fully causes and suggestions given by community can be fetched from here:
> https://wiki.apache.org/hadoop/SocketTimeout
>
> There is a doubt in your distcp command, why using port 50070 (http port)
> instead of 8020 (rpc port) for the target hdfs cluster? I’m confusing that
> it still can connect with 8020 according to your logs.
>
> Best,
> Tao Yang
>
> 在 2019年5月23日，下午8:54，akshay naidu <ak...@gmail.com> 写道：
>
> sun.reflect.NativeConstructorAccessorImpl.newInstance0
>
>
>

Re: hadoop distcp error.

Posted by "yangtao.yt" <ya...@alibaba-inc.com>.

Hi, akshay

Seems it’s not distcp’s business, SocketTimeout exceptions may be caused by network unreachable or unavailable remote server, you can communicate with the target hdfs cluster directly on the machine you executed distcp command to have a test.
Fully causes and suggestions given by community can be fetched from here: https://wiki.apache.org/hadoop/SocketTimeout <https://wiki.apache.org/hadoop/SocketTimeout>

There is a doubt in your distcp command, why using port 50070 (http port) instead of 8020 (rpc port) for the target hdfs cluster? I’m confusing that it still can connect with 8020 according to your logs.

Best,
Tao Yang

> 在 2019年5月23日，下午8:54，akshay naidu <ak...@gmail.com> 写道：
> 
> sun.reflect.NativeConstructorAccessorImpl.newInstance0