You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by praveenesh kumar <pr...@gmail.com> on 2011/10/05 07:15:55 UTC

Error using hadoop distcp

I am trying to use distcp to copy a file from one HDFS to another.

But while copying I am getting the following exception :

hadoop distcp hdfs://ub13:54310/user/hadoop/weblog
hdfs://ub16:54310/user/hadoop/weblog

11/10/05 10:41:01 INFO mapred.JobClient: Task Id :
attempt_201110031447_0005_m_000007_0, Status : FAILED
java.net.UnknownHostException: unknown host: ub16
        at org.apache.hadoop.ipc.Client$Connection.<init>(Client.java:195)
        at org.apache.hadoop.ipc.Client.getConnection(Client.java:850)
        at org.apache.hadoop.ipc.Client.call(Client.java:720)
        at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220)
        at $Proxy1.getProtocolVersion(Unknown Source)
        at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:359)
        at
org.apache.hadoop.hdfs.DFSClient.createRPCNamenode(DFSClient.java:113)
        at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:215)
        at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:177)
        at
org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:82)
        at
org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1378)
        at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:66)
        at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1390)
        at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:196)
        at org.apache.hadoop.fs.Path.getFileSystem(Path.java:175)
        at
org.apache.hadoop.mapred.FileOutputCommitter.setupJob(FileOutputCommitter.java:48)
        at
org.apache.hadoop.mapred.OutputCommitter.setupJob(OutputCommitter.java:124)
        at org.apache.hadoop.mapred.Task.runJobSetupTask(Task.java:835)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:296)
        at org.apache.hadoop.mapred.Child.main(Child.java:170)

Its saying its not finding ub16. But the entry is there in /etc/hosts files.
I am able to ssh both the machines. Do I need password less ssh between
these two NNs ?
What can be the issue ? Any thing I am missing before using distcp ?

Thanks,
Praveenesh

Re: Error using hadoop distcp

Posted by praveenesh kumar <pr...@gmail.com>.
I tried that thing also.. when I am using IP address, its saying I should
use hostname.

*hadoop@ub13:~$ hadoop distcp
hdfs://162.192.100.53:54310/user/hadoop/webloghdfs://
162.192.100.16:54310/user/hadoop/weblog*
11/10/05 14:53:50 INFO tools.DistCp: srcPaths=[hdfs://
162.192.100.53:54310/user/hadoop/weblog]
11/10/05 14:53:50 INFO tools.DistCp: destPath=hdfs://
162.192.100.16:54310/user/hadoop/weblog
java.lang.IllegalArgumentException: Wrong FS: hdfs://
162.192.100.53:54310/user/hadoop/weblog, expected: hdfs://ub13:54310
        at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:310)
        at
org.apache.hadoop.hdfs.DistributedFileSystem.checkPath(DistributedFileSystem.java:99)
        at
org.apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:155)
        at
org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:464)
        at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:648)
        at org.apache.hadoop.tools.DistCp.checkSrcPath(DistCp.java:621)
        at org.apache.hadoop.tools.DistCp.copy(DistCp.java:638)
        at org.apache.hadoop.tools.DistCp.run(DistCp.java:857)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
        at org.apache.hadoop.tools.DistCp.main(DistCp.java:884)

I have the entries of both machines in /etc/hosts...


On Wed, Oct 5, 2011 at 1:55 PM, <be...@gmail.com> wrote:

> Hi praveenesh
>             Can you try repeating the distcp using IP instead of host name.
> From the error looks like an RPC exception not able to identify the host, so
> I believe it can't be due to not setting a password less ssh. Just try it
> out.
> Regards
> Bejoy K S
>
> -----Original Message-----
> From: trang van anh <an...@vtc.vn>
> Date: Wed, 05 Oct 2011 14:06:11
> To: <co...@hadoop.apache.org>
> Reply-To: common-user@hadoop.apache.org
> Subject: Re: Error using hadoop distcp
>
> which  host run the task that throws the exception ? ensure that each
> data node know another data nodes in hadoop cluster-> add "ub16" entry
> in /etc/hosts on where the task running.
> On 10/5/2011 12:15 PM, praveenesh kumar wrote:
> > I am trying to use distcp to copy a file from one HDFS to another.
> >
> > But while copying I am getting the following exception :
> >
> > hadoop distcp hdfs://ub13:54310/user/hadoop/weblog
> > hdfs://ub16:54310/user/hadoop/weblog
> >
> > 11/10/05 10:41:01 INFO mapred.JobClient: Task Id :
> > attempt_201110031447_0005_m_000007_0, Status : FAILED
> > java.net.UnknownHostException: unknown host: ub16
> >          at
> org.apache.hadoop.ipc.Client$Connection.<init>(Client.java:195)
> >          at org.apache.hadoop.ipc.Client.getConnection(Client.java:850)
> >          at org.apache.hadoop.ipc.Client.call(Client.java:720)
> >          at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220)
> >          at $Proxy1.getProtocolVersion(Unknown Source)
> >          at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:359)
> >          at
> > org.apache.hadoop.hdfs.DFSClient.createRPCNamenode(DFSClient.java:113)
> >          at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:215)
> >          at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:177)
> >          at
> >
> org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:82)
> >          at
> > org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1378)
> >          at
> org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:66)
> >          at
> org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1390)
> >          at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:196)
> >          at org.apache.hadoop.fs.Path.getFileSystem(Path.java:175)
> >          at
> >
> org.apache.hadoop.mapred.FileOutputCommitter.setupJob(FileOutputCommitter.java:48)
> >          at
> >
> org.apache.hadoop.mapred.OutputCommitter.setupJob(OutputCommitter.java:124)
> >          at org.apache.hadoop.mapred.Task.runJobSetupTask(Task.java:835)
> >          at org.apache.hadoop.mapred.MapTask.run(MapTask.java:296)
> >          at org.apache.hadoop.mapred.Child.main(Child.java:170)
> >
> > Its saying its not finding ub16. But the entry is there in /etc/hosts
> files.
> > I am able to ssh both the machines. Do I need password less ssh between
> > these two NNs ?
> > What can be the issue ? Any thing I am missing before using distcp ?
> >
> > Thanks,
> > Praveenesh
> >
>
>

Re: Error using hadoop distcp

Posted by be...@gmail.com.
Hi praveenesh
             Can you try repeating the distcp using IP instead of host name. From the error looks like an RPC exception not able to identify the host, so I believe it can't be due to not setting a password less ssh. Just try it out.
Regards
Bejoy K S

-----Original Message-----
From: trang van anh <an...@vtc.vn>
Date: Wed, 05 Oct 2011 14:06:11 
To: <co...@hadoop.apache.org>
Reply-To: common-user@hadoop.apache.org
Subject: Re: Error using hadoop distcp

which  host run the task that throws the exception ? ensure that each 
data node know another data nodes in hadoop cluster-> add "ub16" entry 
in /etc/hosts on where the task running.
On 10/5/2011 12:15 PM, praveenesh kumar wrote:
> I am trying to use distcp to copy a file from one HDFS to another.
>
> But while copying I am getting the following exception :
>
> hadoop distcp hdfs://ub13:54310/user/hadoop/weblog
> hdfs://ub16:54310/user/hadoop/weblog
>
> 11/10/05 10:41:01 INFO mapred.JobClient: Task Id :
> attempt_201110031447_0005_m_000007_0, Status : FAILED
> java.net.UnknownHostException: unknown host: ub16
>          at org.apache.hadoop.ipc.Client$Connection.<init>(Client.java:195)
>          at org.apache.hadoop.ipc.Client.getConnection(Client.java:850)
>          at org.apache.hadoop.ipc.Client.call(Client.java:720)
>          at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220)
>          at $Proxy1.getProtocolVersion(Unknown Source)
>          at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:359)
>          at
> org.apache.hadoop.hdfs.DFSClient.createRPCNamenode(DFSClient.java:113)
>          at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:215)
>          at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:177)
>          at
> org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:82)
>          at
> org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1378)
>          at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:66)
>          at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1390)
>          at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:196)
>          at org.apache.hadoop.fs.Path.getFileSystem(Path.java:175)
>          at
> org.apache.hadoop.mapred.FileOutputCommitter.setupJob(FileOutputCommitter.java:48)
>          at
> org.apache.hadoop.mapred.OutputCommitter.setupJob(OutputCommitter.java:124)
>          at org.apache.hadoop.mapred.Task.runJobSetupTask(Task.java:835)
>          at org.apache.hadoop.mapred.MapTask.run(MapTask.java:296)
>          at org.apache.hadoop.mapred.Child.main(Child.java:170)
>
> Its saying its not finding ub16. But the entry is there in /etc/hosts files.
> I am able to ssh both the machines. Do I need password less ssh between
> these two NNs ?
> What can be the issue ? Any thing I am missing before using distcp ?
>
> Thanks,
> Praveenesh
>


Re: Error using hadoop distcp

Posted by Uma Maheswara Rao G 72686 <ma...@huawei.com>.
Distcp will run as mapreduce job.
Here tasktrackers required the hostname mappings to contact to other nodes.
Please configure the mapping correctly in both the machines and try.
egards,
Uma

----- Original Message -----
From: trang van anh <an...@vtc.vn>
Date: Wednesday, October 5, 2011 1:41 pm
Subject: Re: Error using hadoop distcp
To: common-user@hadoop.apache.org

> which  host run the task that throws the exception ? ensure that 
> each 
> data node know another data nodes in hadoop cluster-> add "ub16" 
> entry 
> in /etc/hosts on where the task running.
> On 10/5/2011 12:15 PM, praveenesh kumar wrote:
> > I am trying to use distcp to copy a file from one HDFS to another.
> >
> > But while copying I am getting the following exception :
> >
> > hadoop distcp hdfs://ub13:54310/user/hadoop/weblog
> > hdfs://ub16:54310/user/hadoop/weblog
> >
> > 11/10/05 10:41:01 INFO mapred.JobClient: Task Id :
> > attempt_201110031447_0005_m_000007_0, Status : FAILED
> > java.net.UnknownHostException: unknown host: ub16
> >          at 
> org.apache.hadoop.ipc.Client$Connection.<init>(Client.java:195)>   
>       at org.apache.hadoop.ipc.Client.getConnection(Client.java:850)
> >          at org.apache.hadoop.ipc.Client.call(Client.java:720)
> >          at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220)
> >          at $Proxy1.getProtocolVersion(Unknown Source)
> >          at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:359)
> >          at
> > 
> org.apache.hadoop.hdfs.DFSClient.createRPCNamenode(DFSClient.java:113)>          at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:215)
> >          at 
> org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:177)>       
>   at
> > 
> org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:82)>          at
> > 
> org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1378)>          at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:66)
> >          at 
> org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1390)>   
>       at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:196)
> >          at org.apache.hadoop.fs.Path.getFileSystem(Path.java:175)
> >          at
> > 
> org.apache.hadoop.mapred.FileOutputCommitter.setupJob(FileOutputCommitter.java:48)>          at
> > 
> org.apache.hadoop.mapred.OutputCommitter.setupJob(OutputCommitter.java:124)>          at org.apache.hadoop.mapred.Task.runJobSetupTask(Task.java:835)
> >          at org.apache.hadoop.mapred.MapTask.run(MapTask.java:296)
> >          at org.apache.hadoop.mapred.Child.main(Child.java:170)
> >
> > Its saying its not finding ub16. But the entry is there in 
> /etc/hosts files.
> > I am able to ssh both the machines. Do I need password less ssh 
> between> these two NNs ?
> > What can be the issue ? Any thing I am missing before using 
> distcp ?
> >
> > Thanks,
> > Praveenesh
> >
> 
> 

Re: Error using hadoop distcp

Posted by trang van anh <an...@vtc.vn>.
which  host run the task that throws the exception ? ensure that each 
data node know another data nodes in hadoop cluster-> add "ub16" entry 
in /etc/hosts on where the task running.
On 10/5/2011 12:15 PM, praveenesh kumar wrote:
> I am trying to use distcp to copy a file from one HDFS to another.
>
> But while copying I am getting the following exception :
>
> hadoop distcp hdfs://ub13:54310/user/hadoop/weblog
> hdfs://ub16:54310/user/hadoop/weblog
>
> 11/10/05 10:41:01 INFO mapred.JobClient: Task Id :
> attempt_201110031447_0005_m_000007_0, Status : FAILED
> java.net.UnknownHostException: unknown host: ub16
>          at org.apache.hadoop.ipc.Client$Connection.<init>(Client.java:195)
>          at org.apache.hadoop.ipc.Client.getConnection(Client.java:850)
>          at org.apache.hadoop.ipc.Client.call(Client.java:720)
>          at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220)
>          at $Proxy1.getProtocolVersion(Unknown Source)
>          at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:359)
>          at
> org.apache.hadoop.hdfs.DFSClient.createRPCNamenode(DFSClient.java:113)
>          at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:215)
>          at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:177)
>          at
> org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:82)
>          at
> org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1378)
>          at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:66)
>          at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1390)
>          at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:196)
>          at org.apache.hadoop.fs.Path.getFileSystem(Path.java:175)
>          at
> org.apache.hadoop.mapred.FileOutputCommitter.setupJob(FileOutputCommitter.java:48)
>          at
> org.apache.hadoop.mapred.OutputCommitter.setupJob(OutputCommitter.java:124)
>          at org.apache.hadoop.mapred.Task.runJobSetupTask(Task.java:835)
>          at org.apache.hadoop.mapred.MapTask.run(MapTask.java:296)
>          at org.apache.hadoop.mapred.Child.main(Child.java:170)
>
> Its saying its not finding ub16. But the entry is there in /etc/hosts files.
> I am able to ssh both the machines. Do I need password less ssh between
> these two NNs ?
> What can be the issue ? Any thing I am missing before using distcp ?
>
> Thanks,
> Praveenesh
>